Code Monkey home page Code Monkey logo

Comments (11)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
I didn't mean to submit that this a defect by the way!

Original comment by [email protected] on 7 Sep 2012 at 4:00

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
no issues. its a defect with almost all classifiers i know of :)

the package doesn't allow you to do that.

in theory, it may or maynot be possible with randomforests or decision trees in 
general. 

the trees split the examples based on the tgini value and for new examples one 
could always see if the examples is perfectly classified with the existing tree 
then no change is required to the tree (usually) and if the examples is not 
classified correctly then recreation of some branches of the tree may be done. 
or one could recalculate the tgini value at each node of the tree that the 
example passes through and if the value is > some threshold recreate that part 
of the tree. this will require some saving of the current tgini values at the 
nodes.

but then randomforests are greedy and approximated so you may be able to get 
away with lots of approximation any not require any kind of retraining unless 
you encounter lots of new training examples and the distribution of the data 
changes.

i vaguely remember a paper in the time-series domain where in order to learn to 
the drift, the authors retrain a bunch of trees, discard a bunch of trees from 
the original forest and consider the new tree bunch as a forest trained on the 
current set of data. 

Original comment by abhirana on 7 Sep 2012 at 6:05

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
Thanks for the reply!

After I posted this, I stumbled upon MATLAB's built-in ensemble classifiers and 
it appears it has method to merge and append to bagged trees 
(http://www.mathworks.co.uk/help/toolbox/stats/treebagger.append.html). I found 
the built-in methods to be much slower than your implementation but I guess it 
might be worth a try. The document is very succinct, any idea how these methods 
work?

Cheers,

Nicolas

Original comment by [email protected] on 7 Sep 2012 at 6:10

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
Another quick question: I trained very large classifiers (~10 million examples, 
28 features). The memory footprint is pretty big (easily fills my 16GB of ram). 
I was wondering if there were fields in the structure that I could safely 
discard to reduce the size in memory?

Original comment by [email protected] on 7 Sep 2012 at 6:17

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
cool. 

looks like they are doing tree level merging to create a new forest, like the 
paper i mentioned about time-series random forests. i don't think they are 
removing existing trees from the forest, only adding more trees.

i could add the append/remove function if you want? something like 
append/remove(forest1, forest2, forest1_tree_indices, forest2_tree_indices)

the issue then will be how to calculate the oob error (which depends on what 
training example you have used and cannot be propagated if the training 
examples are the same)


about the memory requirement stuff, actually right now there is no 
straightforward fix to that, the issue is that the arrays are stored in a ntree 
x number nodes  order size and most trees are much smaller than number nodes 
size and because of that memory is wasted, if the array are stored as number 
nodes x ntree size then one can dynamically increase the node size if a tree 
gets bigger. actually come to think about it append/remove maynot be possible 
due to this memory arrangement. 

i can give it a try if you cannot find a reasonable solution.

Original comment by abhirana on 7 Sep 2012 at 6:26

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
Well if you could give it a try it would be great! I will evaluate the built-in 
ensemble methods but my preliminary tests were not very encouraging sadly.


Original comment by [email protected] on 7 Sep 2012 at 6:43

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
ok, let me give that a try, it will take me about a week or so to get it done.

Original comment by abhirana on 7 Sep 2012 at 6:47

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
Thanks, much appreciated!

Original comment by [email protected] on 17 Sep 2012 at 5:13

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
Abhirana,

I was wondering if you had any chance to try implementing such a feature?

Thanks!

Nicolas

Original comment by [email protected] on 17 Jan 2013 at 1:43

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
thanks nicolas for reminding. i apologize for the delay. 

i found out that it will require a major rewriting of the code. i haven't had a 
chance to make major changes and test them yet. (i am about planning to defend 
my phd in a month or two so thats what distracting me for the timebeing)

i'll make it a priority to get it done soon. i am guessing that for the time 
being i can implement it for classification (do tell me if regression is more 
important or not), is doing it for categorical data required for the time 
being? and how about implementation for importance?

Original comment by abhirana on 21 Jan 2013 at 5:19

  • Changed state: Started

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 14, 2024
Thanks for taking the time to work on this. I only use classification, I do not 
use categorical data and importance is not critical for my work.

Good luck with the PhD defense, my turn is only a few months away :)


Original comment by [email protected] on 21 Jan 2013 at 9:43

from randomforest-matlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.