<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Addition of new training examples to an existing RF classifier? about randomforest-matlab HOT 11 OPEN

GoogleCodeExporter commented on August 14, 2024

Addition of new training examples to an existing RF classifier?

from randomforest-matlab.

Comments (11)

GoogleCodeExporter commented on August 14, 2024

I didn't mean to submit that this a defect by the way!

Original comment by [email protected] on 7 Sep 2012 at 4:00

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

no issues. its a defect with almost all classifiers i know of :)

the package doesn't allow you to do that.

in theory, it may or maynot be possible with randomforests or decision trees in 
general. 

the trees split the examples based on the tgini value and for new examples one 
could always see if the examples is perfectly classified with the existing tree 
then no change is required to the tree (usually) and if the examples is not 
classified correctly then recreation of some branches of the tree may be done. 
or one could recalculate the tgini value at each node of the tree that the 
example passes through and if the value is > some threshold recreate that part 
of the tree. this will require some saving of the current tgini values at the 
nodes.

but then randomforests are greedy and approximated so you may be able to get 
away with lots of approximation any not require any kind of retraining unless 
you encounter lots of new training examples and the distribution of the data 
changes.

i vaguely remember a paper in the time-series domain where in order to learn to 
the drift, the authors retrain a bunch of trees, discard a bunch of trees from 
the original forest and consider the new tree bunch as a forest trained on the 
current set of data.

Original comment by abhirana on 7 Sep 2012 at 6:05

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

Thanks for the reply!

After I posted this, I stumbled upon MATLAB's built-in ensemble classifiers and 
it appears it has method to merge and append to bagged trees 
(http://www.mathworks.co.uk/help/toolbox/stats/treebagger.append.html). I found 
the built-in methods to be much slower than your implementation but I guess it 
might be worth a try. The document is very succinct, any idea how these methods 
work?

Cheers,

Nicolas

Original comment by [email protected] on 7 Sep 2012 at 6:10

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

Another quick question: I trained very large classifiers (~10 million examples, 
28 features). The memory footprint is pretty big (easily fills my 16GB of ram). 
I was wondering if there were fields in the structure that I could safely 
discard to reduce the size in memory?

Original comment by [email protected] on 7 Sep 2012 at 6:17

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

cool. 

looks like they are doing tree level merging to create a new forest, like the 
paper i mentioned about time-series random forests. i don't think they are 
removing existing trees from the forest, only adding more trees.

i could add the append/remove function if you want? something like 
append/remove(forest1, forest2, forest1_tree_indices, forest2_tree_indices)

the issue then will be how to calculate the oob error (which depends on what 
training example you have used and cannot be propagated if the training 
examples are the same)


about the memory requirement stuff, actually right now there is no 
straightforward fix to that, the issue is that the arrays are stored in a ntree 
x number nodes  order size and most trees are much smaller than number nodes 
size and because of that memory is wasted, if the array are stored as number 
nodes x ntree size then one can dynamically increase the node size if a tree 
gets bigger. actually come to think about it append/remove maynot be possible 
due to this memory arrangement. 

i can give it a try if you cannot find a reasonable solution.

Original comment by abhirana on 7 Sep 2012 at 6:26

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

Well if you could give it a try it would be great! I will evaluate the built-in 
ensemble methods but my preliminary tests were not very encouraging sadly.

Original comment by [email protected] on 7 Sep 2012 at 6:43

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

ok, let me give that a try, it will take me about a week or so to get it done.

Original comment by abhirana on 7 Sep 2012 at 6:47

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

Thanks, much appreciated!

Original comment by [email protected] on 17 Sep 2012 at 5:13

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

Abhirana,

I was wondering if you had any chance to try implementing such a feature?

Thanks!

Nicolas

Original comment by [email protected] on 17 Jan 2013 at 1:43

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

thanks nicolas for reminding. i apologize for the delay. 

i found out that it will require a major rewriting of the code. i haven't had a 
chance to make major changes and test them yet. (i am about planning to defend 
my phd in a month or two so thats what distracting me for the timebeing)

i'll make it a priority to get it done soon. i am guessing that for the time 
being i can implement it for classification (do tell me if regression is more 
important or not), is doing it for categorical data required for the time 
being? and how about implementation for importance?

Original comment by abhirana on 21 Jan 2013 at 5:19

Changed state: Started

from randomforest-matlab.

GoogleCodeExporter commented on August 14, 2024

Thanks for taking the time to work on this. I only use classification, I do not 
use categorical data and importance is not critical for my work.

Good luck with the PhD defense, my turn is only a few months away :)

Original comment by [email protected] on 21 Jan 2013 at 9:43

from randomforest-matlab.

Addition of new training examples to an existing RF classifier? about randomforest-matlab HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent