Hey, What does it take to implement partial_fit in lightning? Is the

online learning,about scikit-learn-contrib/lightning

Comments (9)

sschnug commented on July 21, 2024

Non-contributor here:

What algorithms do you need? Do you have too much data (which can't be used through sparse-matrices)?

I do think, that the following algorithms need only minimal changes:

SGDClassifier, SGDRegressor (already available in scikit-learn with partial_fit; only slightly different)
AdaGradClassifier, AdaGradRegressor (slightly more work depending on internals)
SAGClassifier, SAGRegressor (slightly more work depending on internals)

Impossible (algorithm-wise; batch-methods = full-gradient):

FistaClassifier, FistaRegressor
SVRGClassifier, SVRGRegressor

These could maybe work, but i'm unsure about the theory (there might be constraints on partial_fit; how to call it with which data):

CDClassifier, CDRegressor
SDCAClassifier, SDCARegressor

from lightning.

kmike commented on July 21, 2024

Thanks for the detailed overview!

I'm in reinforcement learning setup where the whole data is not available, and want to use a regression model which uses the data seen so far, without retraining it from scratch. I want to try an optimisation algorithm with an adaptive learning rate or a momentum, and lightning has a good AdaGradRegressor implementation.

from lightning.

sschnug commented on July 21, 2024

Let's see what the developers think.

Just two random remarks:

Did you try (carefully tuned) vanilla-SGD (the version in sklearn with partial_fit) for your use-case (i'm sceptical if AdaGrad is so much better, but this might be dependent on your data and i'm not an expert)
There is a warm_start option in CDClassifier and SDCAClassifier... Maybe there is a clever way incorporate these possibilities in your setup

from lightning.

kmike commented on July 21, 2024

Yeah, I'm using vanilla SGD now; it works ok. The problem is that the component should work across many tasks, and it'd be nice to have less parameters to tune.

from lightning.

anttttti commented on July 21, 2024

I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.

I had a look at modifying scikit-lightning for this. The outputs_2d_ initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?

from lightning.

fabianp commented on July 21, 2024

Hi,

A patch that implements partial_fit would definitely be a nice addition !

Please submit a patch with the modifications that you propose. I'll
allocate time to review them.
On Jun 3, 2016 3:59 AM, "anttttti" [email protected] wrote:

I was just about to start an issue on this. I'm training models on a
really big file, so the data won't fit in memory at once. Streaming and
parallelization are the only way to use the data. Vanilla SGD from
scikit-learn takes tuning and doesn't improve from multiple iterations. The
FTRL from Kaggler.py works better, but can't be pickled.

I had a look at modifying scikit-lightning for this. The outputs_2d_
initialization in fit() should be moved to init(), but also the Cython part
should be modified so that it doesn't reset the model parameters when
fit_partial is called. Would it be possible to get these changes?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#78 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAQ8h7Ax4lxMfP7mutD_qXyPhAOQfCCwks5qH4pqgaJpZM4Is9SR
.

from lightning.

anttttti commented on July 21, 2024

I didn't get a patch written, I hacked the code first to see how easily this could be done. I think I got it working for the AdaGradRegressor case, but the results were not good, so I think I missed something. The results from Adagrad without my hack weren't much better than SGD on my data, and FTRL from Kaggler was vastly better. This is a general result on SGD vs. FTRL with high-dimensional data. Anyway, I got a partial_fit FTRL working by adding model pickling to Kaggler instead. I could look at contributing to Lighting later.

Attached is the hack I wrote, in case someone wants to continue from that.
adagrad.py.txt

from lightning.

mblondel commented on July 21, 2024

partial_fit is already supported in scikit-learn's SGD so I think we should focus on AdaGrad first.

@anttttti If you start a PR, we can help you track down the problem. Also make sure to write a unit test that checks that calling partial_fit multiple times is equivalent to fit.

from lightning.

anttttti commented on July 21, 2024

I made a version of FTRL available as part of the package I made available:
https://github.com/anttttti/Wordbatch/blob/master/wordbatch/models/ftrl.pyx

This support partial fit and online learning, weighted features, link function for classification/regression, and does instance-level parallelization with OpenMP prange.

This script probably won't fit the scope of current sklearn-contrib-lightning, so I've released it independently for now.

from lightning.

online learning about lightning HOT 9 OPEN

Comments (9)

I do think, that the following algorithms need only minimal changes:

Impossible (algorithm-wise; batch-methods = full-gradient):

These could maybe work, but i'm unsure about the theory (there might be constraints on partial_fit; how to call it with which data):

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent