It would be nice to have loaders for common file formats such as libsvm's or weka's.</

common file format support about scikit-learn HOT 9 CLOSED

mblondel commented on May 6, 2024

common file format support

from scikit-learn.

Comments (9)

ogrisel commented on May 6, 2024

I think we should open sub issues for each format with online / batch loading variant as soon a someone start working on it. We can keep this issue to discuss global design.

For as for the libsvm format, we should also have a look at the vowpal wabbit format that is a generalization (importance weightings both for samples and features, informative tags: human readable sample id not used in the training but for debugging and results display, feature name-spaces, feature cross products, ...).

See: http://hunch.net/~vw/vw_tutorial.pdf

from scikit-learn.

mblondel commented on May 6, 2024

Good points. Also, it would probably help design a better API if we had at least one algorithm with partial_fit implemented to play with.

from scikit-learn.

mblondel commented on May 6, 2024

For the online variant, we probably want to add a n_passes parameter so that the iterator makes several passes over the dataset (i.e., it will spit the same chunks several times)

from scikit-learn.

ogrisel commented on May 6, 2024

Or we could let the caller decides how many passes it wants by implementing a reset() method for sources that supports it (files, SQL databases, ...). stdin or networked data streams are not resettable unless you dump a local copy which is what VW does (after feature hashing) if I am not mistaken.

from scikit-learn.

mblondel commented on May 6, 2024

Good idea. By caller, I guess you mean the algorithm in partial_fit?

from scikit-learn.

mblondel commented on May 6, 2024

Oops, that would be outside partial_fit, of course.

from scikit-learn.

ogrisel commented on May 6, 2024

Yes, it's the stream puller / orchestrator that wraps both the input and the models with partial_fit and writes to the output stream.

from scikit-learn.

larsmans commented on May 6, 2024

Implemented a tentative (batch) loader for Weka's ARFF format. It's in my branch arff. It can load the Iris dataset, but that's just about it.

from scikit-learn.

larsmans commented on May 6, 2024

We have LibSVM now, SciPy supports ARFF (Weka) and no-one ever requests ARFF support so I guess it's not a popular file format at all (understandable since it's quite a pain to work with). I'm closing this issue, hope you don't mind.

from scikit-learn.

Recommend Projects

common file format support about scikit-learn HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent