Code Monkey home page Code Monkey logo

Comments (9)

ogrisel avatar ogrisel commented on May 6, 2024

I think we should open sub issues for each format with online / batch loading variant as soon a someone start working on it. We can keep this issue to discuss global design.

For as for the libsvm format, we should also have a look at the vowpal wabbit format that is a generalization (importance weightings both for samples and features, informative tags: human readable sample id not used in the training but for debugging and results display, feature name-spaces, feature cross products, ...).

See: http://hunch.net/~vw/vw_tutorial.pdf

from scikit-learn.

mblondel avatar mblondel commented on May 6, 2024

Good points. Also, it would probably help design a better API if we had at least one algorithm with partial_fit implemented to play with.

from scikit-learn.

mblondel avatar mblondel commented on May 6, 2024

For the online variant, we probably want to add a n_passes parameter so that the iterator makes several passes over the dataset (i.e., it will spit the same chunks several times)

from scikit-learn.

ogrisel avatar ogrisel commented on May 6, 2024

Or we could let the caller decides how many passes it wants by implementing a reset() method for sources that supports it (files, SQL databases, ...). stdin or networked data streams are not resettable unless you dump a local copy which is what VW does (after feature hashing) if I am not mistaken.

from scikit-learn.

mblondel avatar mblondel commented on May 6, 2024

Good idea. By caller, I guess you mean the algorithm in partial_fit?

from scikit-learn.

mblondel avatar mblondel commented on May 6, 2024

Oops, that would be outside partial_fit, of course.

from scikit-learn.

ogrisel avatar ogrisel commented on May 6, 2024

Yes, it's the stream puller / orchestrator that wraps both the input and the models with partial_fit and writes to the output stream.

from scikit-learn.

larsmans avatar larsmans commented on May 6, 2024

Implemented a tentative (batch) loader for Weka's ARFF format. It's in my branch arff. It can load the Iris dataset, but that's just about it.

from scikit-learn.

larsmans avatar larsmans commented on May 6, 2024

We have LibSVM now, SciPy supports ARFF (Weka) and no-one ever requests ARFF support so I guess it's not a popular file format at all (understandable since it's quite a pain to work with). I'm closing this issue, hope you don't mind.

from scikit-learn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.