Code Monkey home page Code Monkey logo

Comments (4)

Craigacp avatar Craigacp commented on June 5, 2024

CSVLoader returns a CSVDataSource. The DataSource interface doesn't have much in the way of accessor methods, you should construct a MutableDataset from that data source which will populate the feature & output information objects that you can query. If you want to print out the examples you can iterate the data source and print each Example object.

Tribuo has a row-wise view of data, and doesn't provide a data frame style interface. If you want something more like a dataframe in Java then I think JTablesaw is supposed to be good for that, but I've not used it much.

from tribuo.

pablo3p avatar pablo3p commented on June 5, 2024

Hi there, thanks for your quick reply.
SO when passing in data, I want to make sure that it is proper, so it looks like there is no way to determine that once it is loaded and creates a CSVDataSource.
I would prefer to load then the data from CSV into something like JTablesaw, and from JTablesaw pass that into a Tribuo DataSource.
Wondering if this is possible?
Hope you can let me know.

P.

from tribuo.

Craigacp avatar Craigacp commented on June 5, 2024

You can inspect the examples after they have been loaded to make sure the pipeline is valid. I recommend looking at CSVDataSource rather than using CSVLoader as it's more flexible. There's a columnar data tutorial which explains the mechanisms - https://tribuo.org/learn/4.3/tutorials/columnar-tribuo-v4.html.

We don't currently support loading from JTablesaw into Tribuo because we can't capture the necessary provenance & reproducibility information out of a tablesaw dataset. It would be pretty useful to have though, but due to the provenance issues we've not got around to it.

from tribuo.

pablo3p avatar pablo3p commented on June 5, 2024

Hi, thanks again.
The link you provided seems to have a lot of useful concepts etc.

Yes, to have something like JTablesaw, and have that first load the CSV and then pass it onto like the CSVDataSource, I think would be really good, because you can pass on the responsibility of the "integrity" of the data to the Data Science person, because they are the subject matter experts, and they should be able to look into the DataFrame(in this case JTablesaw) and then decide that the data is in proper shape to pass into the CSVDataSource data structure. Allowing for "Human Intervention" especially at the Data-source part of the Data Pipeline, is very valuable to allow the Data Science person more control in the Data Quality aspect of the Data Pipeline. This type or kind, should be an option and should be available in Tribuo. So just wanted to elaborate on my thinking on this.
Thanks again for all your great help, really appreciate it.
Best Regards,

P

from tribuo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.