Comments (4)
CSVLoader
returns a CSVDataSource
. The DataSource
interface doesn't have much in the way of accessor methods, you should construct a MutableDataset
from that data source which will populate the feature & output information objects that you can query. If you want to print out the examples you can iterate the data source and print each Example
object.
Tribuo has a row-wise view of data, and doesn't provide a data frame style interface. If you want something more like a dataframe in Java then I think JTablesaw is supposed to be good for that, but I've not used it much.
from tribuo.
Hi there, thanks for your quick reply.
SO when passing in data, I want to make sure that it is proper, so it looks like there is no way to determine that once it is loaded and creates a CSVDataSource.
I would prefer to load then the data from CSV into something like JTablesaw, and from JTablesaw pass that into a Tribuo DataSource.
Wondering if this is possible?
Hope you can let me know.
P.
from tribuo.
You can inspect the examples after they have been loaded to make sure the pipeline is valid. I recommend looking at CSVDataSource
rather than using CSVLoader
as it's more flexible. There's a columnar data tutorial which explains the mechanisms - https://tribuo.org/learn/4.3/tutorials/columnar-tribuo-v4.html.
We don't currently support loading from JTablesaw into Tribuo because we can't capture the necessary provenance & reproducibility information out of a tablesaw dataset. It would be pretty useful to have though, but due to the provenance issues we've not got around to it.
from tribuo.
Hi, thanks again.
The link you provided seems to have a lot of useful concepts etc.
Yes, to have something like JTablesaw, and have that first load the CSV and then pass it onto like the CSVDataSource, I think would be really good, because you can pass on the responsibility of the "integrity" of the data to the Data Science person, because they are the subject matter experts, and they should be able to look into the DataFrame(in this case JTablesaw) and then decide that the data is in proper shape to pass into the CSVDataSource data structure. Allowing for "Human Intervention" especially at the Data-source part of the Data Pipeline, is very valuable to allow the Data Science person more control in the Data Quality aspect of the Data Pipeline. This type or kind, should be an option and should be available in Tribuo. So just wanted to elaborate on my thinking on this.
Thanks again for all your great help, really appreciate it.
Best Regards,
P
from tribuo.
Related Issues (20)
- Error on irises-tribuo-v4.ipynb HOT 1
- TransformedModel doesn't have a protobuf
- mRMR HOT 1
- FS using wrapper approaches HOT 7
- Docs recommending IJava HOT 2
- Problem deserializing the XGBoostModel HOT 1
- Do you have any plans to support time-series predictions? HOT 1
- When packaged into docker container: FileNotFoundException: File /lib/linux-musl/x86_64/libxgboost4j.so HOT 6
- Memory and SQLDataSource HOT 1
- Configuring HyperParameters HOT 4
- Question about input feature mapping HOT 9
- Llama APIs HOT 1
- Load data from List obj in memory HOT 1
- MLP HOT 1
- TensorFlow Isuue
- Training loss HOT 1
- Weight and Bias in NN HOT 3
- HDBSCAN implementation in 4.3+ HOT 4
- Clustering Issue with Loading the Data HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tribuo.