Comments (7)
I'm not sure I understand the question. At the moment Tribuo doesn't have any implementations of feature selection wrappers. To add one you need to implement org.tribuo.FeatureSelector
with the desired algorithm. The SelectedFeatureSet
produced by a run of the algorithm can be saved out, and you can produce a dataset containing only the selected features by constructing a SelectedFeaturesDataset
.
from tribuo.
You should keep the test set used by the wrapper completely separate from the test set used to evaluate the final classifier, so you need to split your data into at least three chunks, a train set for the wrapper, a test set for the wrapper and a final test set. You can also train the final classifier on the wrappers train & test set combined if you want, but that's not necessary. You can also do cross validation inside the wrapper, or randomly split the data each time for each feature set, but essentially all three of those options operate on whatever data you pass into the wrapper which should be separate from your final test set.
from tribuo.
Thats all I need to know now. And for further concerns I may reopen this issue.
from tribuo.
Dear Adam,
I implemented a wrapper FS based Cuckoo search algorithm. But I want to know your opinion about this:
var data = new CSVLoader(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");
var dataSplitter = new TrainTestSplitter<Label>(data, 0.5, Trainer.DEFAULT_SEED);
var TrainingPart = new MutableDataset<Label>(dataSplitter.getTrain());
var TestinfPart = new MutableDataset<Label>(dataSplitter.getTest());
var opt = new CuckooSearchOptimizer(TestinfPart,
TransferFunction.TransferFunction_V2,
50,
2,
2,
0.1,
1.5,
10);
var SFS = opt.select(TrainingPart);
This is how the algorithm looks like, and my concern is about passing the test part to the constructor since I think the code should be better but the wrapper FS requires to train and test each solution from the population so I need to use train and test portions for it, now my suggestion is to pass the datasource to the FS algorithm such as:
var data = new CSVLoader(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");
var opt = new CuckooSearchOptimizer(data,
TransferFunction.TransferFunction_V2,
50,
2,
2,
0.1,
1.5,
10);
var SFS = opt.getSelectedFeature();
With some other methods to get all needed information.
Please tell me if there is another appropriate solution for this
from tribuo.
I would pass the feature selection algorithm a dataset and have it split that internally, controlled by a parameter. DataSources should only be converted into Datasets, nothing should really be processing them in the DataSource form.
from tribuo.
Okay, in the algorithm I need to train some trainer like KNN (lazy algorithm) in order to evaluate each solution from the population, therefore I need the train and test parts to be used inside the algorithm and I cant do that by passing the training part, I want to know your suggesion
from tribuo.
I think 10-fold cross validation is suitable for such a task and it solved the issue I was asking about. Now I want to add some other constructors, writing some comments too. Thanks for your help. I will request to add the model to the Tribuo engine and I may add more wrapper approaches for FS in the near future. The code looks like this:
var data = new CSVLoader(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");
var dataSplitter = new TrainTestSplitter<Label>(data, 0.5, Trainer.DEFAULT_SEED);
var TrainingPart = new MutableDataset<Label>(dataSplitter.getTrain());
var TestinfPart = new MutableDataset<Label>(dataSplitter.getTest());
var opt = new CuckooSearchOptimizer(TransferFunction.TransferFunction_V2,
50,
2,
2,
0.1,
1.5,
20);
var SFS = opt.select(TrainingPart);
System.out.println(SFS.featureNames().size());
var SFDS = new SelectedFeatureDataset<>(TrainingPart, SFS);
from tribuo.
Related Issues (20)
- Do you have any plans to support time-series predictions? HOT 1
- When packaged into docker container: FileNotFoundException: File /lib/linux-musl/x86_64/libxgboost4j.so HOT 6
- Memory and SQLDataSource HOT 1
- About csvLoader.loadDataSource HOT 4
- Configuring HyperParameters HOT 4
- Question about input feature mapping HOT 9
- Llama APIs HOT 1
- Load data from List obj in memory HOT 1
- MLP HOT 1
- TensorFlow Isuue
- Training loss HOT 1
- Weight and Bias in NN HOT 3
- HDBSCAN implementation in 4.3+ HOT 4
- Clustering Issue with Loading the Data HOT 9
- Gaussian Mixture Model capability HOT 4
- Regarding this "Covariance matrix is not positive definite" HOT 2
- Can I use a DataFrame from Tablesaw (or another dataframe libary) ? HOT 4
- SQLDataSource Example HOT 15
- Calculating the AUC and ROC HOT 7
- Update Tensorflow java to use version 1 release candidate. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tribuo.