Comments (10)
You can actually provide any (train_index, test_index) iterator to the cv parameter. sklearn's crossvalidate function accepts both kfold objects and iterators (kfold object's split outputs) as inputs. Example would be:
lofo_imp = LOFOImportance(dataset, cv=GroupKFold(4).split(X, y, groups), scoring="roc_auc")
from lofo-importance.
New sklearn version seems to have problems with iterables in cross_validate. Converting iterables to list is a workaround:
lofo_imp = LOFOImportance(dataset, cv=list(GroupKFold(n_splits=4).split(X=tr, y=tr['pressure'], groups=tr['breath_id'])), scoring="neg_mean_absolute_error")
from lofo-importance.
Thanks. It's better to add some document about it.
from lofo-importance.
@aerdem4 I meet this error when use groupkfold
'
In
cv_results = cross_validate(self.model, X, y, cv=self.cv, scoring=self.scoring, fit_params=fit_params)
ValueError: not enough values to unpack (expected 3, got 0)
'
from lofo-importance.
Sklearn cross_validate
function (which is used by lofo-imortance
in LOFOImportance._get_cv_score
) has a groups
keyword argument, I forked this repo and added it there. You can see it in this PR BartlomiejSkwira#1 (it's a work in progress, requires tests)
@aerdem4 would it be a good PR candidate to your repo?
from lofo-importance.
@BartlomiejSkwira GroupKFold is supported with the workaround above. Your PR looks nice but it only covers one out of many validation schemes. From minimalistic point of view, I am thinking maybe keeping the repo without exceptions is better. But if you have an idea to include most common validation schemes in a generic way, you are welcome.
from lofo-importance.
@aerdem4 This workaround did't work for me, I would get a:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
...
<my code calling lofo_importance.get_importance()>
...
File "/opt/conda/lib/python3.8/site-packages/lofo/lofo_importance.py", line 85, in get_importance
lofo_cv_scores.append(self._get_cv_score(feature_to_remove=f))
File "/opt/conda/lib/python3.8/site-packages/lofo/lofo_importance.py", line 59, in _get_cv_score
cv_results = cross_validate(self.model, X, y, cv=self.cv, scoring=self.scoring, fit_params=fit_params, groups=self.groups)
File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 260, in cross_validate
results = _aggregate_score_dicts(results)
File "/opt/conda/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 1675, in _aggregate_score_dicts
for key in scores[0]
IndexError: list index out of range
I wonder if I used it correctly, here is how I used lofo:
pipe = pipeline.Pipeline(steps=[("cls", ensemble.RandomForestClassifier(random_state=RANDOM_STATE))])
cv = model_selection.GroupKFold(n_splits=N_SPLITS)
search = model_selection.GridSearchCV(
pipe,
param_grid,
n_jobs=-1,
scoring=scoring,
cv=cv,
verbose=0,
refit=true,
)
search.fit(X, y, groups=groups)
dataset = Dataset(
df=df,
target="some_target",
features=attribute_columns,
)
# define the validation scheme and scorer.
lofo_importance = LOFOImportance(
dataset,
cv=cv.split(X, y, groups),
scoring=scoring,
model=search.best_estimator_,
n_jobs=n_jobs,
# groups=groups,
)
# get the mean and standard deviation of the importances in pandas format
importance_df = lofo_importance.get_importance() # this line throws an exeption
from lofo-importance.
Can you check the length of generated list in cv.split just before feeding it to LOFO? The functions you use before can mutate cv and cv.split may return an empty list.
from lofo-importance.
@graceyangfan How do you use groupkfold? Like the way I suggested? Can you check the input or share a reproducible code?
from lofo-importance.
Getting the same error as Grace.
lofo_imp = LOFOImportance(dataset, cv=GroupKFold(n_splits=4).split(X=tr, y=tr['pressure'], groups=tr['breath_id']), scoring="neg_mean_absolute_error")
ValueError: not enough values to unpack (expected 3, got 0)
from lofo-importance.
Related Issues (20)
- Add logging or restart mechanism HOT 2
- Sample_weight? HOT 2
- Add the choice between Mean/Std and Median/IQR HOT 5
- Having a lot of features + Using LOFO? HOT 3
- usage question HOT 2
- Multiclass models HOT 4
- Groupkfold or Groupshufflesplit Cross Validation HOT 1
- Support multiclass classification ? HOT 2
- TimeSeriesSplit with Lofo HOT 1
- Feature selection using statistical significance
- How to perform feature selection with hyperparameter tuning?
- Returns NaNs all the time HOT 1
- Any tutorial for dealing with genetic data? HOT 2
- Could you add a reference? HOT 1
- Running the example in the readme throws errors
- Compatibility with neural network: replacing with constant value instead of dropping the feature HOT 2
- requirements.txt not packaged in source distribution
- Pandas 2.0.x compatibility HOT 5
- Variable Grouping Only Works When Model Parameter is Kept To Default HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lofo-importance.