Code Monkey home page Code Monkey logo

Comments (10)

aerdem4 avatar aerdem4 commented on May 24, 2024 2

You can actually provide any (train_index, test_index) iterator to the cv parameter. sklearn's crossvalidate function accepts both kfold objects and iterators (kfold object's split outputs) as inputs. Example would be:

lofo_imp = LOFOImportance(dataset, cv=GroupKFold(4).split(X, y, groups), scoring="roc_auc")

from lofo-importance.

aerdem4 avatar aerdem4 commented on May 24, 2024 2

New sklearn version seems to have problems with iterables in cross_validate. Converting iterables to list is a workaround:

lofo_imp = LOFOImportance(dataset, cv=list(GroupKFold(n_splits=4).split(X=tr, y=tr['pressure'], groups=tr['breath_id'])), scoring="neg_mean_absolute_error")

from lofo-importance.

RainFung avatar RainFung commented on May 24, 2024 1

Thanks. It's better to add some document about it.

from lofo-importance.

graceyangfan avatar graceyangfan commented on May 24, 2024 1

@aerdem4 I meet this error when use groupkfold
'
In
cv_results = cross_validate(self.model, X, y, cv=self.cv, scoring=self.scoring, fit_params=fit_params)

ValueError: not enough values to unpack (expected 3, got 0)
'

from lofo-importance.

BartlomiejSkwira avatar BartlomiejSkwira commented on May 24, 2024

Sklearn cross_validate function (which is used by lofo-imortance in LOFOImportance._get_cv_score) has a groups keyword argument, I forked this repo and added it there. You can see it in this PR BartlomiejSkwira#1 (it's a work in progress, requires tests)

@aerdem4 would it be a good PR candidate to your repo?

from lofo-importance.

aerdem4 avatar aerdem4 commented on May 24, 2024

@BartlomiejSkwira GroupKFold is supported with the workaround above. Your PR looks nice but it only covers one out of many validation schemes. From minimalistic point of view, I am thinking maybe keeping the repo without exceptions is better. But if you have an idea to include most common validation schemes in a generic way, you are welcome.

from lofo-importance.

BartlomiejSkwira avatar BartlomiejSkwira commented on May 24, 2024

@aerdem4 This workaround did't work for me, I would get a:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
    ...
<my code calling lofo_importance.get_importance()>
   ...
  File "/opt/conda/lib/python3.8/site-packages/lofo/lofo_importance.py", line 85, in get_importance
    lofo_cv_scores.append(self._get_cv_score(feature_to_remove=f))
  File "/opt/conda/lib/python3.8/site-packages/lofo/lofo_importance.py", line 59, in _get_cv_score
    cv_results = cross_validate(self.model, X, y, cv=self.cv, scoring=self.scoring, fit_params=fit_params, groups=self.groups)
  File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 260, in cross_validate
    results = _aggregate_score_dicts(results)
  File "/opt/conda/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 1675, in _aggregate_score_dicts
    for key in scores[0]
IndexError: list index out of range

I wonder if I used it correctly, here is how I used lofo:

pipe = pipeline.Pipeline(steps=[("cls", ensemble.RandomForestClassifier(random_state=RANDOM_STATE))])
cv = model_selection.GroupKFold(n_splits=N_SPLITS)
search = model_selection.GridSearchCV(
    pipe,
    param_grid,
    n_jobs=-1,
    scoring=scoring,
    cv=cv,
    verbose=0,
    refit=true,
)
search.fit(X, y, groups=groups)
dataset = Dataset(
        df=df,
        target="some_target",
        features=attribute_columns,
)

# define the validation scheme and scorer.
lofo_importance = LOFOImportance(
    dataset,
    cv=cv.split(X, y, groups),
    scoring=scoring,
    model=search.best_estimator_,
    n_jobs=n_jobs,
    # groups=groups,
)

# get the mean and standard deviation of the importances in pandas format
importance_df = lofo_importance.get_importance() # this line throws an exeption

from lofo-importance.

aerdem4 avatar aerdem4 commented on May 24, 2024

Can you check the length of generated list in cv.split just before feeding it to LOFO? The functions you use before can mutate cv and cv.split may return an empty list.

from lofo-importance.

aerdem4 avatar aerdem4 commented on May 24, 2024

@graceyangfan How do you use groupkfold? Like the way I suggested? Can you check the input or share a reproducible code?

from lofo-importance.

Quetzalcohuatl avatar Quetzalcohuatl commented on May 24, 2024

Getting the same error as Grace.

lofo_imp = LOFOImportance(dataset, cv=GroupKFold(n_splits=4).split(X=tr, y=tr['pressure'], groups=tr['breath_id']), scoring="neg_mean_absolute_error")

ValueError: not enough values to unpack (expected 3, got 0)

from lofo-importance.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.