Code Monkey home page Code Monkey logo

Comments (6)

AntonBiryukovUofC avatar AntonBiryukovUofC commented on July 21, 2024 1

Not yet, will dig in over the weekend!

from ppscore.

AntonBiryukovUofC avatar AntonBiryukovUofC commented on July 21, 2024

As a baseline, we could start with a non-shuffled KFold, or TimeSeriesCV provided in scikit-learn

from ppscore.

8080labs avatar 8080labs commented on July 21, 2024

Hi Anton
that sounds great. We can pass through a cv argument which behaves like a scikit-learn crossvalidator.
Did you already have a look at the code? Please let me know if something is unclear
Florian

from ppscore.

AntonBiryukovUofC avatar AntonBiryukovUofC commented on July 21, 2024

So here's what needs to be done in case i did not miss anything:
Generally for inspiration on various CV I was thinking to use mlxtend by Sebi Raschka, e.g. similar to what he uses here:

http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/

Re: your codebase:

  1. _calculate_model_cv_score_ - this function needs re-vamping , with encoders plugged into a Pipeline object likely to avoid leakage. Alternatively, we can explicitly write a for-loop over splits and avoid using cross_val_score
  2. _mae_normalizer(df, y, model_score) - we need to think what to do here as median/baseline should be calculated over a given CV object. Most likely solution is same as above - an explicit for over cv.split(X) or something similar.
  3. Pass CV somehow to score and matrix.

@8080labs Is there anything I missed?

from ppscore.

AntonBiryukovUofC avatar AntonBiryukovUofC commented on July 21, 2024

A second look at cross_val_score() makes me think that we could also introduce a new scorer, that calculates a baseline and the decision tree score simultaneously...I prefer doing the two things explicitly though

UPD: I think I have implemented most of the stuff necessary..Will try to test on some simple examples. It is probably worth making up some tests that are sensitive to CV changes..
Here, check this out (also check the tests in that branch, and let me know if I should open a PR for it):

https://github.com/AntonBiryukovUofC/ppscore/blob/custom_cv_regression/src/ppscore/calculation.py

Any chance you could create a dev branch so I could stage a PR ?

In the meantime I'll think about a test that would work / fail in the case of KFold with/without shuffle=True, as well as some time-series related test case (should be easy given we have a DecisionTree here)

from ppscore.

8080labs avatar 8080labs commented on July 21, 2024

@AntonBiryukovUofC I've created a dev branch. Looking forward to your PR :)
(I looked at your code but it's better to wait for the PR so I can see the diff)

Cheers,
Tobias

from ppscore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.