Comments (6)
Not yet, will dig in over the weekend!
from ppscore.
As a baseline, we could start with a non-shuffled KFold
, or TimeSeriesCV
provided in scikit-learn
from ppscore.
Hi Anton
that sounds great. We can pass through a cv argument which behaves like a scikit-learn crossvalidator.
Did you already have a look at the code? Please let me know if something is unclear
Florian
from ppscore.
So here's what needs to be done in case i did not miss anything:
Generally for inspiration on various CV I was thinking to use mlxtend
by Sebi Raschka, e.g. similar to what he uses here:
http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
Re: your codebase:
_calculate_model_cv_score_
- this function needs re-vamping , with encoders plugged into a Pipeline object likely to avoid leakage. Alternatively, we can explicitly write a for-loop over splits and avoid usingcross_val_score
_mae_normalizer(df, y, model_score)
- we need to think what to do here as median/baseline should be calculated over a given CV object. Most likely solution is same as above - an explicitfor
overcv.split(X)
or something similar.- Pass CV somehow to
score
andmatrix
.
@8080labs Is there anything I missed?
from ppscore.
A second look at cross_val_score() makes me think that we could also introduce a new scorer, that calculates a baseline and the decision tree score simultaneously...I prefer doing the two things explicitly though
UPD: I think I have implemented most of the stuff necessary..Will try to test on some simple examples. It is probably worth making up some tests that are sensitive to CV changes..
Here, check this out (also check the tests in that branch, and let me know if I should open a PR for it):
https://github.com/AntonBiryukovUofC/ppscore/blob/custom_cv_regression/src/ppscore/calculation.py
Any chance you could create a dev
branch so I could stage a PR ?
In the meantime I'll think about a test that would work / fail in the case of KFold
with/without shuffle=True
, as well as some time-series related test case (should be easy given we have a DecisionTree
here)
from ppscore.
@AntonBiryukovUofC I've created a dev
branch. Looking forward to your PR :)
(I looked at your code but it's better to wait for the PR so I can see the diff)
Cheers,
Tobias
from ppscore.
Related Issues (20)
- ppscore changes to 0 for multiple variables after upgrade HOT 5
- Data preprocessing and information leakage HOT 14
- [SUGGEST] Release a verson supported GPU HOT 2
- ppscore when model_score>baseline_score HOT 3
- There should be an option to override the attribute type like PyCaret HOT 4
- Scikit-learn dependency < 1.0.0 HOT 14
- [Suggestion]: Plot the Decision Tree for pps.score HOT 3
- Readme / docs unclear about using ppscore on time series data HOT 3
- pytests failing with pandas==1.4.0 HOT 1
- Thought on a possible enhancement of the PPS HOT 2
- What does PPS score? HOT 4
- Add support to release Linux aarch64 wheels HOT 4
- Cannot install ppscore HOT 1
- Your package isn't compatible with scikit-learn 1.0.1 HOT 2
- How to report PPS HOT 1
- Question About Data Order HOT 12
- y predicted values given x HOT 3
- Performance HOT 3
- differnt baseline scores for the same y HOT 1
- How to deal with heavy imbalanced data? For example, when the target is 99 "negative" to 1 "positive"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ppscore.