Comments (2)
Hi Rui, thank you for reaching out. We did not compile further performance tests yet but went with a basic but reliable algorithm.
There are many papers and articles out there comparing the performance of machine learning algorithms on various datasets.
I guess there are 3 takeaways:
- The best algorithm always depends on the exact data and underlying relationship. There usually is not the one winner
- Usually, all of the algorithms pick up some signal if the data has been preprocessed correctly
- The effect of the algorithm is more important if there is a lot of data and some algorithms might extract more information. In our case, we only have 1 feature so the performance differences should be less extreme
Since the intention of the PPS is to get a quick understanding of some hidden patterns I don't think that it is critical to always choose the perfect learning algorithm as long as there are no massive differences. E.g. a massive difference would be a score of 0.05 vs 0.9 or 0.5.
Nevertheless, I would be happy to provide more performance comparisons and will share them on the repo and maybe it turns out that the DecisionTree should not be the standard algorithm going forward.
If you make some further tests on your data, I am curious to hear about the results!
Best,
Florian
from ppscore.
Hi Florian,
Thanks so much for your very quick response!
I was precisely thinking on how these might do if we have thousands of features in a n >> p setting. In this case, and just like what happens if you were computing pairwise correlations, there will be a lot of false positives. So in addition to capturing non-linear relationships, as you so well highlight in your towards data science post, I was wondering if leveraging the asymetrical property of the PPS would be helpful or actually makes your life harder since it doubles the computational cost, and whether having other strategies (including bagging in RF) could help boost F1 scores.
I'm aware that it is this situation where you can easily test millions of associations that you want to avoid RF as they can become too slow for you then. Of course, one obvious solution is to do some preliminary feature selection and only compute the PPS from the reduced feature set :)
I'll pay attention to your posts here (or if you post these results on TDS I'd really appreciate if you could link it here). I will close this issue for now in any case.
Thanks once again!
from ppscore.
Related Issues (20)
- Adding the ability to use different evaluation metrics HOT 6
- Giving a DOI citation to PPscore HOT 5
- AttributeError: module 'ppscore' has no attribute 'predictors' HOT 6
- ppscore changes to 0 for multiple variables after upgrade HOT 5
- Data preprocessing and information leakage HOT 14
- [SUGGEST] Release a verson supported GPU HOT 2
- ppscore when model_score>baseline_score HOT 3
- There should be an option to override the attribute type like PyCaret HOT 4
- Scikit-learn dependency < 1.0.0 HOT 14
- [Suggestion]: Plot the Decision Tree for pps.score HOT 3
- Readme / docs unclear about using ppscore on time series data HOT 3
- pytests failing with pandas==1.4.0 HOT 1
- Thought on a possible enhancement of the PPS HOT 2
- What does PPS score? HOT 4
- Add support to release Linux aarch64 wheels HOT 4
- Cannot install ppscore HOT 1
- Your package isn't compatible with scikit-learn 1.0.1 HOT 2
- How to report PPS HOT 1
- Question About Data Order HOT 12
- y predicted values given x HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ppscore.