Code Monkey home page Code Monkey logo

Comments (4)

FlorianWetschoreck avatar FlorianWetschoreck commented on August 22, 2024

Hi Fernando, thank you for posting the question.
Many people ask us this question but it is not so easy to answer because "it depends".
I will think more deeply about it and then get back to you with an answer

from ppscore.

FlorianWetschoreck avatar FlorianWetschoreck commented on August 22, 2024

Hi Fernando,

I gave the question quite some thought and for now I would like to reply the following:

First, the technical interpretation of the PPS is:

  • the percentage of model improvement potential that the feature adds when the current naive base model is compared to a perfect deterministic model.

The interpretation depends on the context:
In general, it is hard to denote some specific levels and give some interpretation for them without knowing the context. For example if many columns have a PPS of 0.3 then a PPS of 0.2 might actually be not that good. However, when no column has a PPS >0.01 then a PPS of 0.1 might be very good - especially when trying to predict something that is hard like stock prices.

Nevertheless, there are some levels that are often helpful during everyday life:

  • PPS == 0 means that there is no predictive power
  • PPS < 0.2 often means that there is some relevant predictive power but it is weak
  • PPS > 0.2 often means that there is strong predictive power
  • PPS > 0.8 often means that there is a deterministic relationship in the data, for example y = 3*x or there is some underlying if...else... logic

Based on those levels, it is often important to check the PPS for multiple columns and then determine your interpretation based on that.

What do you think about this explanation? Do you have some specific scenarios, use cases or questions that you want the PPS to answer?

from ppscore.

FernandoDoreto avatar FernandoDoreto commented on August 22, 2024

Hi @FlorianWetschoreck, thanks for your attention and commitment. I liked your consideration on "interpreting on the context", it makes sense. Also the ranges you described generates insights for me, so I can code a way to automate a ppsThreshold (keep reading to understand what I mean)

I'm using specifically ppscore as part of an approach to detect relationships (linear, non linear, quadratic, trigonometric, log, exponential, etc) among variables. I see that would need a faceted approach since relationships are asymmetric and may have different shapes (linear, non linear etc). I consider combining ppscore, spearman corr and MIC.

  • (1) I apply pps.matrix() to my dataset. So yeah, that can have a high computing cost. Then I consider a certain "ppsThreshold" and query the matrix: model_score != 1 and ppscore > ppsThreshold. That would leave me with variable pairs with relevant predictive powers.

  • (2) Then I calculate spearman correlation on these pairs, revealing if there is a monotonic relationship. This, typically, has a low computing cost.

  • (3) Finally I calculate MIC for the unique combination from these variables pairs. This has a big computing cost, that is why it's important to reduce the feature space with ppscore. For me, MIC tells the relationship strength.

So ultimately I would be able to conclude sth like:

Variable A has strong predictive power on Variable B, but Variable B doesn't have predictive power on Variable A.
Variable A has a strong and positive relationship with Variable B

  • Conclusion rational:

strong predictive power - "strong" provided by ppscore
strong and positive relationship: "strong" is provided by MIC level and "positive" is provided by spearman correlation

Based on your personal and professional experience, this "relationship detection" approach that I described, makes sense to you? Are you aware of any (preferably) open-source python package that does that?

Regards, Fernando

from ppscore.

FlorianWetschoreck avatar FlorianWetschoreck commented on August 22, 2024

Hi Fernando,
what is the surrounding use case that you are working on and why do you need to estimate both the relationship strength and shape?
More information in this regard might inform the solution approach.
Also, I am suprised by this general notion of "positive" because this only applies for a handful of relationships but might be valid in your context.

About your solution approach:

  • please review if you want to apply spearman correlation in parallel to ppscore matrix because the ppscore is not that good for detecting linear relationships and the scores might be below your threshold

from ppscore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.