Code Monkey home page Code Monkey logo

Comments (18)

slundberg avatar slundberg commented on May 22, 2024 9

The summary_plot function is designed to get a good visual summary of which features are most important. By plotting the distribution of a feature's importance over all observations we get a much better idea of its effect than could be captured by a single number. That being said, I have used the average absolute SHAP value to rank the features in the summary plot.

When we collapse a feature's importance across all samples to a single number we are forced to decide what we want to measure. For example, is a feature with high effect on a small number of observations more important than a feature with a small effect on many observations? I think the answer to this is application dependent.

A related question is assigning a P-value to a feature, and this is something I hope to release a notebook on soon.

from shap.

slundberg avatar slundberg commented on May 22, 2024 3

from shap.

slundberg avatar slundberg commented on May 22, 2024 3

Just do np.abs(shap_values).mean(0)

from shap.

slundberg avatar slundberg commented on May 22, 2024 1

@msobroza you can if you want. If you are just checking the sign then it won't matter.

from shap.

AlxndrMlk avatar AlxndrMlk commented on May 22, 2024 1

@slundberg: First of all, thanks for the great package!
Is it possible to get a list / an array of average shapley values (instead of a plot)?
EDIT: Thanks I found a solution!

from shap.

lacava avatar lacava commented on May 22, 2024 1

If you train an XGBoost model and then explain it on your training dataset to get a matrix of shap_values you can consider a single column the importance of a feature across all samples. It will have both positive and negative values. What you want to know is if those values have a meaningful trend or are just driven by random noise. To evaluate this you can retrain your model on a bootstrap resample of your dataset and then explain it again on your original training data to get another matrix of shap_values. If you take the dot product (or correlation) between two of the same columns in each matrix you will see how well the impacts of a feature in the first model agree with the impacts of that same feature in the other model. By repeating this many times you will get an estimate of the global stability of a feature. (if the correlation is consistently greater than 0 then you have a significant feature) There may be better ways to do this, but that's what I have done.

I'm wondering if it would be reasonable to estimate the significance of a variable for a fixed model by simply bootstrap re-sampling the calculation of np.abs(shap_values).mean(0) over a large set of shap_value samples (training or validation data, depending on your goals). this would give you a confidence interval on the mean absolute shap value for each feature, and would not require retraining. Of course you would also lose the source of variation in the model fitting procedure. Since this hasn't been mentioned, is this not a robust way of assessing whether a variable has a significant impact on the behavior of a model?

from shap.

dswatson avatar dswatson commented on May 22, 2024

That's a good point, the full distributions are definitely more informative than a single number. P-values could be a nice touch though, so I'm curious to see what you come up with there. On a related note, I'm wondering if there's some straightforward way to calculate confidence intervals for Shapley value estimates? Seems tough to do without introducing some parametric assumptions...but then again, maybe those assumptions are justified?

from shap.

slundberg avatar slundberg commented on May 22, 2024

You might be able to make some progress on a parametric approximation if you knew the hessian of the model you were explaining. But in practice, if you have access to the model, I would just recommend bootstrapping (retrain the model on many bootstrap resamples of your training data).

from shap.

dswatson avatar dswatson commented on May 22, 2024

Oof, that could be really time consuming for a complex model trained on a large dataset. But I suppose it's really the only completely nonparametric way to estimate confidence intervals here...I'm curious how you'd define the null distribution for a feature's Shapley values? I'm imagining some sort of t-test like you'd use with linear model coefficients, but again, we would need standard errors to make those calculations. I guess I'll just have to wait for that notebook on P-values!

from shap.

snowde avatar snowde commented on May 22, 2024

Hi Scott,

Have you been able to look into the potential of a p-value and confidence intervals? I believe your package can be an excellent source for statistical tests, but I am not absolutely sure how to gather such values myself. This package by Susan Athey from the causal inference community adapted random forest just for the purpose of deriving statistical niceties. https://github.com/swager/grf

Let me know what you think.

Regards,
Derek

from shap.

slundberg avatar slundberg commented on May 22, 2024

I haven't yet posted anything on p-values, but you can use bootstrap resampling (retrain the model many times) to get confidence intervals on the SHAP values. I recommend using the the dot product between the SHAP values for a single feature on the original dataset vs. the bootstrapped samples to measure a global feature feature confidence interval. The R package you point out looks nice, they seem to be specifically deriving some asymptotic distributions for confidence intervals which avoid the need for bootstrap sampling.

from shap.

snowde avatar snowde commented on May 22, 2024

Can you elaborate on this, if not feel free to defer me to the future notebook. "I recommend using the the dot product between the SHAP values for a single feature on the original dataset vs. the bootstrapped samples to measure a global feature feature confidence interval."

from shap.

msobroza avatar msobroza commented on May 22, 2024

Hi,

Sorry if my question is not so relevant. But if I understood you suggest to apply a dot product between the SHAP values and a column vector representing a feature across all samples of the dataset. But if the original dataset contains a large number of examples this value probably is bigger than the bootstrapped dataset. My question is shouldn't normalize the dot product ?

Thanks,
Max

from shap.

unnir avatar unnir commented on May 22, 2024

plus 1.
I would be happy to have a get a list with feature coefficients.

from shap.

hrsuraj avatar hrsuraj commented on May 22, 2024

If you train an XGBoost model and then explain it on your training dataset to get a matrix of shap_values you can consider a single column the importance of a feature across all samples. It will have both positive and negative values. What you want to know is if those values have a meaningful trend or are just driven by random noise. To evaluate this you can retrain your model on a bootstrap resample of your dataset and then explain it again on your original training data to get another matrix of shap_values. If you take the dot product (or correlation) between two of the same columns in each matrix you will see how well the impacts of a feature in the first model agree with the impacts of that same feature in the other model. By repeating this many times you will get an estimate of the global stability of a feature. (if the correlation is consistently greater than 0 then you have a significant feature) There may be better ways to do this, but that's what I have done.

On Mon, Jul 2, 2018 at 12:18 PM snowde @.***> wrote: Can you elaborate on this, if not feel free to defer me to the future notebook. "I recommend using the the dot product between the SHAP values for a single feature on the original dataset vs. the bootstrapped samples to measure a global feature feature confidence interval." — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#13 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ADkTxSHEJ6iEGNnsr6wP3RLz0NNTbuOWks5uCnIigaJpZM4Rdq9_ .

This is a nice suggestion. However for it to be statistically conclusive, there would have to be a lot of retraining done on many resampled datasets. In situations where that exercise is prohibitive (large datasets, limited compute resources, time constraints) what would you suggest as an alternative?

Btw, many thanks for this wonderful resource that you have chosen to make open-source!

from shap.

slundberg avatar slundberg commented on May 22, 2024

@hrsuraj you are right that bootstrapping can be expensive computationally. I don't know of any good closed form alternatives here though. One suggestion is that you don't have to use an entire large dataset for the explanations (though you may still want to for training).

One other thought is that if you have some type of bootstrapped random forest model you might be able to just sub-select trees instead of retraining the whole model.

from shap.

bozorgpanah avatar bozorgpanah commented on May 22, 2024

Hello,

I have a question regarding the SHAP feature importance plot. Since SHAP is based on shapely vale theory, I expect that the sum of all "mean SHAP value (average impact on model output magnitude)" on the x-axis should be one! But It is not! Would you please let me know more about it? Why the sum of all mean SHAP values is bigger than the one!?

from shap.

lacava avatar lacava commented on May 22, 2024

Hello,

I have a question regarding the SHAP feature importance plot. Since SHAP is based on shapely vale theory, I expect that the sum of all "mean SHAP value (average impact on model output magnitude)" on the x-axis should be one! But It is not! Would you please let me know more about it? Why the sum of all mean SHAP values is bigger than the one!?

The sum of shap values for any sample should equal the model output at that sample,. So I imagine the sum of the mean SHAP values would equal the mean model output over those values, not one.

from shap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.