Code Monkey home page Code Monkey logo

fooling-lime-shap's Introduction

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods

This is the code for our paper, "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods."

Read the paper.

Getting started

Setup virtual environment and install requirements:

conda create -n fooling_limeshap python=3.7
source activate fooling_limeshap
pip install -r requirements.txt

You should be able to run the code now!

We provide a short walk through on COMPAS in COMPAS_Example.ipynb. This is a nice place to get started to see how our method works. Applications of the attack on each data set can be found in compas_experiment.py, cc_experiment.py, and german_experiment.py.

References

Please consider citing our paper if you found this work useful!

@inproceedings{advlime:aies20,
  author = {Dylan Slack and Sophie Hilgard and Emily Jia and Sameer Singh and Himabindu Lakkaraju},
  title = {Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods},
  booktitle = {AAAI/ACM Conference on AI, Ethics, and Society (AIES)},
  year = {2020}
}

Contact

This code was developed by Dylan Slack, Sophie Hilgard, and Emily Jia. Reach out to us with any questions!

Our emails are: [email protected], [email protected], and [email protected].

fooling-lime-shap's People

Contributors

dylan-slack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fooling-lime-shap's Issues

Problem the choice of y target in the adversarial model

Will the the both model, the biased original model and adversarial one return the same the prediction for a test real sample?

for _ in range(perturbation_multiplier):
perturbed_xtrain = np.random.normal(0,self.perturbation_std,size=X.shape)
p_train_x = np.vstack((X, X + perturbed_xtrain))
p_train_y = np.concatenate((np.ones(X.shape[0]), np.zeros(X.shape[0])))
all_x.append(p_train_x)
all_y.append(p_train_y)

Here the y label is set as whether the sample is real or synthesized. With the different target definition, I am confused the both two model return the same prediction values.

Strong dependence on using kmeans background samples for SHAP

Hey! I finally got around to playing with the examples you have here, and I noticed that you were using shap.kmeans to get the background data. Since I typically use a random sample not kmeans (unless I am trying to really trying to play with run time optimization), I just swapped

background_distribution = shap.kmeans(xtrain,10)

for

background_distribution = shap.sample(xtrain,10)

When I did this all the adversarial results for SHAP seemed to fall apart for COMPAS...meaning 79% of the time race is still the top SHAP feature in the test dataset for the adversarial model.

This very strong dependence on using kmeans was surprising to me, since it seems to imply SHAP is much more robust to these adversarial attacks when using a typical random background sample. Have you noticed this before, or do you have any thoughts on this? I think it is worth pointing out, but I wanted to get your feedback before suggesting to users that a random sample provides better adversarial robustness.

Thanks!

Using different estimator

I would like to use my own estimator while training the adversary model. So, i do this:
adv_lime = Adversarial_Lime_Model(racist_model_f(), innocuous_model_psi()).\ train(xtrain, ytrain, feature_names=features, categorical_features=categorical_feature_indcs, estimator=knn)
But when i degubbed the training, i noticed that the estimator is still RandomForestClassifier from /Fooling-LIME-SHAP/adversarial_models.py line 177-182. Not sure the reason, then i commented out #self.perturbation_identifier = RandomForestClassifier(n_estimators=rf_estimators).fit(xtrain, ytrain) and used my own estimator by: self.perturbation_identifier = KNeighborsClassifier().fit(xtrain, ytrain). I even commented out line 5: # from sklearn.ensemble import RandomForestClassifier just to see what is going on. the program then uses the KNeighborsClassifier at line 180 but the self.perturbation_identifier on line 182 is None somehow. and for that, i get this error: ['NoneType' object has no attribute 'predict'](), which makes sense because the self.perturbation_identifier is somehow turning to None after one line. i don't know how that's possible. Any suggestion?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.