ankurtaly / integrated-gradients Goto Github PK

View Code? Open in Web Editor NEW

585.0 11.0 99.0 153.25 MB

Attributing predictions made by the Inception network using the Integrated Gradients method

Jupyter Notebook 98.14% Python 1.86%

integrated-gradients's Introduction

Integrated Gradients

(a.k.a. Path-Integrated Gradients, a.k.a. Axiomatic Attribution for Deep Networks)

Contact: integrated-gradients AT gmail.com

Contributors (alphabetical, last name):

Kedar Dhamdhere (Google)
Pramod Kaushik Mudrakarta (U. Chicago)
Mukund Sundararajan (Google)
Ankur Taly (Google Brain)
Jinhua (Shawn) Xu (Verily)

We study the problem of attributing the prediction of a deep network to its input features, as an attempt towards explaining individual predictions. For instance, in an object recognition network, an attribution method could tell us which pixels of the image were responsible for a certain label being picked, or which words from sentence were indicative of strong sentiment.

Applications range from helping a developer debug, allowing analysts to explore the logic of a network, and to give end-user’s some transparency into the reason for a network’s prediction.

Integrated Gradients is a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).

Relevant papers and slide decks

Axiomatic Attribution for Deep Networks -- Mukund Sundararajan, Ankur Taly, Qiqi Yan, Proceedings of International Conference on Machine Learning (ICML), 2017

This paper introduced the Integrated Gradients method. It presents an axiomatic justification of the method along with applications to various deep networks. Slide deck
Did the model understand the questions? -- Pramod Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere, Proceedings of Association of Computational Linguistics (ACL), 2018

This paper discusses an application of integrated gradients for evaluating the robustness of question-answering networks. Slide deck

Implementing Integrated Gradients

This How-To document describes the steps involved in implementing integrated gradients for an arbitrary deep network.

This repository provideds code for implementing integrated gradients for networks with image inputs. It is structured as follows:

Integrated Gradients library: Library implementing the core integrated gradients algorithm.
Visualization library: Library implementing methods for visualizing atributions for image models.
Inception notebook: A Jupyter notebook for generating and visualizing atributions for the Inception (v1) object recognition network.

We recommend starting with the notebook. To run the notebook, please follow the following instructions.

Clone this repository

git clone https://github.com/ankurtaly/Integrated-Gradients.git

In the same directory, run the Jupyter notebook server.
```
jupyter notebook
```
Instructions for installing Jupyter are available here. Please make sure that you have TensorFlow, NumPy, and PIL.Image installed for Python 2.7.
Open ig_inception.ipynb and run all cells.

integrated-gradients's People

Contributors

Stargazers

Watchers

Forkers

alecmgo-zz miyamotok0105 ml-lab kinect59 yigal aihill briando2005 sagarigrandhi milesqli neutralino birajaghoshal jaedukseo objectdetection joohyeon pl8787 talkingraisin chenleshang shubhampachori12110095 zhaoxy92 pinjutien chuangchuangtan samridhim harshithaparsi b-carter timwee yezhengli-mr9 youjp fuface dsweber2 wan-docai cymkg junwen-xie chriskuei xiaoqingnlp jiajiexiao christinaliang rgib37190 adsglass iamrishab sirnyls adelra ibotamon sayanbanerjee32 qzhao haozhen315 aliceschi93 mdai26 bigdatasciencegroup dekailin praisan stjordanis zl2860 chaoyue729 ernstklrb wolodjaz mnurnobi bunthit bluetyson monz sonloc aaaeeee cat-6sephiruth y12uc231 cuoci sharifi-mahdi ml-edu qyb156 rollingstone codedevonfire visheshagrawal yanchenqiao chanjeunlam qwinpin shifengxu hubayirp vmbbc zedoul firezl wikitropes scipiapps zhangyimi shamgane bjjacking marcellusruben haojiepan1 anish0637 liujie40 frank1543179 castrol68 manu87ds ephrem-eth cswangle 393928715 pickleyang lczxxx123 yodha0000 jahnavis2003

integrated-gradients's Issues

the paper link is broken

It seems the paper link on the readme page is broken.

can't extract gradients of certain images

Hi, I successfully extracted gradients on some images. It is a very nice job. However, It fails on some images like attached images. The gradient score has Nan value. How can we fix it? Thanks.

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 175

UnicodeDecodeError Traceback (most recent call last)
in ()
1 # Load the Inception model.
----> 2 sess, graph = load_model()
3
4 # Load the labels vocabulary.
5 labels = np.array(open(LABELS_LOC).read().split('\n'))

in load_model()
8 cfg = tf.ConfigProto(gpu_options={'allow_growth':True})
9 sess = tf.InteractiveSession(graph=graph, config=cfg)
---> 10 graph_def = tf.GraphDef.FromString(open(MODEL_LOC).read())
11 tf.import_graph_def(graph_def)
12 return sess, graph

C:\ProgramData\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 175: character maps to

How would you justify negative attributions?

Hi Ankur,

Thank you for the excellent job of integrated gradient! It provides a great guideline for exploring what the neural network is doing. Can I ask whether there is any justification for negative attributions? Or should we just interpret that as a smaller attribution. Because it's not that intuitive seeing negative attribution in LSTMs.

E.g. Given Attribution_1 = -1, Attribution_2 = 1, can we naively suggest that Attribution_2 brings more impact to the final result?

Best,
Yijun

Contact Email address

Hi Dr.Ankur Taly

I am trying to sending email to [email protected] about my question to implement the method. However, the email address above does not exist. Is there any mistake I made in the email address? Or can I have any other email address to send my question?

Thanks for your help and time.

Ruo

Query Regarding the Baseline Image

I have a query regarding the baseline image and hope someone would help me out with this.

Let's say I am using AlexNet for image classification and want to find the importance of input pixels based on its predictions. As we know, AlexNet expects the input image to be normalized using mean and std, as mentioned here.

So, now I can think of 2 baseline images which seem to fit in the definition of black (all zero) image, as defined in the Integrated Gradients paper.

We can use the 'Average Image' as the baseline. So, if we normalize this baseline, we would essentially be feeding black (all zero) input to the network. (This seems more likely possibility to me).
We can use 'all zero' image as the baseline. So, after normalization, the actual image that we would feed to the network would have values equal to (-mean/std).

It would be really helpful to me if someone could clarify this.

Thanks,
Naman

Please provide data/examples for IG on Sentiment analysis/ NMT

I am trying to apply IG for SA and am unsure about how this has to be done. In the paper there is a section for NMT where it has been applied to analyse attention weights, I would like to work with the code for that and see examples.

Problem of calculate gradient from embedding to word

extract classificaion rules?

Hi,

Is it possible to use your tools to extract some classification rules (IF-ELSE rule) from deep network, given some input features? thanks.

how can we use other models such vgg

If I want to use your approach with other models such as vgg, how can I do this? Thanks.

Provided contact e-mail address doesn't work

Hey Ankur,

maybe this is not the best way to point to this issue as it's not code related, but I attempted to send a question regarding baseline selection and apparently the [email protected] address cannot be found, so maybe the contact in the README.md should be updated.

Thanks!

IG for Multi-Task Models

Thank you so much for the great work! I was wondering for multi-task models how can the attributions from two separate tasks be combined to understand the importance of features for both tasks combined? does it make sense to just sum them up?

URL invalid: Python Notebook for Inception model (object recognition model for images)

Python Notebook for Inception model (object recognition model for images)

This one in references is invalid.

positive and negative polarity

Hi,

I have a question that what do the positive and negative polarity mean.
How can I explain it in my paper?

Thanks.

Can you provide an example of text like Sentiment classification?

Specifically, how to deal with word embedding....

Typo in equation (2) of the paper "Axiomatic Attribution for Deep Networks" ?

There seems to be a typo in the definition of path integrated gradients (equation 2) in the paper Axiomatic Attribution for Deep Networks.

I think it should be $\frac{\partial F(\gamma(\alpha))}{\partial x_i}$ and not $\frac{\partial F(\gamma(\alpha))}{\partial\gamma_i(\alpha)}$ (see proof below).

As I am not a specialist of path integrals, please tell me if I am missing something or if the 2 quantities are equivalent.

Proof (in 2 dimensions for simplicity)

Let:

the function $\gamma(\alpha)$ that specifies a path in $\mathbb{R}^2$ from the baseline $x'=\gamma(0)$ to the input $x=\gamma(1)$
the gradient $\nabla F(\gamma(\alpha)) = \left( \frac{\partial F(\gamma(\alpha))}{\partial x_1}, \frac{\partial F(\gamma(\alpha))}{\partial x_2} \right)$
the derivative $\gamma'(\alpha)=\left( \frac{\partial\gamma_1(\alpha)}{\partial\alpha}, \frac{\partial\gamma_2(\alpha)}{\partial\alpha} \right)$

We have then:

$$\begin{flalign} F(x) - F(x') &= \int_{\alpha=0}^1 \nabla F(\gamma(\alpha))\cdot\gamma'(\alpha)d\alpha\tag*{gradient theorem}\\\ &=\int_{\alpha=0}^1 \left( \frac{\partial F(\gamma(\alpha))}{\partial x_1}, \frac{\partial F(\gamma(\alpha))}{\partial x_2}\right) \cdot \left( \frac{\partial\gamma_1(\alpha)}{\partial\alpha}, \frac{\partial\gamma_2(\alpha)}{\partial\alpha} \right) d\alpha\tag*{definition of gradient and derivative} \\\ &=\int_{\alpha=0}^1 \sum_{i=1}^{n=2} \frac{\partial F(\gamma(\alpha))}{\partial x_i} \frac{\partial\gamma_i(\alpha)}{\partial\alpha}d\alpha\tag*{definition of dot product}\\\ &=\sum_{i=1}^{n=2} \int_{\alpha=0}^1 \frac{\partial F(\gamma(\alpha))}{\partial x_i} \frac{\partial\gamma_i(\alpha)}{\partial\alpha}d\alpha\tag*{linearity of integral}\\\ &= \sum_{i=1}^{n=2} \text{PathIntegratedGrads}_i^\gamma(x) \tag*{Completeness} \end{flalign}$$

So the definition of path integrated gradients is:

$$\text{PathIntegratedGrads}_i^{\gamma}(x) = \int_{\alpha=0}^1 \frac{\partial F(\gamma(\alpha))}{\partial x_i} \frac{\partial\gamma_i(\alpha)}{\partial\alpha}d\alpha$$