Code Monkey home page Code Monkey logo

integrated-gradients's Introduction

Integrated Gradients

(a.k.a. Path-Integrated Gradients, a.k.a. Axiomatic Attribution for Deep Networks)

Contact: integrated-gradients AT gmail.com

Contributors (alphabetical, last name):

  • Kedar Dhamdhere (Google)
  • Pramod Kaushik Mudrakarta (U. Chicago)
  • Mukund Sundararajan (Google)
  • Ankur Taly (Google Brain)
  • Jinhua (Shawn) Xu (Verily)

We study the problem of attributing the prediction of a deep network to its input features, as an attempt towards explaining individual predictions. For instance, in an object recognition network, an attribution method could tell us which pixels of the image were responsible for a certain label being picked, or which words from sentence were indicative of strong sentiment.

Applications range from helping a developer debug, allowing analysts to explore the logic of a network, and to give end-user’s some transparency into the reason for a network’s prediction.

Integrated Gradients is a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).

Relevant papers and slide decks

  • Axiomatic Attribution for Deep Networks -- Mukund Sundararajan, Ankur Taly, Qiqi Yan, Proceedings of International Conference on Machine Learning (ICML), 2017

    This paper introduced the Integrated Gradients method. It presents an axiomatic justification of the method along with applications to various deep networks. Slide deck

  • Did the model understand the questions? -- Pramod Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere, Proceedings of Association of Computational Linguistics (ACL), 2018

    This paper discusses an application of integrated gradients for evaluating the robustness of question-answering networks. Slide deck

Implementing Integrated Gradients

This How-To document describes the steps involved in implementing integrated gradients for an arbitrary deep network.

This repository provideds code for implementing integrated gradients for networks with image inputs. It is structured as follows:

We recommend starting with the notebook. To run the notebook, please follow the following instructions.

  • Clone this repository

    git clone https://github.com/ankurtaly/Integrated-Gradients.git
    
  • In the same directory, run the Jupyter notebook server.

    jupyter notebook
    

    Instructions for installing Jupyter are available here. Please make sure that you have TensorFlow, NumPy, and PIL.Image installed for Python 2.7.

  • Open ig_inception.ipynb and run all cells.

integrated-gradients's People

Contributors

aalokshanbhag avatar ankurtaly avatar matthewsot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

integrated-gradients's Issues

can't extract gradients of certain images

Hi, I successfully extracted gradients on some images. It is a very nice job. However, It fails on some images like attached images. The gradient score has Nan value. How can we fix it? Thanks.

1
4

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 175


UnicodeDecodeError Traceback (most recent call last)
in ()
1 # Load the Inception model.
----> 2 sess, graph = load_model()
3
4 # Load the labels vocabulary.
5 labels = np.array(open(LABELS_LOC).read().split('\n'))

in load_model()
8 cfg = tf.ConfigProto(gpu_options={'allow_growth':True})
9 sess = tf.InteractiveSession(graph=graph, config=cfg)
---> 10 graph_def = tf.GraphDef.FromString(open(MODEL_LOC).read())
11 tf.import_graph_def(graph_def)
12 return sess, graph

C:\ProgramData\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 175: character maps to

How would you justify negative attributions?

Hi Ankur,

Thank you for the excellent job of integrated gradient! It provides a great guideline for exploring what the neural network is doing. Can I ask whether there is any justification for negative attributions? Or should we just interpret that as a smaller attribution. Because it's not that intuitive seeing negative attribution in LSTMs.

E.g. Given Attribution_1 = -1, Attribution_2 = 1, can we naively suggest that Attribution_2 brings more impact to the final result?

Best,
Yijun

Contact Email address

Hi Dr.Ankur Taly

I am trying to sending email to [email protected] about my question to implement the method. However, the email address above does not exist. Is there any mistake I made in the email address? Or can I have any other email address to send my question?

Thanks for your help and time.

Ruo

Query Regarding the Baseline Image

I have a query regarding the baseline image and hope someone would help me out with this.

Let's say I am using AlexNet for image classification and want to find the importance of input pixels based on its predictions. As we know, AlexNet expects the input image to be normalized using mean and std, as mentioned here.

So, now I can think of 2 baseline images which seem to fit in the definition of black (all zero) image, as defined in the Integrated Gradients paper.

  1. We can use the 'Average Image' as the baseline. So, if we normalize this baseline, we would essentially be feeding black (all zero) input to the network. (This seems more likely possibility to me).
  2. We can use 'all zero' image as the baseline. So, after normalization, the actual image that we would feed to the network would have values equal to (-mean/std).

It would be really helpful to me if someone could clarify this.

Thanks,
Naman

extract classificaion rules?

Hi,

Is it possible to use your tools to extract some classification rules (IF-ELSE rule) from deep network, given some input features? thanks.

Provided contact e-mail address doesn't work

Hey Ankur,

maybe this is not the best way to point to this issue as it's not code related, but I attempted to send a question regarding baseline selection and apparently the [email protected] address cannot be found, so maybe the contact in the README.md should be updated.

Thanks!

IG for Multi-Task Models

Thank you so much for the great work! I was wondering for multi-task models how can the attributions from two separate tasks be combined to understand the importance of features for both tasks combined? does it make sense to just sum them up?

positive and negative polarity

Hi,

I have a question that what do the positive and negative polarity mean.
How can I explain it in my paper?

Thanks.

Typo in equation (2) of the paper "Axiomatic Attribution for Deep Networks" ?

There seems to be a typo in the definition of path integrated gradients (equation 2) in the paper Axiomatic Attribution for Deep Networks.

I think it should be $\frac{\partial F(\gamma(\alpha))}{\partial x_i}$ and not $\frac{\partial F(\gamma(\alpha))}{\partial\gamma_i(\alpha)}$ (see proof below).

As I am not a specialist of path integrals, please tell me if I am missing something or if the 2 quantities are equivalent.

Proof (in 2 dimensions for simplicity)

Let:

  • the function $\gamma(\alpha)$ that specifies a path in $\mathbb{R}^2$ from the baseline $x'=\gamma(0)$ to the input $x=\gamma(1)$
  • the gradient $\nabla F(\gamma(\alpha)) = \left( \frac{\partial F(\gamma(\alpha))}{\partial x_1}, \frac{\partial F(\gamma(\alpha))}{\partial x_2} \right)$
  • the derivative $\gamma'(\alpha)=\left( \frac{\partial\gamma_1(\alpha)}{\partial\alpha}, \frac{\partial\gamma_2(\alpha)}{\partial\alpha} \right)$

We have then:

$$\begin{flalign} F(x) - F(x') &= \int_{\alpha=0}^1 \nabla F(\gamma(\alpha))\cdot\gamma'(\alpha)d\alpha\tag*{gradient theorem}\\\ &=\int_{\alpha=0}^1 \left( \frac{\partial F(\gamma(\alpha))}{\partial x_1}, \frac{\partial F(\gamma(\alpha))}{\partial x_2}\right) \cdot \left( \frac{\partial\gamma_1(\alpha)}{\partial\alpha}, \frac{\partial\gamma_2(\alpha)}{\partial\alpha} \right) d\alpha\tag*{definition of gradient and derivative} \\\ &=\int_{\alpha=0}^1 \sum_{i=1}^{n=2} \frac{\partial F(\gamma(\alpha))}{\partial x_i} \frac{\partial\gamma_i(\alpha)}{\partial\alpha}d\alpha\tag*{definition of dot product}\\\ &=\sum_{i=1}^{n=2} \int_{\alpha=0}^1 \frac{\partial F(\gamma(\alpha))}{\partial x_i} \frac{\partial\gamma_i(\alpha)}{\partial\alpha}d\alpha\tag*{linearity of integral}\\\ &= \sum_{i=1}^{n=2} \text{PathIntegratedGrads}_i^\gamma(x) \tag*{Completeness} \end{flalign}$$

So the definition of path integrated gradients is:

$$\text{PathIntegratedGrads}_i^{\gamma}(x) = \int_{\alpha=0}^1 \frac{\partial F(\gamma(\alpha))}{\partial x_i} \frac{\partial\gamma_i(\alpha)}{\partial\alpha}d\alpha$$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.