Code Monkey home page Code Monkey logo

partial_order_in_chaos's Introduction

Partial Order in Chaos

This repository implements the code of the paper Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set.

Explaining models in the Rashomon Set


The main idea is that, given a ML task with finite data, there is no single best predictor but rather an equivalence class of predictors with similar empirical performance, also called a Rashomon Set $$\lbrace h\in \mathcal{H}: \text{error}(h) \leq \epsilon\rbrace.$$

Each of these models gives a different, but still valid perspective on the data. Now, if you were to compute post-hoc explanations (Permutation Importance, SHAP, Integrated Gradient, etc.), you would draw different (and even contradicting) conclusions from the diverse models in the Rashomon Set.

The contribution of this work is that we aggregate the post-hoc explanations of competing models by only highlighting the information on which all good models agree. The model consensus is visualised with a Hasse Diagram, where an arrow points from feature $i$ to feature $j$ if and only if all models agree that the $i$ is more important than $j$. We present some concrete examples

Permutation Importance on COMPAS

Here we compute the Global Feature Importance (GFI) on COMPAS using the Permutation Importance method. That is, we shuffle the value of a feature in the data and compute the resulting decrease in performance. The larger the decrease, the most important the feature is to the model. We show the range of GFI between models in the Rashomon Set of Kernel Ridge Regression fitted to predict COMPAS scores.

GFI COMPAS

First of, looking at the bar charts, no model relies strongly on Sex and Charge so we could discard these features when fitting Kernel Ridge. Secondly, looking at the Hasse Diagram, we see that all models agree features Age and Priors are more important than any other. Therefore, these two feature are the most important ones when predicting COMPAS scores with Kernel Ridge. We note that Age and Priors are incomparable since there is no directed path connecting them. This means that some good models rely more on one then the other.

Local Feature Attributions for House Price

In this experiment, we compute the Local Feature Attributions (LFA) of additive models fitted to predict house prices. Typically, the LFA would be computed with techniques like SHAP or Integrated Gradient, but for additive models, both techniques end up yielding the same LFA. We can compute the LFA of one of the most expensive houses in the dataset to understand why its price is so much higher-than-average.

LFA Houses

The Hasse diagram reveals that, when explaining the high price of this house, all models agree the most important feature is OverallQual=10. This suggests that the quality of the materials of the house might be a major factor in this dataset. Lower in the Hasse diagram, we have other features (1stFlrSF=very large and GarageArea=very large) that are also important for the high price, but to a lesser extend than OverallQual=10.

Structure


The repository is structured as follows

  • experiments All necessary Python scripts to reproduce the experiments of the paper.
  • tests Unitests and visual tests to verify the base functionalies of the repo.
  • uxai Main source code to compute the Rashomon Set of Linear/Additive models, Kernel Ridge Regression, and Random Forests. The following post-hoc explanation methods are supported
    • linear/additive Build-in LFA and GFI since these models are interpretable by design.
    • kernel Integrated Gradient (LFA) and Permutation Importance (GFI).
    • random_forests TreeSHAP (LFA) and Mean Absolute TreeSHAP (GFI). Mean Absolute TreeSHAP consists of averaging the absolute value of local Shapley values and is the de-facto approach in the SHAP library.

partial_order_in_chaos's People

Contributors

gablabc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.