Code Monkey home page Code Monkey logo

mli-resources's Introduction

Machine Learning Interpretability (MLI)

Machine learning algorithms create potentially more accurate models than linear models, but any increase in accuracy over more traditional, better-understood, and more easily explainable techniques is not practical for those who must explain their models to regulators or customers. For many decades, the models created by machine learning algorithms were generally taken to be black-boxes. However, a recent flurry of research has introduced credible techniques for interpreting complex, machine-learned models. Materials presented here illustrate applications or adaptations of these techniques for practicing data scientists.

Want to contribute your own content? Just make a pull request.

Want to use the content in this repo? Just cite the H2O.ai machine learning interpretability team or the original author(s) as appropriate.

Contents

Practical MLI examples

(A Dockerfile is provided that will construct a container with all necessary dependencies to run the examples here.)

Installation of Examples

Dockerfile

A Dockerfile is provided to build a docker container with all necessary packages and dependencies. This is the easiest way to use these examples if you are on Mac OS X, *nix, or Windows 10. To do so:

  1. Install and start docker. From a terminal:
  2. Create a directory for the Dockerfile.
    $ mkdir anaconda_py36_h2o_xgboost_graphviz
  3. Fetch the Dockerfile from the mli-resources repo.
    $ curl https://raw.githubusercontent.com/h2oai/mli-resources/master/anaconda_py36_h2o_xgboost_graphviz/Dockerfile > anaconda_py36_h2o_xgboost_graphviz/Dockerfile
  4. Build a docker image from the Dockefile. For this and other docker commands below, you may need to use sudo.
    $ docker build --no-cache anaconda_py36_h2o_xgboost_graphviz
  5. Display docker image IDs. You are probably interested in the most recently created image.
    $ docker images
  6. Start the docker image and the Jupyter notebook server.
    $ docker run -i -t -p 8888:8888 <image_id> /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && /opt/conda/bin/jupyter notebook --notebook-dir=/mli-resources --ip='*' --port=8888 --no-browser --allow-root"
  7. List docker containers.
    $ docker ps
  8. Copy the sample data into the Docker container. Refer to GetData.md to obtain datasets needed for notebooks.
    $ docker cp path/to/train.csv <container_id>:/mli-resources/data/train.csv
  9. Navigate to the port Jupyter directs you to on your machine. It will likely include a token.
Manual

Install:

  1. Anaconda Python 5.1.0 from the Anaconda archives.
  2. Java.
  3. The latest stable h2o Python package.
  4. Git.
  5. XGBoost with Python bindings.
  6. GraphViz.

Anaconda Python, Java, Git, and GraphViz must be added to your system path.

From a terminal:

  1. Clone the mli-resources repository with examples.
    $ git clone https://github.com/h2oai/mli-resources.git
  2. $ cd mli-resources
  3. Copy the sample data into the mli-resources repo directory. Refer to GetData.md to obtain datasets needed for notebooks.
    $ cp path/to/train.csv ./data
  4. Start the Jupyter notebook server.
    $ jupyter notebook
  5. Navigate to the port Jupyter directs you to on your machine.

Additional Code Examples

The notebooks in this repo have been revamped and refined many times. Other versions with different, and potentially interesting, details are available at these locations:

Testing Explanations

One way to test generated explanations for accuracy is with simulated data with known characteristics. For instance, models trained on totally random data with no relationship between a number of input variables and a prediction target should not give strong weight to any input variable nor generate compelling local explanations or reason codes. Conversely, you can use simulated data with a known signal generating function to test that explanations accurately represent that known function. Detailed examples of testing explanations with simulated data are available here. A summary of these results are available here.

Webinars/Videos

Booklets

Conference Presentations

Miscellaneous Resources

General References

mli-resources's People

Contributors

iancovert avatar jphall663 avatar karthiktsaliki avatar lingyaomeng avatar navdeep-g avatar pramitchoudhary avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mli-resources's Issues

How H2O GBM deal with missing value

Thanks everyone for this amazing source of model interpretation. I have a question in LOCO, regarding how H2O GBM deals with missing value.

In the markdown, it is said that H2O GBM deals with missing value by following the majority path in the tree. However in the H2O document, I think it means that H2O GBM deals with missing value by treating it as a new category and minimize the loss function. And in testing it will follow the missing value path optimized in training.

Do I have some missunderstanding here?

Thanks,
Sandy

Docker file fetched from mil-resource has the wrong content

Following the instruction to build the Docker image,

  1. Fetch the Dockerfile from the mli-resources repo.
    $ curl https://raw.githubusercontent.com/h2oai/mli-resources/master/anaconda_py36_h2o_xgboost_graphviz/Dockerfile > anaconda_py36_h2o_xgboost_graphviz/Dockerfile

The resulted Docker file contains the status code rather the content itself. Do this instead,
curl https://raw.githubusercontent.com/h2oai/mli-resources/master/anaconda_py36_h2o_xgboost_graphviz/Dockerfile | Select-Object -ExpandProperty Content

Examples via Docker: missing mli-resources clone & jupyter to be run w/ --allow-root

In order to successfully install examples using Docker I did the following changes:

  • There seems to be missing step which clones mli-resources GitHub repository. Perhaps RUN git clone https://github.com/h2oai/mli-resources.git should be added to Dockerfile (I cloned repo manually).
  • Jupyter refuses to start under root - consider adding --allow-root parameter: docker run -i -t -p 8888:8888 <image_id> /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && /opt/conda/bin/jupyter notebook --notebook-dir=/mli-resources --ip='*' --port=8888 --no-browser --allow-root" and/or suggest to run it under a normal user.
  • Perhaps it would be worth to suggest user to use URL provided by Jupyter after it starts - it contains auth token (README.md, step 9.).

Oracle that generates Smin samples..

In given example a GBM is used to create tree. but TREPAN generates sample to satisfy minimum number of samples required(Smin). how does GBM do that here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.