Code Monkey home page Code Monkey logo

distributedkernelshap's Introduction

Distributing KernelSHAP using ray

This repository shows how to distribute explanations with KernelSHAP one a single node or a Kubernetes cluster using ray. The predictions of a logistic regression model on 2560 instances from the Adult dataset are explained using KernelSHAP configured with a background set of 100 samples from the same dataset. The data preprocessing and model fitting steps are available in the scripts/ folder, but both the data and the model will be automatically downloaded by the benchmarking scripts.

Distributed KernelSHAP on a single multicore node

Setup

  1. Install conda
  2. Create a virtual environment with conda create --name shap python=3.7
  3. Activate the environment with conda activate shap
  4. Execute pip install . in order to install the dependencies needed to run the benchmarking scripts

Running the benchmarks

Two code versions are available:

  • One using a parallel pool of ray actors, which consume small subsets of the 2560 dataset to be explained
  • One using ray serve instead of the parallel pool

The two methods can be run from the repository root, using the scripts benchmarks/ray_pool.py and bechmarks/serve_explanations.py, respectively. Options that can be configured are:

  • number of actors/replicas that the task is going to be distributed on (e.g., --workers 5 (pool), --replicas 5 (ray serve))
  • if a benchmark (i.e., redistributing the task over an increasingly large pool or number of replicas) is to be performed (-benchmark 0 to disable or benchmark 1 to enable)
  • the number of times the task is run for the same configuration in benchmarking mode (e.g, --nruns 5)
  • how many instances can be sent to an actor/replica at once (this is a required argument) (e.g., -b 1 5 10 (pool) -batch 1 5 10 (ray serve)). If more than one value is passed after the argument name, the task (or benchmarking) will be executed for different batch sizes

Distributed KernelShap on a Kubernetes cluster

Setup

This requires you to have access to a Kubernetes cluster and have kubectl installed. Don't forget to export the path to the cluster configuration .yaml file in your KUBECONFIG environment variable, as described here before moving on to the next steps.

Running the benchmarks

The ray_pool.py and serve_explanations.py have been modified to be deployable in the kubernetes and prefixed by k8s_. The benchmark experiments can be run via the bash scripts in the benchmarks/ folder. These scripts:

  • Apply the appropriate k8s manifest in cluster/ to the k8s cluster
  • Upload a k8s*.py file to it
  • Run the script
  • Pull the results and save them in the results directory

Specifically:

  • Calling bash benchmarks/k8s_benchmark_pool.sh 10 20 will run the benchmark with increasing number of workers (the cluster is reset as the number of workers is increased). By default the experiment is run with batches of sizes 1 5 and 10. This can be changed by updating the value of BATCH in cluster/Makefile.pool
  • Calling bash benchmarks/k8s_benchmark_serve.sh 10 20 ray will run the benchmark with increasing number of workers and batch size of 1 5 and 10 for each worker. The batch size setting can be modified from the .sh script itself. The ray argument means that ray is able to batch single requests together and dispatch them to the same worker. If replaced by default, minibatches will be distributed to each worker

Sample results

Single node

The experiments were run on a compute-optimized dedicated machine in Digital Ocean with 32vCPUs. This explains why the performance gains attenuation below.

The results obtained running the task using the ray parallel pool are below:

alt text

Distributing using ray serve yields similar results:

alt text

Kubernetes cluster

The experiments were run on a cluster consisting of two compute-optimized dedicated machine in Digital Ocean with 32vCPUs each. This explains why the performance gains attenuation below.

The results obtained running the task using the ray parallel pool over a two-node cluster are shown below:

alt text alt text

Distributing using ray serve yields similar results:

alt text alt text

distributedkernelshap's People

Contributors

alexcoca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

micseb

distributedkernelshap's Issues

Got ray.serialization.DeserializationError when trying to run KernelShap in distributed mode

Hello,
I read your documentation and blog post and was interested in running KernelShap in distributed mode.
I've installed ray and alibi[ray] as in the documentation. Other packages were already installed.
Running the following code on a binary dataset (1s and 0s as feature values) raised a serialization issue:

opts = {'n_cpus': 19}
start = time.time()
distrib_explainer = KernelShap(self.func_predict, distributed_opts=opts)
distrib_explainer.fit(background_set)
distib_shap_values = distrib_explainer.explain(record_to_explain, nsamples='auto')
print(str(time.time() - start))

This is the error:
raise DeserializationError()
ray.serialization.DeserializationError

Do you know what is the problem and how it can be resolved? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.