This project demonstrates how you can take a DataRobot model and build clusters on the basis of the prediction explanations.
Status: Functional
Todo: Generate a downloadable dataset with the cluster labels added
You will need a DataRobot account and access to a dedicated prediction server.
You will also need a bunch of python libraries, including the DataRobot package
pip install numpy
pip install pandas
pip install sklearn
pip install matplotlib
pip install functools
pip install hdbscan
pip install datarobot
To run this application you will need a YAML file that authenticates you against your DataRobot instance when using the DataRobot Python Package. Please follow these guidelines to set this up
The core functions that retrieve the predictions and their explanations can be found in the file drpredexplanations.py
The results generated by the above file can then be clustered using one of several functions found in the file drclustering.py
The above functions are used by the example script and the web application example.
Currently the implementation allows you to build either K-Means or HDBScan clusters. The clustering is done on a sparse matrix representation of the prediction explanation strengths.
Additional algorithms, features and distance metrics will be added given time.
The script example.py shows you how to create clusters by specifying a DataRobot project model and dataset using an interactive python session.
The file app.py and the contents of the templates directory is a python flask web application you can use to run the clustering on any of your DataRobot projects, provided that you supply a data set to score against.
It will store the plots generated in the folder static so that they do not need to be re-generated.
To run:
python app.py
Then follow the prompts