Light

sato9hara / defragtrees Goto Github PK

View Code? Open in Web Editor NEW

83.0 9.0 22.0 35.57 MB

Python code for tree ensemble interpretation

License: MIT License

Python 94.87% R 5.13%

defragtrees's Introduction

defragTrees

Python code for tree ensemble interpretation proposed in the following paper.

S. Hara, K. Hayashi, Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. In Proceedings of the 21th International Conference on Artificial Intelligence and Statistics (AISTATS'18), pages 77--85, 2018.

Requirements

To use defragTrees:

Python3.x
Numpy
Pandas

To run example codes in example directory:

Python: XGBoost, Scikit-learn
R: randomForest

To replicate paper results in paper directory:

Python: Scikit-learn, Matplotlib, pylab
R: randomForest, inTrees, nodeHarvest

Usage

Prepare data:

Input X: feature matrix, numpy array of size (num, dim).
Output y: output array, numpy array of size (num,).
- For regression, y is real value.
- For classification, y is class index (i.e., 0, 1, 2, ..., C-1, for C classes).
Splitter splitter: thresholds of tree ensembles, numpy array of size (# of split rules, 2).
- Each row of splitter is (feature index, threshold). Suppose the split rule is second feature < 0.5, the row of splitter is then (1, 0.5).

Import the class:

from defragTrees import DefragModel

Fit the simplified model:

Kmax = 10 # uppder-bound number of rules to be fitted
mdl = DefragModel(modeltype='regression') # change to 'classification' if necessary.
mdl.fit(X, y, splitter, Kmax)
#mdl.fit(X, y, splitter, Kmax, fittype='EM') # use this when one wants exactly Kmax rules to be fitted

Check the learned rules:

print(mdl)

For further deitals, see defragTrees.py. In IPython, one can check:

import defragTrees
defragTrees?

Examples

Simple Examples

See example directory.

Replicating Paper Results

See paper directory.

defragtrees's People

Contributors

Stargazers

Watchers

defragtrees's Issues

Feature request: tree visualization + LightGBM support

It would just be really nice to visualize the rules defragTrees generates in a graphical tree (similar to scikit-learn), and if there was support for the ensemble trees from LightGBM.

Advice needed for regression case

I was able to successfully apply this method to a practical classification problem and got pretty good results.
However, when I try to use the default setup for a regression random forest in Python (using jupyter notebook environment) with 100,000 data points, the kernel crashed. Is there anything I should be aware of when dealing with regression instead of classification? Thanks.

Possible printout inequality error

Based on the codes in the "check" function, l[1] == 0 represents "value <= x" and l[1] == 1 represents "x < value".
However, in the print implementation, it seems like l[1] == 0 represents "value < x" and l[1] == 1 represents "x <= value".
Am I misunderstanding the implementation, or there is actually an error?

xgboost trees - IndexError

I'm probably doing something stupid, but I was trying to use the xgboost functionality with a toy example of my own, and it resulted in the following error:

IndexError: too many indices for array

An example script and data files may be downloaded from here.

Any idea by chance what's going on?

Example lgb failing with IndexError

I can't reproduce LightGBM example:

> python exmaple_lgb.py

...(training outputs)...

----- Found Rules -----
Traceback (most recent call last):
  File "example_lgb.py", line 51, in <module>
    print(mdl)
  File "../defragTrees.py", line 87, in __str__
    box, vmin, vmax = self.__r2box(self.rule_[i], self.dim_)
  File "../defragTrees.py", line 187, in __r2box
    box[1, rr[0]-1] = np.minimum(box[1, rr[0]-1], rr[2])
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I am using python 3.7 with the following packages:

lightgbm==2.2.2
numpy==1.15.4
pandas==0.23.4
scikit-learn==0.20.1
scipy==1.1.0

On the related note, could you please add requirements.txt file.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.