Code Monkey home page Code Monkey logo

defragtrees's Introduction

defragTrees

Python code for tree ensemble interpretation proposed in the following paper.

Requirements

To use defragTrees:

  • Python3.x
  • Numpy
  • Pandas

To run example codes in example directory:

  • Python: XGBoost, Scikit-learn
  • R: randomForest

To replicate paper results in paper directory:

  • Python: Scikit-learn, Matplotlib, pylab
  • R: randomForest, inTrees, nodeHarvest

Usage

Prepare data:

  • Input X: feature matrix, numpy array of size (num, dim).
  • Output y: output array, numpy array of size (num,).
    • For regression, y is real value.
    • For classification, y is class index (i.e., 0, 1, 2, ..., C-1, for C classes).
  • Splitter splitter: thresholds of tree ensembles, numpy array of size (# of split rules, 2).
    • Each row of splitter is (feature index, threshold). Suppose the split rule is second feature < 0.5, the row of splitter is then (1, 0.5).

Import the class:

from defragTrees import DefragModel

Fit the simplified model:

Kmax = 10 # uppder-bound number of rules to be fitted
mdl = DefragModel(modeltype='regression') # change to 'classification' if necessary.
mdl.fit(X, y, splitter, Kmax)
#mdl.fit(X, y, splitter, Kmax, fittype='EM') # use this when one wants exactly Kmax rules to be fitted

Check the learned rules:

print(mdl)

For further deitals, see defragTrees.py. In IPython, one can check:

import defragTrees
defragTrees?

Examples

Simple Examples

See example directory.

Replicating Paper Results

See paper directory.

defragtrees's People

Contributors

sato9hara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

defragtrees's Issues

Advice needed for regression case

I was able to successfully apply this method to a practical classification problem and got pretty good results.
However, when I try to use the default setup for a regression random forest in Python (using jupyter notebook environment) with 100,000 data points, the kernel crashed. Is there anything I should be aware of when dealing with regression instead of classification? Thanks.

Possible printout inequality error

Based on the codes in the "check" function, l[1] == 0 represents "value <= x" and l[1] == 1 represents "x < value".
However, in the print implementation, it seems like l[1] == 0 represents "value < x" and l[1] == 1 represents "x <= value".
Am I misunderstanding the implementation, or there is actually an error?

xgboost trees - IndexError

I'm probably doing something stupid, but I was trying to use the xgboost functionality with a toy example of my own, and it resulted in the following error:

IndexError: too many indices for array

An example script and data files may be downloaded from here.

Any idea by chance what's going on?

Example lgb failing with IndexError

I can't reproduce LightGBM example:

> python exmaple_lgb.py

...(training outputs)...

----- Found Rules -----
Traceback (most recent call last):
  File "example_lgb.py", line 51, in <module>
    print(mdl)
  File "../defragTrees.py", line 87, in __str__
    box, vmin, vmax = self.__r2box(self.rule_[i], self.dim_)
  File "../defragTrees.py", line 187, in __r2box
    box[1, rr[0]-1] = np.minimum(box[1, rr[0]-1], rr[2])
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I am using python 3.7 with the following packages:

lightgbm==2.2.2
numpy==1.15.4
pandas==0.23.4
scikit-learn==0.20.1
scipy==1.1.0

On the related note, could you please add requirements.txt file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.