Code Monkey home page Code Monkey logo

random-subgroups's Introduction

Random Subgroups python package

Making predictions with subgroups

random-subgroups is a machine learning package compatible with scikit-learn.

It uses ensembles of weak estimators, as in random forests, for classification and regression tasks. The main difference from the random forests algorithm is that it uses subgroups as estimators.

The subgroup discovery implementation of this package is made on top of the pysubgroup package. It uses many of the features from pysubgroup, but it also extends it with different quality measures (more suitable for prediction) and different search strategies.

Installation:

pip install random-subgroups

Example of the classifier:

from randomsubgroups import RandomSubgroupClassifier
from sklearn import datasets

data = datasets.load_breast_cancer()
y = data.target
X = data.data

sg_classifier = RandomSubgroupClassifier(n_estimators=30)

sg_classifier.fit(X, y)
>>> classifiers_df = sg_classifier.show_models()

Target: 0; Model: Col26>=0.27 AND Col7>=0.06
Target: 0; Model: Col3>=435.17 AND Col6>=0.11
Target: 0; Model: Col20>=18.22 AND Col3>=806.60
Target: 0; Model: Col16>=0.02 AND Col20>=15.87
Target: 0; Model: Col17>=0.01 AND Col20>=17.91
Target: 0; Model: Col20>=17.50 AND Col22>=118.60
Target: 0; Model: Col23>=1004.60 AND Col7>=0.05
Target: 0; Model: Col0>=15.33 AND Col13>=21.73
Target: 0; Model: Col22>=124.16
Target: 0; Model: Col13>=18.88 AND Col3>=716.60
Target: 0; Model: Col12>=1.39 AND Col22>=123.11
Target: 0; Model: Col23>=1030.0 AND Col6>=0.05
Target: 0; Model: Col27>=0.15 AND Col3>=358.90
Target: 0; Model: Col15>=0.01 AND Col23>=883.99
Target: 0; Model: Col0>=10.98 AND Col27>=0.16
Target: 1; Model: Col22<105.0 AND Col27<0.16
Target: 1; Model: Col20<15.53 AND Col26<0.35
Target: 1; Model: Col7<0.05
Target: 1; Model: Col13<42.86 AND Col27<0.12
Target: 1; Model: Col23<771.47 AND Col27<0.12
Target: 1; Model: Col20<17.79 AND Col25<0.20
Target: 1; Model: Col20<15.49 AND Col27<0.15
Target: 1; Model: Col13<29.40 AND Col1<19.98
Target: 1; Model: Col20<15.75 AND Col6<0.08
Target: 1; Model: Col20<15.63 AND Col27<0.20
Target: 1; Model: Col22<104.79 AND Col29<0.10
Target: 1; Model: Col20<14.69 AND Col6<0.12
Target: 1; Model: Col27<0.12 AND Col3<693.70
Target: 1; Model: Col20<16.00 AND Col6<0.09
Target: 1; Model: Col27<0.11 AND Col7<0.06
>>> sg_classifier.show_decision(X[5])

The predicted value is: 0
From a total of 6 estimators.

The subgroups used in the prediction are:

 Predicting target 0
Col0>=10.98 AND Col27>=0.16 ---> 0
Col26>=0.27 AND Col7>=0.06 ---> 0
Col27>=0.15 AND Col3>=358.90 ---> 0
Col3>=435.17 AND Col6>=0.11 ---> 0

 Predicting target 1
Col13<29.40 AND Col1<19.98 ---> 1
Col20<15.63 AND Col27<0.20 ---> 1

The targets of the subgroups used in the prediction have the following distribution:

output

Example of the regressor:

from randomsubgroups import RandomSubgroupRegressor
from sklearn import datasets

data = datasets.load_diabetes()
y = data.target
X = data.data

sg_regressor = RandomSubgroupRegressor(n_estimators=30)

sg_regressor.fit(X, y)
>>> regressors_df = sg_regressor.show_models()

Target: 98.35; Model: Col2<-0.03 AND Col5<0.05
Target: 104.24844720496894; Model: Col6>=-0.01 AND Col7<-0.00
Target: 107.71686746987952; Model: Col6>=-0.02 AND Col7<-0.00
Target: 109.73033707865169; Model: Col3<0.06 AND Col8<-0.01
Target: 191.60625; Model: Col2>=0.00 AND Col3>=-0.03
Target: 192.41304347826087; Model: Col3>=-0.02 AND Col7>=0.03
Target: 199.28795811518324; Model: Col8>=0.01
Target: 202.17094017094018; Model: Col3>=0.04 AND Col4>=-0.05
Target: 206.8709677419355; Model: Col2>=0.02 AND Col7>=-0.04
Target: 211.1290322580645; Model: Col2>=-0.02 AND Col8>=0.01
Target: 212.44036697247705; Model: Col4>=-0.01 AND Col8>=0.03
Target: 212.8655462184874; Model: Col7>=-0.00 AND Col8>=0.02
Target: 213.66935483870967; Model: Col7>=-0.01 AND Col8>=0.03
Target: 216.0079365079365; Model: Col3>=-0.02 AND Col6<-0.01
Target: 218.92233009708738; Model: Col0>=-0.03 AND Col8>=0.03
Target: 219.56435643564356; Model: Col2>=0.02 AND Col7>=-0.00
Target: 220.40740740740742; Model: Col2>=0.02 AND Col6<-0.02
Target: 220.46153846153845; Model: Col3>=-0.04 AND Col8>=0.03
Target: 222.0222222222222; Model: Col8>=0.02 AND Col9>=0.01
Target: 222.92592592592592; Model: Col2>=0.00 AND Col3>=0.03
Target: 224.375; Model: Col6<0.00 AND Col9>=0.02
Target: 224.3939393939394; Model: Col2>=0.02 AND Col7>=-0.00
Target: 224.70833333333334; Model: Col3>=0.02 AND Col8>=0.01
Target: 226.5257731958763; Model: Col2>=-0.00 AND Col8>=0.02
Target: 233.0185185185185; Model: Col2>=0.02 AND Col7>=-0.00
Target: 239.25882352941176; Model: Col2>=0.00 AND Col9>=0.02
Target: 243.9375; Model: Col2>=0.03 AND Col3>=0.01
Target: 247.63492063492063; Model: Col2>=-0.01 AND Col9>=0.05
Target: 248.56756756756758; Model: Col2>=0.03 AND Col9>=0.02
Target: 260.29411764705884; Model: Col2>=0.06 AND Col8>=-0.01
>>> sg_regressor.show_decision(X[0])

The predicted value is: 220.93552644658507
From a total of 12 estimators.

The subgroups used in the prediction are:
Col2>=0.00 AND Col3>=-0.03 ---> 191.60625
Col8>=0.01 ---> 199.28795811518324
Col2>=0.02 AND Col7>=-0.04 ---> 206.8709677419355
Col2>=-0.02 AND Col8>=0.01 ---> 211.1290322580645
Col3>=-0.02 AND Col6<-0.01 ---> 216.0079365079365
Col2>=0.02 AND Col7>=-0.00 ---> 219.56435643564356
Col2>=0.02 AND Col6<-0.02 ---> 220.40740740740742
Col2>=0.02 AND Col7>=-0.00 ---> 224.3939393939394
Col3>=0.02 AND Col8>=0.01 ---> 224.70833333333334
Col2>=0.02 AND Col7>=-0.00 ---> 233.0185185185185
Col2>=0.03 AND Col3>=0.01 ---> 243.9375
Col2>=0.06 AND Col8>=-0.01 ---> 260.29411764705884

The targets of the subgroups used in the prediction have the following distribution:

output

random-subgroups's People

Contributors

rebelosa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

random-subgroups's Issues

Reproducibility of Regressor

Hi,

I am presently looking at the reproducibility of the regressor.

Using the example given in the inline documentation, I unfortunately get different results with every prediction:

from contextlib import contextmanager
import sys, os

@contextmanager
def suppress_stdout():
    """
    https://stackoverflow.com/a/25061573
    :return:
    """
    with open(os.devnull, "w") as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:
            yield
        finally:
            sys.stdout = old_stdout


from randomsubgroups import RandomSubgroupRegressor
from sklearn.datasets import make_regression

X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)
for i in range(2):
    regr = RandomSubgroupRegressor(max_depth=2, random_state=0)
    regr.fit(X, y)
    print('--------------')
    print(regr.predict([[0, 0, 0, 0]]))
    with suppress_stdout():
        models = regr.show_models()
    print(models.iloc[0])

result:

--------------
[25.27883316]
Target                   -48.616693
Model     Col1<-0.83 AND Col3<-0.39
Name: 0, dtype: object
--------------
[29.77902512]
Target                   -54.594158
Model     Col0<-0.71 AND Col2<-1.15
Name: 0, dtype: object

My expectation would have been that with every (identical) call of the prediction, I would get the same prediction and submodels.

Deliver result back as list or dataframe in sg_classifier.show_models()

Hi,

thanks for bringing this library to the community.

Presently I am looking at sg_classifier.show_models().

As in pysubgroup, I would like to get the results of the discovery back in a structure that I can work with in my own code.

Presently, only a print to the console is done in show_models().

It would be nice and convenient if I could get back the results in a structured way (like in the package pysubgroup: a list or a pandas.DataFrame).

Thanks for considering this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.