Code Monkey home page Code Monkey logo

random-subgroups's Introduction

Random Subgroups python package

Making predictions with subgroups

random-subgroups is a machine learning package compatible with scikit-learn.

It uses ensembles of weak estimators, as in random forests, for classification and regression tasks. The main difference from the random forests algorithm is that it uses subgroups as estimators.

The subgroup discovery implementation of this package is made on top of the pysubgroup package. It uses many of the features from pysubgroup, but it also extends it with different quality measures (more suitable for prediction) and different search strategies.

Installation:

pip install random-subgroups

Example of the classifier:

from randomsubgroups import RandomSubgroupClassifier
from sklearn import datasets

data = datasets.load_breast_cancer()
y = data.target
X = data.data

sg_classifier = RandomSubgroupClassifier(n_estimators=30)

sg_classifier.fit(X, y)
>>> classifiers_df = sg_classifier.show_models()

Target: 0; Model: Col26>=0.27 AND Col7>=0.06
Target: 0; Model: Col3>=435.17 AND Col6>=0.11
Target: 0; Model: Col20>=18.22 AND Col3>=806.60
Target: 0; Model: Col16>=0.02 AND Col20>=15.87
Target: 0; Model: Col17>=0.01 AND Col20>=17.91
Target: 0; Model: Col20>=17.50 AND Col22>=118.60
Target: 0; Model: Col23>=1004.60 AND Col7>=0.05
Target: 0; Model: Col0>=15.33 AND Col13>=21.73
Target: 0; Model: Col22>=124.16
Target: 0; Model: Col13>=18.88 AND Col3>=716.60
Target: 0; Model: Col12>=1.39 AND Col22>=123.11
Target: 0; Model: Col23>=1030.0 AND Col6>=0.05
Target: 0; Model: Col27>=0.15 AND Col3>=358.90
Target: 0; Model: Col15>=0.01 AND Col23>=883.99
Target: 0; Model: Col0>=10.98 AND Col27>=0.16
Target: 1; Model: Col22<105.0 AND Col27<0.16
Target: 1; Model: Col20<15.53 AND Col26<0.35
Target: 1; Model: Col7<0.05
Target: 1; Model: Col13<42.86 AND Col27<0.12
Target: 1; Model: Col23<771.47 AND Col27<0.12
Target: 1; Model: Col20<17.79 AND Col25<0.20
Target: 1; Model: Col20<15.49 AND Col27<0.15
Target: 1; Model: Col13<29.40 AND Col1<19.98
Target: 1; Model: Col20<15.75 AND Col6<0.08
Target: 1; Model: Col20<15.63 AND Col27<0.20
Target: 1; Model: Col22<104.79 AND Col29<0.10
Target: 1; Model: Col20<14.69 AND Col6<0.12
Target: 1; Model: Col27<0.12 AND Col3<693.70
Target: 1; Model: Col20<16.00 AND Col6<0.09
Target: 1; Model: Col27<0.11 AND Col7<0.06
>>> sg_classifier.show_decision(X[5])

The predicted value is: 0
From a total of 6 estimators.

The subgroups used in the prediction are:

 Predicting target 0
Col0>=10.98 AND Col27>=0.16 ---> 0
Col26>=0.27 AND Col7>=0.06 ---> 0
Col27>=0.15 AND Col3>=358.90 ---> 0
Col3>=435.17 AND Col6>=0.11 ---> 0

 Predicting target 1
Col13<29.40 AND Col1<19.98 ---> 1
Col20<15.63 AND Col27<0.20 ---> 1

The targets of the subgroups used in the prediction have the following distribution:

output

Example of the regressor:

from randomsubgroups import RandomSubgroupRegressor
from sklearn import datasets

data = datasets.load_diabetes()
y = data.target
X = data.data

sg_regressor = RandomSubgroupRegressor(n_estimators=30)

sg_regressor.fit(X, y)
>>> regressors_df = sg_regressor.show_models()

Target: 98.35; Model: Col2<-0.03 AND Col5<0.05
Target: 104.24844720496894; Model: Col6>=-0.01 AND Col7<-0.00
Target: 107.71686746987952; Model: Col6>=-0.02 AND Col7<-0.00
Target: 109.73033707865169; Model: Col3<0.06 AND Col8<-0.01
Target: 191.60625; Model: Col2>=0.00 AND Col3>=-0.03
Target: 192.41304347826087; Model: Col3>=-0.02 AND Col7>=0.03
Target: 199.28795811518324; Model: Col8>=0.01
Target: 202.17094017094018; Model: Col3>=0.04 AND Col4>=-0.05
Target: 206.8709677419355; Model: Col2>=0.02 AND Col7>=-0.04
Target: 211.1290322580645; Model: Col2>=-0.02 AND Col8>=0.01
Target: 212.44036697247705; Model: Col4>=-0.01 AND Col8>=0.03
Target: 212.8655462184874; Model: Col7>=-0.00 AND Col8>=0.02
Target: 213.66935483870967; Model: Col7>=-0.01 AND Col8>=0.03
Target: 216.0079365079365; Model: Col3>=-0.02 AND Col6<-0.01
Target: 218.92233009708738; Model: Col0>=-0.03 AND Col8>=0.03
Target: 219.56435643564356; Model: Col2>=0.02 AND Col7>=-0.00
Target: 220.40740740740742; Model: Col2>=0.02 AND Col6<-0.02
Target: 220.46153846153845; Model: Col3>=-0.04 AND Col8>=0.03
Target: 222.0222222222222; Model: Col8>=0.02 AND Col9>=0.01
Target: 222.92592592592592; Model: Col2>=0.00 AND Col3>=0.03
Target: 224.375; Model: Col6<0.00 AND Col9>=0.02
Target: 224.3939393939394; Model: Col2>=0.02 AND Col7>=-0.00
Target: 224.70833333333334; Model: Col3>=0.02 AND Col8>=0.01
Target: 226.5257731958763; Model: Col2>=-0.00 AND Col8>=0.02
Target: 233.0185185185185; Model: Col2>=0.02 AND Col7>=-0.00
Target: 239.25882352941176; Model: Col2>=0.00 AND Col9>=0.02
Target: 243.9375; Model: Col2>=0.03 AND Col3>=0.01
Target: 247.63492063492063; Model: Col2>=-0.01 AND Col9>=0.05
Target: 248.56756756756758; Model: Col2>=0.03 AND Col9>=0.02
Target: 260.29411764705884; Model: Col2>=0.06 AND Col8>=-0.01
>>> sg_regressor.show_decision(X[0])

The predicted value is: 220.93552644658507
From a total of 12 estimators.

The subgroups used in the prediction are:
Col2>=0.00 AND Col3>=-0.03 ---> 191.60625
Col8>=0.01 ---> 199.28795811518324
Col2>=0.02 AND Col7>=-0.04 ---> 206.8709677419355
Col2>=-0.02 AND Col8>=0.01 ---> 211.1290322580645
Col3>=-0.02 AND Col6<-0.01 ---> 216.0079365079365
Col2>=0.02 AND Col7>=-0.00 ---> 219.56435643564356
Col2>=0.02 AND Col6<-0.02 ---> 220.40740740740742
Col2>=0.02 AND Col7>=-0.00 ---> 224.3939393939394
Col3>=0.02 AND Col8>=0.01 ---> 224.70833333333334
Col2>=0.02 AND Col7>=-0.00 ---> 233.0185185185185
Col2>=0.03 AND Col3>=0.01 ---> 243.9375
Col2>=0.06 AND Col8>=-0.01 ---> 260.29411764705884

The targets of the subgroups used in the prediction have the following distribution:

output

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.