The light-ml from marcosoares-92

LIGHT-ML

This is a library developed to incorporate useful steps in every data science and machine learning project, in order to facilitate and accelerate model development. Therefore, data scientists can spend less time working on coding preprocessing methods/scripts and use this time more wisely to create new features and tune the best model.

Example with Customized Scikit-Learn Preprocessors

The main purpose here is to show how the objects made available by the module light_ml.preprocessors can be readily used in feature selection - more specifically, we will apply Boruta feature selection technique.

First let's import some usual packages and use iris dataset in order to show how our library can be used in this context.

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

Then we can build the dataset and subsequently perform train-test split:

data = load_iris()
X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = pd.Series(data["target"], name="target")

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=99)

Finally, we can import and instantiate our feature selection object:

from light_ml.preprocessors import BorutaFeatureSelector

bfs = BorutaFeatureSelector(trials=50, percentile=0.01, keep_only_tail=False)

The final step is then to train our transformer and use some of its methods and properties:

bfs.fit(X_train, y_train)

Summary of the feature selection procedure:

bfs.summary()

**************************************************
*                    SUMMARY                     *
**************************************************

>> Features to drop (<= 17):
	* sepal length (cm)    [hits: 6]
	* sepal width (cm)     [hits: 0]

>> Features to tentatively keep (17 < hits < 33):
	

>> Features to drop (>= 33):
	* petal length (cm)    [hits: 49]
	* petal width (cm)     [hits: 50]

Selected Features:

bfs.selected_features

['petal length (cm)', 'petal width (cm)']

Visualization of the decision regions:

bfs.show_decision_regions(show_features=True)

Transforming our dataset:

bfs.transform(X_train)

	petal length (cm)	petal width (cm)
26	1.6	0.4
8	1.4	0.2
133	5.1	1.5
101	5.1	1.9
15	1.5	0.4
...	...	...
130	6.1	1.9
84	4.5	1.5
17	1.4	0.3
56	4.7	1.6
78	4.5	1.5

marcosoares-92 / light-ml Goto Github PK

light-ml's Introduction

LIGHT-ML

Example with Customized Scikit-Learn Preprocessors

light-ml's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	petal length (cm)	petal width (cm)
26	1.6	0.4
8	1.4	0.2
133	5.1	1.5
101	5.1	1.9
15	1.5	0.4
...	...	...
130	6.1	1.9
84	4.5	1.5
17	1.4	0.3
56	4.7	1.6
78	4.5	1.5

	petal length (cm)	petal width (cm)
26	1.6	0.4
8	1.4	0.2
133	5.1	1.5
101	5.1	1.9
15	1.5	0.4
...	...	...
130	6.1	1.9
84	4.5	1.5
17	1.4	0.3
56	4.7	1.6
78	4.5	1.5

	petal length (cm)	petal width (cm)
26	1.6	0.4
8	1.4	0.2
133	5.1	1.5
101	5.1	1.9
15	1.5	0.4
...	...	...
130	6.1	1.9
84	4.5	1.5
17	1.4	0.3
56	4.7	1.6
78	4.5	1.5