Code Monkey home page Code Monkey logo

light-ml's Introduction

LIGHT-ML

This is a library developed to incorporate useful steps in every data science and machine learning project, in order to facilitate and accelerate model development. Therefore, data scientists can spend less time working on coding preprocessing methods/scripts and use this time more wisely to create new features and tune the best model.

Example with Customized Scikit-Learn Preprocessors

The main purpose here is to show how the objects made available by the module light_ml.preprocessors can be readily used in feature selection - more specifically, we will apply Boruta feature selection technique.

First let's import some usual packages and use iris dataset in order to show how our library can be used in this context.

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

Then we can build the dataset and subsequently perform train-test split:

data = load_iris()
X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = pd.Series(data["target"], name="target")

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=99)

Finally, we can import and instantiate our feature selection object:

from light_ml.preprocessors import BorutaFeatureSelector

bfs = BorutaFeatureSelector(trials=50, percentile=0.01, keep_only_tail=False)

The final step is then to train our transformer and use some of its methods and properties:

bfs.fit(X_train, y_train)
  • Summary of the feature selection procedure:
bfs.summary()
**************************************************
*                    SUMMARY                     *
**************************************************

>> Features to drop (<= 17):
	* sepal length (cm)    [hits: 6]
	* sepal width (cm)     [hits: 0]

>> Features to tentatively keep (17 < hits < 33):
	

>> Features to drop (>= 33):
	* petal length (cm)    [hits: 49]
	* petal width (cm)     [hits: 50]

  • Selected Features:
bfs.selected_features
['petal length (cm)', 'petal width (cm)']
  • Visualization of the decision regions:
bfs.show_decision_regions(show_features=True)

png

  • Transforming our dataset:
bfs.transform(X_train)
petal length (cm) petal width (cm)
26 1.6 0.4
8 1.4 0.2
133 5.1 1.5
101 5.1 1.9
15 1.5 0.4
... ... ...
130 6.1 1.9
84 4.5 1.5
17 1.4 0.3
56 4.7 1.6
78 4.5 1.5

light-ml's People

Contributors

gfluz94 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.