This is a library developed to incorporate useful steps in every data science and machine learning project, in order to facilitate and accelerate model development. Therefore, data scientists can spend less time working on coding preprocessing methods/scripts and use this time more wisely to create new features and tune the best model.
The main purpose here is to show how the objects made available by the module light_ml.preprocessors
can be readily used in feature selection - more specifically, we will apply Boruta feature selection technique.
First let's import some usual packages and use iris
dataset in order to show how our library can be used in this context.
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
Then we can build the dataset and subsequently perform train-test split:
data = load_iris()
X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = pd.Series(data["target"], name="target")
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=99)
Finally, we can import and instantiate our feature selection object:
from light_ml.preprocessors import BorutaFeatureSelector
bfs = BorutaFeatureSelector(trials=50, percentile=0.01, keep_only_tail=False)
The final step is then to train our transformer and use some of its methods and properties:
bfs.fit(X_train, y_train)
- Summary of the feature selection procedure:
bfs.summary()
************************************************** * SUMMARY * ************************************************** >> Features to drop (<= 17): * sepal length (cm) [hits: 6] * sepal width (cm) [hits: 0] >> Features to tentatively keep (17 < hits < 33): >> Features to drop (>= 33): * petal length (cm) [hits: 49] * petal width (cm) [hits: 50]
- Selected Features:
bfs.selected_features
['petal length (cm)', 'petal width (cm)']
- Visualization of the decision regions:
bfs.show_decision_regions(show_features=True)
- Transforming our dataset:
bfs.transform(X_train)
petal length (cm) | petal width (cm) | |
---|---|---|
26 | 1.6 | 0.4 |
8 | 1.4 | 0.2 |
133 | 5.1 | 1.5 |
101 | 5.1 | 1.9 |
15 | 1.5 | 0.4 |
... | ... | ... |
130 | 6.1 | 1.9 |
84 | 4.5 | 1.5 |
17 | 1.4 | 0.3 |
56 | 4.7 | 1.6 |
78 | 4.5 | 1.5 |