Code Monkey home page Code Monkey logo

pandas2sklearn's Introduction

Build Status

pandas2sklearn

An integration of pandas dataframes with scikit learn.

The module contains:

  • dealing with dataframes in a scikit learn DataSet fashion.
  • transformation mechanism that can be easily integrated in scikit learn pipelines, DataSetTransformer.

Installation

The module can be easily installed with pip:

> pip install pandas2sklearn

Tests

The module contains some basic testing of the provided functionalities.

> py.test

Usage

The module contains two classes:

DataSet

The DataSet is wrapper around pandas DataFrame, that converts you can use to select:

  • id
  • features
  • target

Example, suppose we have a DataFrame that has the following columns;

df.coumns = id, FN1, FN2, FN3, FN4, FN5, FC1, FC2, FC3, FC4, FC5, FC6, target

from pandas_sklearn import DataSet

dataset = DataSet(df, target_column='target', id_column='id')

dataset.has_target() == True
dataset.has_id() == True
dataset.target == df['target']
dataset.id == df['id']
dataset.target_names == ['FN1', 'FN2', 'FN3', 'FN4', 'FN5', 'FC1', 'FC2', 'FC3', 'FC4', 'FC5', 'FC6']
dataset.data == df[['FN1', 'FN2', 'FN3', 'FN4', 'FN5', 'FC1', 'FC2', 'FC3', 'FC4', 'FC5', 'FC6']]


# removing some features that are not needed FN4, FN5, FC1, FC5, FC6
dataset.set_feature_names(usage=DataSet.EXCLUDE, columns=['FN4', 'FN5', 'FC1', 'FC5', 'FC6'])
dataset.target_names == ['FN1', 'FN2', 'FN3', 'FC2', 'FC3', 'FC4']

# converting the dataset to dictionary
dataset.to_dict() == [
    {'FN1': 12, 'FN2': 23, 'FC2': 'coffee', 'FC2': 'xbox one', 'FC4': 'inch'},
    ...
]

DataSetTransformer

A feature wise transformer, applies a scikit-learn transformer to one or more features. e.g.

DataSetTransformer([
    (['petal length (cm)', 'petal width (cm)'], StandardScaler()),
    ('sepal length (cm)', MinMaxScaler()),
    ('sepal width (cm)', None),
]))

It could be used together with pipelines, e.g.

pipeline = Pipeline([
    ('preprocess', DataSetTransformer([
        (['petal length (cm)', 'petal width (cm)'], StandardScaler()),
        ('sepal length (cm)', MinMaxScaler()),
        ('sepal width (cm)', None),
    ])),
    ('classify', SVC(kernel='linear'))
])

Credit

The DataSetTransformer is based on the work of Ben Hamner and Paul Butler.

pandas2sklearn's People

Contributors

mmourafiq avatar

Watchers

Dimitri Grinkevich avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.