Code Monkey home page Code Monkey logo

tabular_knowledge's Introduction

Tabular Knowledge

Goals of package

Make it easy to analyze tabular datasets with lots of features in scikit-learn.

Motivation

Although scikit-learn has selectors that let you select features based on their data types, sometimes you want to select features based on mutual information. In order to do this, metadata needs to be computed, similar to how tensorflow extended, and specifically, tensorflow data validation does it. On top of that mutual information needs be computed for each feature to determine which features have predictive power for the problem at hand.

Tabular knowledge makes it easier to understand data in a scikit-learn/pandas environment and provides functions to make it easy to filter out features that do not have any predictive power.

A working example of its use is available in src/client/client.py which implements a typical workflow that can lead to a baseline model. Specifically, computing metadata, fine tuning semantics of features, analyzing mutual information, basic encoding with pipelines, followed by a model.

Future directions

It may be possible to build policy driven pipelines in the future. For example include features with mutual information above a certain threshold, where the selector behaves like make_column_selector in scikit-learn. Secondly, the encoding could also be declarative based on model type.

tabular_knowledge's People

Contributors

pritamdodeja avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.