Code Monkey home page Code Monkey logo

codelib's Introduction

checks pylint Score Coverage score License


Table of contents

  1. Introduction
    1. Objective
    2. Programming style
    3. Version control
    4. Contribution guidelines
  2. Project documents
    1. Approach
  3. Available modules
    1. Clustering - determines optimal k
    2. GLMNet - classification/regression
    3. k-nearest neighbours - classification/regression
    4. Random forest - classification/timeseries/regression
    5. XGBoost - classification/timeseries/regression
    6. Traveling salesman problem - integer programming/heuristic
    7. Transportation problem - integer programming
    8. Time series
  4. Pull request guidelines
  5. Initial setup
  6. Unit tests
  7. Important links
  8. License

Introduction

Objective

The objective of this repository is to:

  1. Create a code library/toolkit to automate commonly used machine learning techniques/approaches in a modular environment.
  2. Provide best in class approaches developed over a period of time.
  3. Reduce development time for machine learning projects.
  4. Provide a scalable solution for all machine learning projects.

Programming style

It's good practice to follow accepted standards while coding in python:

  1. PEP 8 standards: For code styles.
  2. PEP 257 standards: For docstrings standards.
  3. PEP 484 standards For function annotations standards.

Also, it's a good idea to rate all our python scripts with Pylint. If we score anything less than 8/10, we should consider redesigning the code architecture.

A composite pylint ratings for all the codes are automatically computed when we run the tests and prepended on top of this file.

Version control

We use semantic versionning (SemVer) for version control. You can read about semantic versioning here.

Initial setup

bash install.sh

Requirements

The python requirements can be found at

  1. Requirements

Run unit tests and pylint ratings

To run all unit tests and rate all python scripts, run the following in project directory:

./bin/run_tests.sh

Available options:

-a default, runs both code rating and unit tests.
-u unit tests.
-r code rating.

The pylint ratings for each python script can be found at logs/pylint/


Important links

Guidelines


codelib's People

Contributors

bdiptesh avatar madhutangudu avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

madhutangudu

codelib's Issues

[Maintenance]: Standardize the API of AutoArima

Describe the maintenance request

Standardize the API of AutoArima in TimeSeries module.

Current impact

  • The current API of params is inconsistent with API of other modules which uses param.
  • The current output of model_summary is inconsistent with output of other modules

Version

v0.4.0 (Default)

Additional context

No response

[Maintenance]: Change label pointers and add in PR template

Describe the maintenance request

Change label pointers in issue form's yaml files and add maintenance in PR template.

Current impact

With the change in labels, the yaml file will point to incorrect labels.

Version

v0.4.1 (Default)

Additional context

No response

Clustering: Optimal k

Is your feature request related to a problem? Please describe.
A clustering module to cluster any given data (categorical/continuos/ordinal) and returns optimal clustering solution

Describe the solution you'd like
Compute optimal clustering solution using gap-statistic.

Methods:

  1. First SE
  2. Maximum Gap

Expected input

df: pandas.DataFrame
x_var: List[str]
max_cluster: int
method: Union[str]

Expected API

opt_k

Acceptance criteria

Integration tests:

  • Categorical variables only
  • Continuos variables only
  • Ordinal variables only
  • Combination of categorical/ordinal/continuos

Tasks

  • Define integration tests
  • First pass implementation of Gap statistic
  • Modular implementation

XGBoost

Is your feature request related to a problem? Please describe.
XGBoost module

Describe the solution you'd like
XGBoost module for both classification and regression. Determine optimal hyperparameters

Methods:

  1. Classification
  2. Regression

Expected input

df: pandas.DataFrame
y_var: List[str]
x_var: List[str]
method: Union[str]
param: Dict

Expected API

fit
predict

Acceptance criteria

  • Integration tests

Tasks

  • Define integration tests
  • First pass implementation of XGBoost
  • Modular implementation

GLMNet

Is your feature request related to a problem? Please describe.
GLMNet module

Describe the solution you'd like
GLMNet module for building models. Determine optimal alpha and lambda.

Expected input

df: pandas.DataFrame
y_var: List[str]
x_var: List[str]
param: Dict

Expected API

fit
predict

Acceptance criteria

  • Integration tests

Tasks

  • Define integration tests
  • First pass implementation of GLMNet
  • Modular implementation

[Maintenance]: Add version v0.4.1 in issue templates

Describe the maintenance request

Addion of version v0.4.1 in all issue templates.

Current impact

We will not be able to identify and track issues with each version release.

Version

v0.4.0 (Default)

Additional context

No response

[Bug]: AutoArima's predict method with exogenous variables

Contact Details

No response

Describe the bug

When calling AutoArima with exogenous variables, if the length of the y_var is greater than 1, it's throwing an error.

How can we reproduce the bug

  • Set y_var as sales
  • Set x_var as ["cost", "holiday"]
  • Instantiate an AutoArima model with exogenous variables

AutoArima model throws an error.

Expected behavior

No errors should be raised.

Version

v0.4.0 (Default)

Relevant log output/screenshots

No response

Additional context

No response

[Feature]: XGBoost time series method

Is your feature request related to a problem? Please describe.

Time series method to predict the next n intervals

Describe the solution you'd like

Add a time series method for building forecasting models with/without exogenous variables using XGBoost.

Expected input(s)

df: pandas.DataFrame
y_var: List[str]
x_var: List[str]
method: Union[str]
param: Dict

Expected output(s)

fit
predict

Additional context

No response

Acceptance criteria

  • Integration tests

Version

v0.4.0 (Default)

kNN: Optimal k

Is your feature request related to a problem? Please describe.
kNN module

Describe the solution you'd like
kNN module for both classification and regression. Determine optimal k.

Methods:

  1. Classification
  2. Regression

Parameters:

  1. Euclidean
  2. Cosine
  3. Majority voting
  4. Weighted voting

Expected input

df: pandas.DataFrame
y_var: List[str]
x_var: List[str]
method: Union[str]
param: Dict

Expected API

fit
predict

Acceptance criteria

  • Integration tests

Tasks

  • Define integration tests
  • First pass implementation of kNN
  • Modular implementation

Random forest

Is your feature request related to a problem? Please describe.
Random forest module

Describe the solution you'd like
Random forest module for both classification and regression. Determine optimal hyperparameters

Methods:

  1. Classification
  2. Regression

Expected input

df: pandas.DataFrame
y_var: List[str]
x_var: List[str]
method: Union[str]
param: Dict

Expected API

fit
predict

Acceptance criteria

  • Integration tests

Tasks

  • Define integration tests
  • First pass implementation of random forest
  • Modular implementation

[Maintenance]: Available modules in README

Describe the maintenance request

Add a list of available modules in README.md

Current impact

An user needs to look through the code base to determine available modules, which may be time consuming.

Version

v0.4.1 (Default)

Additional context

No response

Traveling salesman problem

Is your feature request related to a problem? Please describe.
Travelling salesman problem

Describe the solution you'd like
Travelling salesman problem with both integer programming and heuristic.

Methods:

  1. Optimization
  2. Heuristic

Expected input

df: pandas.DataFrame
method: Union[str]
param: Dict

Expected API

solve

Acceptance criteria

  • Integration tests

Tasks

  • Define integration tests
  • Document the formulations for both IP and Heuristics
  • First pass implementation of TSP
  • Modular implementation

Time series module

Is your feature request related to a problem? Please describe.
Time series module

Describe the solution you'd like
Time series module for building forecasting models.

Expected input

df: pandas.DataFrame
y_var: List[str]
x_var: List[str]
param: Dict

Expected API

fit
predict

Acceptance criteria

  • Integration tests

Tasks

  • Define integration tests
  • First pass implementation of Time series
  • Modular implementation

[Feature]: Time series module (AutoARIMA)

Is your feature request related to a problem? Please describe.

Time series module to predict the next n intervals

Describe the solution you'd like

Time series module for building forecasting models with/without exogenous variables sans prophet/gam.

Expected input(s)

df: pandas.DataFrame
y_var: str
x_var: List[str]
params: Dict

Expected output(s)

model: object
model_summary: Dict

Additional context

Expected API

mod = AutoArima(df=df_ip,
                 y_var="y",
                 x_var=["cost", "stock_level", "retail_price"])

df_op = mod.predict(x_predict)

Acceptance criteria

  • Integration tests

Version

v0.4.0 (Default)

Transportation problem

Is your feature request related to a problem? Please describe.
Transportation problem

Describe the solution you'd like
Transportation problem using integer programming.

Expected input

df: pandas.DataFrame
method: Union[str]
param: Dict

Expected API

solve

Acceptance criteria

  • Integration tests

Tasks

  • Define integration tests
  • Document the formulations for IP
  • First pass implementation of TP
  • Modular implementation

[Feature]: Random forest time series method

Is your feature request related to a problem? Please describe.

Time series method to predict the next n intervals

Describe the solution you'd like

Add a time series method for building forecasting models with/without exogenous variables using Random forest.

Expected input(s)

df: pandas.DataFrame
y_var: List[str]
x_var: List[str]
method: Union[str]
param: Dict

Expected output(s)

fit
predict

Additional context

No response

Acceptance criteria

  • Integration tests

Version

v0.4.0 (Default)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.