Code Monkey home page Code Monkey logo

minerva-ml / open-solution-home-credit Goto Github PK

View Code? Open in Web Editor NEW
446.0 34.0 170.0 224 KB

Open solution to the Home Credit Default Risk challenge :house_with_garden:

Home Page: https://www.kaggle.com/c/home-credit-default-risk

License: MIT License

Python 56.25% Jupyter Notebook 43.75%
machine-learning deep-learning kaggle pipeline feature-engineering reproducible-experiments reproducibility pipeline-framework lightgbm xgboost neptune competition credit-scoring credit-risk open-source python python3 python35

open-solution-home-credit's Introduction

Home Credit Default Risk: Open Solution

Join the chat at https://gitter.im/minerva-ml/open-solution-home-credit license

This is an open solution to the Home Credit Default Risk challenge ๐Ÿก.

More competitions ๐ŸŽ‡

Check collection of public projects ๐ŸŽ, where you can find multiple Kaggle competitions with code, experiments and outputs.

Our goals

We are building entirely open solution to this competition. Specifically:

  1. Learning from the process - updates about new ideas, code and experiments is the best way to learn data science. Our activity is especially useful for people who wants to enter the competition, but lack appropriate experience.
  2. Encourage more Kagglers to start working on this competition.
  3. Deliver open source solution with no strings attached. Code is available on our GitHub repository ๐Ÿ’ป. This solution should establish solid benchmark, as well as provide good base for your custom ideas and experiments. We care about clean code ๐Ÿ˜ƒ
  4. We are opening our experiments as well: everybody can have live preview on our experiments, parameters, code, etc. Check: Home Credit Default Risk ๐Ÿ“ˆ and screens below.
Train and validation results on folds ๐Ÿ“Š LightGBM learning curves ๐Ÿ“Š
train-validation-results-on-folds LightGBM-learning-curves

Disclaimer

In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script ๐Ÿ.

Note

As of 1.07.2019 we officially discontinued neptune-cli client project making neptune-client the only supported way to communicate with Neptune. That means you should run experiments via python ... command or update loggers to neptune-client. For more information about the new client go to neptune-client read-the-docs page.

How to start?

Learn about our solutions

  1. Check Kaggle forum and participate in the discussions.
  2. Check our Wiki pages ๐Ÿก, where we document our work. See solutions below:
link to code name CV LB link to description
solution 1 chestnut ๐ŸŒฐ ? 0.742 LightGBM and basic features
solution 2 seedling ๐ŸŒฑ ? 0.747 Sklearn and XGBoost algorithms and groupby features
solution 3 blossom ๐ŸŒผ 0.7840 0.790 LightGBM on selected features
solution 4 tulip ๐ŸŒท 0.7905 0.801 LightGBM with smarter features
solution 5 sunflower ๐ŸŒป 0.7950 0.804 LightGBM clean dynamic features
solution 6 four leaf clover ๐Ÿ€ 0.7975 0.806 priv. LB 0.79804, Stacking by feature diversity and model diversity

Start experimenting with ready-to-use code

You can jump start your participation in the competition by using our starter pack. Installation instruction below will guide you through the setup.

Installation (fast track)

  1. Clone repository and install requirements (use Python3.5)
pip3 install -r requirements.txt
  1. Register to the neptune.ml (if you wish to use it)
  2. Run experiment based on LightGBM:

๐Ÿ”ฑ

neptune account login
neptune run --config configs/neptune.yaml main.py train_evaluate_predict_cv --pipeline_name lightGBM

๐Ÿ

python main.py -- train_evaluate_predict_cv --pipeline_name lightGBM

Installation (step by step)

Step by step installation ๐Ÿ–ฅ๏ธ

Hyperparameter Tuning

Various options of hyperparameter tuning are available

  1. Random Search

    configs/neptune.yaml

      hyperparameter_search__method: random
      hyperparameter_search__runs: 100

    src/pipeline_config.py

        'tuner': {'light_gbm': {'max_depth': ([2, 4, 6], "list"),
                                'num_leaves': ([2, 100], "choice"),
                                'min_child_samples': ([5, 10, 15 25, 50], "list"),
                                'subsample': ([0.95, 1.0], "uniform"),
                                'colsample_bytree': ([0.3, 1.0], "uniform"),
                                'min_gain_to_split': ([0.0, 1.0], "uniform"),
                                'reg_lambda': ([1e-8, 1000.0], "log-uniform"),
                                },
                  }

Get involved

You are welcome to contribute your code and ideas to this open solution. To get started:

  1. Check competition project on GitHub to see what we are working on right now.
  2. Express your interest in paticular task by writing comment in this task, or by creating new one with your fresh idea.
  3. We will get back to you quickly in order to start working together.
  4. Check CONTRIBUTING for some more information.

User support

There are several ways to seek help:

  1. Kaggle discussion is our primary way of communication.
  2. Read project's Wiki, where we publish descriptions about the code, pipelines and supporting tools such as neptune.ml.
  3. Submit an issue directly in this repo.

open-solution-home-credit's People

Contributors

dependabot[bot] avatar gitter-badger avatar jakubczakon avatar kant avatar ninoko avatar pknut avatar pranayaryal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-solution-home-credit's Issues

How LightGBM works?

  • how it works?
  • which parameters are important and why?
  • rules of thumb for grid search

pipelines for classical ML algorithms

When validation, loaders, make_submit function, and model training steps (random forest and sv regression) are ready:

  • build first pipeline from these pieces
  • run grid search
  • submit to Kaggle :)

Age buckets

  • investigate age buckets
  • check for important dates in Spain (Studies, retirement)

first features for training purposes

  • Analyze dataset to identify good candidates for simple features
    • simple, direct, easy to implement with minimal effort.
  • Implement steppy Transformer that prepares features for sklearn regression algorithms:
    • random forest
    • support vector regression
    • others(?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.