Light

erfanebrahimibazaz / catboost_insurance_churn_rate Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 1.0 2.31 MB

Jupyter Notebook 100.00%

catboost_insurance_churn_rate's Introduction

catboost_insurance_churn_rate

This repo is a complement to my other git repository where I trained models with XGBoost and lightgbm to predict churn rate of sample insurance clients. In this repo the same data set has been used to train a model with catboost.

Objectives

Comparing catboost with xgboost and lightgbm
Making a soft and hard voter manually, saving trained models to hard drive and applying an ensemble of trained models on test data.

Catboost

Catboost has been used together with cross-validation.
scikit-learn's train_test_split is used to make 70% train, 15% validation and 15% test data.
A model is trained with catboost for each kfold cross-validation.
f1_score is used to check model accuracy for each fold.
Feature importance is calculated for all the folds together.
With seaborn the features are sorted based on importance on target value and visualized.
Each trained model in each fold is saved seperately with pickle.
All saved models are loaded back and used to predict target value of test data set.
All models' predictions are averaged together and rounded to 0 if the average value is less than equal to 0.5 and 1 if otherwise is true.

Results

A single catboost performed more accurately than the ensemble of 5 catboosts each trained by a different kfold.
Training a catboost model with all trained data with not split, improved the performance by testing it on the df_test data frame. The priblem with this method is, there is no way to be sure the final model is not overfit.

Impact of different encodings on catboost model

Catboost can handle categorical data and does not require encoding. Still we check the impact of different encodings on catboost.

catboost_insurance_churn_rate's People

Contributors

Watchers

Forkers

arezoo-drv

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.