Code Monkey home page Code Monkey logo

telco_kaggle's Introduction

Customer Churn Prediction

I developed, to my knowledge, the most accurate predictive model on Kaggle's Telco Customer Churn dataset with a validation set prediction accuracy of 91.30%. Below I offer an executive summary entailing how I tackle tabular binary classification problems.
perf

Our Features

Features are displayed along with their corresponding possible unique values.
the_features

The Target

In this dataset, we are predicting whether or not a customer will discontinue their current service contract with Telco.
the_targets

The libraries

To install the requisite libraries run conda env create -f environment.yml from this repositories root directory to initialize the vitrual environment and install all required libraries.
The primary dependencies for this project are PyTorch, imblearn, sklearn, and pandas.

The Model

A simple dense neural network with three layers, halfing in size after each. No embedding matrix is used as there are no high-cardinality categorical variables with which to contend.
deeper_model

The hyper-parameters

Loss function (criterion): nn.BCEWithLogitsLoss().
Optimizer and learning rate: optim.Adam(deeper_model.parameters(), lr=1e-2).
Validation set size: 20%.
Batch size: 827.
Instances of each class (after oversampling): 5174.
Epochs: 1000.

Two custom functions

I wrote two custom functions print_unique() and training_loop(), both of which may be found in my notebook. UPDATE Since this repository's creation, training_loop has been significantly updated in my custom_deep_learning repository.

My 'Trick' to the most accurate model

If you've made it this far, take a peek inside my notebook. I'll give you a free cookie :) But, if you're in a rush, to summarize: The trick to achieving the highest accuracy out of any model on Kaggle, was to use Adam as my optimizer, which is more powerful than plain vanilla optim.SGD, to synthetically oversample the minority class SMOTE, and use a neural network with sufficient capacity, which happened to be two layers. nn.BatchNorm2d, nn.Dropout and nn.Embedding are tricks which may be used to improve generalization with a model such as this but, I elected to not implement them here.

telco_kaggle's People

Contributors

spacefrostdev avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.