Code Monkey home page Code Monkey logo

modelling-allstate-claims-severity's Introduction

Modelling the AllState Claim Severity Dataset (June 2017)

by Jérôme E. Blanchet, Senior Analyst | Data Scientist

981 Gulf Pl, Ottawa, ON K1K 3X9 (819) 576-5502 [email protected] [email protected]

Abstract

This paper is about building predictive modelling with the Allstate Claim Severity Dataset. This is a work in progress. All sections are unfinished and make progress every day. The dataset include 72 binary variables, around 45 categorical variable with more than 2 categories and 15 continuous variables. All data are perfecly anonymous. We don't know what the data represent. The Target variable is continuous and represent the claim severity.

Besides showing that innovative algorithms can out-perform more traditional tools, the goal of this paper is mainly about designing creative way to show several information on a single and simple graph. For example, presenting the structure of the full dataset on a single graph or showing the behavior of all trees made of a random forest on a single chart. The objective is also to show that advanced algorithms such as Neural Hyperparameters optimization is not a blackbox process, but rather an intuitive and interpretable journey of discovery.

Right now, Keras (with tensorflow or Theano backend) is probably the most popular library for deep learning and Neural Network design. It allow you to do fast prototyping of your ideas and built a final product quickly. In the other hand, building a neural network from scratch directly with Tensorflow, Theano or Numpy has its own advantages. It will allow you to see the low level process with every single iteration. This way of doing things is important, especially if you want to convince a manager about the 'No Blackbox' status of an algorithm. This way of doing things can contribute to accelarate the Technology Transformation process within a Company. It is one thing to own the technical knowledge, it is something else to present an algorithm and succeed to show that it is interpretable, intuitive and able you to replicate results.

This paper use an academic way of showing results. In the other hand, the ultimate goal of this paper is to convince is audiences that you can extract business value with innovative algorithms and create a simple and concret final product out of it.

Table of Contents

Part 1) Data and Modules Importation....................................................................................................XXX
Part 2) Data Description and Interaction ....................................................................................................XXX

Part 3) Preprocessing the Data...............................................................................................XXX

  • 3.0 Logarithmic Tranformation, Outliers, Standardization and No-Variance Variables
  • 3.1 Dimensionality Reduction with Principal Component Analysis (PCA) Binary Variables Only
  • 3.2 Dimensionality Reduction with T-distributed stochastic neighbor embedding (t-SNE) on top (PCA)
  • 3.3 Dimensionality Reduction with Principal Component Analysis (PCA) Continuous Variables Only
  • 3.4 Dimensionality Reduction with T-distributed stochastic neighbor embedding (t-SNE) on top of (PCA)
  • 3.4 Clustering
  • ...at least 80% of the paper ;-)

Part 4) Modelling the Claims Severity Data..........................................................................XXX
Part 5) Results.............................................................................................XXX
Part 6) Bibliography.......................................................................................................XXX
Part 7) Annexe............................................................................................................XXX

List of Tested Algorithms

Benchmark (Classic Linear Based Algorithms):

  • Linear Regression...............................................................................................XXX
  • Linear Regression Ridge...............................................................................................XXX
  • Linear Regression Lasso...............................................................................................XXX
  • Linear Regression ElasticNet............................................................................................XXX

Tree Based Algorithms:

  • Single Decision Tree Regressor.............................................................................................XXX
  • Decision Tree Bagging Regressor............................................................................................XXX
  • Random Forest Regressor...............................................................................................XXX
  • Extra Trees Regressor...............................................................................................XXX
  • Ada Boost Regressor...............................................................................................XXX
  • Gradient Boosting Regressor...............................................................................................XXX
  • XGBoost Regressor...............................................................................................XXX
  • Light GBM Regressor...............................................................................................XXX

Neural Network and Stacking:

  • Neural Network with Keras (Tensorflow Backend).......................................................................XXX
  • Neural Network with Tensorflow....................................................................................XXX
  • Manual Stacking...............................................................................................XXX
  • Marios Michailidis's StackNet.........................................................................................XXX

Other Algorithms:

  • Support Vector Regression...............................................................................................XXX
  • K Neighbors Regressor...............................................................................................XXX

modelling-allstate-claims-severity's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.