Code Monkey home page Code Monkey logo

deloitte_hackathon_predict_loan_defaulter's Introduction

Deloitte_Hackathon_predict_Loan_Defaulter

PROBLEM STATEMENT

  • Aim of the problem is to predict loan status based on certain features.

  • Dataset Description Train.csv - 67463 rows x 35 columns (Includes target column as Loan Status)

Attributes:

  • ID: unique ID of representative
  • Loan Amount: loan amount applied
  • Funded Amount:loan amount funded
  • Funded Amount Investor: loan amount approved by the investors
  • Term: term of loan (in months)
  • Batch Enrolled: batch numbers to representatives
  • Interest Rate: interest rate (%) on loan
  • Grade: grade by the bank
  • Sub Grade: sub-grade by the bank
  • Employment Duration: duration
  • Home Ownership: Owner ship of home
  • Verification Status: Income verification by the bank
  • Payment Plan: if any payment plan has started against loan
  • Loan Title: loan title provided
  • Debit to Income: ratio of representative's total monthly debt repayment divided by self reported monthly income excluding mortgage
  • Delinquency - two years: number of 30+ days delinquency in past 2 years
  • Inquires - six months: total number of inquiries in last 6 months
  • Open Account: number of open credit line in representative's credit line 19. Public Record: number of derogatory public records
  • Revolving Balance: total credit revolving balance
  • Revolving Utilities: amount of credit a representative is using relative to revolving_balance
  • Total Accounts: total number of credit lines available in representatives credit line
  • Initial List Status: unique listing status of the loan - W(Waiting), F(Forwarded)
  • Total Received Interest: total interest received till date
  • Total Received Late Fee: total late fee received till date
  • Recoveries: post charge off gross recovery
  • Collection Recovery Fee: post charge off collection fee
  • Collection 12 months Medical: total collections in last 12 months excluding medical collections
  • Application Type: indicates when the representative is an individual or joint
  • Last week Pay: indicates how long (in weeks) a representative has paid EMI after batch enrolled
  • Accounts Delinquent: number of accounts on which the representative is delinquent
  • Total Collection Amount: total collection amount ever owed
  • Total Current Balance: total current balance from all accounts
  • Total Revolving Credit Limit: total revolving credit limit
  • Loan Status: 1 = Defaulter, 0 = Non Defaulters

Test.csv - 28913 rows x 34 columns(Includes target column as Loan Status) Sample Submission.csv - Please check the Evaluation section for more details on how to generate a valid submission.

The challenge is to predict the Loan Status

Knowledge and Skills Big dataset, underfitting vs overfitting Optimising log_loss to generalise well on unseen data

Data Preprocessing

  • As the values of columns Employment Duration and Home Ownership are interchanged, these columns are renamed to their correct names.
  • Categorical attributes are encoded using LabelEncoder as we will using Random Forest for building the model.

Features Selection

  • used ExtraTreesClassifier to select the best features

valuation Metric

The competition evaluation metric used is Log-loss.

Approach

As this is a classification problem that involves prediction of whether a loan applicant will default or not, built Logistic Regression ,Random Forest and Xgboost models. performed log loss each model and Xgboost performed better with a log loss of 0.32. later performed hyperparameter tunning and train the xgboost model using Aws sagemaker instances And the f1 scores of the model improve from 0.91 to 0.94

deloitte_hackathon_predict_loan_defaulter's People

Contributors

elviskoech avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.