Code Monkey home page Code Monkey logo

credit-risk-analysis's Introduction

Credit-Risk-Analysis

Overview of the analysis

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, there is a need to employ different techniques to train and evaluate models with unbalanced classes.

Purpose

  • To apply resampling models, SMOTEENN algorithm and Ensemble classifiers to predict credit risk.

Resources

Dataset: LoanStat_2019Q1.csv, Python 3.7.6 and Anaconda 2020.11

Results

Credit risk resampling Models

Oversampling

The balanced accuracy score was 65% and the precision for the high risk had a very low positivity at 1% with a recall of 69% and the F1 was 2%. The higher number of populations for low risk resulted a 100% precision with a recall of 61%.

SMOTE Oversampling

Based on this model the balanced accuracy score was 66%, precision (1%), and F1 (2%) for high risk results were smilar to oversampling model. Due to the higher population for lowrisk, the precision for low risk was 100% with a recall of 69%.

Undersampling

The accuracy score for undersampling technique was 56%. The precision, recall and F1 for high risk were 1%, 65% and 1%, respectively. However, the precision and recall for low risk were 100% and 47%.

Combination (Over and Under) Sampling

Based on this model the accuracy score was 64% and the high risk precision, recall and F1 were 1%, 72% and 2%, respectively. And the precision and recall for low risk were 100% and 57%.

Balanced Random Forest Classifier

The Random Forest model shows the accuracy score of 68% and the precision, recall and F1 for high risk were 88%, 37% and 52%, respectively. The precision and recall for low risk were 100%.

Easy Ensemble AdaBoost Classifier

The balanced accuracy score was 93% and the precision for high risk was 9% with 92% recall. The precision and recall for low risk were 100% and 94%, respectively.

Summary

Below is the summary of the top three models which scored the highest balanced accuracy scores:-

  • Easy Ensemble Classifier 93%.
  • Random Forest 68%.
  • SMOTE 66%.

Regarding Random Forest model, the precision and recall for low risk were 100%, however the accuracy score was lower than Ensemble Classifier. The Easy Ensemble model has predicted 93 observations as actual high risk (true positive) out of 101 observations (92%). Similarly, the actual low risk was predicted by 94% as true negative (16121 out of 17104). Additionally, the accuracy score for Ensemble model was 93%. Therefore, I would recommend the Easy Ensemble Classifier to use for prediction credit risk.

credit-risk-analysis's People

Contributors

tekateka avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.