Code Monkey home page Code Monkey logo

credit_risk_analysis's Introduction

Credit_Risk_Analysis

Overview

A Peer-to-peer lending company wants to use Machine Learning to predict credit risk, for quicker and more reliable loan experiences. This project will use Resampling, employing different techniques from the imbalanced-learn and scikit-learn libraries to build and evaluate learning models on:

  • Balanced Accuracy: How often the classifier is correct
  • Precision: How reliable a positive/negative classifier is.
  • Recall/Sensitivity: The ability of a clssifier to find all the positive/negative samples

To determine the best suited model that accurately predicts and classifies risky credit applications.

Data enivorment

  • Juptyter Notebook
  • Python
  • imbalance-learn
  • scikit learn
  • NumPy
  • Pathlib

Results

Over Sampling: RandomOverSampler:

Screen Shot 2022-10-04 at 11 34 13 AM

Balanced Accuracy: 63%

Precision:

  • Risky Loans = 1%; Model recorded a large number of FALSE positives
  • Good Loans = 100%; Model recorded a large number of True negatives

Recall:

  • Risky Loans = 64%; Model recorded a large number of True positives
  • Good Loans = 63%; Model recorded a low number of False positive

SMOTE: Screen Shot 2022-10-04 at 11 33 02 AM

Balanced Accuracy: 63%

Precision:

  • Risky Loans = 1%; Model recorded a large number of FALSE positives
  • Good Loans = 100%; Model recorded a large number of True negatives

Recall:

  • Risky Loans = 60%; Model recorded a large number of True positives
  • Good Loans = 67%; Model recorded a low number of False positive

Under Sampling: ClusterCentroids: Screen Shot 2022-10-04 at 11 33 15 AM

Balanced Accuracy: 53%

Precision:

  • Risky Loans = 1%; Model recorded a large number of FALSE positives
  • Good Loans = 100%; Model recorded a large number of True negatives

Recall:

  • Risky Loans = 66%; Model recorded a large number of True positives
  • Good Loans = 40%; Model recorded a large number of False positives

Combinations: SMOTEEN: Screen Shot 2022-10-04 at 11 33 42 AM

Balanced Accuracy: 66%

Precision:

  • Risky Loans = 1%; Model recorded a large number of FALSE positives
  • Good Loans = 100%; Model recorded a large number of True negatives

Recall:

  • Risky Loans = 75%; Model recorded a large number of True positives
  • Good Loan = 58%; Model recorded a lower but still large number False positives

BalancedRandomForestClassifier: Screen Shot 2022-10-04 at 11 34 59 AM

Balanced Accuracy: 77%

Precision:

  • Risky Loans = 4%; Model recorded a large number of FALSE positives
  • Good Loans = 100%; Model recorded a large number of True negatives

Recall:

  • Risky Loans = 63%; Model recorded a large number of True positives
  • Good Loans = 92%; Model recorded a large number of True negatives

EasyEnsembleClassifier: Screen Shot 2022-10-04 at 11 35 13 AM

Balanced Accuracy: 89%

Precision:

  • Risky Loans = 7%; Model recorded a large number of FALSE positives
  • Good Loans = 100%; Model recorded a large number of True negatives

Recall:

  • Risky Loans= 84%; Model recorded a large number of True positives
  • Good Loans = 95%; Model recorded a large number of True negatives

Summary

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, different techniques were employed to train and evaluate models with unbalanced classes. Considering the significance level of credit risk when applying for loans, there is a priority hierarchy of Recall/Sensitivity to Precision metrics when it comes to determining which model to deploy. The ClusterCentroids (53%), RandomSampler (63%), SMOTE (63%), and SMOTEEN (66%) classifyng models all had low performing Balanced Accuracy Scores, which denotes “How often the classifier is correct”. I would not recommend use of these models when determining credit risk. The BalancedRandomForestClassifier has a Balanced Accuracy Score of 77% but the Sensitivity Score for risky loans is discouraging. I would recommend adopting the EasyEnsembleClassifier. The model is 89% accurate with distinguishing high-risk loans from low risk loans, and the Recall clusters: The ability of a classifier to find all the positive/negative samples in a dataset , (84%:95%), are as equally accurate.

credit_risk_analysis's People

Contributors

tracari avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.