Credit_Risk_Analysis

Overview

A Peer-to-peer lending company wants to use Machine Learning to predict credit risk, for quicker and more reliable loan experiences. This project will use Resampling, employing different techniques from the imbalanced-learn and scikit-learn libraries to build and evaluate learning models on:

Balanced Accuracy: How often the classifier is correct
Precision: How reliable a positive/negative classifier is.
Recall/Sensitivity: The ability of a clssifier to find all the positive/negative samples

To determine the best suited model that accurately predicts and classifies risky credit applications.

Data enivorment

Juptyter Notebook
Python
imbalance-learn
scikit learn
NumPy
Pathlib

Results

Over Sampling: RandomOverSampler:

Balanced Accuracy: 63%

Precision:

Risky Loans = 1%; Model recorded a large number of FALSE positives
Good Loans = 100%; Model recorded a large number of True negatives

Recall:

Risky Loans = 64%; Model recorded a large number of True positives
Good Loans = 63%; Model recorded a low number of False positive

SMOTE:

Balanced Accuracy: 63%

Precision:

Risky Loans = 1%; Model recorded a large number of FALSE positives
Good Loans = 100%; Model recorded a large number of True negatives

Recall:

Risky Loans = 60%; Model recorded a large number of True positives
Good Loans = 67%; Model recorded a low number of False positive

Under Sampling: ClusterCentroids:

Balanced Accuracy: 53%

Precision:

Risky Loans = 1%; Model recorded a large number of FALSE positives
Good Loans = 100%; Model recorded a large number of True negatives

Recall:

Risky Loans = 66%; Model recorded a large number of True positives
Good Loans = 40%; Model recorded a large number of False positives

Combinations: SMOTEEN:

Balanced Accuracy: 66%

Precision:

Risky Loans = 1%; Model recorded a large number of FALSE positives
Good Loans = 100%; Model recorded a large number of True negatives

Recall:

Risky Loans = 75%; Model recorded a large number of True positives
Good Loan = 58%; Model recorded a lower but still large number False positives

BalancedRandomForestClassifier:

Balanced Accuracy: 77%

Precision:

Risky Loans = 4%; Model recorded a large number of FALSE positives
Good Loans = 100%; Model recorded a large number of True negatives

Recall:

Risky Loans = 63%; Model recorded a large number of True positives
Good Loans = 92%; Model recorded a large number of True negatives

EasyEnsembleClassifier:

Balanced Accuracy: 89%

Precision:

Risky Loans = 7%; Model recorded a large number of FALSE positives
Good Loans = 100%; Model recorded a large number of True negatives

Recall:

Risky Loans= 84%; Model recorded a large number of True positives
Good Loans = 95%; Model recorded a large number of True negatives

Summary

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, different techniques were employed to train and evaluate models with unbalanced classes. Considering the significance level of credit risk when applying for loans, there is a priority hierarchy of Recall/Sensitivity to Precision metrics when it comes to determining which model to deploy. The ClusterCentroids (53%), RandomSampler (63%), SMOTE (63%), and SMOTEEN (66%) classifyng models all had low performing Balanced Accuracy Scores, which denotes “How often the classifier is correct”. I would not recommend use of these models when determining credit risk. The BalancedRandomForestClassifier has a Balanced Accuracy Score of 77% but the Sensitivity Score for risky loans is discouraging. I would recommend adopting the EasyEnsembleClassifier. The model is 89% accurate with distinguishing high-risk loans from low risk loans, and the Recall clusters: The ability of a classifier to find all the positive/negative samples in a dataset , (84%:95%), are as equally accurate.

tracari / credit_risk_analysis Goto Github PK