Code Monkey home page Code Monkey logo

credit_risk_analysis's Introduction

Credit_Risk_Analysis

Overview of the project

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. The purpose of this project is to perform six different machine learning models to analyze credit card risk, evaluate the performance of these models, and make recommendations on whether they should be used to predict credit risk. The project consists of three tasks:

  • Use resampling models to predict credit risk
    • Oversample the data: RandomOverSampler and SMOTE algorithms
    • Undersample the data: ClusterCentroids algorithm
  • Use the combinatorial approach of over- and undersampling SMOTEENN algorithm to predict credit risk
  • Use ensemble classifiers to predict credit risk
    • BalancedRandomForestClassifier algorithm
    • EasyEnsembleClassifier algorithm

Results

I. RandomOverSampler

RandomOverSampleing

  • The balanced accuracy score is 65%.
  • For high risk credits, the precision score is 1%, and the recall is 63%, which means only 1% of the predicted high risk credits are true high risk and 63% of the high risk credits are identified by this model. The F1 score is also quite low (0.02).

II. SMOTE Oversampling

SMOTE

  • The balanced accuracy score is 61.8%.
  • For high risk credits, the precision score is 1%, and the recall is 59%, which means only 1% of the predicted high risk credits are true high risk and 59% of the high risk credits are identified by this model. The F1 score is also quite low (0.02).

III. ClusterCentroids Undersampling

UnderSampling

  • The balanced accuracy score is 51%.
  • For high risk credits, the precision score is 1%, and the recall is 59%, which means only 1% of the predicted high risk credits are true high risk and 59% of the high risk credits are identified by this model. The F1 score is also quite low (0.01).

IV. SMOTEENN Combination Sampling

SMOTEENN

  • The balanced accuracy score is 63.8%.
  • For high risk credits, the precision score is 1%, and the recall is 70%, which means only 1% of the predicted high risk credits are true high risk and 70% of the high risk credits are identified by this model. The F1 score is also quite low (0.02).

V. Balanced Random Forest Classifier

BRFensemble

  • The balanced accuracy score is 78.8%.
  • For high risk credits, the precision score is 4%, and the recall is 67%, which means 4% of the predicted high risk credits are true high risk and 67% of the high risk credits are identified by this model. The F1 score is slightly improved at 0.07.

VI. Easy Ensemble Classifier

EasyEnsembleClassifier

  • The balanced accuracy score is 92.5%.
  • For high risk credits, the precision score is 7%, and the recall is 91%, which means 7% of the predicted high risk credits are true high risk and 91% of the high risk credits are identified by this model. The F1 score is improved more at 0.14.

Summary

In summary, the first four models are not good at predicating high risk credit because (1) Their balanced accuracy scores are all below 70%; (2) Their precision scores for high risk credit are very low (1-2%), incidating a large number of false high risk credits; (3) Their recall scores for high risk credit are also low (70% or below), which are indicative of a large number of false low risk credits.

However, the two ensemble classifiers are much more effective catching high risk credits in comparison to the previous four models. Particularly, the Easy Ensemble Classifer performs the best with highest balanced accuracy score (92.5%), precision score (7%), recall (91%), and F1 score (0.14).

Therefore, among the six machine leanring models, the Easy Ensemble Classifier is recommended to predict high risk credit.

credit_risk_analysis's People

Contributors

lilyhanhub avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.