Code Monkey home page Code Monkey logo

loan_default's Introduction

Predicting Loan Default for Czech Bank

Loan lending plays an important role in our everyday life. However, loan default is still unavoidable, which carries a great risk and may even end up in a financial crisis. Therefore, it is particularly important for a bank to identify whether a candidate is eligible to receive a loan. In the past, the evaluation primarily depended on manual review, which was time-consuming and labor-intensive. Recently, banks have opted for machine learning approaches to automatically predict loan defaults based on certain features since it can highly enhance the accuracy and efficiency of the prediction.

Dataset description

For my analysis, the dataset is “1999 Czech Financial Dataset - Real Anonymized Transactions” which has been obtained from data.world. It contains real anonymized Czech Bank transactions, account information, and loan records released for PKDD’99 Discovery Challenge.

The relation between the 8 tables is given as: image

Data Preprocessing

First datasets are merged based on common columns. Few columns were renamed for the purpose of join() operation. Then columns with more than 50% missing values were removed. Column containing target variable was converted to binary and also converted few categorical columns to numeric format. All the column values were normalized/standardized using scale() as few columns had only single digit values while others had more than 5 digit values. Most important step: In this dataset the number of non – defaulters are 275989 and defaulters are 26262. Under – sampling involves randomly removing instances from the majority class to create a more balanced distribution. This process ensures that the machine learning models are not biased towards predicting the majority class and can better identify patterns associated with the minority class. ovun.sample function from ‘ROSE’ library has been used here.

Data Splitting

In this project we created a 80-20 split that is 80% training data and 20% testing data.

Model Development and Evaluation

For my analysis I considered 3 cases for model development and evaluation for comparison of their performance: Full Model, Reduced model using correlation matrix, and LASSO reduced model.

loan_default's People

Contributors

brunda09 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.