Code Monkey home page Code Monkey logo

bias-in-credit-models's Introduction

bias-in-credit-models

Machine learning is being deployed to do large-scale decision making, which can strongly impact the life of individuals. By not considering and analysing such scenarios, we may end up building models that fail to treat societies equally and even infringe anti-discrimination laws.

There are several algorithmic interventions to identify unfair treatment based on what is considered to be fair. This project focuses on showing how these interventions can be applied in a case study using a classification-based credit model.

Case Study Outline

I made use of a public loan book from Bondora, a P2P lending platform based in Estonia. I looked into two different protected groups: gender and age.

Bondora provides lending to less credit-worthy customers, with the presence of much higher default rates than seen in traditional banks. This means that the interests collected are significantly higher. On average, the loan amount for this dataset was around €2,100 with a payment duration of 38 months and interest rate of 26.30%. 

For traditional banks, the cost of a false positive (misclassifying a defaulting loan) is many times greater than reward of a true positive (correctly classifying a non-defaulting loan). Given the higher interest rates collected by Bondora compared to banks, I will assume for illustration purposes that the reward to cost ratio is much smaller at 1 to 2. This will be used to find the best thresholds to maximise profits while meeting all requirements for each algorithmic intervention.

I then developed a classification model that predicts whether a loan is likely to be paid back or not using the technique Gradient Boosted Decision Trees. With the results of the model predictions, I then analysed the following scenarios:

  • Maximise profit uses different classification thresholds for each group and only aims at maximising profit. Fairness through unawareness uses the same classification threshold for all groups while maximising profit.
  • Demographic parity applies different classification thresholds for each group, while keeping the same fraction of positives in each group.
  • Equal opportunity uses different classification thresholds for each group, while keeping the same true positive rate in each group.
  • Equalised odds applies different classification thresholds for each group, while keeping the same true positive rate and false positive rate in each group.

Project Structure

I) Data Cleaning

pre_process.py restructures the data by setting it in the right format and renaming as needed for visualisation. The file fill_missing_values.py make restructure the data and fill missing values that will be later used in the modeling phase.

II) Data Exploration

Both notebook take the processed and restructured data and plots the distributions, correlations and missing data.

III) Credit Model

Does a grid search to find the best model using the technique Gradient Boosted Decision Trees. After finding the best model, it saves the predictions and the original data as CSV.

IV) Model Analysis and Unfairness Detection

  • model_performance.ipynb: Reviews the performance of the model using ROC curves and AUC for 'Gender' and 'Age Group.
  • unfairness_measures.py: Finds the best thresholds for each protected class by maximising profits whie meeting each algorithmic intervention requirements. This then saves all results as CSV.
  • model_fairness_interventions.ipynb: Reviews the results for from unfairness_measures.py.

More Information

For more information on each algorithmic intervention and the intepretation of the case study results, go to: https://medium.com/@ValeriaCortezVD/preventing-discriminatory-outcomes-in-credit-models-39e1c6540353

Contact

References

Data

Bondora’s loan book. Available at: https://www.bondora.com/en/public-reports [Accessed August 18, 2018]

Main Literature

Barocas, S., Hardt, M. & Narayanan, A., 2018. Fairness and machine learning. Available at: http://fairmlbook.org/ [Accessed August 29, 2018].

Dwork, C. et al., 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ITCS ’12. New York, NY, USA: ACM, pp. 214–226.

Hardt, M. et al., 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. pp. 3315–3323.

Pedreshi, D., Ruggieri, S. & Turini, F., 2008. Discrimination-aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’08. New York, NY, USA: ACM, pp. 560–568.

bias-in-credit-models's People

Contributors

valeria-io avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.