Code Monkey home page Code Monkey logo

recidivismcasestudy's Introduction

Recidivism Case Study

This case study is based on two articles that were published in 2016:

Both articles are about COMPAS, a statistical tool used in the justice system to assign defendants a "risk score" that is intended to reflect the risk that they will commit another crime if released.

The ProPublica article evaluates COMPAS as a binary classifier and compares its error rates for black and white defendants. It concludes that COMPAS is unfair to black defendants because they are more likely to be misclassified as high risk.

In response, the Washington Post article shows that COMPAS has the same predictive value for black and white defendants. And they explain that the test cannot have the same predictive value and the same error rates at the same time.

The purpose of this case study is to understand these conflicting claims, to learn about classification algorithms and the metrics we use to evaluate them, and to think about fairness and the ethics of data science.

The notebooks

  • In the first notebook I replicate the analysis from the ProPublica article and define the basic metrics we use to evaluate classification algorithms, including error rates and predictive values.

  • In the second notebook I replicate the analysis from the WaPo article and define the calibration curve, the ROC curve, and a related metric, concordance.

  • In the third notebook I use the same methods to evaluate the performance of COMPAS for male and female defendants, and lay out the fundamental conflict between two definitions of fairness.

These three notebooks are intended to support a module in a data science class that engages students in the context and ethical challenges of machine learning.

Slides

I used these notebooks for a module of my Data Science class at Olin College.

Over the course of three class sessions, I presented these slides and led a discussion with students. This happened in Spring 2020 when classes were run remotely, so the discussions were not as effective as they could have been. For next time I hope to develop a richer set of discussion questions.

Additional notebooks

This repository contains three additional notebooks with additional explorations that you might be interested in. They are not essential to understand the issues, and they are less complete than the first three notebook.

  • The fourth notebook proves what I asserted in the second notebook: if you are given prevalence and error rates, you can compute predictive values; and if you are given prevalence and predictive values, you can compute error rates.

  • The fifth notebook demonstrates that the challenge of defining fairness between groups gets harder as we consider more groups, and identifies the groups with the highest and lowest errors and predictive values.

  • The sixth notebook explores what I call "the other calibration curve", the probability of being classified high risk as a function of the probability of recidivism.

I include these notebook in part to resist the temptation to hide my development process. I worked on this case study on and off over several years. I explored a lot of things and took a lot of wrong turns. It took me a long time to find the story, get it organized, and strike a balance between two conflicting goals: maintaining the scientific detachment that lets us tackle difficult topics while keeping sight of the context, the people, and the human consequences.

I hope these materials will be engaging and informative for readers, and useful for teaching and learning the ethical practice of data science.

recidivismcasestudy's People

Contributors

a1ip avatar allendowney avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.