Code Monkey home page Code Monkey logo

fairbatch's Introduction

FairBatch: Batch Selection for Model Fairness

Authors: Yuji Roh, Kangwook Lee, Steven Euijong Whang, and Changho Suh

In Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021


This repo contains codes used in the ICLR 2021 paper: FairBatch: Batch Selection for Model Fairness

Abstract: Training a fair machine learning model is essential to prevent demographic disparity. Existing techniques for improving model fairness require broad changes in either data preprocessing or model training, rendering themselves difficult-to-adopt for potentially already complex machine learning systems. We address this problem via the lens of bilevel optimization. While keeping the standard training algorithm as an inner optimizer, we incorporate an outer optimizer so as to equip the inner problem with an additional functionality: Adaptively selecting minibatch sizes for the purpose of improving model fairness. Our batch selection algorithm, which we call FairBatch, implements this optimization and supports prominent fairness measures: equal opportunity, equalized odds, and demographic parity. FairBatch comes with a significant implementation benefit -- it does not require any modification to data preprocessing or model training. For instance, a single-line change of PyTorch code for replacing batch selection part of model training suffices to employ FairBatch. Our experiments conducted both on synthetic and benchmark real data demonstrate that FairBatch can provide such functionalities while achieving comparable (or even greater) performances against the state of the arts. Furthermore, FairBatch can readily improve fairness of any pre-trained model simply via fine-tuning. It is also compatible with existing batch selection techniques intended for different purposes, such as faster convergence, thus gracefully achieving multiple purposes.

Setting

This directory is for simulating FairBatch on the synthetic dataset. The program needs PyTorch and Jupyter Notebook.

The directory contains a total of 4 files and 1 child directory: 1 README, 2 python files, 1 jupyter notebook, and the child directory containing 6 numpy files for synthetic data.

Simulation

To simulate FairBatch, please use the jupyter notebook in the directory.

The jupyter notebook will load the data and train the models with three different fairness metrics: equal opportunity, equalized odds, and demographic parity.

Each training utilizes the FairBatch sampler, which is defined in FairBatchSampler.py. The pytorch dataloader serves the batches to the model via the FairBatch sampler. Experiments are repeated 10 times each. After the training, the test accuracy and fairness will be shown.

Other details

The two python files are models.py and FairBatchSampler.py. The models.py file contains a logistic regression architecture and a test function. The FairBatchSampler.py file contains two classes: CustomDataset and FairBatch. The CustomDataset class defines the dataset, and the FairBatch class implements the algorithm of FairBatch as described in the paper.

More detailed explanations of each component can be found in the code as comments. Thanks!

Demos using Google Colab

We also release Google Colab notebooks for fast demos. You can access both the PyTorch version and the TensorFlow version.

Reference

@inproceedings{
roh2021fairbatch,
title={FairBatch: Batch Selection for Model Fairness},
author={Yuji Roh and Kangwook Lee and Steven Euijong Whang and Changho Suh},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=YNnpaAKeCfx}
}

fairbatch's People

Contributors

yuji-roh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fairbatch's Issues

Beyond binary labels/sensitive attributes

Hi Yuji,

Thanks for the great work. It seems that the implementation of the adjust_lambda function assumes binary labels/sensitive attributes, which differs from the approximation for multiclass setting that has been introduced in the paper.

For example,

if yhat_yz[(1, 1)] > yhat_yz[(1, 0)]:
self.lb1 += self.alpha
else:
self.lb1 -= self.alpha
.

I am wondering if there is a plan for supporting multiclass labels/sensitive attributes?

Thanks,
Xudong

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.