Light

ucla-vmg / mime Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 13.84 MB

Code for our ECCV 2022 paper, "MIME: Minority Inclusion for Majority Group Enhancement of AI Performance"

Home Page: https://visual.ee.ucla.edu/mime.htm/

Jupyter Notebook 100.00%

mime's Introduction

MIME: Minority Inclusion for Majority Group Enhancement of AI Performance

Pradyumna Chari, Yunhao Ba, Shreeram Athreya, Achuta Kadambi

UCLA, USA

Project webpage

https://visual.ee.ucla.edu/mime.htm/

Citation

@InProceedings{chari2022mime,
      author={Chari, Pradyumna and Ba, Yunhao and Athreya, Shreeram and Kadambi, Achuta},
      title={MIME: Minority Inclusion for Majority Group Enhancement of AI Performance},
      booktitle={ECCV},
      year={2022}
}

(A) INTRODUCTION

Figure 1: Inclusion of minorities can improve performance for majorities. We theoretically describe an effect called Minority Inclusion, Majority Enhancement (MIME). The figure depicts test classification of blue mimes, and an initial training stack, also of blue mimes. If allowed to add one more training sample, it can be better to push an orange mime onto the training stack rather than a blue mime. Test accuracy can increase by pushing orange, even though the test set consists of blue mimes alone.

Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. An oft-held misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets.

Results

Figure 2: When domain gap is small, the MIME effect holds. In the presence of large domain gap, MIME effect is absent. On five datasets, majority performance is maximized with some inclusion of minorities. All experiments are run for several trials and realizations. On dataset six, the gender classification task is rescoped to occur in a high domain gap setting. The majority group is chickens and the minority group is humans. Here, the MIME effect is absent. These observations validate our proposed theory.

(B) SETTING UP THE EXPERIMENTS

This GitHub repository provides access to the code used for the primary results of the paper.

First, please install the necessary dependencies by using the provided requirements.txt file.

Then, please follow the instructions listed out below to set up experiments for the six datasets and generate appropriate results.

FairFace

Notebook for training: FairFace_Model_Training.ipynb

Requirements:

Train labels csv file: fairface_label_train.csv
Test labels csv file: fairface_label_val.csv
Images: fairface-img-margin025-trainval containing the images.
Fixed model initialization: model_init_2class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

Link to dataset: https://github.com/dchen236/FairFace

Pet Images

Notebook for training: PetImages_Model_Training.ipynb

Requirements:

Training data folder: data
Fixed model initialization: resnet34_imp_2class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

We use a manually annotated subset of the original dataset. Link to our annotated dataset: https://drive.google.com/drive/folders/1xH3OsuMrA2UuxqQ8PvqCsyXTIWwBR02L?usp=sharing

UTKFace

Notebook for training: UTKFace_Model_Training.ipynb

Requirements:

Training data folder: UTKFace
Fixed model initialization: model_init_9class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

Link to dataset: https://susanqq.github.io/UTKFace/ (Aligned and cropped faces)

Chest-Xray14

Notebook for training: Xray_Model_training.ipynb

Requirements:

Data folder paths: data/Atelectasis/Male, data/Atelectasis/Female, data/Pneumothorax/Male, data/Pneumothorax/Female
Key to the subset of images we have used: Chest-Xray14 dataset details.csv
Fixed model initialization: resnet34_imp_2class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

Link to full dataset: https://nihcc.app.box.com/v/ChestXray-NIHCC

Adult (Census)

Notebook for training: Adult_Model_training.ipynb

Requirements:

Training data folder: data (provided)
Fixed model initialization: model2_init_2class.pt (to be added by the user)

All requirements should be in the same directory as the training notebook.

Dataset provided in the directory as adults.csv

UTKFace-Chicken gender classification

Notebook for training: Chicken_Model_training.ipynb

Requirements:

Data folder paths: dataset/humans, data/chicken/male, data/chicken/female
dataset/humans is the UTKFace dataset folder (renamed to 'humans')
dataset/chicken is the Chicken gender dataset folder, with cock renamed to male, hen renamed to female. Train and test splits are combined and any redundant files are removed.
Fixed model initialization: model_init_2class.pt (to be added by the user)

All requirements should be in the same directory as the training notebook.

Link to (chickens) dataset: https://drive.google.com/drive/folders/1eGq8dWGL0I3rW2B9eJ_casH0_D3x7R73

Plotting the Results

plot_results.pynb - Notebook to plot the trends after completion of runs. Instructions to use included in the notebook. Depending on the dataset, the following sections may need to be changed:

All the checkpoints directories for each trial must be located in a directory named 'all_checkpoints'.
The variable named label - guidance provided in notebook comments.
The variables named name_save_majority and name_save_minority - guidance provided in notebook comments.
The definition of the variable l_dash must be changed based on the dataset - guidance provided in notebook comments.

(C) RUNNING THE EXPERIMENTS

For each dataset, the file structure is to be created according to the above instructions.
For each dataset, the concerned notebook is run across several trials (5 trials for all datasets except the NIH Chest-Xray14 dataset, which is run for 7 trials).
Each trial is characterized by the random seed used (included in the supplementary materials document).
For all minority training ratios for a particular trial, a fixed set of initialization weights are used so that the only differences visible in performance are due to train set configuration. This initialization may be generated by the user manually, and stored according to the naming convention provided. This will be automatically loaded by the notebook.
Each trial produces results in a folder named 'checkpoints'. This is to be renamed and the checkpoints folders for all trials are to be stored in a folder named 'all_checkpoints'. Trends may be plotted using the plot_results.ipynb notebook.

mime's People

Contributors

Forkers

peterzs

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.