Code Monkey home page Code Monkey logo

mime's Introduction

MIME: Minority Inclusion for Majority Group Enhancement of AI Performance

Pradyumna Chari, Yunhao Ba, Shreeram Athreya, Achuta Kadambi

UCLA, USA

Project webpage

https://visual.ee.ucla.edu/mime.htm/

Citation

@InProceedings{chari2022mime,
      author={Chari, Pradyumna and Ba, Yunhao and Athreya, Shreeram and Kadambi, Achuta},
      title={MIME: Minority Inclusion for Majority Group Enhancement of AI Performance},
      booktitle={ECCV},
      year={2022}
}

(A) INTRODUCTION

Figure 1: Inclusion of minorities can improve performance for majorities. We theoretically describe an effect called Minority Inclusion, Majority Enhancement (MIME). The figure depicts test classification of blue mimes, and an initial training stack, also of blue mimes. If allowed to add one more training sample, it can be better to push an orange mime onto the training stack rather than a blue mime. Test accuracy can increase by pushing orange, even though the test set consists of blue mimes alone.

Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. An oft-held misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets.

Results

Figure 2: When domain gap is small, the MIME effect holds. In the presence of large domain gap, MIME effect is absent. On five datasets, majority performance is maximized with some inclusion of minorities. All experiments are run for several trials and realizations. On dataset six, the gender classification task is rescoped to occur in a high domain gap setting. The majority group is chickens and the minority group is humans. Here, the MIME effect is absent. These observations validate our proposed theory.

(B) SETTING UP THE EXPERIMENTS

This GitHub repository provides access to the code used for the primary results of the paper.

First, please install the necessary dependencies by using the provided requirements.txt file.

Then, please follow the instructions listed out below to set up experiments for the six datasets and generate appropriate results.

FairFace

Notebook for training: FairFace_Model_Training.ipynb

Requirements:

  • Train labels csv file: fairface_label_train.csv
  • Test labels csv file: fairface_label_val.csv
  • Images: fairface-img-margin025-trainval containing the images.
  • Fixed model initialization: model_init_2class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

Link to dataset: https://github.com/dchen236/FairFace

Pet Images

Notebook for training: PetImages_Model_Training.ipynb

Requirements:

  • Training data folder: data
  • Fixed model initialization: resnet34_imp_2class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

We use a manually annotated subset of the original dataset. Link to our annotated dataset: https://drive.google.com/drive/folders/1xH3OsuMrA2UuxqQ8PvqCsyXTIWwBR02L?usp=sharing

UTKFace

Notebook for training: UTKFace_Model_Training.ipynb

Requirements:

  • Training data folder: UTKFace
  • Fixed model initialization: model_init_9class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

Link to dataset: https://susanqq.github.io/UTKFace/ (Aligned and cropped faces)

Chest-Xray14

Notebook for training: Xray_Model_training.ipynb

Requirements:

  • Data folder paths: data/Atelectasis/Male, data/Atelectasis/Female, data/Pneumothorax/Male, data/Pneumothorax/Female
  • Key to the subset of images we have used: Chest-Xray14 dataset details.csv
  • Fixed model initialization: resnet34_imp_2class.pt (to be added by user)

All requirements should be in the same directory as the training notebook.

Link to full dataset: https://nihcc.app.box.com/v/ChestXray-NIHCC

Adult (Census)

Notebook for training: Adult_Model_training.ipynb

Requirements:

  • Training data folder: data (provided)
  • Fixed model initialization: model2_init_2class.pt (to be added by the user)

All requirements should be in the same directory as the training notebook.

Dataset provided in the directory as adults.csv

UTKFace-Chicken gender classification

Notebook for training: Chicken_Model_training.ipynb

Requirements:

  • Data folder paths: dataset/humans, data/chicken/male, data/chicken/female
  • dataset/humans is the UTKFace dataset folder (renamed to 'humans')
  • dataset/chicken is the Chicken gender dataset folder, with cock renamed to male, hen renamed to female. Train and test splits are combined and any redundant files are removed.
  • Fixed model initialization: model_init_2class.pt (to be added by the user)

All requirements should be in the same directory as the training notebook.

Link to (chickens) dataset: https://drive.google.com/drive/folders/1eGq8dWGL0I3rW2B9eJ_casH0_D3x7R73

Plotting the Results

plot_results.pynb - Notebook to plot the trends after completion of runs. Instructions to use included in the notebook. Depending on the dataset, the following sections may need to be changed:

  1. All the checkpoints directories for each trial must be located in a directory named 'all_checkpoints'.
  2. The variable named label - guidance provided in notebook comments.
  3. The variables named name_save_majority and name_save_minority - guidance provided in notebook comments.
  4. The definition of the variable l_dash must be changed based on the dataset - guidance provided in notebook comments.

(C) RUNNING THE EXPERIMENTS

  1. For each dataset, the file structure is to be created according to the above instructions.

  2. For each dataset, the concerned notebook is run across several trials (5 trials for all datasets except the NIH Chest-Xray14 dataset, which is run for 7 trials).

  3. Each trial is characterized by the random seed used (included in the supplementary materials document).

  4. For all minority training ratios for a particular trial, a fixed set of initialization weights are used so that the only differences visible in performance are due to train set configuration. This initialization may be generated by the user manually, and stored according to the naming convention provided. This will be automatically loaded by the notebook.

  5. Each trial produces results in a folder named 'checkpoints'. This is to be renamed and the checkpoints folders for all trials are to be stored in a folder named 'all_checkpoints'. Trends may be plotted using the plot_results.ipynb notebook.

mime's People

Contributors

pradyumnachari avatar shreeramathreya avatar

Forkers

peterzs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.