princetonvisualai / gan-debiasing Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 15.0 2.3 MB

Fair Attribute Classification through Latent Space De-biasing (CVPR 2021)

Python 100.00%

gan-debiasing's People

Contributors

Stargazers

Watchers

Forkers

kmfeng ggmartins chaoso liuguoyou smurugu mvisionai lucasmiranda42 bacoco shawkui xuzikang jiazhi412 swati1-ud wormlove jenderkieee888 mojeedoyedeji

gan-debiasing's Issues

Wrong notation for Bias Amplification in the paper

Hi,
in the paper page 5, there is this explanation about Bias Amplification:

For each pair of target and protected attribute values, we add (Pt|g − Ptˆ|g)
if Pt,g > PtPg and −(Pt|g − Ptˆ|g) otherwise.
where
Pt|g be the fraction of images with protected attribute g that have target attribute t, 
Ptˆ|g be the fraction of images with protected attribute g that are **predicted** to have target attribute t,

However, based on the original paper of Directional Bias Amplification

I think the correct notation should be:

we add (Pt^|g − Pt|g) if Pt,g > PtPg and −(Pt^|g − Pt|g) otherwise.

since we are measuring A -> T.

The code also suggests that the difference is from (Pt^|g − Pt|g).

gan-debiasing/utils.py

Line 256 in 6cec2a2

diff[i][j] = pred_bog[i][j] - data_bog[i][j]

Did I understand it correctly? or maybe I missed something. Thanks!

FileNotFoundError: [Errno 2] No such file or directory: 'data/fake_images/all_Male_scores.pkl'

Hi All,
I've recently started working on checking the Fairness of the model on visualization tasks.

while running the linear.py code (gan-debiasing project) I got the exception telling FileNotFoundError: all_Male_scores.pkl.

please let me know if I'm doing any mistakes running in the below procedure :

Downloaded CelebA dataset and put it in data/celeb
Ran crop_images.py to crop the images 128*128
Ran main.py --experiment baseline to train a standard attribute classifier for each target attribute
Ran generate_images.py --experiment orig
Ran get_scores.py -->Note: it was only generating the all_Smiling_scores.pkl file . it was not generating the 'all_Male_scores.pkl' file.
Ran linear.py --> throws an exception( "all_Male_scores.pkl file not found error")

Thanks.

self.network not defined, should be changed to self.model

gan-debiasing/Models/attr_classifier.py

Line 43 in 505f085

self.network.train()

Query: Requirements/dependencies

Respected sir
While implementing the repository I was unable to find the related text file with the requirements/dependencies for the code to run. I request you to please guide me.

Edit: Do you recommend installing the dependencies given in the reference coursera google colab notebook

# Import libraries.
import numpy as np
np.random.seed(123)
from sklearn import svm
import matplotlib.pyplot as plt

Crash main.py#144

deo, deo_std = utils.bootstrap_deo(val_targets[:, 1], val_targets[:, 0], val_pred)

bootstrap_deo doesnt have a default value for repeats

Training time

Hi!

Could you please add a list of estimated run times for each of the operations/training steps in the list?

Many thanks,
Dominik

Could you add a LICENCE in this repo?

function create_dataset_all in load_data.py

In my opinion, this function is for building a dataset to integrate both fake images and original training data.
The code part that I have questions about is
labeldata = pickle.load(open(fake_params['attr_path'], 'rb')) labeldata = np.tile(labeldata, 2)

The range of labeldata is (0, 175000) before np.tile. However, when assigning the label to fake images, you assign the samples in (15000, 175000) as lables in (0, 160000). Is it a misalignment?
Or there must be some unknown extra preprocessing in "all_{}_scores.pkl".

Should I process it as the domdata?

FileNotFoundError: [Errno 2] No such file or directory: 'data/fake_images/Smiling_scores.pkl'

Hello,

I was trying to run your code and I have been struggling with an error for the past few hours.

This is the error that I met when I tried to run python main.py --experiment model:
FileNotFoundError: [Errno 2] No such file or directory: 'data/fake_images/Smiling_scores.pkl'

I have followed your steps on the readme and I haven't found out where in the code this file is being created.

Full output:
{'experiment': 'model', 'experiment_name': '', 'real_data_dir': 'data/celeba', 'fake_data_dir_orig': 'data/fake_images/AllGenImages/', 'fake_data_dir_new': '', 'fake_scores_target': '', 'fake_scores_protected': '', 'cuda': True, 'random_seed': 0, 'attribute': 31, 'protected_attribute': 20, 'test_mode': False, 'num_train': 160000, 'number': 0, 'device': device(type='cuda'), 'dtype': torch.float32, 'print_freq': 100, 'total_epochs': 20, 'save_folder': 'record/model\Smiling', 'optimizer_setting': {'optimizer': <class 'torch.optim.adam.Adam'>, 'lr': 0.0001, 'weight_decay': 0}, 'dropout': 0.5, 'data_setting': {'real_params': {'path': 'data/celeba', 'attribute': 31, 'protected_attribute': 20, 'number': 0}, 'fake_params': {'path_new': 'data/fake_images/Smiling/', 'path_orig': 'data/fake_images/AllGenImages/', 'attr_path': 'data/fake_images/Smiling_scores.pkl', 'dom_path': 'data/fake_images/all_Male_scores.pkl', 'range_orig_image': (15000, 175000), 'range_orig_label': (160000, 320000), 'range_new': (0, 160000)}, 'augment': True, 'params_train': {'batch_size': 32, 'shuffle': True, 'num_workers': 0}, 'params_val': {'batch_size': 64, 'shuffle': False, 'num_workers': 0}}}
Traceback (most recent call last):
File "main.py", line 197, in
main(opt)
File "main.py", line 63, in main
train = create_dataset_all(
File "load_data.py", line 60, in create_dataset_all
labeldata = pickle.load(open(fake_params['attr_path'], 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'data/fake_images/Smiling_scores.pkl'

Observed Fairness was not dropped for baseline model

Hi ,

Baseline model fairness was not dropped for the Male is protected attribute and Smiling is the target attribute.

I've trained the standard attribute classifier using the below command python main.py --experiment baseline
As per the code, The baseline model is trained on the

Dataset: CelebA training dataset X with 162,770 images,
Hyperparameters: Binary cross-entropy loss for 20 epochs with a batch size of 32, and using Adam optimizer with a learning rate of 1e-4.
Target attribute: Smilling (attribute 31)
Protected attribute: Male (attribute 20)

Fairness metrics for Baseline model :

Training epoch 19: [5001|5087], loss:0.07021914422512054
Avg precision all = 0.9827746330855125
Validation results:
AP : {:.1f} +- {:.1f} 98.52633278341692 0.13046754108938044
DEO : {:.1f} +- {:.1f} 3.3444695945977543 1.1048136540690787
BA : {:.1f} +- {:.1f} 0.06943005310537514 0.3600587245308783
KL : {:.1f} +- {:.1f} 0.01865371993659408 0.03730743987318816
Test results:
AP : {:.1f} +- {:.1f} 98.43127724632386 0.1274417440206799
DEO : {:.1f} +- {:.1f} 2.7732756124918376 1.1788377126697231
BA : {:.1f} +- {:.1f} -0.5825026659862016 0.4052127065723986
KL : {:.1f} +- {:.1f} 0.020252475423118695 0.04050495084623739

if you observe the above fairness matrics (AP, DEO, BA, KL) looks fine for the baseline model,
but the result is shown in the paper for the baseline model is different (DEO, BA, KL is high).

As per my understanding the standard classifier fairness metrics: DEO, BA, KL should be high.

do I need to change anything to train the standard attribute classifier to reproduce the paper results(fairness)?

Please let me know whether I'm doing any mistakes.
Expecting your response.

How to run `main.py` file for both target and protected attribute

Hi,

get_scores.py takes in a command line argument with the attribute. You need to (1) run main.py for both the protected and target attributes, and (2) run get_scores.py with both the protected and target attributes.

Originally posted by @vramaswamy94 in #8 (comment)

can you please provide the steps on how to run main.py for both target and protected attributes (commands to run the main.py file ).

Thank you in advance.

FileNotFoundError: [Errno 2] No such file or directory: 'data/fake_images/Straight_Hair_scores.pkl'

Hello, I was trying to run your code and I have been struggling with an error for the past few hours. This is the error that I met when I tried to run python main.py --experiment model. I saw your previous response, but I have a confusion. You said that running get_scores.py and change the path to the location where the scores are stored could solve it. Should I need to generate scores for the newly generated images "data/fake_images/Straight_Hair/"? If I only modify the out_file parameter, it seems like get_scores.py is still hallucinating scores for the original images "data/fake_images/AllGenImages/" instead of the newly generated ones. What command should I need to run to generate the final Straight_Hair_scores.pkl file that I want? It seems like python get_scores.py --attribute 32 --out_file data/fake_images/Straight_Hair_scores.pkl doesn't achieve the desired outcome I'm aiming for.

Some Questions about Reproducing Experimental Results

I have faithfully followed the README instructions to reproduce the experiment, selecting the target attribute as "StraightHair". However, the baseline and final training results are as follows:

This is the improvement over baseline:

When comparing them to the results mentioned in the paper, I notice a significant difference in terms of both metrics and improvement levels:

How can I address this issue?