Hey Mateo,
Thanks for your work (I really admire your papers on concept-based models 😁) and releasing your code!
I am trying to train Concept Bottleneck Models using this repository on CelebA. Following your approach, I am using 8 concepts, but I don't have any hidden concepts (so, all 8 concepts are used). I tried training independent and sequential concept bottleneck models. This is the config:
trials: 1
results_dir: /mnt/qb/work/bethge/bkr046/CEM/results/CelebA
dataset: celeba
root_dir: /mnt/qb/work/bethge/bkr046/DATASETS/celeba_torchvision #cem/data/
image_size: 64
num_classes: 1000
batch_size: 512
use_imbalance: True
use_binary_vector_class: True
num_concepts: 8
label_binary_width: 1
label_dataset_subsample: 12
#num_hidden_concepts: 0
selected_concepts: False
num_workers: 8
sampling_percent: 1
test_subsampling: 1
intervention_freq: 1
intervention_batch_size: 1024
intervention_policies:
- "group_random_no_prior"
competence_levels: [1, 0]
incompetence_intervention_policies:
- "group_random_no_prior"
skip_repr_evaluation: True
shared_params:
top_k_accuracy: [3, 5, 10]
save_model: True
max_epochs: 200
patience: 15
emb_size: 16
extra_dims: 0
concept_loss_weight: 1
learning_rate: 0.005
weight_decay: 0.000004
weight_loss: False
c_extractor_arch: resnet34
optimizer: sgd
early_stopping_monitor: val_loss
early_stopping_mode: min
early_stopping_delta: 0.0
momentum: 0.9
sigmoidal_prob: False
runs:
- architecture: 'SequentialConceptBottleneckModel'
extra_name: "NoInterventionInTrainingNoHiddenConcepts"
sigmoidal_embedding: False
concat_prob: False
embedding_activation: "leakyrelu"
bool: False
extra_dims: 0
sigmoidal_extra_capacity: False
sigmoidal_prob: True
training_intervention_prob: 0
While doing this, I noticed a few unexpected things:
- Even with 100% interventions, the accuracy of the independent model is less than 50%. Since all concepts are visible, and it's an independent model, the c2y component would have seen all possible concept combinations during training, and therefore should be able to perform well when all ground-truth concepts are provided during interventions.
+---------------------------------------------------------------------------+-----------------+------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
| Method | Task Accuracy | Concept Accuracy | Concept AUC | 25% Int Acc | 50% Int Acc | 75% Int Acc | 100% Int Acc |
+---------------------------------------------------------------------------+-----------------+------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
| IndependentConceptBottleneckModel | 0.2743 ± 0.0000 | 0.8237 ± 0.0000 | 0.8163 ± 0.0000 | 0.3009 ± 0.0000 | 0.3229 ± 0.0000 | 0.3457 ± 0.0000 | 0.3711 ± 0.0000 |
| SequentialConceptBottleneckModel | 0.2784 ± 0.0000 | 0.8237 ± 0.0000 | 0.8163 ± 0.0000 | 0.3107 ± 0.0000 | 0.3409 ± 0.0000 | 0.3584 ± 0.0000 | 0.3711 ± 0.0000 |
+---------------------------------------------------------------------------+-----------------+------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
- The training accuracy of the independent model is also pretty low, which is concerning. This is from the 49th epoch (which looks like the last epoch to me):
Epoch 49: 86%|████████▌ | 24/28 [00:01<00:00, 14.65it/s, loss=3.13, y_accuracy=0.381, y_top_3_accuracy=0.548, y_top_5_accuracy=0.619, y_top_10_accuracy=0.714, val_y_accuracy=0.384, val_y_top_3_accuracy=0.557, val_y_top_5_accuracy=0.629, val_y_top_10_accuracy=0.697]
- The number of classes in the dataset is 230 instead of 256. I guess that is because some of the 256 possible classes never appear in the dataset. Did you see this as well?
Do you think these observations are expected? If yes, it would be really helpful if you could provide some intuition as to why.