This code reproduces the results from ACL 2023 paper "Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages".
These dependencies must be installed:
pip install hatch gdown
Download all datasets for training the Cross-Lingual Contextualised Token Embedding Alignment and the Zero-Shot Cross-Lingual transfer to downstream tasks:
bash scripts/datasets/download_datasets.sh data
The archive contains the original files from Flickr30k, SNLI and NLVR2 which are all in English. It also includes the translated files for each language required in the downstream tasks.
Note that the translation of train/dev sets of Flickr30k, SNLI and NVLR2 datasets has be done using Googletrans package, running the following commands:
bash scripts/datasets/prepare_flickr30k.sh
bash scripts/datasets/prepare_snli.sh
bash scripts/datasets/prepare_nlvr2.sh
bash scripts/alignment/token_alignment_flickr30k.sh
bash scripts/alignment/token_alignment_snli.sh
bash scripts/alignment/token_alignment_nlvr2.sh
This should create aligned word pairs in data
folder for each dataset as follows:
data/
flickr30k/
word_pairs_dev_en-de.json
word_pairs_dev_en-es.json
word_pairs_dev_en-id.json
word_pairs_dev_en-ru.json
word_pairs_dev_en-tr.json
word_pairs_train_en-de.json
word_pairs_train_en-es.json
word_pairs_train_en-id.json
word_pairs_train_en-ru.json
word_pairs_train_en-tr.json
nlvr2/
word_pairs_dev_en-id.json
word_pairs_dev_en-sw.json
word_pairs_dev_en-ta.json
word_pairs_dev_en-tr.json
word_pairs_dev_en-zh-cn.json
word_pairs_train_en-id.json
word_pairs_train_en-sw.json
word_pairs_train_en-ta.json
word_pairs_train_en-tr.json
word_pairs_train_en-zh-cn.json
snli/
word_pairs_dev_en-ar.json
word_pairs_dev_en-es.json
word_pairs_dev_en-fr.json
word_pairs_dev_en-ru.json
word_pairs_train_en-ar.json
word_pairs_train_en-es.json
word_pairs_train_en-fr.json
word_pairs_train_en-ru.json
Train CLiCoTEA by running the following commands (default options should be modified from the bash script):
# train CLiCoTEA for image/text retrieval on flickr30k in German
bash scripts/embeddings/train_clicotea.sh flickr30k albef_retrieval flickr de
# train CLiCoTEA for visual reasoning on NLVR2 in Swahili
bash scripts/embeddings/train_clicotea.sh nlvr2 albef_nlvr nlvr sw
# train CLiCoTEA for visual entailment on SNLI in French
1bash scripts/embeddings/train_clicotea.sh snli albef_classification ve fr
Note that we start from pre-trained ALBEF models which are available in LAVIS package.
- Download the images of the downstream tasks from the official website:
Text data can be downloaded from the IGLUE Benchmark with:
bash scripts/zero-shot/download_datasets.sh
- Run zero-shot evaluation:
DATA_DIR="<path to folder containing test files>"
LANG="<language to test>"
FLICKR30K_IMAGE_ROOT="<place path to image folder>"
COCO_IMAGE_ROOT="<place path to image folder>"
MARVL_IMAGE_ROOT="<place path to image folder>"
PATH_TO_CHECKPOINT="<place path to model checkpoint>"
- Retrieval task on xFlickrCO
bash scripts/zero-shot/zeroshot_retrieval.sh $DATA_DIR $LANG $FLICKR30K_IMAGE_ROOT $COCO_IMAGE_ROOT $PATH_TO_CHECKPOINT
- Visual entailment task on XVNLI
bash scripts/zero-shot/zeroshot_ve.sh $DATA_DIR $LANG $FLICKR30K_IMAGE_ROOT $PATH_TO_CHECKPOINT
- Visual reasoning task on MaRVL
bash scripts/zero-shot/zeroshot_vr.sh $DATA_DIR $LANG $MARVL_IMAGE_ROOT $PATH_TO_CHECKPOINT
Running all tests:
hatch run test:run
Or running a specific test:
hatch run test:run -k test_get_token_pairs
Please cite as:
@inproceedings{clicotea,
title = "Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages",
author = "Karoui, Yasmine and
Lebret, R{\'e}mi and
Foroutan Eghlidi, Negar and
Aberer, Karl",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-short.32",
pages = "366--375",
}