Code Monkey home page Code Monkey logo

clicotea's Introduction

CLiCoTEA: Cross-Lingual Contextualised Token Embedding Alignment

This code reproduces the results from ACL 2023 paper "Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages".

Installation

These dependencies must be installed:

  • Hatch: for managing the Python package
  • gdown: for downloading the datasets
pip install hatch gdown

Prepare datasets

Download all datasets for training the Cross-Lingual Contextualised Token Embedding Alignment and the Zero-Shot Cross-Lingual transfer to downstream tasks:

bash scripts/datasets/download_datasets.sh data

The archive contains the original files from Flickr30k, SNLI and NLVR2 which are all in English. It also includes the translated files for each language required in the downstream tasks.

Note that the translation of train/dev sets of Flickr30k, SNLI and NVLR2 datasets has be done using Googletrans package, running the following commands:

bash scripts/datasets/prepare_flickr30k.sh
bash scripts/datasets/prepare_snli.sh
bash scripts/datasets/prepare_nlvr2.sh

Compute token alignment with awesome-align model

bash scripts/alignment/token_alignment_flickr30k.sh
bash scripts/alignment/token_alignment_snli.sh
bash scripts/alignment/token_alignment_nlvr2.sh

This should create aligned word pairs in data folder for each dataset as follows:

data/
   flickr30k/
      word_pairs_dev_en-de.json
      word_pairs_dev_en-es.json
      word_pairs_dev_en-id.json
      word_pairs_dev_en-ru.json
      word_pairs_dev_en-tr.json
      word_pairs_train_en-de.json
      word_pairs_train_en-es.json
      word_pairs_train_en-id.json
      word_pairs_train_en-ru.json
      word_pairs_train_en-tr.json
   nlvr2/
      word_pairs_dev_en-id.json
      word_pairs_dev_en-sw.json
      word_pairs_dev_en-ta.json
      word_pairs_dev_en-tr.json
      word_pairs_dev_en-zh-cn.json
      word_pairs_train_en-id.json
      word_pairs_train_en-sw.json
      word_pairs_train_en-ta.json
      word_pairs_train_en-tr.json
      word_pairs_train_en-zh-cn.json
   snli/
      word_pairs_dev_en-ar.json
      word_pairs_dev_en-es.json
      word_pairs_dev_en-fr.json
      word_pairs_dev_en-ru.json
      word_pairs_train_en-ar.json
      word_pairs_train_en-es.json
      word_pairs_train_en-fr.json
      word_pairs_train_en-ru.json

Train CLiCoTEA

Train CLiCoTEA by running the following commands (default options should be modified from the bash script):

# train CLiCoTEA for image/text retrieval on flickr30k in German
bash scripts/embeddings/train_clicotea.sh flickr30k albef_retrieval flickr de

# train CLiCoTEA for visual reasoning on NLVR2 in Swahili
bash scripts/embeddings/train_clicotea.sh nlvr2 albef_nlvr nlvr sw

# train CLiCoTEA for visual entailment on SNLI in French
1bash scripts/embeddings/train_clicotea.sh snli albef_classification ve fr

Note that we start from pre-trained ALBEF models which are available in LAVIS package.

Zero-shot transfer to unseen languages

  1. Download the images of the downstream tasks from the official website:

Text data can be downloaded from the IGLUE Benchmark with:

bash scripts/zero-shot/download_datasets.sh
  1. Run zero-shot evaluation:
DATA_DIR="<path to folder containing test files>"
LANG="<language to test>"
FLICKR30K_IMAGE_ROOT="<place path to image folder>"
COCO_IMAGE_ROOT="<place path to image folder>"
MARVL_IMAGE_ROOT="<place path to image folder>"
PATH_TO_CHECKPOINT="<place path to model checkpoint>"
  • Retrieval task on xFlickrCO
bash scripts/zero-shot/zeroshot_retrieval.sh $DATA_DIR $LANG $FLICKR30K_IMAGE_ROOT $COCO_IMAGE_ROOT $PATH_TO_CHECKPOINT
  • Visual entailment task on XVNLI
bash scripts/zero-shot/zeroshot_ve.sh $DATA_DIR $LANG $FLICKR30K_IMAGE_ROOT $PATH_TO_CHECKPOINT
  • Visual reasoning task on MaRVL
bash scripts/zero-shot/zeroshot_vr.sh $DATA_DIR $LANG $MARVL_IMAGE_ROOT $PATH_TO_CHECKPOINT

Running tests

Running all tests:

hatch run test:run

Or running a specific test:

hatch run test:run -k test_get_token_pairs

Citation

Please cite as:

@inproceedings{clicotea,
   title = "Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages",
    author = "Karoui, Yasmine  and
      Lebret, R{\'e}mi  and
      Foroutan Eghlidi, Negar  and
      Aberer, Karl",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-short.32",
    pages = "366--375",
}

clicotea's People

Stargazers

 avatar Anshu Daur avatar tom zhou avatar vfdev avatar

Watchers

Rémi Lebret avatar  avatar

Forkers

nithinv-27

clicotea's Issues

Averaging sub-token embeddings for token alignment

As described in Section 3.2 Word Alignment of the paper:

For words that are split into sub-word tokens, we consider either the left-most token embedding alignment (i.e., the first sub-word token of a word) or, the average embedding across all sub-word tokens.

The left-most token embedding alignment technique works the best overall, but the average embedding technique gives better performance on MaRVL for Swahili, Tamil, and Indonesian.

The average embedding technique needs to be implemented for the training token alignment embeddings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.