Code Monkey home page Code Monkey logo

mixed-gradient-few-shot's Introduction

Mixed-Gradient-Few-Shot

This is the repository of the Findings of NAACL 2022 paper "Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer".

We modified the code based on the Xtreme benchmark.

Model Card

We show an example of how we code the stochastic gradient surgery function at sgs_card.py for easier locating the implementation of the method.

Prerequisites

The first step is to build the virtual environment.

conda create --name xtreme --file conda-env.txt
conda activate xtreme
bash install_tools.sh

Download the Data

There are two ways to download the data. The first way is easier: directly downloading our zipped data from google drive:

gdown https://drive.google.com/uc?id=1uA6QHGQ9iBqhYe45A54EPBUwsFTmHxec
unzip download.zip
rm download.zip

All data will be located at the folder download/

The second is following the command to download the data:

bash scripts/download_data.sh

Training

To conduct few-shot learning on various tasks, we run:

bash scripts/train.sh [MODEL] [TASK] [SEED] [FEW_SHOT] [GPU]

The model and results will be stored at outputs/seed-${SEED}/"}

For example, we run a 5-shot learning by fine-tuning xlm-roberta-large on the NER task with random seed 1, which is trained on the GPU 0:

bash scripts/train.sh xlm-roberta-large panx 1 5 0

Note that our codebase only support stochastic gradient surgery for 4 tasks presented in the paper, i.e. panx, udpos, xnli, tydiqa. The 5 seeds used in the paper are 1,2,3,4,5.

Results

You can find your results on the test sets for all target languages in the test_results.txt file. The reader can find them under outputs folder. For example, they are located at:

for panx: outputs/seed-1/panx/xlm-roberta-large-LR2e-5-epoch10-MaxLen128/test_results.txt
for udpos: outputs/seed-1/udpos/xlm-roberta-large-LR2e-5-epoch10-MaxLen128/test_results.txt
for tydiqa: outputs/seed-1/xlm-roberta-base_LR3e-5_EPOCH30_maxlen384_batchsize4_gradacc8/predictions/test_results.txt
for xnli: outputs/seed-1/xlm-roberta-base-LR2e-5-epoch5-MaxLen128/test_results.txt

mixed-gradient-few-shot's People

Contributors

fe1ixxu avatar

Stargazers

Prince Osei Aboagye avatar  avatar  avatar

Watchers

 avatar  avatar

mixed-gradient-few-shot's Issues

About dataset

Hi, a really nice work ! I have a question. I notice that you compare your work with https://aclanthology.org/2021.acl-long.447.pdf in XNLI experiment. But it seems that you didn't experiment on their few shot dataset (https://github.com/fsxlt/buckets) , which is important for model scores. Maybe because their few shot dataset (https://github.com/fsxlt/buckets) have too many buckets(40) ? And I'm curious about the reviewers' opinions from NAACL about this points . Is that acceptable ? Thanks !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.