Code Monkey home page Code Monkey logo

attempt's Introduction

ATTEMPT: Attentional Mixture of Prompt Tuning

This includes an original implementation of Akari Asai, Mohammadreza Salehi, Matthew E. Peters, Hannaneh Hajishirzi. "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts". In Proc. of EMNLP. 2022.

attempt_overview

Acknowledgements: We used the huggingface's transformers and dataset libraries. The implementations of the baselines are from the Compacter repository. Huge thanks to the contributors of those amazing repositories!

Content

  1. Installation
  2. ATTEMPT
  3. Baselines
  4. Trained checkpoints
  5. Citations and Contact

Installation

please run the command below to install the dependent libraries.

conda create -n attempt_env python=3.8
conda activate attempt_env
python setup.py develop

ATTEMPT

ATTEMPT consists of two-step training: Source Prompt Training and Target Prompt Training.

Training

  1. Source Prompt Training: ATTEPT first trains a set of soft prompts on several large-scale dataset, which we call source prompts.

  2. Target Prompt Training: For a target task, ATTEMPT newly initializes a target task prompt as well as an attention module G and learns to interpolate the source prompts and the new task prompts using the attention weights generated by G.

Source Prompt Training

python run_seq2seq.py prompt_tuning.json

You can download a set of the prompts by running the command below:

cd attempt
wget https://homes.cs.washington.edu/~akari/models/attempt/source_prompts.zip
unzip source_prompts
rm source_prompts.zip
cd ..

We provides source prompts for different size of T5 models (T5-base, large and 3b). Please see more details in the Trained checkpoints section.

Target Prompt Training (single-task)

Once you obtain the source prompts, you can run target prompt training.

python run_seq2seq.py configs/attempt/sinlge_task.json

Target Prompt Training (multi-task)

To train ATTEMPT on multiple target task simultaneously as discussed in our paper Section 3.3 (Mixed-task Mini-Batch training), you simply need to set multiple tasks for the task_name parameters (make sure you also set dataset_config_name; you can just add "en" for each).

e.g.,

"task_name": ["superglue-boolq", "superglue-cb", "superglue-wic", "superglue-wsc.fixed"],
"dataset_config_name": ["en", "en", "en", "en"],

An example command to conduct multi-task training for SuperGLUE is as follows:

python run_seq2seq.py configs/attempt/multitask_superglue.json

Evaluation

You can run evaluations by running the eval_seq2seq.py script.

  • Run trained model on a single target task
python eval_seq2seq.py configs/attempt/eval_single_task.json
  • Run trained model on multiple target tasks
python eval_seq2seq.py configs/attempt/eval_suerglue.json

Baselines

As in ATTEMPT, you can configure the parameters in a config.json file. See the details of the hyper-parameters in config.

The Adapter, Baseline, Prompt Tuning and fine-tuning baseline implementations are mostly from the awesome compacter paper with some minor modifications.

Standard Fine-tuning

A comment to run a standard fine-tuning is shown below.

python run_seq2seq.py configs/baselines/finetuning.json

Prompt tuning

Prompt Tuning (Lester et al., 2021) insert a small embedding (prompt) in front of input to be fed into a frozen LM. During training, only this prompt embedding will be updated.

python run_seq2seq.py configs/baselines/prompt_tuning.json

SPoT

SPoT (Vu et al., 2022) initialize a target task prompt with a pretrained prompt to boost prompt tuning performance. To run the SPoT baseline, you first need to acquire some source prompt using the prompt tuning method.

We also provide a set of trained source prompts. See instructions at the Trained checkpoints section.

python run_seq2seq.py configs/baselines/spot.json
Important config parameters
  • prompt_embedding_path (a list of str): a list of a prompt embeddings you want to load.

  • load_prefix_embeddings (bool): set always true for SPoT to initialize your target task prompt with the prompt embedding you passed via prompt_embedding_path option.

  • save_prefix_only (bool): set true if you want to save a prompt embedding only to avoid copying and saving the untouched LMs for every time!

Adapter

Adapter (Houlsby et al., 2019) inserts light-weight layers after transformer layers.

python run_seq2seq.py configs/baselines/adapter.json
Important config parameters
  • task_reduction_factor (int): control how much you reduce the number of parameters in Adapters. Bigger number means less parameters to be updated. By default we set task_reduction_factor to be 32 as in Mahabadi et al. (2021).

BitFit

BitFit (Zaken et al., 2022) only updates the bias terms of the original LM for each task.

python run_seq2seq.py configs/baselines/bitfit.json

Trained checkpoints

Source prompts

T5-base

To download the trained source prompts for T5-base, please run the command below:

wget https://homes.cs.washington.edu/~akari/models/attempt/attempt_large_source.zip
unzip source_prompts

T5-large

wget https://homes.cs.washington.edu/~akari/models/attempt/source_prompts.zip
unzip source_prompts

T5-3B

wget https://homes.cs.washington.edu/~akari/models/attempt/attempt_3b_source.zip
unzip source_prompts

Pretrained attention weigjts

wget https://homes.cs.washington.edu/~akari/models/attempt/attn_pretrain_nlu.zip

Target task embeddings

The target task embeddings are available at google drive.

For example, you can download and reproduce our paper results by running the following commands.

  • SuperGLUE ATTEMPT-mt (attempt_mt_superglue.zip)
python eval_seq2seq.py configs/attempt/eval_suerglue.json
  • GLUE ATTEMPT-mt (attempt_mt_glue.zip)
python eval_seq2seq.py configs/attempt/eval_glue.json

Note: the current eval_seq2seq.py script assumes all multiple tasks use the same metrics, so for the tasks using different metrics, you need to run evaluation separately. I'll latter add support for multiple metrics support for an easier evaluation pipeline.

Citation and Contact

If you find this repository helpful, please cite our paper.

@inproceedings{asai2022attempt ,
  title={Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts},
  author={ Asai, Akari and Salehi, Mohammadreza, Peters, Matthew E and Hajishirzi, Hannaneh},
  journal={EMNLP},
  year={ 2022 }
}

If you have any questions about the paper, feel free to contact Akari Asai (akari[at]cs.washington.edu) or open an issue, and mention @AkariAsai

attempt's People

Contributors

akariasai avatar nguyenvuthientrang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.