DAPT

This reposity contains codes for paper "Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners".

Requirements

Python=3.6
Install dependencies by pip install -r requirements.txt.

Data Preparation

Prepare processed 16-shot GLUE datasets in data folder by following the instruction here.
This reposity also supports training on SuperGLUE datasets from paper "GPT Understands, Too", and you can put the data folder FewGLUE_32dev in data folder.

How to Run

Quick Start

To run with default setting (whose parameters might not be optimal), use run.py:
- In this paper's setting, please specify encoder=inner2 or encoder=inner, where 2 stands for two stage training in paper;
- For manual prompt setting, encoder=manual;
- For P-Tuning setting in paper "GPT Understands, Too", use encoder=lstm.

$ python run.py -h
usage: run.py [-h] [--encoder {manual,lstm,inner,inner2}] [--task TASK]
              [--num_splits NUM_SPLITS] [--repeat REPEAT] [--load_manual]
              [--extra_mask_rate EXTRA_MASK_RATE]
optional arguments:
  -h, --help            show this help message and exit
  --encoder {manual,lstm,inner,inner2}
  --task TASK
  --num_splits NUM_SPLITS
  --repeat REPEAT
  --load_manual
  --extra_mask_rate EXTRA_MASK_RATE

After training, you can view the results in output/[task]/[manual,lstm,inner,inner2]/16-[13,21,42,87,100]/result.txt.

Search for Optimal Hyper Parameters with W&B

Note the results reported in our paper follows the same evaluation protocol in paper "Making Pre-trained Language Models Better Few-shot Learners", by aggregating the best result of each seed split through parameter grid search.
To reproduce the results in our paper, please register at wandb and use api_key to login and run sweep.py:

$ wandb login
# enter your api_key to login to wandb service...
$ python sweep.py -h
usage: sweep.py [-h]
                [--task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}]
                [--encoder {none,mlp,lstm,inner,inner2}]
                [--seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]]
                [--batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]]
                [--grad_accumulation_steps {1,2,4}]
                [--extra_mask_rate EXTRA_MASK_RATE] [--sweep_id SWEEP_ID]
optional arguments:
  -h, --help            show this help message and exit
  --task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}
  --encoder {none,mlp,lstm,inner,inner2}
  --seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]
  --batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]
  --grad_accumulation_steps {1,2,4}
  --extra_mask_rate EXTRA_MASK_RATE
  --sweep_id SWEEP_ID

And you can view and gather optimal results in detail at wandb.

Specified Running

To run with specified parameters, you can interact with cli.py.
Put task SST-2 as an example (will perform traininng on 1 split at a time):

export CUDA_VISIBLE_DEVICES=0 &&
python3 cli.py \
--data_dir data/k-shot/SST-2/16-13 \
--model_type albert \
--model_name_or_path roberta-large \
--cache_dir pretrain/roberta-large \
--task_name sst-2 \
--output_dir output/sst-2-inner2 \
--do_eval \
--do_train \
--pet_per_gpu_eval_batch_size 8 \
--pet_per_gpu_train_batch_size 16 \
--pet_gradient_accumulation_steps 1 \
--pet_max_seq_length 128 \
--pet_max_steps 250 \
--learning_rate 1e-4 \
--eval_set "test" \
--prompt_encoder_type "inner" \
--two_stage_train

zxlzr / dart Goto Github PK

dart's Introduction

DAPT

Requirements

Data Preparation

How to Run

Quick Start

Search for Optimal Hyper Parameters with W&B

Specified Running

dart's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent