Code Monkey home page Code Monkey logo

how-to-distill-your-bert's Introduction

How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives

Initial code release for the paper:

How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives (ACL 2023)

Xinpeng Wang, Leonie Weissweiler, Hinrich Schütze and Barbara Plank.

Task-Specific-Distillation

We inherit Fairseq framework for task-specific-distillation of the RoBERTa model.

Train

run task_specific_distillation/experiments.py for task-specific distillation on RoBERTa model.

python experiments.py  --task {task} --method {method} -e {experiment} -s {stage} --mapping {mapping} --init {init} --group {group} --seeds {seeds} 

task: mnli, qnli, sst-2, cola, mrpc, qqp, rte

method: kd, hidden_mse_learn, hidden_mse_token, crd, att_kl_learn, att_mse_learn

Task-Agnostic-Distillation

The task-anostic-distillation code is based on the work izsak-etal-2021-train.

Data Preperation

The dataset directory includes scripts to pre-process the datasets we used in our experiments (Wikipedia, Bookcorpus). See dedicated README for full details.

Pretrain

run task_agnostic_distillation/experiments.py for distilling a transformer model from BERT_large during the pre-training stage.

python -m torch.distributed.launch run_pretraining.py --method {distillation_objective} --student_initialize ... 

See task_agnostic_distillation/README.md for a complete bash code example and detailed explanation of all the training configuration.

Finetuning

Run task_agnostic_distillation/run_glue.py for finetuning a saved checkpoint on GLUE tasks.

example :

python run_glue.py \
  --model_name_or_path <path to model> \
  --task_name MRPC \
  --max_seq_length 128 \
  --output_dir /tmp/finetuning \
  --overwrite_output_dir \
  --do_train --do_eval \
  --evaluation_strategy steps \
  --per_device_train_batch_size 32 --gradient_accumulation_steps 1 \
  --per_device_eval_batch_size 32 \
  --learning_rate 5e-5 \
  --weight_decay 0.01 \
  --eval_steps 50 --evaluation_strategy steps \
  --max_grad_norm 1.0 \
  --num_train_epochs 5 \
  --lr_scheduler_type polynomial \
  --warmup_steps 50

Cite

@misc{wang2023distill,
      title={How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives}, 
      author={Xinpeng Wang and Leonie Weissweiler and Hinrich Schütze and Barbara Plank},
      year={2023},
      eprint={2305.15032},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

how-to-distill-your-bert's People

Contributors

xinpeng-wang avatar

Stargazers

Shareef Ifthekhar avatar Ogundepo Odunayo avatar

Forkers

techthiyanes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.