As the fast development of the Internet of Things and other mobile devices, the deployment of faster and more efficient deep learning techniques needs more efforts. I create this repository for exploring and developing state-of-the-art accelerating computation and efficient deep learning techniques for faster sequence modeling. The base library we choose is fairseq which is an open-source library for sequence modeling developed and maintained by facebook artificial intelligence research lab.
- PyTorch version >= 1.4.0
- Python version >= 3.6
- For training new models, you'll also need an NVIDIA GPU and NCCL
- To install fairseq and develop locally:
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
- For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--deprecated_fused_adam" --global-option="--xentropy" \
--global-option="--fast_multihead_attn" ./
- For large datasets install PyArrow:
pip install pyarrow
- If you use Docker make sure to increase the shared memory size either with
--ipc=host
or--shm-size
as command line options tonvidia-docker run
.
List of implemented techniques (ongoing)
- Transformer (self-attention) networks
The authors from MIT-Han-Lab have already published their fairseq-based codes, one may check out here. Additionally, I re-implement the Lite-Transformer since the authors seem used an old version of fairseq and it may cause further conflicts especially when you need to apply the incremenral_state function. Before you test the model, please make sure you install the cuda version lightConv and dynamicConv by:
cd fairseq/modules/lightconv_layer
python cuda_function_gen.py
python setup.py install
cd fairseq/modules/dynamicconv_layer
python cuda_function_gen.py
python setup.py install
Here we use IWSLT'14 dataset as an example to demo how to train a new Lite-Transformer.
First download and preprocess the data:
# Download and prepare the data
cd examples/translation/
bash prepare-iwslt14.sh
cd ../..
# Preprocess/binarize the data
TEXT=examples/translation/iwslt14.tokenized.de-en
fairseq-preprocess --source-lang de --target-lang en \
--trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
--destdir data-bin/iwslt14.tokenized.de-en \
--workers 20
Next we'll train a new Lite-Transformer translation model over this data: we choose the transformer_multibranch_iwslt_de_en architecture, and save all the training log into 'train.log' file.
CUDA_VISIBLE_DEVICES=0 fairseq-train \
data-bin/iwslt14.tokenized.de-en \
--arch transformer_multibranch_iwslt_de_en \
--share-decoder-input-output-embed \
--optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
--lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
--dropout 0.2 --weight-decay 0.0001 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--max-tokens 4096 --max-update 50000 \
--encoder-branch-type attn:1:80:4 lightweight:default:80:4 \
--decoder-branch-type attn:1:80:4 lightweight:default:80:4 \
--weight-dropout 0.1 \
--encoder-embed-dim 160 --decoder-embed-dim 160 \
--encoder-ffn-embed-dim 160 --decoder-ffn-embed-dim 160 \
--save-dir checkpoints/transformer_multibranch \
--tensorboard-logdir checkpoints/transformer_multibranch/log > checkpoints/transformer_multibranch/train.log
Finally we can evaluate our trained model:
fairseq-generate data-bin/iwslt14.tokenized.de-en \
--path checkpoints/transformer_multibranch/checkpoint_best.pt \
--batch-size 128 --beam 4 --remove-bpe --lenpen 0.6
Later on, I will focus on all the state-of-the-art faster sequence modeling techniques, including re-implementation, validating and try to give a track on all the effective techniques to make efficient computation and accelerating in deep learning based on fairseq (More is Comming.. )
The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks. Fore complete details, please go check the fairseq website https://github.com/pytorch/fairseq.
fairseq provides pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.
- Translation: convolutional and transformer models are available
- Language Modeling: convolutional and transformer models are available
We also have more detailed READMEs to reproduce results from specific papers:
- Training with Quantization Noise for Extreme Model Compression
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
- Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)