Code Monkey home page Code Monkey logo

fasterseq's Introduction

About FasterSeq

As the fast development of the Internet of Things and other mobile devices, the deployment of faster and more efficient deep learning techniques needs more efforts. I create this repository for exploring and developing state-of-the-art accelerating computation and efficient deep learning techniques for faster sequence modeling. The base library we choose is fairseq which is an open-source library for sequence modeling developed and maintained by facebook artificial intelligence research lab.

Requirements and Installation

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • To install fairseq and develop locally:
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
  • For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./
  • For large datasets install PyArrow: pip install pyarrow
  • If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run.
List of implemented techniques (ongoing)

The authors from MIT-Han-Lab have already published their fairseq-based codes, one may check out here. Additionally, I re-implement the Lite-Transformer since the authors seem used an old version of fairseq and it may cause further conflicts especially when you need to apply the incremenral_state function. Before you test the model, please make sure you install the cuda version lightConv and dynamicConv by:

cd fairseq/modules/lightconv_layer
python cuda_function_gen.py
python setup.py install
cd fairseq/modules/dynamicconv_layer
python cuda_function_gen.py
python setup.py install

Here we use IWSLT'14 dataset as an example to demo how to train a new Lite-Transformer.

First download and preprocess the data:

# Download and prepare the data
cd examples/translation/
bash prepare-iwslt14.sh
cd ../..

# Preprocess/binarize the data
TEXT=examples/translation/iwslt14.tokenized.de-en
fairseq-preprocess --source-lang de --target-lang en \
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
    --destdir data-bin/iwslt14.tokenized.de-en \
    --workers 20

Next we'll train a new Lite-Transformer translation model over this data: we choose the transformer_multibranch_iwslt_de_en architecture, and save all the training log into 'train.log' file.

CUDA_VISIBLE_DEVICES=0 fairseq-train \
    data-bin/iwslt14.tokenized.de-en \
    --arch transformer_multibranch_iwslt_de_en \
    --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.2 --weight-decay 0.0001 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 4096 --max-update 50000 \
    --encoder-branch-type attn:1:80:4 lightweight:default:80:4 \
    --decoder-branch-type attn:1:80:4 lightweight:default:80:4 \
    --weight-dropout 0.1 \
    --encoder-embed-dim 160 --decoder-embed-dim 160 \
    --encoder-ffn-embed-dim 160 --decoder-ffn-embed-dim 160 \
    --save-dir checkpoints/transformer_multibranch \
    --tensorboard-logdir checkpoints/transformer_multibranch/log > checkpoints/transformer_multibranch/train.log

Finally we can evaluate our trained model:

fairseq-generate data-bin/iwslt14.tokenized.de-en \
    --path checkpoints/transformer_multibranch/checkpoint_best.pt \
    --batch-size 128 --beam 4 --remove-bpe --lenpen 0.6

Later on, I will focus on all the state-of-the-art faster sequence modeling techniques, including re-implementation, validating and try to give a track on all the effective techniques to make efficient computation and accelerating in deep learning based on fairseq (More is Comming.. )

Getting Started

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks. Fore complete details, please go check the fairseq website https://github.com/pytorch/fairseq.

Pre-trained models and examples

fairseq provides pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.

We also have more detailed READMEs to reproduce results from specific papers:

fasterseq's People

Contributors

zackchen-lb avatar

Stargazers

Qiyao Wu avatar xiaopenhu avatar Feifei Wong avatar Ririko avatar 夜幕 avatar  avatar Zhida Chen avatar kkkkkkliu avatar Xi Jing avatar Miki A332 avatar Sierra Guo avatar Alan Yang avatar Kai Ge avatar Ray Musk avatar 芳心縱火范特西 avatar 龙佚 avatar WangYiChen avatar wei_zh37 avatar Shawn Charles avatar Pupu Hui avatar 瓜瓜 avatar Steele Koch avatar 王力没有红 avatar Coder @ Picnic avatar Claude avatar Leon Zhu avatar Alex Yi avatar porschebz avatar 曾俊峰 avatar Alice Wang avatar X-LEFT avatar DevilAgent avatar xiaoming avatar Cho'Gath avatar 调参术师 avatar 数据娃掘 avatar Junho Choi avatar  avatar  avatar Eric417 avatar  avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.