Code Monkey home page Code Monkey logo

dlcl's Introduction

Learning Deep Transformer Models for Machine Translation on Fairseq

The implementation of Learning Deep Transformer Models for Machine Translation [ACL 2019] (Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao)

This code is based on Fairseq v0.5.0

Installation

  1. pip install -r requirements.txt
  2. python setup.py develop
  3. python setup.py install

NOTE: test in torch==0.4.1

Prepare Training Data

  1. Download the preprocessed WMT'16 En-De dataset provided by Google to project root dir

  2. Generate binary dataset at data-bin/wmt16_en_de_google

bash runs/prepare-wmt-en2de.sh

Train

Train deep pre-norm baseline (20-layer encoder)

bash runs/train-wmt-en2de-deep-prenorm-baseline.sh

Train deep post-norm DLCL (25-layer encoder)

bash runs/train-wmt-en2de-deep-postnorm-dlcl.sh

Train deep pre-norm DLCL (30-layer encoder)

bash runs/train-wmt-en2de-deep-prenorm-dlcl.sh

NOTE: BLEU will be calculated automatically when finishing training

Results

Model #Param. Epoch* BLEU
Transformer (base) 65M 20 27.3
Transparent Attention (base, 16L) 137M - 28.0
Transformer (big) 213M 60 28.4
RNMT+ (big) 379M 25 28.5
Layer-wise Coordination (big) 210M* - 29.0
Relative Position Representations (big) 210M 60 29.2
Deep Representation (big) 356M - 29.2
Scailing NMT (big) 210M 70 29.3
Our deep pre-norm Transformer (base, 20L) 106M 20 28.9
Our deep post-norm DLCL (base, 25L) 121M 20 29.2
Our deep pre-norm DLCL (base, 30L) 137M 20 29.3

NOTE: * denotes approximate values.

dlcl's People

Contributors

wangqiangneu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dlcl's Issues

method of application

Excuse me, can I use transformer-DLCL like transformer , how to operate it concretely, and whether I can call it directly? My code is tensorflow

RuntimeError: Model parameter did not receive gradient: encoder.embed_tokens.weight. Use the param in the forward pass or set requires_grad=False

@wangqiangneu 学长您好,我请教您一个问题。我在使用train-wmt-en2de-deep-prenorm-dlcl.sh进行训练时,会出现下面这个问题。按照错误提示我在_forward()里没有找到在哪里改正,也没见requires_grad在哪里,所以请问我需要怎么设置才能正常训练呢?

Traceback (most recent call last):
File "/home/zwc/python-virtual-environments/dlcl-master/multiprocessing_train.py", line 46, in run
single_process_main(args)
File "/home/zwc/python-virtual-environments/dlcl-master/train.py", line 126, in main
train(args, trainer, task, epoch_itr)
File "/home/zwc/python-virtual-environments/dlcl-master/train.py", line 172, in train
log_output = trainer.train_step(sample, update_params=True)
File "/home/zwc/python-virtual-environments/dlcl-master/fairseq/trainer.py", line 152, in train_step
grad_norm = self._all_reduce_and_rescale(grad_denom)
File "/home/zwc/python-virtual-environments/dlcl-master/fairseq/trainer.py", line 228, in _all_reduce_and_rescale
flat_grads = self._flat_grads = self._get_flat_grads(self._flat_grads)
File "/home/zwc/python-virtual-environments/dlcl-master/fairseq/trainer.py", line 253, in _get_flat_grads
grads = self._get_grads()
File "/home/zwc/python-virtual-environments/dlcl-master/fairseq/trainer.py", line 247, in _get_grads
raise RuntimeError('Model parameter did not receive gradient: ' + name + '. '
RuntimeError: Model parameter did not receive gradient: encoder.embed_tokens.weight. Use the param in the forward pass or set requires_grad=False

数据链接失效

你好,数据集的链接好像失效了,请问可以重新生成一个链接吗?谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.