Code Monkey home page Code Monkey logo

han_nmt's Introduction

Description

Implementation of the paper "Document-Level Neural Machine Translation with Hierarchical Attention Networks". It is based on OpenNMT (v.2.1) https://github.com/OpenNMT/OpenNMT-py

This is a restricted version. It DOES NOT work for shards, and multimodal translation.

Preprocess

The data, similary for any NMT baseline, consists of a source file and a target file which are aligned at sentence-level. However, the sentences should be in order for each document (i.e. not shuffled). Additionally, the model requires a file (doc_file) indicating the beginning of each document in the source file. Each line of the doc_file indicates the number of lines at the source file where a new document starts.

Example:

0
10
25

There are 3 documents. The first one from line 0 to line 9, the second from line 10 to 24, the third from line 25 to the end.

Command:

python preprocess.py -train_src [source_file] -train_tgt [target_file] -train_doc [doc_file] 
-valid_src [source_dev_file] -valid_tgt [target_dev_file] -valid_doc [doc_dev_file] -save_data [out_file]

The folder preprocess_TED_zh-en contains the files to preprocess the TED Talks zh-en dataset from https://wit3.fbk.eu/mt.php?release=2015-01.

Training

Training the sentence-level NMT baseline:

python train.py -data [data_set] -save_model [sentence_level_model] -encoder_type transformer -decoder_type transformer -enc_layers 6 -dec_layers 6 -label_smoothing 0.1 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -position_encoding -dropout 0.1 -batch_size 4096 -start_decay_at 20 -report_every 500 -epochs 20 -gpuid 0 -max_generator_batches 16 -batch_type tokens -normalization tokens -accum_count 4 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot 
-train_part sentences

Training HAN-encoder using the sentence-level NMT model:

python train.py -data [data_set] -save_model [HAN_enc_model] -encoder_type transformer -decoder_type transformer -enc_layers 6 -dec_layers 6 -label_smoothing 0.1 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -position_encoding -dropout 0.1 -batch_size 1024 -start_decay_at 2 -report_every 500 -epochs 1 -gpuid 0 -max_generator_batches 32 -batch_type tokens -normalization tokens -accum_count 4 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot 
-train_part all -context_type HAN_enc -context_size 3 -train_from [sentence_level_model]

Training HAN-decoder using the sentence-level NMT model:

python train.py -data [data_set] -save_model [HAN_dec_model] -encoder_type transformer -decoder_type transformer -enc_layers 6 -dec_layers 6 -label_smoothing 0.1 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -position_encoding -dropout 0.1 -batch_size 1024 -start_decay_at 2 -report_every 500 -epochs 1 -gpuid 0 -max_generator_batches 32 -batch_type tokens -normalization tokens -accum_count 4 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot 
-train_part all -context_type HAN_dec -context_size 3 -train_from [sentence_level_model]

Training HAN-joint using the HAN-encoder model:

python train.py -data [data_set] -save_model [HAN_joint_model] -encoder_type transformer -decoder_type transformer -enc_layers 6 -dec_layers 6 -label_smoothing 0.1 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -position_encoding -dropout 0.1 -batch_size 1024 -start_decay_at 2 -report_every 500 -epochs 1 -gpuid 0 -max_generator_batches 32 -batch_type tokens -normalization tokens -accum_count 4 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot 
-train_part all -context_type HAN_join -context_size 3 -train_from [HAN_enc_model]

Input options:

  • train_part: [sentences, context, all]
  • context_type: [HAN_enc, HAN_dec, HAN_join, HAN_dec_source, HAN_dec_context]
  • context_size: number of previous sentences

NOTE: The transformer model is sensitive to variation on hyperparameters. The HAN is also sensitive to the batch size.

Translation

The translation is done sentence by sentence despite not being necesary for HAN_enc or baseline (this could be improved).

python translate.py -model [model] -src [test_source_file] -doc [test_doc_file] 
-output [out_file] -translate_part all -batch_size 1000 -gpu 0

Input options:

  • translate_part: [sentences, all]
  • batch_size: maximun number of sentences to keep in memory at once.

Test files reported in the paper

The output files of the 3 reported systems: transformer NMT, cache NMT, HAN-decoder NMT, HAN-encoder NMT, HAN-encoder-decoder NMT.

  • sub_es-en: Opensubtitles
  • sub_zh-en: TV subtitles
  • TED_es-en: TED Talks WIT 2015
  • TED_zh-en: TED Talks WIT 2014

Reference:

Miculicich, L., Ram, D., Pappas, N. & Henderson, J. Document-Level Neural Machine Translation with Hierarchical Attention Networks. EMNLP 2018. https://www.aclweb.org/anthology/D18-1325/

Contact:

[email protected]

han_nmt's People

Contributors

askender avatar lmiculicich avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.