Code Monkey home page Code Monkey logo

youmaynotneedattention's Introduction

You May Not Need Attention

Code for the Eager Translation Model from the paper You May Not Need Attention by Ofir Press and Noah A. Smith.

Eager Translation Model

The following python packages are required:

  • pytorch 0.4
  • sacreBLEU
  • mosestokenizer

In addition, fast_align is needed to compute alignments.

This code requires Python 3.6+

Preprocessing

Get the translation data

  1. Download the dataset you'd like to use. For this example we'll use Sockeye Autopilot to download the WMT 2014 EN->DE dataset.
sockeye-autopilot --task wmt14_en_de --model none
  1. Enter the directory containing the bye-pair encoded (BPE) version of the data:
cd ./sockeye_autopilot/systems/wmt14_en_de/data/bpe
  1. Unzip everything
gunzip *
  1. Shuffle the tranining data
paste -d '|'  train.src train.trg | shuf | awk -v FS='|' '{ print $1 > "train.shuf.src" ; print $2 > "train.shuf.trg" }'

Run the Eager Feasibility preprocessing

  1. Combine the source and target training data into one file
paste -d ' ||| ' train.shuf.src - - - - train.shuf.trg < /dev/null > combined_srctrg
  1. Use fast_align to find the aligments of the training sentence pairs
./fast_align -i ~/sockeye_autopilot/systems/wmt14_en_de/data/bpe/combined_srctrg -d -o -v > forward.align_ende
  1. Run our script for making the training data Eager Feasible:
python  add_epsilons.py --align forward.align_ende --trg ~/sockeye_autopilot/systems/wmt14_en_de/data/bpe/to_train/train.shuf.trg --src ~/sockeye_autopilot/systems/wmt14_en_de/data/bpe/to_train/train.shuf.src --left_pad 4 --directory ~/corpus/WMTENDE/4pad/ 

Make sure the directory given for the --directory argument actually exists! The --left_pad argument specifies how many initial padding tokens should be inserted into the training dataset.

Note: The add_epsilons script may encounter sentence pairs which fast_align could not find an alignment for. If so, it will delete those lines from the training set. It will then ask you to re-run the script (with the same arguments) in order to finish the process.

Training

Use the following command to train:

python main.py --data ~/corpus/WMTENDE/4pad/  --save ~/exps/ --wdrop 0 --dropout 0.1 --dropouti 0.15 --dropouth 0.1 --dropoutcomb 0.1 --nlayer 4 --epochs 25 --bptt 60 --nhid 1000 --emsize 1000 --batch_size 200 --start_decaying_lr_step 200000 --update_interval 6500

These were the hyperparams used to train the models presented in the paper.

--save specifies where to store the model checkpoints

--nhid is the size of the LSTM

--emsize is double the word embedding size, and must be equivalent to --nhid

Translate

Once you have a trained model, you can use it to translate a document containing sentences in the source language.

python generate.py --checkpoint ~/exps/20181012-022002/model351000.pt  --data  ~/corpus/WMTENDE/4pad/    --src_path  ~/sockeye_autopilot/systems/wmt14_en_de/data/bpe/dev.src  --beam_size 5 --eval  --target_translation ~/sockeye_autopilot/systems/wmt14_en_de/data/raw/dev.trg  --epsilon_limit 3 --src_epsilon_injection 4  --start_pads 5 --language de --save_dir ./output/

This will translate the file found in --src_path , and will save the output in the --save_dir directory. In addition, it will compute the BLEU score.

Reference

If you found this code useful, please cite the following paper:

@article{press2018you,
  title={You May Not Need Attention},
  author={Press, Ofir and Smith, Noah A},
  journal={arXiv preprint arXiv:1810.13409},
  year={2018}
}

Acknowledgments

This repository is based on the code from the AWD-LSTM language model.

youmaynotneedattention's People

Watchers

 avatar  avatar  avatar

Forkers

vgupta123

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.