Code Monkey home page Code Monkey logo

binarizednmt's Introduction

Binarized Neural Machine Translation

We explore ways to reduce computation and model size for neural machine translation. With the development of binary weight networks and XNOR networks in vision, we attempt to extend that work to machine translation. In particular, we evaluate how binary convolutions can be used in machine translation and their effects.

Datasets

Although our analysis is done on Multi30k dataset, our code supports the following datasets:

  • WMT 14 EN - FR
  • IWSLT
  • Multi30k

Models

Baseline Models

We implement 4 baseline models to compare our binarized models against.

Simple LSTM

simplelstm

An encoder decoder model, that encodes the source language with an LSTM, then presents the final hidden state to the decoder. The decoder uses the final hidden state to decode the output.

Attention RNN

attentionlstm

An encoder decoder model, similar to the last but at every decoder step applies an attention mechanism over all the encoder outputs conditioned on the current hidden state.

Attention QRNN

The same model as above, but using QRNN (Quasi Recurrent Neural Network developed by Salesforce Research) instead of LSTMs. QRNN should be much faster since the rely on lower level convolutions and can be parallelized further than Attention RNN.

ConvS2S

ConvS2S

This model (implemented by FAIR) rather than using RNNs, creates a series of convolutional layers that are used for the encoder, and decoder along with attention.

Binarized Models

We implement two variants of binarized networks to compare performance.

ConvS2S Binarized Weight Networks

This model is the same as the one implemented above, with one key difference. All the weights are represented as a binary tensor β, and a normalization vector such that W ≈ β · α. The benefit here is that a convolution can be estimated as (I · β) · α

ConvS2S XNOR network

This model extends upon the binarized weight network. The input is binarized as well so the convolutions can be estimated as (sign(I) · sign(β)) · α.

Notable Results

Translation Performance

BLEU

Other stats can be found in this issue

Model Size

We compare model size of two different sets of models. First the models we ran our Multi30k experiments on. Then the large models. Since our dataset is quite a bit smaller, we also ran experiments on the size of the models that are used for larger translation datasets such as WMT, and note the hyper parameters reported in their papers.

ModelSize

LargeModelSize

Set Up

A short cut to do all the setup:

# creates a virutal environment and downloads the data
$ bash setup.sh

To set up the python code create a python3 environment with the following:

# create a virtual environment
$ python3 -m venv env

# activate environment
$ source env/bin/activate

# install all requirements
$ pip install -r requirements.txt

If you add a new package you will have to update the requirements.txt with the following command:

# add new packages
$ pip freeze > requirements.txt

And if you want to deactivate the virtual environment

# decativate the virtual env
$ deactivate

# if using python 3.7.x, no official tensorflow distro is available so use this for mac:
$ pip install https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0-py3-none-any.whl

# use this for linux
$ pip install https://github.com/adrianodennanni/tensorflow-1.12.0-cp37-cp37m-linux_x86_64/blob/master/tensorflow-1.12.0-cp37-cp37m-linux_x86_64.whl?raw=true

References

  1. Attention and Simple LSTM Pictures

  2. FairSeq ConvS2S Gif Original

Papers

  1. XNOR - Net: Paper
  2. Multi bit quantization networks: Paper
  3. Binarized LSTM Language Model: Paper
  4. Fair Seq Convolutinal Sequence Learning: Paper
  5. Quasi Recurrent Networks: Paper
  6. WMT 14 Translation Task Paper
  7. Attention is all you need Paper
  8. Imagination improves multimodal translation Paper
  9. Multi30k dataset Paper
  10. IWSLT paper

Githubs and Links

  1. Pytorch MT Seq2Seq Tutorial
  2. XNOR-net AI2
  3. Annotated Transformer (Harvard NLP)
  4. Salesforce QRNN Pytorch
  5. Fair Seq
  6. Torchtext
  7. XNOR NET Pytorch

binarizednmt's People

Contributors

akshatsh avatar kevinb22 avatar sarahyu17 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.