Code Monkey home page Code Monkey logo

hfxunlp / transformer Goto Github PK

View Code? Open in Web Editor NEW
63.0 8.0 9.0 1.07 MB

Neutron: A pytorch based implementation of Transformer and its variants.

Home Page: https://github.com/hfxunlp/transformer

License: GNU General Public License v3.0

Python 93.88% Shell 2.91% HTML 0.05% C++ 3.06% C 0.10%
neural-machine-translation transformer seq2seq natural-language-processing attention-is-all-you-need average-attention-network dynamic-sentence-sampling robust-neural-machine-translation beam-search ensemble

transformer's Introduction

Neutron

Neutron: A pytorch based implementation of the Transformer and its variants.

This project is developed with python 3.10.

Setup dependencies

Try pip install -r requirements.txt after you clone the repository.

If you want to use BPE, to enable convertion to C libraries, to try the simple MT server and to support Chinese word segmentation supported by pynlpir in this implementation, you should also install those dependencies in requirements.opt.txt with pip install -r requirements.opt.txt.

Data preprocessing

BPE

We provide scripts to apply Byte-Pair Encoding (BPE) under scripts/bpe/.

convert plain text to tensors for training

Generate training data for train.py with bash scripts/mktrain.sh, configure variables in scripts/mktrain.sh for your usage (the other variables shall comply with those in scripts/bpe/mk.sh).

Configuration for training and testing

Most configurations are managed in cnfg/base.py. Configure advanced details with cnfg/hyp.py.

Training

Just execute the following command to launch the training:

python train.py

Generation

bash scripts/mktest.sh, configure variables in scripts/mktest.sh for your usage (while keep the other settings consistent with those in scripts/mkbpe.sh and scripts/mktrain.sh).

Exporting python files to C libraries

You can convert python classes into C libraries with python mkcy.py build_ext --inplace, and codes will be checked before compiling, which can serve as a simple to way to find typo and bugs as well. This function is supported by Cython. These files can be removed by commands tools/clean/cython.py . and rm -fr build/. Loading modules from compiled C libraries may also accelerate, but not significantly.

Ranking

You can rank your corpus with pre-trained model, per token perplexity will be given for each sequence pair. Use it with:

python rank.py rsf h5f models

where rsf is the result file, h5f is HDF5 formatted input of file of your corpus (genrated like training set with tools/mkiodata.py like in scripts/mktrain.sh), models is a (list of) model file(s) to make perplexity evaluation.

The other files' discription

modules/

Foundamental models needed for the construction of transformer.

loss/

Implementation of label smoothing loss function required by the training of transformer.

lrsch.py

Learning rate schedule model needed according to the paper.

utils/

Functions for basic features, for example, freeze / unfreeze parameters of models, padding list of tensors to same size on assigned dimension.

translator.py

Provide an encapsulation for the whole translation procedure with which you can use the trained model in your application easier.

server.py

An example depends on Flask to provide simple Web service and REST API about how to use the translator, configure those variables before you use it.

transformer/

Implementations of seq2seq models.

parallel/

Multi-GPU parallelization implementation.

datautils/

Supportive functions for data segmentation.

tools/

Scripts to support data processing (e.g. text to tensor), analyzing, model file handling, etc.

Performance

Settings: WMT 2014, English -> German, 32k joint BPE with 8 as vocabulary threshold for BPE. 2 nVidia GTX 1080 Ti GPU(s) for training, 1 for decoding.

Tokenized case-sensitive BLEU measured with multi-bleu.perl, Training speed and decoding speed are measured by the number of target tokens (<eos> counted and <pad> discounted) per second and the number of sentences per second:

BLEU Training Speed Decoding Speed
Attention is all you need 27.3
Neutron 28.07 23213.65 150.15

Acknowledgments

Hongfei Xu is partially supported by the Education Department of Henan Province (Grant No. 232300421386) while maintaining this project.

Details of this project can be found here, and please cite it if you enjoy the implementation :)

@article{xu2019neutron,
  author = {Xu, Hongfei and Liu, Qiuhui},
  title = "{Neutron: An Implementation of the Transformer Translation Model and its Variants}",
  journal = {arXiv preprint arXiv:1903.07402},
  archivePrefix = "arXiv",
  eprinttype = {arxiv},
  eprint = {1903.07402},
  primaryClass = "cs.CL",
  keywords = {Computer Science - Computation and Language},
  year = 2019,
  month = "March",
  url = {https://arxiv.org/abs/1903.07402},
  pdf = {https://arxiv.org/pdf/1903.07402}
}

transformer's People

Contributors

hfxunlp avatar liuqiuhui2015 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformer's Issues

Can't give odd number embedding size for isize

Currently only accepts isize to be even number, below isize=141 and it failed(probably don't know how to split at self.w[:,0::2] and self.w[:,1::2] ?

self.encoder = Encoder(isize=isize, num_layer=6,nwd=vocab_size)

File "./transformer/Encoder.py", line 78, in init
self.pemb = PositionalEmb(isize, xseql, 0, 0)
File "./transformer/modules.py", line 46, in init
self.reset_parameters()
File "./transformer/modules.py", line 68, in reset_parameters
self.w[:, 0::2], self.w[:, 1::2] = torch.sin(pos * rdiv_term), torch.cos(pos * rdiv_term)
RuntimeError: The expanded size of the tensor (70) must match the existing size (71) at non-singleton dimension 1. Target sizes: [512, 70]. Tensor sizes: [512, 71]

Code of your "Learning Hard Retrieval Decoder Attention for Transformers" paper

Hi Hongfei,
Does this repo also contain the implementation of your "Learning Hard Retrieval Decoder Attention for Transformers" paper? If not, will it be released? Based on my understanding, the "hard retrieval" is achieved by replacing P with P'=Multinomial Sampling(P), P=(P'-P).detach()+P. Please kindly correct me if I am wrong.

more empirical results are expected

Hi, Hongfei and Qiuhui, Thanks for your efforts~

From your description in paper, this implementation achieves 28.07 on WMT'14 ENDE. It's great~ Have you try other datasets to validate its effectiveness? Also, have you reproduced the followig variants with your code? i.e., Hier*, TA, SC, DOC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.