Code Monkey home page Code Monkey logo

admin-torch's Introduction

PWC PyTorch PyPI - Python Version GitHub Maintenance PyPI

Admin-Torch

Transformers Training **Stabilized**

What's New?Key IdeaHow To UseDocsExamplesCitationLicense

Here, we provide a plug-in-and-play implementation of Admin, which stabilizes previously-diverged Transformer training and achieves better performance, without introducing additional hyper-parameters. The design of Admin is half-precision friendly and can be reparameterized into the original Transformer.


What's New?

Beyond the original admin implementation:

  1. admin-torch removed the profilling stage and is plug-in-and-play.
  2. admin-torch's implementation is more robust (see below).

Comparison w. the DeepNet Init and the Original Admin Init (on WMT'17 En-De).

Regular batch size (8x4096) Huge batch size (128x4096)
Original Admin
DeepNet
admin-torch

More details can be found in our example.

Key Idea

What complicates Transformer training?

For Transformer f, input x, randomly initialized weight w, we describe its stability (output_change_scale) as

In our study, we show that, an original n-layer Transformer's output_change_scale is O(n), which unstabilizes its training. Admin stabilize Transformer's training by regulating this scale to O(logn) or O(1).

More details can be found in our paper.

How to use?

install

pip install admin-torch==0.1.0

import

import admin_torch

enjoy

def __init__(self, ...):
...
+(residual = admin_torch.as_module(self, self.number_of_sub_layers))+
...

def forward(self, ):
...
-!x = x + f(x)!-
+(x = residual(x, f(x)))+
x = self.LN(x)
...

An elaborated example can be found at our doc, and a real working example can be found at LiyuanLucasLiu/fairseq (training recipe is available at our example).

Citation

Please cite the following papers if you found our model useful. Thanks!

Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, and Jiawei Han (2020). Understanding the Difficulty of Training Transformers. Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP'20).

@inproceedings{liu2020admin,
  title={Understanding the Difficulty of Training Transformers},
  author = {Liu, Liyuan and Liu, Xiaodong and Gao, Jianfeng and Chen, Weizhu and Han, Jiawei},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)},
  year={2020}
}

Xiaodong Liu, Kevin Duh, Liyuan Liu, and Jianfeng Gao (2020). Very Deep Transformers for Neural Machine Translation. arXiv preprint arXiv:2008.07772 (2020).

@inproceedings{liu_deep_2020,
 author = {Liu, Xiaodong and Duh, Kevin and Liu, Liyuan and Gao, Jianfeng},
 booktitle = {arXiv:2008.07772 [cs]},
 title = {Very Deep Transformers for Neural Machine Translation},
 year = {2020}
}

admin-torch's People

Contributors

liyuanlucasliu avatar microsoftopensource avatar microsoft-github-operations[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.