Code Monkey home page Code Monkey logo

transformer_variants's Introduction

Transformer Variants

Transformer introduced a new approach to sequence processing through the Attention Mechanism, revolutionizing the traditional sequential data processing methods. Along with its success, many research studies based on Transformer has conducted. However, most of these studies focused on utilizing Transformer as it is and exploring additional advancements, resulting in a relatively limited number of studies comparing the performance of natural language processing based on the structural changes of the Transformer model itself.

To mend this situation, this repo focuses on structure of the Transformer and implements three Transformer models: Standard Transformer, Recurrent Transformer, and Evolved Transformer. The performance evaluation of each model is conducted in three natural language generation tasks: Neural Machine Translation, Dialogue Generation, and Text Summarization.



Model Architectures

Standard Transformer Recurrent Transformer Evolved Transformer
The most basic Transformer Model Architecture introduced in the Attention Is All You Need paper The recursive layer-connected Transformer model structure introduced in the Universal Transformers paper The advanced Transformer model structure introduced in the The Evolved Transformer



Experimental Setups

Data Setup Model Setup Training Setup
Machine Translation:WMT14 En-De Embedding Dimension: 256 Epochs: 10
Dialogue Generation:Daily Dialogue Hidden Dimension: 256 Batch Size: 32
Text Summarization:Daily Mail PFF Dimension: 512 Learning Rate: 5e-4
Train Data Volumn:100,000 N Heads: 512 iters_to_accumulate: 4
Valid Data Volumn:1,000 N Layers: 6 Gradient Clip Max Norm: 1
Vocab Size: 15,000 N Cells: 3 Apply AMP: True



Result

Model Translation Dialogue Generation Summarization
Standard Transformer - - -
Recurrent Transformer - - -
Evolved Transformer - - -



How to Use

Clone git repo in your env

git clone https://github.com/moon23k/Transformer_Variants.git


Setup Datasets and Tokenizer via setup.py file

python3 setup.py -task ['all', 'translation', 'dialogue', 'summarization']


Actual tasks are done by running run.py file

python3 run.py -task ['translation', 'dialogue', 'summarization']
               -mode ['train', 'test', 'inference']
               -model ['standard', 'recurrent', 'evolved']
               -search ['greedy', 'beam']



Reference

transformer_variants's People

Contributors

moon23k avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.