The transformer_variants from moon23k

Transformer Variants

Transformer introduced a new approach to sequence processing through the Attention Mechanism, revolutionizing the traditional sequential data processing methods. Along with its success, many research studies based on Transformer has conducted. However, most of these studies focused on utilizing Transformer as it is and exploring additional advancements, resulting in a relatively limited number of studies comparing the performance of natural language processing based on the structural changes of the Transformer model itself.

To mend this situation, this repo focuses on structure of the Transformer and implements three Transformer models: Standard Transformer, Recurrent Transformer, and Evolved Transformer. The performance evaluation of each model is conducted in three natural language generation tasks: Neural Machine Translation, Dialogue Generation, and Text Summarization.

Model Architectures

Standard Transformer	Recurrent Transformer	Evolved Transformer

The most basic Transformer Model Architecture introduced in the Attention Is All You Need paper	The recursive layer-connected Transformer model structure introduced in the Universal Transformers paper	The advanced Transformer model structure introduced in the The Evolved Transformer

Experimental Setups

Data Setup	Model Setup	Training Setup
`Machine Translation:` `WMT14 En-De`	`Embedding Dimension:` `256`	`Epochs:` `10`
`Dialogue Generation:` `Daily Dialogue`	`Hidden Dimension:` `256`	`Batch Size:` `32`
`Text Summarization:` `Daily Mail`	`PFF Dimension:` `512`	`Learning Rate:` `5e-4`
`Train Data Volumn:` `100,000`	`N Heads:` `512`	`iters_to_accumulate:` `4`
`Valid Data Volumn:` `1,000`	`N Layers:` `6`	`Gradient Clip Max Norm:` `1`
`Vocab Size:` `15,000`	`N Cells:` `3`	`Apply AMP:` `True`

Result

Model	Translation	Dialogue Generation	Summarization
Standard Transformer	-	-	-
Recurrent Transformer	-	-	-
Evolved Transformer	-	-	-

How to Use

Clone git repo in your env

git clone https://github.com/moon23k/Transformer_Variants.git

Setup Datasets and Tokenizer via setup.py file

python3 setup.py -task ['all', 'translation', 'dialogue', 'summarization']

Actual tasks are done by running run.py file

python3 run.py -task ['translation', 'dialogue', 'summarization']
               -mode ['train', 'test', 'inference']
               -model ['standard', 'recurrent', 'evolved']
               -search ['greedy', 'beam']

moon23k / transformer_variants Goto Github PK

transformer_variants's Introduction

Transformer Variants

Model Architectures

Experimental Setups

Result

How to Use

Reference

transformer_variants's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent