Code Monkey home page Code Monkey logo

transformer-models-from-scratch's Introduction

Transformer-Models-from-Scratch

This repository contains various transformer models that I implemented from scratch (using PyTorch) when I started to learn Machine Learning. These models include:

  1. encoder-only transformer model for text classification:

  2. decoder-only transformer model (GPT-like) trained for doing n-digit addition

    • GPT_Addition.ipynb Open In Colab
      • The same model (with only about 0.28 million parameters) is trained on 2-digit, 5-digit, 10-digit and 18-digit additions separately, and it got all the 2-digit addition right, and only a very small fraction of the higher digit additions wrong (test accuracy for 18-digit is about 96.6%).
      • The wrong answers that the model gave are mostly off by one or two digits.
  3. full transformer model (encoder + decoder) for machine translation

    • Transformer_Multi30k_German_to_English.ipynb Open In Colab
      • This notebook trained a transformer model of about 26 million parameters on the Multi30k dataset, and achieved a BLEU score of 35.5 on the test set. This BLUE score seems high, which I think one reason is that the sentences in this dataset are relatively simple.
    • Transformer_Chinese_To_English_Translation_news-commentary-v16.ipynb Open In Colab
      • This notebook trained a transformer with about 90 million parameters on the news-commentary-v16 dataset. The main purpose of this notebook is to study how the performance of the model (test loss and BLEU score) changes as training set size increases. The result is shown in the plots at the end of this notebook.

Notes

Transformer_details.pdf (HTML version) contains some details of the transformer model that I found a little bit confusing when I first tried to implement it from scratch.

References

  1. the Attention Is All You Need paper arXiv:1706.03762
  2. The Annotated Transformer by Alexander Rush
  3. GPT-3: Language Models are Few-Shot Learners
  4. Andrej Karpathy's minGPT Github repository: karpathy/minGPT

transformer-models-from-scratch's People

Contributors

hbchen-one avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.