Code Monkey home page Code Monkey logo

nlp-notes's Introduction

NLP Notes

Attention

Additive/concat Attention

Multiplicative Attention

Multi-head Self Attention / Transformer

Subword Tokenization

  • Summary: HuggingFace Tokenizer Summary
  • Implementation: HuggingFace Tokenizer, Google SentencePiece
  • Unigram Language Model (ULM)
    • assume all subword occurence are independent and subword sequence is produced by the product of subword occurrence probabilities
    • optimize for whole sentence likelihood probability (Viterbi Algorithm)
    • both WP and ULM leverages language model to build subword vocabulary
  • Byte Pair Encoding (BPE)
    • start from character level, form a new subword based on the next highest frequency pair until reaching desired vocabulary size or the next highest frequency is 1
    • used in GPT-2, RoBERTa, see Git Issue for implementation
    • tokenizers.CharBPETokenizer: OpenAIGPTTokenizerFast,
    • tokenizers.ByteLevelBPETokenizer: GPT2TokenizerFast, RobertaTokenizerFast, LongformerTokenizerFast
  • WordPiece (WP)
    • similar to BPE but "choose the new word unit out of all possible ones that increase the likelihood on the training data the most when added to the model"
      • define log P(sentence) = Σ log P(token_i)
        when merge adjacent tokens x and y into z
        the change in likelihood is log P(token_z) - (log P(token_x) + log P(token_y))
    • tokenizers.BertWordPieceTokenizer: BertTokenizerFast, DistilBertTokenizerFast, ElectraTokenizerFast, RetriBertTokenizerFast, MobileBertTokenizerFast

Industrial Application

Google Neural Machine Translation System

Concept applied: Additive/concat attention, Residual connection, Vanilla dropout
Resources: [Paper][Illustrative Intro][TF2 Implementation][Torch Implementation]

BERT: Bidirectional Encoder Representations from Transformers

Resources: [Paper]

Probabilistic Graph

Conditional Random Field

Resources: [Introduction to CRF][CRF vs MRF][CRF for Multi-label Classification]  [Tensorflow CRF];

Bi-LSTM CRF

Resources: [Paper][TF1.0 Implementation by Scofield]

Label Attention Network

Resources: [Paper][Torch Implementation by Author]

Modeling Tricks

Transformer Training

Pre-Layer Normalization Transformer: [Paper]
Training Tips for Transformer: [Paper]

Recurrent Neural Network Normalization

Resources: [Methodology Overview][Layer Normalization]
Experience: use BatchNormalization or LayerNormalization after each RNN layer

Recurrent Neural Network Dropout

Resources: [Methodology Overview][Vanilla Dropout][Variational Dropout][Recurrent Dropout]
Experience: set dropout ratio between 0.1 and 0.3, begin with vanilla dropout

nlp-notes's People

Contributors

ywu94 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.