Code Monkey home page Code Monkey logo

awesome-long-context-language-modeling's Introduction

lclm-logoAwesome Long-context Language Modeling papers


Introduction (Draft by ChatGPT😄)

Traditional language models like ChatGPT have a fixed context window, which means they can only consider a limited number of tokens (words or subwords) as input to generate the next word in a sequence. For example, the original GPT-3.5 has a maximum context window of 2,048 tokens. This limitation poses challenges when dealing with longer pieces of text, as it may cut off relevant information beyond the context window.

Figure taken from Longformer

To overcome this limitation and better model long-range dependencies in text, researchers have explored various techniques and architectures. The following are some approaches that might be considered as "long context language models".

PaperList

Memory/Cache-Augmented Models

Some language models incorporate external memory mechanisms, allowing them to store information from past tokens and retrieve it when necessary. These memories enable the model to maintain context over longer segments of text.

Hierarchical Models / Data-Centric or Compress happens in context / key-value

Probably transformer variants, which means that the architecture of the transformers

Transformer Variants (Totally change the KV or position embedding of the transformers)

Researchers have explored variants that can better handle long context by utilizing techniques like sparse attention, axial attention, and reformulating the self-attention mechanism.

Window-Based/On-the-fly Methods

Rather than relying on a fixed context window, some models use a sliding window approach. They process the text in smaller chunks, capturing local dependencies within each window and passing relevant information between adjacent windows.

Analysis

Reinforcement Learning

Some approaches use reinforcement learning to guide the model's attention to focus on important parts of the input text while considering the context.

Benchmark

CV-Inspired

Contact Me

If you have any questions or comments, please feel free to let us know: 📧 Cheng Deng. The main contribution of the contributor:

awesome-long-context-language-modeling's People

Contributors

davendw49 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jluite

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.