Code Monkey home page Code Monkey logo

mtmtransformer / mtmtransformer Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 12.99 MB

Directing Sentiment and Evoking Recall in GPT-2 from an Attention Bock Based Persistent Memory using a Small Fraction of One Layer and a Case for the Spacial Separation of what might be described as Emotional Implicit Memory and Explicit Memory

License: MIT License

Python 100.00%
machine-learning transformer transformer-memory attention-mechanism explicit gpt-2 implicit memory recall sentiment

mtmtransformer's Introduction

Towards Medium-Term Memory Recall in Transformer Models with a Compute Cost Independent of Context Length:

Directing Sentiment and Evoking Recall in GPT-2 from an Attention Bock Based Persistent Memory using a Small Fraction of One Layer and a Case for the Spacial Separation of what might be described as Emotional Implicit Memory and Explicit Memory

This work proposes a bolt on method which uses a k-nearest neighbour algorithm to search a persistent memory while reusing the internal machinery of the Transformers’ attention blocks to determine what to search for and how to rank the search results. It was found that sentiment direction through what might be described as emotional implicit memory and recall (explicit memory) appear to arise from different layers in GPT-2 (layers 7, 8 and 9 vs. layer 10 respectively). For sentiment direction, it was found that with a positive persistent memory it required a fraction of layers (< 3) and a fraction of each layers’ tokens-heads (< 10%) to change the VADER compound score from a median of -0.13 with an IQR of 1.6 in unadulterated GPT-2 to a median of 0.99 with an IQR of 0.01 with the method. For recall, it was found that the method pulls similar tokens out of the persistent memory as GPT-2 appears to “look” at in the context. Further, it was found that only a fraction of the first context tokens’ tokens-heads in a single layer (5 out of 2160 total tokens-heads) was sufficient to produce similar or better recall results in most cases than when the Wikipedia text was placed in the context. This suggests that the computational cost of recall using this method may be context length independent.

alt text

See the write-up for more details.

Install

pip install torch numpy transformers datasets tiktoken wandb tqdm faiss shutil

Dependencies:

  • pytorch
  • numpy
  • transformers for huggingface transformers (to load GPT-2 checkpoints)
  • datasets for huggingface datasets (if you want to download + preprocess OpenWebText)
  • tiktoken for OpenAI's fast BPE code
  • wandb for optional logging
  • tqdm for progress bars
  • faiss (Facebook AI Similarity Search) for k-nearest neighbour search use faiss-gpu if you have a NVidia GPU
  • shutil for saving persistent memory

Usage

This project was adapted from Andrej Karpathy's nanoGPT and as such works in a similar fashion.

Three templates have been included in this repository. One which demonstrates directing sentiment, one that demonstrates recall and one which produces text from the unadulterated version of GPT-2. To explore changing the various parameters of the methods or change the persistent memory, edit/create new templates from the templates in the config directory.

Sentiment Direction

This work takes a slightly different perspective on sentiment direction. Since it is an attempt to replicate emotional implicit memory, the perspective is more akin to the difference in a persons’ response to a mouse running across the diner table when they have one of the two below memories:

1. “mice are disgusting vectors of disease”
2. “mice are so cute especially when being fed”

To trial a template for sentiment direction which uses the Type 3 method and has the same parameters as Table 1 in the write-up use the below:

$ python sample.py config/sample_gpt2_type_3_method.py 

The above template loads its "memory" from a previously stored file. As would be expected using a "memory" created for the original GPT-2 it does not work well with GPT-2 model weights that have been finetuned and are different from the original version.

Recall

GPT-2 is not the ideal model to test recall. To get around this a Wikipedia paragraph and contrived question were chosen such that GPT-2 got the expected answer 50 out of 100 tries/seeds when the Wikipedia paragraph was placed in the context prior the the contrived question.

To trial the template that places the Wikipedia paragraph in the persistent memory and the question in the context and uses the Type 2 method with the same parameters as Table 4 in the write-up use the below.

$ python sample.py config/sample_gpt2_type_2_method.py 

GPT-2

The template to generate text through unadulterated GPT-2 is below:

$ python sample.py config/sample_gpt2.py 

Using More Capable Models

It is difficult to determine from GPT-2 if this will be an effective method. If someone is curious enough to try this out on a more capable model I would be incredibly interested to learn how it worked out. Perhaps leave a note in issues.

Acknowledgements

Andrej Karpathy for generously releasing the nanoGPT repository under the MIT license.

mtmtransformer's People

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.