Code Monkey home page Code Monkey logo

Comments (11)

Bachstelze avatar Bachstelze commented on August 24, 2024 8

The code is similar to the GPT example in this repo:

from transformers import AutoTokenizer, AutoModel
from bertviz import head_view
from bertviz import model_view

# load the model
# Vicuna is an instruction-model based on Llama
model_name = "lmsys/vicuna-7b-delta-v1.1" # mistralai/Mistral-7B-Instruct-v0.1
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_attentions=True)

input_sentence = """The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words.\n
Input: If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n\nOutput:"""
input_sentence = "Generate a positive review for a place."
inputs = tokenizer.encode(input_sentence, return_tensors='pt')
outputs = model(inputs)
attention = outputs[-1]  # Output includes attention weights when output_attentions=True
tokens = tokenizer.convert_ids_to_tokens(inputs[0]) 

# save the complete model view
html_head_view = head_view(attention, tokens, html_action='return')
with open("all_head_view.html", 'w') as file:
    file.write(html_head_view.data)

html_model_view = model_view(attention, tokens, html_action='return')
with open("all_model_view.html", 'w') as file:
    file.write(html_model_view.data)

# save the view just for certain layers if the browser can't display the whole
# shorter inputs are easier to display
layers = [1]
html_head_view = head_view(attention, tokens, html_action='return', include_layers=layers)

with open("short_head_view.html", 'w') as file:
    file.write(html_head_view.data)

html_model_view = model_view(attention, tokens, html_action='return', include_layers=layers)
with open("short_model_view.html", 'w') as file:
    file.write(html_model_view.data)

The loading and processing already take 30 GB of RAM. My machine starts to swap at this point and i just save the html to visualize it after the RAM is free again.

The output looks very repetitive.
model_view_vicuna_small_instruction
head_view_vicuna_small_instruction
long_head_view

In the case of Vicuna (lmsys/vicuna-7b-delta-v1.1 From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning) all heads consist of the same weight shape. Every token can only attend to its previous tokens due to the one-directional objective of GPTS, e.g. the first token can only attend to the start of the sentence token and itself. Interestingly, every token equally shares its weights to all possible tokens. Therefore, the attention weights are strongest for the tokens at the beginning and decrease towards the end. This entails a "L" shape like the positive multiplicative inverse.

Let me know if you find other patterns or have a good explanation for this phenomenon

from bertviz.

Bachstelze avatar Bachstelze commented on August 24, 2024

Doesn't it work as decoder model?
I have successfully run Mistral (with lots of redundant shortcuts). The architecture should be similar.

from bertviz.

iBibek avatar iBibek commented on August 24, 2024

@Bachstelze , this is good news.
Can you please share the code (if its possible then )?

from bertviz.

iBibek avatar iBibek commented on August 24, 2024

@Bachstelze Thank you so much <3

from bertviz.

MarioRicoIbanez avatar MarioRicoIbanez commented on August 24, 2024

Hi! I am trying to use also bertviz with LLMs. But have you manage to see not only self-attentions of the first iteration but the attention of the genearted word too? Using model.generate method.

from bertviz.

Icamd avatar Icamd commented on August 24, 2024

Hi! I am trying to use also bertviz with LLMs. But have you manage to see not only self-attentions of the first iteration but the attention of the genearted word too? Using model.generate method.

Have you solve the problem? Thank you!

from bertviz.

MarioRicoIbanez avatar MarioRicoIbanez commented on August 24, 2024

Hi, I finally ended up using captum and it works perfectly!

https://captum.ai/tutorials/Llama2_LLM_Attribution

from bertviz.

Icamd avatar Icamd commented on August 24, 2024

Hi, I finally ended up using captum and it works perfectly!

https://captum.ai/tutorials/Llama2_LLM_Attribution

Thank you for the information! I find this works as well: https://github.com/mattneary/attention. I will try using captum, thank you!

from bertviz.

Bachstelze avatar Bachstelze commented on August 24, 2024

@Icamd Does https://github.com/mattneary/attention work well with bigger GPTs? Do you know how the attention weights are aggregated into one view?

@MarioRicoIbanez Can we use captum to view the attention pattern?

from bertviz.

Bachstelze avatar Bachstelze commented on August 24, 2024

The Llama 3 model sinks most of the time all attention to the "begin of the text" token.
It is possible to load the model in 4 or 8 quantizations and run BertViz, e.g. in google colab: https://colab.research.google.com/drive/1Fcgug4a6rv9F-Wej0rNveiM_SMNZOtrr?usp=sharing

from bertviz.

iBibek avatar iBibek commented on August 24, 2024

@Bachstelze , can you please clarify on the part where you said :

The Llama 3 model sinks most of the time all attention to the "begin of the text" token.

from bertviz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.