Thank you for the great repo. Is there any plan from your side to up

The code is similar to the GPT example in this repo: <div class="snippet-clipboard

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Any plan on upadating the code for LLaMA models? about bertviz HOT 11 OPEN

iBibek commented on August 24, 2024

Any plan on upadating the code for LLaMA models?

from bertviz.

Comments (11)

Bachstelze commented on August 24, 2024 8

The code is similar to the GPT example in this repo:

from transformers import AutoTokenizer, AutoModel
from bertviz import head_view
from bertviz import model_view

# load the model
# Vicuna is an instruction-model based on Llama
model_name = "lmsys/vicuna-7b-delta-v1.1" # mistralai/Mistral-7B-Instruct-v0.1
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_attentions=True)

input_sentence = """The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words.\n
Input: If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n\nOutput:"""
input_sentence = "Generate a positive review for a place."
inputs = tokenizer.encode(input_sentence, return_tensors='pt')
outputs = model(inputs)
attention = outputs[-1]  # Output includes attention weights when output_attentions=True
tokens = tokenizer.convert_ids_to_tokens(inputs[0]) 

# save the complete model view
html_head_view = head_view(attention, tokens, html_action='return')
with open("all_head_view.html", 'w') as file:
    file.write(html_head_view.data)

html_model_view = model_view(attention, tokens, html_action='return')
with open("all_model_view.html", 'w') as file:
    file.write(html_model_view.data)

# save the view just for certain layers if the browser can't display the whole
# shorter inputs are easier to display
layers = [1]
html_head_view = head_view(attention, tokens, html_action='return', include_layers=layers)

with open("short_head_view.html", 'w') as file:
    file.write(html_head_view.data)

html_model_view = model_view(attention, tokens, html_action='return', include_layers=layers)
with open("short_model_view.html", 'w') as file:
    file.write(html_model_view.data)

The loading and processing already take 30 GB of RAM. My machine starts to swap at this point and i just save the html to visualize it after the RAM is free again.

The output looks very repetitive.

In the case of Vicuna (lmsys/vicuna-7b-delta-v1.1 From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning) all heads consist of the same weight shape. Every token can only attend to its previous tokens due to the one-directional objective of GPTS, e.g. the first token can only attend to the start of the sentence token and itself. Interestingly, every token equally shares its weights to all possible tokens. Therefore, the attention weights are strongest for the tokens at the beginning and decrease towards the end. This entails a "L" shape like the positive multiplicative inverse.

Let me know if you find other patterns or have a good explanation for this phenomenon

from bertviz.

Bachstelze commented on August 24, 2024

Doesn't it work as decoder model?
I have successfully run Mistral (with lots of redundant shortcuts). The architecture should be similar.

from bertviz.

iBibek commented on August 24, 2024

@Bachstelze , this is good news.
Can you please share the code (if its possible then )?

from bertviz.

iBibek commented on August 24, 2024

@Bachstelze Thank you so much <3

from bertviz.

MarioRicoIbanez commented on August 24, 2024

Hi! I am trying to use also bertviz with LLMs. But have you manage to see not only self-attentions of the first iteration but the attention of the genearted word too? Using model.generate method.

from bertviz.

Icamd commented on August 24, 2024

Hi! I am trying to use also bertviz with LLMs. But have you manage to see not only self-attentions of the first iteration but the attention of the genearted word too? Using model.generate method.

Have you solve the problem? Thank you!

from bertviz.

MarioRicoIbanez commented on August 24, 2024

Hi, I finally ended up using captum and it works perfectly!

https://captum.ai/tutorials/Llama2_LLM_Attribution

from bertviz.

Icamd commented on August 24, 2024

Hi, I finally ended up using captum and it works perfectly!

https://captum.ai/tutorials/Llama2_LLM_Attribution

Thank you for the information! I find this works as well: https://github.com/mattneary/attention. I will try using captum, thank you!

from bertviz.

Bachstelze commented on August 24, 2024

@Icamd Does https://github.com/mattneary/attention work well with bigger GPTs? Do you know how the attention weights are aggregated into one view?

@MarioRicoIbanez Can we use captum to view the attention pattern?

from bertviz.

Bachstelze commented on August 24, 2024

The Llama 3 model sinks most of the time all attention to the "begin of the text" token.
It is possible to load the model in 4 or 8 quantizations and run BertViz, e.g. in google colab: https://colab.research.google.com/drive/1Fcgug4a6rv9F-Wej0rNveiM_SMNZOtrr?usp=sharing

from bertviz.

iBibek commented on August 24, 2024

@Bachstelze , can you please clarify on the part where you said :

The Llama 3 model sinks most of the time all attention to the "begin of the text" token.

from bertviz.

Any plan on upadating the code for LLaMA models? about bertviz HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent