ericwtodd / function_vectors Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 22.0 3.49 MB

Function Vectors in Large Language Models (ICLR 2024)

Home Page: https://functions.baulab.info/

Python 83.17% Jupyter Notebook 16.16% Shell 0.67%

function_vectors's People

Contributors

Stargazers

Watchers

function_vectors's Issues

Get the same clean_nll and intervention_nll when running n_shot_eval with compute_nll=True in eval_utils.py

Thanks for the great work!

I get the same clean_nll and intervention_nll when I run n_shot_eval with compute_nll=True in eval_utils.py.

I think it's because intervention_fv sets a wrong idx in add_function_vector for the case of compute_nll=True in intervention_utils.py. In this case, the input of the model is nll_inputs, which contains target. The intervention should no longer be applied to the last token of the model input (nll_inputs) as the other cases do, but to the last token of the original sentence (inputs).

function_vectors/src/utils/intervention_utils.py

Lines 164 to 169 in 54ec3bf

    
           intervention_fn = add_function_vector(edit_layer, function_vector.reshape(1, model_config['resid_dim']), model.device) 
        
           with TraceDict(model, layers=model_config['layer_hook_names'], edit_output=intervention_fn):      
        
               if compute_nll: 
        
                   output = model(**nll_inputs, labels=nll_targets) 
        
                   intervention_nll = output.loss.item() 
        
                   intervention_output = output.logits[:,original_pred_idx,:]

What I think should fix the bug is replacing

intervention_fn = add_function_vector(edit_layer, function_vector.reshape(1, model_config['resid_dim']), model.device)

    if compute_nll:
        idx = -1 - target_len
    else:
        idx = -1 
    intervention_fn = add_function_vector(edit_layer, function_vector.reshape(1, model_config['resid_dim']), model.device, idx=idx)

Does this make sense to you? Also it would be helpful if you could fix other codes possibly related to this bug. Thanks!

Clarification on code for section 3.1 testing the robustness against different input formats

I saw in section 3.1 and appendix B (Table 8) of your paper that you experimented different input format and tested the robustness. Did you do that by changing the --prefixes argument for src/portability_eval.py?

Where did "top_heads" come from in src/utils/extract_utils.py?

I am assuming that you used causal mediation to identify the attention heads, but do you have dedicated scripts to do that? Thanks in advance for your help!

Great work!

Great work! Really enjoyed reading the paper & already have some followup ideas.

Thanks for publishing your code & data!

Best,
Chris

Getting all zero CIE matrix for GPT-NeoX

Thanks for the great work!
I'm trying to recompute function vectors for GPT-NeoX (and Pythia models using the same setup you have for GPT-NeoX). However, after running compute_indirect_effect.py, I get all zero CIE matrices (indirect_effect.pt).

I think this is due to the ablation intervention not working on these models, so clean probs and intervention probs are always equal hence why the difference is always 0.

Other models I tried don't have this issue (GPT2, LLaMa).

Do you know how to make the intervention work for GPT-NeoX models? Thanks!

sentence_eval last token logits

In eval_utils.py function sentence_eval (lines 138-189), why are the logits of the last token in sentence (ICL prompt) taken, and not the logits of the last token in the target_completion (which consists of the sentence + target)? I have attached the relevant lines below for reference

Line 157: inputs = tokenizer(sentence, return_tensors='pt').to(device)

Line 158: original_pred_idx = len(inputs.input_ids.squeeze()) - 1

Line 170: clean_output = output.logits[:,original_pred_idx,:]

Line 181: clean_output = model(**inputs).logits[:,-1,:]

Could you provide code in Sec 3.2?

Hello, could you provide code in Sec 3.2 that implement the experiment of Table 6, thank you so much.

ericwtodd / function_vectors Goto Github PK

function_vectors's People

Contributors

Stargazers

Watchers

Forkers

function_vectors's Issues

Get the same clean_nll and intervention_nll when running n_shot_eval with compute_nll=True in eval_utils.py

Clarification on code for section 3.1 testing the robustness against different input formats

Where did "top_heads" come from in src/utils/extract_utils.py?

Great work!

Getting all zero CIE matrix for GPT-NeoX

sentence_eval last token logits

Could you provide code in Sec 3.2?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	intervention_fn = add_function_vector(edit_layer, function_vector.reshape(1, model_config['resid_dim']), model.device)
	with TraceDict(model, layers=model_config['layer_hook_names'], edit_output=intervention_fn):
	if compute_nll:
	output = model(**nll_inputs, labels=nll_targets)
	intervention_nll = output.loss.item()
	intervention_output = output.logits[:,original_pred_idx,:]