Code Monkey home page Code Monkey logo

function_vectors's People

Contributors

ericwtodd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

function_vectors's Issues

Get the same clean_nll and intervention_nll when running n_shot_eval with compute_nll=True in eval_utils.py

Thanks for the great work!

I get the same clean_nll and intervention_nll when I run n_shot_eval with compute_nll=True in eval_utils.py.

I think it's because intervention_fv sets a wrong idx in add_function_vector for the case of compute_nll=True in intervention_utils.py. In this case, the input of the model is nll_inputs, which contains target. The intervention should no longer be applied to the last token of the model input (nll_inputs) as the other cases do, but to the last token of the original sentence (inputs).

intervention_fn = add_function_vector(edit_layer, function_vector.reshape(1, model_config['resid_dim']), model.device)
with TraceDict(model, layers=model_config['layer_hook_names'], edit_output=intervention_fn):
if compute_nll:
output = model(**nll_inputs, labels=nll_targets)
intervention_nll = output.loss.item()
intervention_output = output.logits[:,original_pred_idx,:]

What I think should fix the bug is replacing

intervention_fn = add_function_vector(edit_layer, function_vector.reshape(1, model_config['resid_dim']), model.device) 

by

    if compute_nll:
        idx = -1 - target_len
    else:
        idx = -1 
    intervention_fn = add_function_vector(edit_layer, function_vector.reshape(1, model_config['resid_dim']), model.device, idx=idx)

Does this make sense to you? Also it would be helpful if you could fix other codes possibly related to this bug. Thanks!

Great work!

Great work! Really enjoyed reading the paper & already have some followup ideas.

Thanks for publishing your code & data!

Best,
Chris

Getting all zero CIE matrix for GPT-NeoX

Thanks for the great work!
I'm trying to recompute function vectors for GPT-NeoX (and Pythia models using the same setup you have for GPT-NeoX). However, after running compute_indirect_effect.py, I get all zero CIE matrices (indirect_effect.pt).

I think this is due to the ablation intervention not working on these models, so clean probs and intervention probs are always equal hence why the difference is always 0.

Other models I tried don't have this issue (GPT2, LLaMa).

Do you know how to make the intervention work for GPT-NeoX models? Thanks!

sentence_eval last token logits

In eval_utils.py function sentence_eval (lines 138-189), why are the logits of the last token in sentence (ICL prompt) taken, and not the logits of the last token in the target_completion (which consists of the sentence + target)? I have attached the relevant lines below for reference

Line 157: inputs = tokenizer(sentence, return_tensors='pt').to(device)

Line 158: original_pred_idx = len(inputs.input_ids.squeeze()) - 1

Line 170: clean_output = output.logits[:,original_pred_idx,:]

Line 181: clean_output = model(**inputs).logits[:,-1,:]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.