Comments (26)
If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?
你可以尝试将sequential_edit设置为False,多次编辑ROME和MEMIT确实很差,很多文献都提到过。
https://arxiv.org/abs/2211.11031
https://arxiv.org/abs/2405.14768
https://arxiv.org/abs/2402.10987
https://arxiv.org/abs/2403.05330
from easyedit.
Thanks very much. We will check this hyperparameter.
![image](https://private-user-images.githubusercontent.com/63391142/339672451-e9276662-778d-4173-8ca3-e6cd95329bed.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjM3NzQwNDEsIm5iZiI6MTcyMzc3Mzc0MSwicGF0aCI6Ii82MzM5MTE0Mi8zMzk2NzI0NTEtZTkyNzY2NjItNzc4ZC00MTczLThjYTMtZTZjZDk1MzI5YmVkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODE2VDAyMDIyMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQwZjJmNWNlNGMwNjI2ODkxODFmNzlkZjI5ODU1NjMyNDhmZDYzNThiMzRhZDE4MzVhMTMzN2UyOGI1MmUyMzAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MniBMo244O-Fu8hbuyuT3c8AUgNuS3JAlRk3sCfy7uY)
from easyedit.
Hi, this file has been updated recently, we will check the code assp.
from easyedit.
Hello, I will carefully check it tomorrow. If it's convenient for you, could you please explain the significance of your modification? I would be very grateful.
from easyedit.
Recently, I am also doing knowledge editing.
I heard about EasyEdit from my friends and want to borrow the evaluation code of this tool in my approach.
When I used the evaluation code last night, I found that the results produced by slice_list
were not always expected results.
i.e., when there is padding, the results are incorrect.
For example:
prompt_target: On which continent is Abrit Nunatak located? South America <|endoftext|>
Input_ids: [ 818, 644, 1175, 750, 3271, 1375, 335, 5557, 1907, 287, 30, 2159, 1810, 314, 50256]
in that batch data, the padding_size = 1 for the current case
here:
num_prompt_toks = 11 ([818, 644, 1175, 750, 3271, 1375, 335, 5557, 1907, 287, 30])
num_pad_toks = 1 ([50256])
prompt_len = 12
slice result for label:
return [row[start_index:] for row, start_index in zip(matrix, start_indices)]
[1810, 314, 50256]
due to: [0:818, 1:644, 2:1175, 3:750, 4:3271, 5:1375, 6:335, 7:5557, 8:1907, 9:287, 10:30, 11:2159, 12:1810, 13:314, 14:50256]
expected results: [2159, 1810, 314]
The motivation behind the above modification (Mentioned last night):
the
prompt_target
input_ids could be treated as [promt_len, answer_len, pad_len]
so, we can slice thelabels
byrow[start_index[0]-1:start_index[0]+start_index[1]-1]
wherestart_index[0]
andstart_index[1]
store the prompt_len and answer_len
these two could be calculated by:
prompt_len = [(x,y) for x,y in zip(num_prompt_toks, num_answer_toks)]
It is the same for the slice of answer.
I hope I do not understand your code incorrectly.
from easyedit.
I attempt to reproduce your issue, and it is indeed a complex case.
-
The model you are editing should be from the GPT-2 series. My suggestion is to remove
<|endoftext|>
. This way, no pad_tokens will interfere with the evaluation. In the original tokenizer, there is no pad_token_id, so most implementations assign the eos_token_id to pad_token_id. This results in an additionalnum_pad_toks
. -
Since modifying the evaluation has a huge impact, we provide minimal changes to this module. You can try my suggestion to remove the
<|endoftext|>
(likeprompt_target: On which continent is Abrit Nunatak located? South America
), which will not affect your editing experiments.
Please give it a try, and I hope everything goes smoothly. Thank you for your valuable feedback.
from easyedit.
Hi pengzju:
Thanks for your suggestions.
Actually, I have revised this evaluation code last night for my approach and it works well.
However, if the editing is performed on a batch of examples, the pad_token_id must be specified (as mentioned by you).
So, the cases with shorter length will be automatically padded with eos_token (if assigning the eos_token_id to pad_token_id), causing an incorrect slice of labels for these cases.
The example I gave above is in this case; and I merely listed a sentence in the batch data for a clearer explanation.
Take a more extreme example (feeding a batch of data):
[
prompt_target_1
,prompt_target_2
,prompt_target_3
...]
If theprompt_target_1
is: On which continent is Abrit Nunatak located? South America <|endoftext|> <|endoftext|> <|endoftext|>
these eos_tokens are added automatically since editing is performed in batch data
Input_ids: [ 818, 644, 1175, 750, 3271, 1375, 335, 5557, 1907, 287, 30, 2159, 1810, 314, 50256, 50256, 50256]
the slice result of label will be [50256, 50256, 50256] due toprompt_len=11+3=14
[0:818, 1:644, 2:1175, 3:750, 4:3271, 5:1375, 6:335, 7:5557, 8:1907, 9:287, 10:30, 11:2159, 12:1810, 13:314, 14:50256, 15:50256, 16:50256]
original code:return [row[start_index:] for row, start_index in zip(matrix, start_indices)]
what do you think?
from easyedit.
-
I understand your point. Regardless, in the scenario of evaluating batch edits, we cannot avoid adding
pad_token
. This will lead to the issue you mentioned. However, in the editor, we split thebatch_size=n
evaluation into n evaluations ofbatch_size=1
. This means that the current EasyEdit Evaluation Module should be bug-free. -
I also find your suggestion very meaningful, but based on the discussion in #302, shouldn't we avoid using tuples for input (e.g.,
prompt_len = [(x,y) for x,y in zip(num_prompt_toks, num_answer_toks)]
)? Instead, we should usetext_a + ' ' + text_b
. I'm not sure if the code you provided aligns with this concatenation format. Could you please provide a new code example (evaluatingtext_a + ' ' + text_b
)? I will modify the corresponding code based on your suggestion.
Thanks again.
from easyedit.
Additionally, I believe your code will only work when padding_side='right'
. Consider if the padding_side
is set to left
, and batch editing is performed, the tokenization of target_new
will result in many pad_token on the left side. These should not be considered as answers or labels but should be ignored.
However, your code includes them as part of the ACC calculation
from easyedit.
I totally agree with you: "If the evaluation into n evaluations of batch_size=1, the evaluation should be bug-free.".
The solution I mentioned above only works well with the setting of padding_side='right'.
from easyedit.
Therefore, batch evaluation is a very complex problem, because different editing methods have different corresponding padding_side
, so we can only unify them into a single evaluation(bs = 1).
from easyedit.
It is a good choice for the complex situations.
from easyedit.
Thank you very much for the discussion. Actually, the solution you provided, num_answer_toks = [len(i) for i in tok(targets).input_ids]
, is also bug-free when bs=1. However, I prefer to keep the original code and hope you understand why I separated all batch evaluations. I wish you success with your experiments.
from easyedit.
Thanks.
from easyedit.
One last question:
Whether it is unreasonable to set paddng_side = left
? because the prompt
should be next to the target
in the generation scenario mentioned by you.
(left)
prompt ... pading ... target
VS (right)prompt target ... padding...
If in the editing phase, does this (paddng_side = left
) affect model editing performance?
from easyedit.
My friends tested ROME and MEMIT on ZsRE dataset (full data: 19086 cases) based on EasyEdit, and the performance is very poor.
(GPT2-1.5B version)
ROME: 'post': {'pre': {'rewrite_acc': 0.20420484207422276, 'rephrase_acc': 0.198043279039821}, 'post': {'rewrite_acc': 0.056103763416715316, 'rephrase_acc': 0.047820100830160556, 'locality': {'neighborhood_acc': 0.005416384270414259}}}
MEMIT: {'pre': {'rewrite_acc': 0.20420484207422276, 'rephrase_acc': 0.198043279039821}, 'post': {'rewrite_acc': 0.14768535151669387, 'rephrase_acc': 0.12228780976030271, 'locality': {'neighborhood_acc': 0.07205569589870872}}}
If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?
Thanks in advance.
from easyedit.
One last question: Whether it is unreasonable to set
paddng_side = left
? because theprompt
should be next to thetarget
in the generation scenario mentioned by you.(left)
prompt ... pading ... target
VS (right)prompt target ... padding...
If in the editing phase, does this (
paddng_side = left
) affect model editing performance?
据我所知,很多自回归模型的预训练阶段padding_side都是left (https://zhuanlan.zhihu.com/p/646852375)
我觉得理论上是不会影响模型性能的。但从事实的角度来讲,model editing会有副作用从而影响通用能力,相关的文献很多,你可以看一看 (https://arxiv.org/abs/2401.07453)
from easyedit.
If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?
你可以尝试将sequential_edit设置为False,多次编辑ROME和MEMIT确实很差,很多文献都提到过。
from easyedit.
Thanks very much. We will check this hyperparameter.
from easyedit.
If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?
你可以尝试将sequential_edit设置为False,多次编辑ROME和MEMIT确实很差,很多文献都提到过。
We checked the sequential_edit setting, which is set to False by default in EasyEdit. Consequently, our test results did not show any improvement. Are there other critical parameters that might influence the final editing performance?
Thank you for your assistance!
from easyedit.
According to EasyEdit's default parameters, the editing effect should not be too poor. Could you provide more detailed information so that we can better assist you?
from easyedit.
hi, could you please provide more details, have you solved your issue yet?
from easyedit.
According to EasyEdit's default parameters, the editing effect should not be too poor. Could you provide more detailed information so that we can better assist you?
Using the configuration of editor=ROME
and base model=gpt2-xl
as an example, these two images show the detailed information of the parameter settings.
from easyedit.
Delete the code:
https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/rome_main.py#L56
https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/rome_main.py#L57
and try it again
from easyedit.
keep_original_weight
will be deprecated, you can ignore this params. I will fix this issue asap
from easyedit.
Do you have any further questions? @YuxinZhangGit
from easyedit.
Related Issues (20)
- Evaluation Question HOT 4
- locality and portability evaluation HOT 8
- GRACE sequential edit result HOT 15
- IKE fluency HOT 2
- ccks gpt2-xl 模型 为什么用roberta模型加载? HOT 2
- R-ROME has poor performance when using GPT2-xl HOT 2
- WISE CONTEXT_TEMPLATES_CACHE HOT 7
- WISE tokenize HOT 7
- what is the meaning of archive HOT 2
- [Wise] Editing loss = nan + 0 HOT 2
- AttributeError: Can't pickle local object 'length_collation.<locals>.collate_fn' HOT 8
- Issue with KN batch editing HOT 2
- Question about ZSRE experiment, how to reproduce results in your WISE paper? HOT 6
- PermissionError: [Errno 13] Permission denied: './results/models/MEND/gpt2-xl' HOT 3
- Using MEND with monkeypatch from higher HOT 10
- CCKS2024 ROME Question HOT 6
- WISE errors HOT 3
- Use EasyEdit with KnowEdit HOT 13
- Error calculation "adj_k" to edit LLAMA-3-8B using MEMIT HOT 4
- 关于wise.py HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from easyedit.