The implementations of 'slice_list' and 'test_prediction_acc' probably be still not co

I understand your point. Regardless, in the scenario of evaluatin

Finding a Bug about easyedit HOT 26 CLOSED

SonglinZhai commented on August 16, 2024

Finding a Bug

from easyedit.

Comments (26)

pengzju commented on August 16, 2024 1

If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?

你可以尝试将sequential_edit设置为False，多次编辑ROME和MEMIT确实很差，很多文献都提到过。

https://arxiv.org/abs/2211.11031
https://arxiv.org/abs/2405.14768
https://arxiv.org/abs/2402.10987
https://arxiv.org/abs/2403.05330

from easyedit.

pengzju commented on August 16, 2024 1

Thanks very much. We will check this hyperparameter.

from easyedit.

zxlzr commented on August 16, 2024

Hi, this file has been updated recently, we will check the code assp.

from easyedit.

XeeKee commented on August 16, 2024

Hello, I will carefully check it tomorrow. If it's convenient for you, could you please explain the significance of your modification? I would be very grateful.

from easyedit.

SonglinZhai commented on August 16, 2024

Recently, I am also doing knowledge editing.
I heard about EasyEdit from my friends and want to borrow the evaluation code of this tool in my approach.
When I used the evaluation code last night, I found that the results produced by slice_list were not always expected results.
i.e., when there is padding, the results are incorrect.

For example:

prompt_target: On which continent is Abrit Nunatak located? South America <|endoftext|>
Input_ids: [ 818, 644, 1175, 750, 3271, 1375, 335, 5557, 1907, 287, 30, 2159, 1810, 314, 50256]
in that batch data, the padding_size = 1 for the current case

here:

num_prompt_toks = 11 ([818, 644, 1175, 750, 3271, 1375, 335, 5557, 1907, 287, 30])
num_pad_toks = 1 ([50256])
prompt_len = 12
slice result for label:
return [row[start_index:] for row, start_index in zip(matrix, start_indices)]
[1810, 314, 50256]
due to: [0:818, 1:644, 2:1175, 3:750, 4:3271, 5:1375, 6:335, 7:5557, 8:1907, 9:287, 10:30, 11:2159, 12:1810, 13:314, 14:50256]
expected results: [2159, 1810, 314]

The motivation behind the above modification (Mentioned last night):

the prompt_target input_ids could be treated as [promt_len, answer_len, pad_len]
so, we can slice the labels by row[start_index[0]-1:start_index[0]+start_index[1]-1]
where start_index[0] and start_index[1] store the prompt_len and answer_len
these two could be calculated by:
prompt_len = [(x,y) for x,y in zip(num_prompt_toks, num_answer_toks)]

It is the same for the slice of answer.
I hope I do not understand your code incorrectly.

from easyedit.

pengzju commented on August 16, 2024

I attempt to reproduce your issue, and it is indeed a complex case.

The model you are editing should be from the GPT-2 series. My suggestion is to remove <|endoftext|>. This way, no pad_tokens will interfere with the evaluation. In the original tokenizer, there is no pad_token_id, so most implementations assign the eos_token_id to pad_token_id. This results in an additional num_pad_toks.
Since modifying the evaluation has a huge impact, we provide minimal changes to this module. You can try my suggestion to remove the <|endoftext|> (like prompt_target: On which continent is Abrit Nunatak located? South America), which will not affect your editing experiments.

Please give it a try, and I hope everything goes smoothly. Thank you for your valuable feedback.

from easyedit.

SonglinZhai commented on August 16, 2024

Hi pengzju:
Thanks for your suggestions.
Actually, I have revised this evaluation code last night for my approach and it works well.

However, if the editing is performed on a batch of examples, the pad_token_id must be specified (as mentioned by you).
So, the cases with shorter length will be automatically padded with eos_token (if assigning the eos_token_id to pad_token_id), causing an incorrect slice of labels for these cases.

The example I gave above is in this case; and I merely listed a sentence in the batch data for a clearer explanation.

Take a more extreme example (feeding a batch of data):

[prompt_target_1, prompt_target_2, prompt_target_3 ...]
If the prompt_target_1 is: On which continent is Abrit Nunatak located? South America <|endoftext|> <|endoftext|> <|endoftext|>
these eos_tokens are added automatically since editing is performed in batch data
Input_ids: [ 818, 644, 1175, 750, 3271, 1375, 335, 5557, 1907, 287, 30, 2159, 1810, 314, 50256, 50256, 50256]
the slice result of label will be [50256, 50256, 50256] due to prompt_len=11+3=14
[0:818, 1:644, 2:1175, 3:750, 4:3271, 5:1375, 6:335, 7:5557, 8:1907, 9:287, 10:30, 11:2159, 12:1810, 13:314, 14:50256, 15:50256, 16:50256]
original code: return [row[start_index:] for row, start_index in zip(matrix, start_indices)]

what do you think?

from easyedit.

pengzju commented on August 16, 2024

I understand your point. Regardless, in the scenario of evaluating batch edits, we cannot avoid adding pad_token. This will lead to the issue you mentioned. However, in the editor, we split the batch_size=n evaluation into n evaluations of batch_size=1. This means that the current EasyEdit Evaluation Module should be bug-free.
I also find your suggestion very meaningful, but based on the discussion in #302, shouldn't we avoid using tuples for input (e.g., prompt_len = [(x,y) for x,y in zip(num_prompt_toks, num_answer_toks)])? Instead, we should use text_a + ' ' + text_b. I'm not sure if the code you provided aligns with this concatenation format. Could you please provide a new code example (evaluating text_a + ' ' + text_b)? I will modify the corresponding code based on your suggestion.

Thanks again.

from easyedit.

pengzju commented on August 16, 2024

Additionally, I believe your code will only work when padding_side='right'. Consider if the padding_side is set to left, and batch editing is performed, the tokenization of target_new will result in many pad_token on the left side. These should not be considered as answers or labels but should be ignored.

However, your code includes them as part of the ACC calculation

from easyedit.

SonglinZhai commented on August 16, 2024

I totally agree with you: "If the evaluation into n evaluations of batch_size=1, the evaluation should be bug-free.".
The solution I mentioned above only works well with the setting of padding_side='right'.

from easyedit.

pengzju commented on August 16, 2024

Therefore, batch evaluation is a very complex problem, because different editing methods have different corresponding padding_side, so we can only unify them into a single evaluation(bs = 1).

from easyedit.

SonglinZhai commented on August 16, 2024

It is a good choice for the complex situations.

from easyedit.

pengzju commented on August 16, 2024

Thank you very much for the discussion. Actually, the solution you provided, num_answer_toks = [len(i) for i in tok(targets).input_ids], is also bug-free when bs=1. However, I prefer to keep the original code and hope you understand why I separated all batch evaluations. I wish you success with your experiments.

from easyedit.

SonglinZhai commented on August 16, 2024

Thanks.

from easyedit.

SonglinZhai commented on August 16, 2024

One last question:
Whether it is unreasonable to set paddng_side = left? because the prompt should be next to the target in the generation scenario mentioned by you.

(left) prompt ... pading ... target VS (right) prompt target ... padding...

If in the editing phase, does this (paddng_side = left) affect model editing performance?

from easyedit.

SonglinZhai commented on August 16, 2024

My friends tested ROME and MEMIT on ZsRE dataset (full data: 19086 cases) based on EasyEdit, and the performance is very poor.

(GPT2-1.5B version)

ROME: 'post': {'pre': {'rewrite_acc': 0.20420484207422276, 'rephrase_acc': 0.198043279039821}, 'post': {'rewrite_acc': 0.056103763416715316, 'rephrase_acc': 0.047820100830160556, 'locality': {'neighborhood_acc': 0.005416384270414259}}}

MEMIT: {'pre': {'rewrite_acc': 0.20420484207422276, 'rephrase_acc': 0.198043279039821}, 'post': {'rewrite_acc': 0.14768535151669387, 'rephrase_acc': 0.12228780976030271, 'locality': {'neighborhood_acc': 0.07205569589870872}}}

If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?
Thanks in advance.

from easyedit.

pengzju commented on August 16, 2024

One last question: Whether it is unreasonable to set paddng_side = left? because the prompt should be next to the target in the generation scenario mentioned by you.

(left) prompt ... pading ... target VS (right) prompt target ... padding...

If in the editing phase, does this (paddng_side = left) affect model editing performance?

据我所知，很多自回归模型的预训练阶段padding_side都是left (https://zhuanlan.zhihu.com/p/646852375）

我觉得理论上是不会影响模型性能的。但从事实的角度来讲，model editing会有副作用从而影响通用能力，相关的文献很多，你可以看一看 (https://arxiv.org/abs/2401.07453)

from easyedit.

pengzju commented on August 16, 2024

If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?

你可以尝试将sequential_edit设置为False，多次编辑ROME和MEMIT确实很差，很多文献都提到过。

from easyedit.

SonglinZhai commented on August 16, 2024

Thanks very much. We will check this hyperparameter.

from easyedit.

YuxinZhangGit commented on August 16, 2024

If there are no bugs, what hyperparameters in EasyEdit do you think might affect performance?

你可以尝试将sequential_edit设置为False，多次编辑ROME和MEMIT确实很差，很多文献都提到过。

We checked the sequential_edit setting, which is set to False by default in EasyEdit. Consequently, our test results did not show any improvement. Are there other critical parameters that might influence the final editing performance?

Thank you for your assistance!

from easyedit.

XeeKee commented on August 16, 2024

According to EasyEdit's default parameters, the editing effect should not be too poor. Could you provide more detailed information so that we can better assist you?

from easyedit.

zxlzr commented on August 16, 2024

hi, could you please provide more details, have you solved your issue yet?

from easyedit.

YuxinZhangGit commented on August 16, 2024

According to EasyEdit's default parameters, the editing effect should not be too poor. Could you provide more detailed information so that we can better assist you?

Using the configuration of editor=ROME and base model=gpt2-xl as an example, these two images show the detailed information of the parameter settings.

from easyedit.

pengzju commented on August 16, 2024

Delete the code:
https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/rome_main.py#L56
https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/rome_main.py#L57

and try it again

from easyedit.

pengzju commented on August 16, 2024

keep_original_weight will be deprecated, you can ignore this params. I will fix this issue asap

from easyedit.

pengzju commented on August 16, 2024

Do you have any further questions? @YuxinZhangGit

from easyedit.

Finding a Bug about easyedit HOT 26 CLOSED

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent