Code Monkey home page Code Monkey logo

Comments (6)

Deno-V avatar Deno-V commented on July 20, 2024

刚准备问你细节呢LOL

from heterfc.

cuizhiling avatar cuizhiling commented on July 20, 2024

实在抱歉,我在运行您的代码的过程中,又遇到了一个新问题,就是在训练过程中进行Evaluation时,出现以下报错:
.....................................................................................
.....................................................................................
Epoch 1 | Global step: 1580 Loss: 1.0206
Epoch 1 | Global step: 1590 Loss: 1.3909
--------- dev Evaluation Start ---------
Traceback (most recent call last):
File "train.py", line 253, in
strict_score, label_accuracy = evaluate(model, dev_dataloader, report_fine_grained_metric=True)
File "train.py", line 150, in evaluate
prob,_ = model(batch.edge_index, batch.edge_type, batch.x, batch.num_claim_evi,batch.attn_mask, batch.valid_idx,batch.evi_word_cnt,batch.concat_token_ids,batch.concat_attn_mask)
File "/opt/conda/envs/FC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "Fact_Checking/Baseline/HeterFC-main/model.py", line 140, in forward
torch.max(per_claim_words[temp_sum:temp_sum+num_word],dim=0)[0]
IndexError: max(): Expected reduction dim 0 to have non-zero size.

经过debug发现,当执行到module.py, line 140( torch.max(per_claim_words[temp_sum:temp_sum+num_word],dim=0)[0] )时某些batch中会出现例如以下这样的情况:word_cnt[i]的值为: [18, 40, 32, 6, 6, 6, 6, 7, 7, 7, 7, 7, 9, 7, 5, 8, 5, 5, 6, 13, 6, 24, 18, 21, 19, 25, 25, 21, 18, 0],由于word_cnt列表里存在一个为0的值,当num_word取最后这个0的时候,会导致per_claim_words[temp_sum:temp_sum+num_word]变量为空,从而出现IndexError: max(): Expected reduction dim 0 to have non-zero size.的报错,我尝试在当word_cnt为0的时候直接跳过evidence_reps.append的操作,貌似解决了这一bug,但又继续出现了其他新的报错,例如下面这个:

Epoch 1 | Global step: 1590 Loss: 1.3625
--------- dev Evaluation Start ---------
Traceback (most recent call last):
File "Fact_Checking/Baseline/HeterFC-main/train.py", line 150, in evaluate
prob,_ = model(batch.edge_index, batch.edge_type, batch.x, batch.num_claim_evi,batch.attn_mask, batch.valid_idx,batch.evi_word_cnt,batch.concat_token_ids,batch.concat_attn_mask)
File "/opt/conda/envs/FC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "Fact_Checking/Baseline/HeterFC-main/model.py", line 116, in forward
x = torch.concat(x,dim=0)
NotImplementedError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.

这里debug发现,同样是因为执行到module.py, line 116( x = torch.concat(x,dim=0))时,偶尔会有某些batch存在x这个变量为空的情况。
请问能否帮忙看看以上这些问题该怎么解决呢?或者您有什么建议吗?谢谢您!给您添麻烦了

from heterfc.

Deno-V avatar Deno-V commented on July 20, 2024

这件事情比较棘手了,你可以邮件联系我。但是总体来说我觉得可能是预处理出现了问题。
依稀记得coding的时候我考虑过玩意wordcnt出现0了怎么办,于是在预处理preprocess.py中我加了这一段代码:
for i in wordcnt:
if i==0:
print("Error!! in preprocess [wordcnt]")
不过我记得当时我从来没有见到过这个信息被输出出来,请问你在运行的时候碰到过这个error信息没?

from heterfc.

Deno-V avatar Deno-V commented on July 20, 2024

你的 word_cnt里面出现了0所以导致了报错,我刚才去检查了我的文件,检查的代码如下:

import pickle;import torch;import transformers
f = open('train_first_step_file.pkl','rb');
temp = pickle.load(f);f.close()
evi_word_cnt = temp['evi_word_cnt'];len(evi_word_cnt)
142582
evi_word_cnt[0]
[5, 9, 9, 9, 7, 8, 11, 11, 11, 9]
for i in range(len(evi_word_cnt)):
... for j in range(len(evi_word_cnt[i])):
... if evi_word_cnt[i][j]==0:
... print('Bad Case')
...

可见,我预处理之后的文件里面没有出现wordcnt=0的问题。
所以我有以下的猜测:

  1. 我上传的代码出现了问题,emmmm确实这件事情已经过去了一年多了,如果真的这样就麻烦有点大,不过实在不行我可以把我预处理之后给你用。
  2. 召回的结果不同,可能你召回的部分evidence是空的,或者只包含几个无意义的字符(例如空格)所以wordcnt真的变成0了,那就需要对召回的evidence进行过滤了,空的evidence不应该被召回的。

from heterfc.

Deno-V avatar Deno-V commented on July 20, 2024

preprocss中的第74行:
evidence_list.extend([r[0][0][1:] if len(r[0][0])>1 else ['No bias.'] for r in raw])
这一行应该是有保证evidence不为空的吧?所以我觉得很奇怪。你之前debug的信息确定告诉我了某条evidence的wordcnt变成了0,但是我也确实有做措施防止这种情况,并且在这种情况发生的时候print一个警告信息。

所以还希望你能提供更详细的内容。
比如你在执行代码的过程是否看到了print出来的警告信息"Error!! in preprocess [wordcnt]"?

以及尝试运行以下的代码
import pickle;import torch;import transformers
f = open('train_first_step_file.pkl','rb');
temp = pickle.load(f);f.close()
for i in range(len(evi_word_cnt)):
for j in range(len(evi_word_cnt[i])):
if evi_word_cnt[i][j]==0:
print('Bad Case',i,j)
来确定到底是第几个样本的第几条evidence对应的出现了wordcnt=0,这样也方便我继续debug尝试发现问题!谢谢!

from heterfc.

cuizhiling avatar cuizhiling commented on July 20, 2024

非常感谢您的耐心解答!相关细节和文件我已通过邮件发给您了,您可以有空的时候看一下,再次感谢!

from heterfc.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.