zjunlp / hvpnet Goto Github PK

[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

License: MIT License

Python 98.48% Shell 1.52%

entity-extraction relation-extraction multimodal kg prefix hvpnet ner re naacl dataset

hvpnet's People

Contributors

Stargazers

Watchers

Forkers

zxlzr jinfish techthiyanes thechuong98 fcbleomessi10 zkayell chz367 hyusheng qingxiangno1 jesse3692

hvpnet's Issues

RuntimeError: mat1 dim 1 must match mat2 dim 0

I have an issue when training,

RuntimeError                              Traceback (most recent call last)
[<ipython-input-51-25f39c1f99c1>](https://localhost:8080/#) in <module>
    201 
    202 if __name__ == "__main__":
--> 203     main()

10 frames
[<ipython-input-51-25f39c1f99c1>](https://localhost:8080/#) in main()
    187     if args.do_train:
    188         # train
--> 189         trainer.train()
    190         # test best model
    191         args.load_path = os.path.join(args.save_path, 'best_model.pth')

[<ipython-input-50-1e697a6369dc>](https://localhost:8080/#) in train(self)
    278                     self.step += 1
    279                     batch = (tup.to(self.args.device)  if isinstance(tup, torch.Tensor) else tup for tup in batch)
--> 280                     attention_mask, labels, logits, loss = self._step(batch, mode="train")
    281                     avg_loss += loss.detach().cpu().item()
    282 

[<ipython-input-50-1e697a6369dc>](https://localhost:8080/#) in _step(self, batch, mode)
    459             images, aux_imgs = None, None
    460             input_ids, token_type_ids, attention_mask, labels = batch
--> 461         output = self.model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, labels=labels, images=images, aux_imgs=aux_imgs)
    462         logits, loss = output.logits, output.loss
    463         return attention_mask, labels, logits, loss

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

[<ipython-input-48-116255bafcd3>](https://localhost:8080/#) in forward(self, input_ids, attention_mask, token_type_ids, labels, images, aux_imgs)
    191     def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, labels=None, images=None, aux_imgs=None):
    192         if self.args.use_prompt:
--> 193             prompt_guids = self.get_visual_prompt(images, aux_imgs)
    194             prompt_guids_length = prompt_guids[0][0].shape[2]
    195             # attention_mask: bsz, seq_len

[<ipython-input-48-116255bafcd3>](https://localhost:8080/#) in get_visual_prompt(self, images, aux_imgs)
    227         aux_prompt_guids = [torch.cat(aux_prompt_guid, dim=1).view(bsz, self.args.prompt_len, -1) for aux_prompt_guid in aux_prompt_guids]  # 3 x [bsz, 4, 3840]
    228 
--> 229         prompt_guids = self.encoder_conv(prompt_guids)  # bsz, 4, 4*2*768
    230         aux_prompt_guids = [self.encoder_conv(aux_prompt_guid) for aux_prompt_guid in aux_prompt_guids] # 3 x [bsz, 4, 4*2*768]
    231         split_prompt_guids = prompt_guids.split(768*2, dim=-1)   # 4 x [bsz, 4, 768*2]

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py](https://localhost:8080/#) in forward(self, input)
    115     def forward(self, input):
    116         for module in self:
--> 117             input = module(input)
    118         return input
    119 

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py](https://localhost:8080/#) in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

[/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in linear(input, weight, bias)
   1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
-> 1692         output = input.matmul(weight.t())
   1693         if bias is not None:
   1694             output += bias

RuntimeError: mat1 dim 1 must match mat2 dim 0

Full error is here. what's wrong in here? I just run this repository except any changes.

关系数据集中object image的获取方案？

作者，您好！

作者提供的多模态关系数据集中，关于object image存在两个文件夹img_detect和image_vg，模型使用image_vg中的数据时效果更好。
img_vg和img_detect获取时方法上的区别是什么？为什么img_vg中的图片比img_detect的图片分辨率大很多，似乎超过原图？

非常感谢您提前提供帮助。 @

What are the details of using the visual grouding tool?

Hi, I noticed in your README that you would first extract the nouns in the text and then perform visual grounding.

My understanding is to use the nouns in the text as a query and get the corresponding image area by visual grounding.

However, I have some questions about this: (1) if there is more than one noun, is each noun a separate query or all the nouns are one query; (2) in the relational extraction, I found that there are some data with the same text and the same image, only the head entity or the tail entity is different, but the image region is different for these data, according to what you have done in the README, these data have the same text and image, it should be able to get the same image area.

Anyway, I would like to know the exact details of the visual grounding you performed to solve my two doubts above, thanks a lot, your work is great.

Question about Twitter-2015

Hello,

I ran your code on the Twitter-2015 dataset and achieved an F1-score of 75.30, which closely aligns with the results reported in your paper. However, I later discovered that some labels in your dataset had been modified, and you mentioned that the paper's results are based on the original version. In an attempt to replicate the reported results, I replaced the 'train.txt,' 'valid.txt,' and 'test.txt' files with the original text files. This time, I only managed to achieve an F1-score of 73.27, which is clearly not an ideal outcome.

I've made several attempts to tune the parameters, but I'm still unable to match the results presented in the paper. Could you kindly provide me with some guidance or hints to help me reproduce your reported results?

Thank you very much.

关于MRE任务中F1计算的问题

在您文章的数据中，MNER任务的F1计算是2pr/(p+r)，但是MRE却不是，请问MRE的F1是有不同的计算方式吗？

How to test the RE models

Hi,

I want to test the RE model which you provided. However, I cannot find the correct command to run testing. I read the run.py and it seems that the file cannot run testing even if the --do_test is activated. Could you provide instructions for testing?

Question about the "Segmentation fault (core dumped)" error

Request for the object image data

Thanks for your work! I would like to re-train your model, and could you please upload your originally obtained object files (such as img_vg)? Thank you!

Twitter15_ckpt, Twitter17_ckpt,re_ckpt需要访问权限

您好，Twitter15_ckpt, Twitter17_ckpt,re_ckpt显示谷歌云盘需要访问权限，请问有别的渠道下载吗

No error

It‘s OK

Question about get_visual_prompt

Great works, but I hava some questions.

prompt_guids is 4 * [bsz, 256, 2, 2]. Shouldn't the dimension of torch.cat(prompt_guids, dim=1) is [bsz, 1024, 2, 2], but it's [bsz, 3840, 2, 2] in fact.

prompt_guids, aux_prompt_guids = self.image_model(images, aux_imgs)  # [bsz, 256, 2, 2], [bsz, 512, 2, 2]....
prompt_guids = torch.cat(prompt_guids, dim=1).view(bsz, self.args.prompt_len, -1)   # bsz, 4, 3840

What does this code do? Is it just calculating the k,v values of superimposing four resnet blocks onto that self-attention layer?

for i in range(4):
     key_val = key_val + torch.einsum('bg,blh->blh', prompt_gate[:, i].view(-1, 1), split_prompt_guids[i])

Why choose 64?
key, value = key_val[0].reshape(bsz, 12, -1, 64).contiguous(), key_val[1].reshape(bsz, 12, -1, 64).contiguous() # bsz, 12, 4, 64
If I want to switch to roberta plm, where I need to modiy?
I met

IndexError: index out of range in self

File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\transformers\models\roberta\modeling_roberta.py", line 846, in forward
past_key_values_length=past_key_values_length,
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\transformers\models\roberta\modeling_roberta.py", line 132, in forward
token_type_embeddings = self.token_type_embeddings(token_type_ids)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\sparse.py", line 160, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\functional.py", line 2044, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

in bert_output = self.bert(input_ids=input_ids,

                        attention_mask=prompt_attention_mask,
                        token_type_ids=token_type_ids,
                        past_key_values=prompt_guids,
                        return_dict=True)

I guess the num_embeddings of bert is 30522, but roberta is 50265

data preprocess

Hello, could I inquire about how you handle data preprocessing specifically? Is it possible to make the relevant code public?

Twitter15数据集修改部分实体类别

你好，对比论文中使用的twitter15数据集和其它论文（UMT、MAF...）所使用的数据集存在部分实体类别不一致情况，其中test.txt存在较多修改实体。在数据不一致情况下与其它模型实验结果进行比较是否合适，修改实体类别的意义是什么？

question

Hello, I did the training and testing exactly as README did, but it didn't matter what I predicted. The tensor output was also Nan. According to the debugging, it is Bert. In PY, "Prompt = [] for name, layer in self.resnet.named () : if name = = 'FC' or name = = 'avgpool' : continue x = Layer (x) # (BSZ, 256,56,56)" causes X to all be 0. Do not know how to change to normal training and testing, request guidance. Here are the predictions I printed out after my test. The two white lines above are the correct answers, and below are the predictions. You can see that they are all zeros, corresponding to the dictionary“No relationship between entities.”.

RE数据集下载太慢

您好，用该命令wget 120.27.214.45/Data/re/multimodal/data.tar.gz下载数据很慢，请问是否可以把数据上传到谷歌云盘供下载

关于论文中对比实验的疑问？

您好，我想问一下，关于论文中对比实验的MEGA，该文章《Multimodal Relation Extraction with Efficient Graph Alignment》的原文只有多模态关系抽取的实验结果，没有关于MNER的实验数据结果，而且原项目也没有提供关于MNER的数据集处理之后的数据结果（比如imgSG，rel_1等数据），请问您是如何修改代码得到结果的呢？可以提供一下MNER数据集处理之后的结果（比如imgSG，rel_1等数据）吗？

Not putting parameters into optimizer but still trainable

In the modules/train.py file, lines 484 and 492 both add parameters to the optimizer, but the parameters for crf and fc are not added to the optimizer, but they can still be trained. As far as I know, if the model parameters are not added to the optimizer it cannot be trained, do you know why?

关于使用 visual grouding 工具提取局部视觉图像的代码实现，能提供一下吗

邮箱[email protected]

十分感谢