The knowprompt from zjunlp

experiment result

According to your introduction, I get experiment result on semeval dataset by bash scripts/semeval.sh . but the loss is not dropping during training. where is wrong? I don't change anywhere. @njcx-ai

效果复现

使用本项目复现结果过程中，具体是用roberta-large复现semeveal数据集在8-shot下的结果，Eval/best_f1=0.149，与论文中的相差甚远，且在main.py的line 214（if not args.two_steps: trainer.test()）这里提示No 'test_dataloader()' method defined to run 'Trainer.test'，请问这个是因为公开项目不完整导致的吗？

.ckpt FileNotFoundError

尝试了几个数据集，都是在最后两个epoch时报错：FileNotFoundError: [Errno 2] No such file or directory: '/home/code/KnowPrompt/output/epoch=2-Eval/f1=0.906.ckpt'，求指点

求问作者实验需求显存大小

看到论文implement details作者实验中使用了8 3090 GPUs

compute micro F1 in code and replicate the results in paper

The code calculates the F1 score without considering "no_relation." Is there any background information regarding this calculation method?

Furthermore, running the experiments with the parameters from the paper on TACRED/V results in a score of 0. This could be because there are too many "no_relation" instances in these datasets. How can the results from the paper be replicated? (Perhaps the authors altered the class distribution in the training set or considered "no_relation" when calculating F1 scores.)

Thanks！

运行dialogue数据集bug

你好，我在dialogue数据集上面运行的时候运行python generate_k_shot.py --data_dir ./dataset --k 8 --dataset dialogue这句代码，报下面错误，请问是什莫原因吗，期待你的回复，谢谢！

运行报错

请问运行>> bash scripts/retacred.sh的时候报错：
Traceback (most recent call last):
File "main.py", line 244, in
main()
File "main.py", line 128, in main
parser = _setup_parser()
File "main.py", line 55, in _setup_parser
litmodel_class = _import_class(f"lit_models.{temp_args.litmodel_class}")
File "main.py", line 27, in import_class
class = getattr(module, class_name)
AttributeError: module 'lit_models' has no attribute 'TransformerLitModel'
scripts/retacred.sh: line 2: --model_name_or_path: command not found
scripts/retacred.sh: line 3: --accumulate_grad_batches: command not found
scripts/retacred.sh: line 4: --batch_size: command not found
scripts/retacred.sh: line 5: --data_dir: command not found
scripts/retacred.sh: line 6: --check_val_every_n_epoch: command not found
scripts/retacred.sh: line 7: --data_class: command not found
scripts/retacred.sh: line 8: --max_seq_length: command not found
scripts/retacred.sh: line 9: --model_class: command not found
scripts/retacred.sh: line 10: --t_lambda: command not found
scripts/retacred.sh: line 11: --wandb: command not found
scripts/retacred.sh: line 12: --litmodel_class: command not found
scripts/retacred.sh: line 13: --task_name: command not found
scripts/retacred.sh: line 14: --lr: command not found
这是什么原因？

pytorch_lightning.utilities.exceptions.MisconfigurationException: No `test_dataloader()` method defined to run `Trainer.test`.

The training works but it throws the exception.
Environment pytorch_lightning ==1.5.0

configure_optimizers def

Hi, many thanks for providing the code. I have a question why do you have two configure_optimizers() in the Base lightning module?
another question is that I know that the configure_optimizers function in the BertLitModel is for two-stage training but we do not need to check the optimizer_grouped_parameters in the training_step() and then optimize them?

Initialization of Type Words

I had a question regarding the initialization of type words. According to the code:
if self.args.init_type_words:
so_word = [a[0] for a in self.tokenizer(["[obj]","[sub]"], add_special_tokens=False)['input_ids']]
meaning_word = [a[0] for a in self.tokenizer(["person","organization", "location", "date", "country"], add_special_tokens=False)['input_ids']]

The meaning words are initialized with certain entity types. While these are the probable entity types for the TACRED dataset, the same is not true for the SemEval dataset.

I wanted to know how this initialization affects the working of the algorithm on other datasets like SemEval. Should we change this initialization based on the dataset?

ValueError: too many values to unpack (expected 4)

你好！
我使用自己标注的数据集，格式处理为这样
{'token': ['地', '面', '状', '况', '不', '良', '导', '致', '位', '置', '偏', '移', '。'], 'h': {'name': '位置偏移', 'pos': [8, 12]}, 't': {'name': '地面状况不良', 'pos': [0, 6]}, 'relation': '因果关系'}
（不知道格式这样是否正确。。
然后运行后，得到这个报错。

File "D:\re\knowprompt2\KnowPrompt\lit_models\transformer.py", line 210, in validation_step
input_ids, attention_mask, labels, _ = batch
ValueError: too many values to unpack (expected 4)

代码没有改动。我不知道是我格式的问题还是其他问题。
盼回复！谢谢！

code

sorry to borther you,I have seen your paper and code, but I couldn't see any prompt method,I just see the relation between two entities have the different #perfermence from trditional relation perfermence.Is there any unseen code published?

直接使用label信息，是否相当于泄露给模型信息 (已回答)

what is the difference between virtual type words and soft prompt

It seems that the virtual type words and soft prompt are the same thing, except that virtual type words are initialized according to the prior probability distribution.
May I know whether my understanding is correct?

some questions about the code

I run get_label_word.py get the PT file of semeval dataset, and then run script/semeval.sh, but I find that loss is not dropping during training, and f1 score is relatively low(0.14). What's wrong with it? Waiting for your reply, thanks.

a bug?

FileNotFoundError: [Errno 2] No such file or directory: './dataset/roberta-large_semeval.pt'
how can i get this pt

About the calculation of \phi(r)

hi, congratulations on your great work!
I have a question after reading the paper (though I have not read the code yet). How do you calculate the value of \phi(r)? It seems that you don't explain that in the paper. Thank you.

下载模型到本地后代码仍然报错：在main.py中更改了地址，仍然报错

276149990-91f70bc9-670f-431a-98cc-5d1266593c43

276150275-67a6c420-d316-4b8b-a2ea-7e3f24b1ef42

276142022-25673a7b-7c57-4a04-8785-837b04b60df4

An academic issues on "How to estimate the entity type distributions with relation class is not known"

According to your paper: you estimate the prior distributions over the candidate set C_sub and C_obj of potential entity types, according to a certain relation class, where the prior distributions are estimated by frequency statistics. But, how do you estimate the prior distributions with an unknown relation in the instance , just like “the chicken or the egg?”.

For example, the relation “per:country_of_birth” indicates the subject entity belongs to “person” and the object entity belongs to “country”. The prior distributions for C_sub can be counted as {"person":1} , but we should know this instance contains the relation "per:country_of_birth" in advance, then we can estimate the prior distributions of the candidate set.

关于two-stage中的第一个阶段，论文中的描述是否和代码不一致

大佬好！我有以下疑问：
论文 4.3 节说第一阶段训练是优化那几个虚拟类型词和答案词的 embedding，但是我看到代码（如下）transformer.py 里，else 里是用整个 embedding 来当做优化的参数，请问这是怎么回事呢？

 def configure_optimizers(self):
        no_decay_param = ["bias", "LayerNorm.weight"]

        if not self.args.two_steps: 
            parameters = self.model.named_parameters()
        else:
            # model.bert.embeddings.weight
            parameters = [next(self.model.named_parameters())]

Missing weighted average function for virtual answer word

Hi. Thanks for your great work.

The paper mentioned weighted average function on page 4, indicating embeddings of virtual words should be initialized with respect to the probability distribution. However, your code shows only a mean operation was performed. Is that a bug or does it just shows a negligible difference so that we could ignore it?

Moreover, I am a little confused about the probability distribution. Is it still based on prior distributions discussed in Entity Knowledge Injection in Part 4.1?

Thanks in advance for your patience.

KnowPrompt/lit_models/transformer.py

Line 167 in 8734c20

    
           word_embeddings.weight[continous_label_word[i]] = torch.mean(word_embeddings.weight[idx], dim=0)

python3.8 version is not compatible with pytorch1.8.1+cu111

when i run pip install -r requirments, show me this:

actually, this was only the first problem I had when I ran this codebase, and there were often configuration dependencies file errors during the run, and I didn't complete this experimental verification in the end

遇到问题求助

您好，我想问一下您几个问题：
1： virtual template words 指的是构造的模板吗？也就是图一中的
Hamilton [MASK] British [SEP]
2: answer words 指的是 [MASK] 预测的的关系词吗？图一中的 country,city,residence
3: 公式5中的 h,t,r 分别指的是什么？
4：文中多次提到 estimate the probability distributions ，这个值是怎么算的？

非常感谢

How relation embedding was constructed in the algorithm?

I want to know how the relation embedding was constructed? I learned from this repo where relation embedding was built from the bert 【mask】? But this is the same to type word embedding. Am I right?

有关virtual type words的embedding的初始化问题

您好，有以下几个问题想要请教您：

论文和代码中的virtual type words初始化方式不一致问题
在论文中[sub]和[obj]的初始化是通过embedding乘以概率𝜙_sub和𝜙_obj得到的，但在最新提交的transformer.py的第174-175行代码中似乎是直接取了均值，相当于每个实体类型的𝜙_sub和𝜙_obj都是一样的。想问一下是我对文章或者代码的理解有问题吗？
two_stage设置问题
请问在代码中实现two_stage是需要设置参数--two_steps为True吗？我在复现过程中，将--two_steps设置为True之后，测试集的结果从70多降到了20多，因此有点疑惑。

谢谢您！

跑代码报错：ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

而且已经把reberta-large模型下载到本地，也更改了main.py中的from_pretrain()的地址，还是报错

关于 KE loss 即论文中 "Implicit Structured Constraints" 的一些疑问

大佬好！首先对你们的卓越工作以及开源精神表示敬意！

起因：我在尝试重新运行项目中的代码的时候，对 KE loss 的有效性有一些疑问，虽然论文中你们提到“In addition, there exists rich semantic knowledge among relation labels and structural knowledge implications among relational triples, which cannot be ignored.”，我在这里将 ke_loss 理解为文中所说的 structural knowledge，但是直接使用 $(s + r - o)$ 的方式，感觉是有点简单粗暴了哈~ 而且从实验的日志来看，ke_loss 从头到尾并没有得到优化。

KnowPrompt/lit_models/transformer.py

Lines 292 to 293 in 9159e4b

    
           d_1 = torch.norm(subject_embedding + mask_relation_embedding - object_embedding, p=2) / bsz 
        
           d_2 = torch.norm(neg_subject_embedding + real_relation_embedding - neg_object_embedding, p=2) / bsz

（顺便一提，你们公开的代码里面这部分的 log 写错了，两个输出都是 loss）

KnowPrompt/lit_models/transformer.py

Lines 196 to 197 in 9159e4b

    
           self.log("Train/loss", loss) 
        
           self.log("Train/ke_loss", loss)

如果调整到正常的输出之后就会发现 ke_loss 一直在 20 左右。这也印证了我前面的想法，毕竟 $s$、$r$、$o$ 都是直接从模型直接输出的，没有经过额外的全连接之类的重新映射，直接取距离的二范数可能不太好训练。

总而言之，我的疑问如下，如果能够抽空解答一二，十分感谢~ @njcx-ai (应该是作者吧~😘)

在设计 KE loss 的时候是怎样考虑的？当时在实验的时候，KE loss 的表现又是如何？
负样本的选取中，是在 max_token_length 长度上选取的，必定也会包含 prompt 部分和句子后面 padding 部分的 token，这个当时有没有考虑呢？
负样本的计算为什么会选用 real_relation_embedding，而不是选择模型的输出呢？

KnowPrompt/lit_models/transformer.py

Lines 262 to 297 in 9159e4b

    
           def ke_loss(self, logits, labels, so, input_ids): 
        
               subject_embedding = [] 
        
               object_embedding = [] 
        
               neg_subject_embedding = [] 
        
               neg_object_embedding = [] 
        
               bsz = logits.shape[0] 
        
               for i in range(bsz): 
        
                   subject_embedding.append(torch.mean(logits[i, so[i][0]:so[i][1]], dim=0)) 
        
                   object_embedding.append(torch.mean(logits[i, so[i][2]:so[i][3]], dim=0)) 
        
                   # random select the neg samples 
        
                   st_sub = random.randint(1, logits[i].shape[0] - 6) 
        
                   span_sub = random.randint(1, 5) 
        
                   st_obj = random.randint(1, logits[i].shape[0] - 6) 
        
                   span_obj = random.randint(1, 5) 
        
                   neg_subject_embedding.append(torch.mean(logits[i, st_sub:st_sub+span_sub], dim=0)) 
        
                   neg_object_embedding.append(torch.mean(logits[i, st_obj:st_obj+span_obj], dim=0)) 
        
               subject_embedding = torch.stack(subject_embedding) 
        
               object_embedding = torch.stack(object_embedding) 
        
               neg_subject_embedding = torch.stack(neg_subject_embedding) 
        
               neg_object_embedding = torch.stack(neg_object_embedding) 
        
               # trick , the relation ids is concated,  
        
               _, mask_idx = (input_ids == self.tokenizer.mask_token_id).nonzero(as_tuple=True) 
        
               mask_output = logits[torch.arange(bsz), mask_idx] 
        
               mask_relation_embedding = mask_output 
        
               real_relation_embedding = self.model.get_output_embeddings().weight[labels+self.label_st_id] 
        
               d_1 = torch.norm(subject_embedding + mask_relation_embedding - object_embedding, p=2) / bsz 
        
               d_2 = torch.norm(neg_subject_embedding + real_relation_embedding - neg_object_embedding, p=2) / bsz 
        
               f = torch.nn.LogSigmoid() 
        
               loss = -1.*f(self.args.t_gamma - d_1) - f(d_2 - self.args.t_gamma) 
        
               return loss

How to initialize the learnable relation embedding?

在阅读本篇优秀论文的一些疑问：

在Relation Knowledge Injection这一小节提到要将关系的语义知识注入到关系的初始化向量中，请问是通过对某一关系中的单词频率来对embedding进行加权的吗？例如有关系 y = per:countries_of_residence，那么其候选集单词为{"person", "country", "residence“}，其概率分布为{“1/3”，“1/3”，“1/3”}，所以关系y的初始化是 y_initialized_embedding = 1/3 person_embedding + 1/3 country_embedding + 1/3 residence_embedding吗？
同样是这一小节中假设在一个已知 PLM 的 vocabulary 中存在一个隐式的虚拟 answer word 来表示关系标签（例如问题1中的y），请问该虚拟的word embedding如何计算得到？和问题1中的关系y的初始化是什么关系呢？
请问 Figure 2 (b) 中的[MASK]处得到的是[MASK]在虚拟的answer words V'的概率分布吗？其结果和relation embedding head的输出进行叉乘表示的是什么呢？（relation embedding head 表示的是问题1中的初始化关系吗？）

期待作者们的回复！非常感谢！

Hi! How to run this code with BART model? Would you give an example of parameters setting? Thank you so much!!

paper中虚拟的type word是每个relation都一样吗？

我看了一下代码好像[sub]和[obj]就只是个token，即对每个关系来说，这个type word embedding是一样的。为什么在paper表6里面不同的句子[sub]和[obj]周围的word会有差别，这个type word embedding在inference的时候会变吗？
（我认为的流程：训练得到每个relation的embedding+[sub]和[obj]的embedding后，inference时按照template把[sub]和[obj]、[MASK]插入，预测[MASK]，和relation embedding求相似度。）

chinese fields

I want to change English roberta model to chinese roberta，the data processed module showed that use_bert is ture,so the dataset return batch that is analysed by 5 variables to express, I don't find any module to accept the batch

关于预训练语言模型选择的问题

请问一下作者，如果我想选择bert作为预训练模型，应该修改哪些参数？

	d_1 = torch.norm(subject_embedding + mask_relation_embedding - object_embedding, p=2) / bsz
	d_2 = torch.norm(neg_subject_embedding + real_relation_embedding - neg_object_embedding, p=2) / bsz

	def ke_loss(self, logits, labels, so, input_ids):
	subject_embedding = []
	object_embedding = []
	neg_subject_embedding = []
	neg_object_embedding = []
	bsz = logits.shape[0]
	for i in range(bsz):
	subject_embedding.append(torch.mean(logits[i, so[i][0]:so[i][1]], dim=0))
	object_embedding.append(torch.mean(logits[i, so[i][2]:so[i][3]], dim=0))

	# random select the neg samples
	st_sub = random.randint(1, logits[i].shape[0] - 6)
	span_sub = random.randint(1, 5)
	st_obj = random.randint(1, logits[i].shape[0] - 6)
	span_obj = random.randint(1, 5)
	neg_subject_embedding.append(torch.mean(logits[i, st_sub:st_sub+span_sub], dim=0))
	neg_object_embedding.append(torch.mean(logits[i, st_obj:st_obj+span_obj], dim=0))

	subject_embedding = torch.stack(subject_embedding)
	object_embedding = torch.stack(object_embedding)
	neg_subject_embedding = torch.stack(neg_subject_embedding)
	neg_object_embedding = torch.stack(neg_object_embedding)
	# trick , the relation ids is concated,


	_, mask_idx = (input_ids == self.tokenizer.mask_token_id).nonzero(as_tuple=True)
	mask_output = logits[torch.arange(bsz), mask_idx]
	mask_relation_embedding = mask_output
	real_relation_embedding = self.model.get_output_embeddings().weight[labels+self.label_st_id]

	d_1 = torch.norm(subject_embedding + mask_relation_embedding - object_embedding, p=2) / bsz
	d_2 = torch.norm(neg_subject_embedding + real_relation_embedding - neg_object_embedding, p=2) / bsz
	f = torch.nn.LogSigmoid()
	loss = -1.*f(self.args.t_gamma - d_1) - f(d_2 - self.args.t_gamma)

	return loss

zjunlp / knowprompt Goto Github PK

knowprompt's People

Contributors

Stargazers

Watchers

Forkers

knowprompt's Issues

Recommend Projects

Recommend Topics

Recommend Org