The kb2e's discuss from thunlp

Train_TransE.cpp在训练过程中存在内存泄漏的问题

您好，
在调用函数bfgs()训练时内存不断增长，感觉应该是发生了内存泄漏，可是没有找到原因。因为我用的freebase知识库比较大，所以内存占用很严重。您有什么建议吗？

TransH Wr向量梯度公式的推导过程

能给下 TransH 中 Wr 向量的梯度推导过程吗，推不出代码里的结果：
A_tmp[rel][ii]+=belta*rate*x*tmp1;
A_tmp[rel][ii]-=belta*rate*x*tmp2;
A_tmp[rel][ii]+=belta*rate*sum_x*entity_vec[e1][ii];
A_tmp[rel][ii]-=belta*rate*sum_x*entity_vec[e2][ii];
谢谢！

TransR and TransH cannot work

Hi, TransE can work with respect to FB15K, but when TransR and TransH cannot. The error is showed as:
Segmentation fault: 11

Normalize `relation_tmp[rel_neg]` in PTransE/Train_TransE_path.cpp

Is there a reason why the section

			norm(relation_tmp[rel]);
            		norm(entity_tmp[e1]);
            		norm(entity_tmp[e2]);
            		norm(entity_tmp[j]);

Skips normalization of rel_neg? This leads to huge scores in train_kb for the corrupt relation triple (comparing values like 0.1 + MARGIN > 250 and may lead to fewer gradient updates than expected for each label pair, possibly leading to underfitting.

./TransE 段错误 (核心已转储) ,请教这个是什么原因？

[New LWP 24671]
Core was generated by `./Train_TransE'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
347 vfscanf.c: 没有那个文件或目录.
(gdb) where
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
#1 0x00007fa3fe62c457 in ___vfscanf (s=, format=, argptr=argptr@entry=0x7fff7663cbc8)
at vfscanf.c:3066
#2 0x00007fa3fe6337d7 in __fscanf (stream=, format=) at fscanf.c:31
#3 0x0000000000402249 in prepare() ()
#4 0x00000000004019b2 in main ()
(gdb)

PTransE

请问有谁运行Test_TransE_path出现segmentation fault11的情况嘛？怎么解决的

TransH中没有测试代码？

TransH中，只有训练的，没有测试代码？

PTransE PCRA algorithm data files

Hi! Thanks for making this open source!
I was wondering if the data files (data/e1_e2.txt" etc) in https://github.com/thunlp/KB2E/blob/master/PTransE/PCRA.py are avaliable?
Thanks!

数据含义

在实体到id的映射文件中，假设我有这样一行数据 /m/027rn 0 我想请问 “/m/027rn”是什么意思。

Predict difference in head and tail in test_scipt?

I notice there are evaluation results on predicting head and tail (left and right?). How do you get the final results shown in KB2E Readme.md? Thanks.

WN11数据集，重新统计了一下实体数量，与论文中给出的数量有出入，请问有什么问题？

What does the file "n2n.txt" mean?

如题.
n2n.txt并没有找到相关的文档说明? 请问n2n.txt是什么文件? 如何计算得到呢?
期待你的回复, 感谢.

PTransE中测试的时候n2n.txt是怎么生成的

validate.txt作用

请问题目中文件的作用是什么？
如果将validate.txt去掉对结果会有什么样的结果？将该文件中的数据都用来训练结果会更好吗？

transE.py 运行失败

按照教程运行transE.py出现如下问题

源代码并没有做任何改动

entity有没有真实的东西而不仅仅是“乱码”？

entity2id.txt这个文件里的数据，是否有对应真实的事物？
@Mrlyk423 谢谢

数据格式

请教一个问题。
测试数据的三元组貌似都是一维的？但实际我的数据三元组是多维的，需要转换为一维？或求向量的模？

您好，请问可以分享一下关系抽取的代码吗？

如题，出于科研需要，请求分享一下关系抽取（RE）的代码，用于深刻学习，感激不尽。

关于Trans系列模型训练集的问题

你好，我抽取了FB15K/train.txt中的一个实体/m/04ztj及其所有的关系，发现关系为/people/marriage_union_type/unions_of_this_type./people/marriage/spouse（配偶）有几千条之多，关系为/people/marriage_union_type/unions_of_this_type./people/marriage/location_of_ceremony的也有几百条之多，这明显不可能，为什么训练集中的错误关系会如此之多呢？

Train_TransR.cpp BUG

#236-#237：
norm(entity_tmp[k]);
这里 k 值与实体数量有什么关系？当 train.txt 容量大于实体数量100倍时会引起段错误。

Questions about Vector Normalization in TransE

Hi,
I'm interested in your research, but I have some questions about vector normalization in Train_TransE.cpp.

Paper "Translating Embeddings for Modeling Multi-relational Data" says:

The optimization is carried out by stochastic gradient descent (in minibatch mode), over the possible
h, and t, with the additional constraints that the L2-norm of the embeddings of the entities is 1 (no regularization or norm constraints are given to the label embeddings ). This constraint is important for our model, as it is for previous embedding-based methods, because it prevents the training process to trivially minimize L by artificially increasing entity embeddings norms.

But I'm confused about two places in Train_TransE.cpp:

1:
Here is the normalization function in Train_TransE.cpp:
double norm(vector<double> &a) { double x = vec_len(a); if (x>1) for (int ii=0; ii<a.size(); ii++) a[ii]/=x; return 0; }
It seems that you will not normalize vectors whose length is smaller than 1.

2:
Relation vectors are normalized.

Can you give some hints to help me understand your algorithm?

数据集

能否提供一下WN11和FB13数据集？

transE参数设置问题

你好，TranE代码似乎并没有预设为最优的参数，直接跑的结果和transE论文有所差距，但是按照原论文Translating Embeddings for Modeling Multi-relational Data (2013) 中的参数来跑，结果竟然差得离谱，想请教一下参数如何设定才能跑出原论文以及TransR论文中的结果。以下是我再WN18上直接运行代码的结果;
18 40943
left:439.644 0.8028 425.411 0.9308
right:468.461 0.8108 455.947 0.9314
以及调整到原论文参数（method=unif，k=20,learning rate=0.01,margin=2）后的结果（这个显然不对）：
18 40943
left:1351.67 0.1482 1340.44 0.1648
right:1475.97 0.142 1466.25 0.1548

求教~关于负样本的选取问题

想问下TransE代码中负样本的选取是哪一部分呢~

论文中说训练3分钟，为什么我的得跑好几个小时。

大神们是在服务器上跑的？我训练的wn18，跑了6个小时。求解，祝好。

why no the Test_transH.cpp file

the question as upper

求教，TransH和TransR的梯度以及L1、L2距离的问题

1 关于L1和L2的选取问题
Trans系列算法中，对距离的选取对最后的结果有影响么？看了您github上分享的代码，感觉主要是侧重于用L1距离。

2 梯度下降算法中的偏导数取值的问题
参考了您github上的代码，transE的梯度求取没有问题，但是transH和transR算法的梯度求解值，我不是太明白。
具体的细节，我给您发邮件了。
非常感谢!

e1_e2.txt

请问e1_e2.txt文件是如何生成的，作用是什么。

MeanRank没有达到200+

您好，我跑了几次您transE实现的代码，试了许多参数，但是实验结果显示MeanRank都没有达到200多，最好的只有400+的情况。我们用的数据是 wordnet18的数据，下降1000轮，学习率试过[0.01, 0.001]，vector 维度试过[20, 100]，margin 试过[1, 2]。请问是参数或者程序还有要修改的地方吗？
谢谢，祝好~

difference between results of transE(paper) and transE(our)

Hi, could anyone please explain what causes the difference between transE(paper) and transE(our) in the evaluation results? Why transE(our) outperforms transE(paper) on hit@10 of FB15K that much?

entity2id & relation2id mappings

I couldn't find a description of the FB15k dataset so I'm asking this question here.

In the dataset, I saw triples such as:
"/m/08mbj32 /m/0d193h /common/annotation_category/annotations./common/webpage/topic"
(FB15k valid, line 830)

The relation seems to be 2 joined relations( /common/annotation_category/annotations & /common/webpage/topic) merged into one. Is this how 1-N-1 relations are handled? I found both parts of this specific relation in the relation2id description, but there are some other "concatenated" relations for which I can't find sub-parts.

If you could clear this up, I'd be really grateful.

Best,
Martin

Segmentation fault error

Thanks for releasing the project code and data.
When running the code i got "Segmentation fault", why is that? any idea.

你好，请问一下，e1_e2.txt文件是怎么生成的，可以简单描述一下吗？只找到这个e1_e2.txt: the top-500 entity pairs which are calculated by TransE.

关于transH中法向量的更新

尊敬的林先生：

关于法向量的更新有2个问题想要请教您：
1）train_transH的法向量Wr为什么要进行如下的更新？(line 292---296)

for (int ii=0; ii<n; ii++)
{
A_tmp[rel][ii]+=belta_rate_sum_x_entity_vec[e1][ii];
A_tmp[rel][ii]-=belta_rate_sum_x_entity_vec[e2][ii];
}
2）在您编写的代码中是如何保证Wr和r正交的？

谢谢~

求教，关于testTransE的问题

请问，最后的输出结果
cout<<"left:"<<lsum/fb_l.size()<<'\t'<<lp_n/fb_l.size()<<"\t"<<lsum_filter/fb_l.size()<<'\t'<<lp_n_filter/fb_l.size()<<endl;
cout<<"right:"<<rsum/fb_r.size()<<'\t'<<rp_n/fb_r.size()<<'\t'<<rsum_filter/fb_r.size()<<'\t'<<rp_n_filter/fb_r.size()<<endl;
怎么理解？
如何得到论文中链接预测，关系分类，元组分类以及最后的文本关系抽取的结果。
请您给出详细步骤，非常感谢！

PTransE confidence.txt 文档

您好，在PtransE add模式下的训练代码里的confidence.txt的作用是什么呢？需要怎么编辑？

call for e1_e2.txt

Hi, I am interested in your Ptranse model. In the Ptranse, the PCRA.py reads e1_e2.txt, but it is not in the data files. What should i do to get e1_e2.txt? Extract e1 and e2 from the train.txt? Could you release the confidence.txt directly to help me understand your algorithm? I am really appreciate for your reply.

How to generate e1_e2.txt

you mean e1_e2.txt as all top-500 entity pairs mentioned in the task entity prediction.
Can you explain how to generate e1_e2.txt explicitly? And what dou you mean all top-500 entity pairs

FB15K WN18数据集无法下载

你好，不知是由于墙的问题还是链接失效，Bordes在NIPS论文里使用的FB15K和WN18数据集下载不了，不知能否分享一下，万分感谢。
邮箱：[email protected]

Train_TransE_RNN.cpp

Hi,你好，请问一下代码中是不是少这个文件？

Magic Integer `1345` in PCRA.py

From PTransE/PCRA.py:

for line in f:
    seg = line.strip().split()
    relation2id[seg[0]] = int(seg[1])
    id2relation[int(seg[1])]=seg[0]
    id2relation[int(seg[1])+1345]="~"+seg[0]
    relation_num+=1
f.close()

What does 1345 in line 26 represent? Is this adding a negative label to the end of the list? Is 1345 some default number of ids?

thunlp / kb2e Goto Github PK

kb2e's Issues

Recommend Projects

Recommend Topics

Recommend Org