Code Monkey home page Code Monkey logo

kb2e's Issues

TransH Wr向量梯度公式的推导过程

能给下 TransH 中 Wr 向量的梯度推导过程吗,推不出代码里的结果:
A_tmp[rel][ii]+=belta*rate*x*tmp1;
A_tmp[rel][ii]-=belta*rate*x*tmp2;
A_tmp[rel][ii]+=belta*rate*sum_x*entity_vec[e1][ii];
A_tmp[rel][ii]-=belta*rate*sum_x*entity_vec[e2][ii];
谢谢!

TransR and TransH cannot work

Hi, TransE can work with respect to FB15K, but when TransR and TransH cannot. The error is showed as:
Segmentation fault: 11

Normalize `relation_tmp[rel_neg]` in PTransE/Train_TransE_path.cpp

Is there a reason why the section

			norm(relation_tmp[rel]);
            		norm(entity_tmp[e1]);
            		norm(entity_tmp[e2]);
            		norm(entity_tmp[j]);

Skips normalization of rel_neg? This leads to huge scores in train_kb for the corrupt relation triple (comparing values like 0.1 + MARGIN > 250 and may lead to fewer gradient updates than expected for each label pair, possibly leading to underfitting.

./TransE 段错误 (核心已转储) ,请教这个是什么原因?

[New LWP 24671]
Core was generated by `./Train_TransE'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
347 vfscanf.c: 没有那个文件或目录.
(gdb) where
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
#1 0x00007fa3fe62c457 in ___vfscanf (s=, format=, argptr=argptr@entry=0x7fff7663cbc8)
at vfscanf.c:3066
#2 0x00007fa3fe6337d7 in __fscanf (stream=, format=) at fscanf.c:31
#3 0x0000000000402249 in prepare() ()
#4 0x00000000004019b2 in main ()
(gdb)

PTransE

请问有谁运行Test_TransE_path出现segmentation fault11的情况嘛?怎么解决的

数据含义

在实体到id的映射文件中,假设我有这样一行数据 /m/027rn 0 我想请问 “/m/027rn”是什么意思。

What does the file "n2n.txt" mean?

如题.
n2n.txt并没有找到相关的文档说明? 请问n2n.txt是什么文件? 如何计算得到呢?
期待你的回复, 感谢.

validate.txt作用

请问题目中文件的作用是什么?
如果将validate.txt去掉对结果会有什么样的结果?将该文件中的数据都用来训练结果会更好吗?

数据格式

请教一个问题。
测试数据的三元组貌似都是一维的?但实际我的数据三元组是多维的,需要转换为一维?或求向量的模?

关于Trans系列模型训练集的问题

你好,我抽取了FB15K/train.txt中的一个实体/m/04ztj及其所有的关系,发现关系为/people/marriage_union_type/unions_of_this_type./people/marriage/spouse(配偶)有几千条之多,关系为/people/marriage_union_type/unions_of_this_type./people/marriage/location_of_ceremony的也有几百条之多,这明显不可能,为什么训练集中的错误关系会如此之多呢?

Train_TransR.cpp BUG

#236-#237:
norm(entity_tmp[k]);
这里 k 值与实体数量有什么关系?当 train.txt 容量大于实体数量100倍时会引起段错误。

Questions about Vector Normalization in TransE

Hi,
I'm interested in your research, but I have some questions about vector normalization in Train_TransE.cpp.

Paper "Translating Embeddings for Modeling Multi-relational Data" says:

The optimization is carried out by stochastic gradient descent (in minibatch mode), over the possible
h, and t, with the additional constraints that the L2-norm of the embeddings of the entities is 1 (no regularization or norm constraints are given to the label embeddings ). This constraint is important for our model, as it is for previous embedding-based methods, because it prevents the training process to trivially minimize L by artificially increasing entity embeddings norms.

But I'm confused about two places in Train_TransE.cpp:

1:
Here is the normalization function in Train_TransE.cpp:
double norm(vector<double> &a) { double x = vec_len(a); if (x>1) for (int ii=0; ii<a.size(); ii++) a[ii]/=x; return 0; }
It seems that you will not normalize vectors whose length is smaller than 1.

2:
Relation vectors are normalized.

Can you give some hints to help me understand your algorithm?

数据集

能否提供一下WN11和FB13数据集?

transE参数设置问题

你好,TranE代码似乎并没有预设为最优的参数,直接跑的结果和transE论文有所差距,但是按照原论文Translating Embeddings for Modeling Multi-relational Data (2013) 中的参数来跑,结果竟然差得离谱,想请教一下参数如何设定才能跑出原论文以及TransR论文中的结果。以下是我再WN18上直接运行代码的结果;
18 40943
left:439.644 0.8028 425.411 0.9308
right:468.461 0.8108 455.947 0.9314
以及调整到原论文参数(method=unif,k=20,learning rate=0.01,margin=2)后的结果(这个显然不对):
18 40943
left:1351.67 0.1482 1340.44 0.1648
right:1475.97 0.142 1466.25 0.1548

求教,TransH和TransR的梯度以及L1、L2距离的问题

1 关于L1和L2的选取问题
Trans系列算法中,对距离的选取对最后的结果有影响么?看了您github上分享的代码,感觉主要是侧重于用L1距离。

2 梯度下降算法中的偏导数取值的问题
参考了您github上的代码,transE的梯度求取没有问题,但是transH和transR算法的梯度求解值,我不是太明白。
具体的细节,我给您发邮件了。
非常感谢!

e1_e2.txt

请问e1_e2.txt文件是如何生成的,作用是什么。

MeanRank没有达到200+

您好,我跑了几次您transE实现的代码,试了许多参数,但是实验结果显示MeanRank都没有达到200多,最好的只有400+的情况。我们用的数据是 wordnet18的数据,下降1000轮,学习率试过[0.01, 0.001],vector 维度试过[20, 100],margin 试过[1, 2]。请问是参数或者程序还有要修改的地方吗?
谢谢,祝好~

entity2id & relation2id mappings

I couldn't find a description of the FB15k dataset so I'm asking this question here.

In the dataset, I saw triples such as:
"/m/08mbj32 /m/0d193h /common/annotation_category/annotations./common/webpage/topic"
(FB15k valid, line 830)

The relation seems to be 2 joined relations( /common/annotation_category/annotations & /common/webpage/topic) merged into one. Is this how 1-N-1 relations are handled? I found both parts of this specific relation in the relation2id description, but there are some other "concatenated" relations for which I can't find sub-parts.

If you could clear this up, I'd be really grateful.

Best,
Martin

Segmentation fault error

Thanks for releasing the project code and data.
When running the code i got "Segmentation fault", why is that? any idea.

关于transH中法向量的更新

尊敬的林先生:

关于法向量的更新有2个问题想要请教您:
1)train_transH的法向量Wr为什么要进行如下的更新?(line 292---296)

for (int ii=0; ii<n; ii++)
{
A_tmp[rel][ii]+=belta_rate_sum_x_entity_vec[e1][ii];
A_tmp[rel][ii]-=belta_rate_sum_x_entity_vec[e2][ii];
}
2)在您编写的代码中是如何保证Wr和r正交的?

谢谢~

求教,关于testTransE的问题

请问,最后的输出结果
cout<<"left:"<<lsum/fb_l.size()<<'\t'<<lp_n/fb_l.size()<<"\t"<<lsum_filter/fb_l.size()<<'\t'<<lp_n_filter/fb_l.size()<<endl;
cout<<"right:"<<rsum/fb_r.size()<<'\t'<<rp_n/fb_r.size()<<'\t'<<rsum_filter/fb_r.size()<<'\t'<<rp_n_filter/fb_r.size()<<endl;
怎么理解?
如何得到论文中链接预测,关系分类,元组分类以及最后的文本关系抽取的结果。
请您给出详细步骤,非常感谢!

call for e1_e2.txt

Hi, I am interested in your Ptranse model. In the Ptranse, the PCRA.py reads e1_e2.txt, but it is not in the data files. What should i do to get e1_e2.txt? Extract e1 and e2 from the train.txt? Could you release the confidence.txt directly to help me understand your algorithm? I am really appreciate for your reply.

How to generate e1_e2.txt

you mean e1_e2.txt as all top-500 entity pairs mentioned in the task entity prediction.
Can you explain how to generate e1_e2.txt explicitly? And what dou you mean all top-500 entity pairs

Magic Integer `1345` in PCRA.py

From PTransE/PCRA.py:

for line in f:
    seg = line.strip().split()
    relation2id[seg[0]] = int(seg[1])
    id2relation[int(seg[1])]=seg[0]
    id2relation[int(seg[1])+1345]="~"+seg[0]
    relation_num+=1
f.close()

What does 1345 in line 26 represent? Is this adding a negative label to the end of the list? Is 1345 some default number of ids?

Test_TransE

double tmp = calc_sum(h,l,rel);
这一行计算出来的距离值后面没有使用,是不是有其他的用途?

请问能否对代码提供注释

您好,因为每个人的代码能力不同,请问能否对代码添加注释,方便更多的人理解和改进算法,非常感谢。

Train_TransE中bfgs()有点问题

bfgs()中保存实体和关系词向量的代码应该在epoch循环之外吧?

目前这样重复写,并没有保存中间结果,反而延长了训练时间

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.