huangtinglin / knowledge_graph_based_intent_network Goto Github PK

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

Python 100.00%

www2021 recommendation-system knowledge-graph graph-neural-network knowledge-graph-for-recommendation information-retrieval pytorch explainable-recommendation

knowledge_graph_based_intent_network's Introduction

Learning Intents behind Interactions with Knowledge Graph for Recommendation

This is our PyTorch implementation for the paper:

Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu, Xiangnan He and Tat-Seng Chua (2021). Learning Intents behind Interactions with Knowledge Graph for Recommendation. Paper in arXiv. In WWW'2021, Ljubljana, Slovenia, April 19-23, 2021.

Introduction

Knowledge Graph-based Intent Network (KGIN) is a recommendation framework, which consists of three components: (1)user Intent modeling, (2)relational path-aware aggregation, (3)indepedence modeling.

Citation

If you want to use our codes and datasets in your research, please cite:

@inproceedings{KGIN2020,
  author    = {Xiang Wang and
              Tinglin Huang and 
              Dingxian Wang and
              Yancheng Yuan and
              Zhenguang Liu and
              Xiangnan He and
              Tat{-}Seng Chua},
  title     = {Learning Intents behind Interactions with Knowledge Graph for Recommendation},
  booktitle = {{WWW}},
  pages     = {878-887},
  year      = {2021}
}

Environment Requirement

The code has been tested running under Python 3.6.5. The required packages are as follows:

pytorch == 1.5.0
numpy == 1.15.4
scipy == 1.1.0
sklearn == 0.20.0
torch_scatter == 2.0.5
networkx == 2.5

Reproducibility & Example to Run the Codes

To demonstrate the reproducibility of the best performance reported in our paper and faciliate researchers to track whether the model status is consistent with ours, we provide the best parameter settings (might be different for the custormized datasets) in the scripts, and provide the log for our trainings.

The instruction of commands has been clearly stated in the codes (see the parser function in utils/parser.py).

Last-fm dataset

python main.py --dataset last-fm --dim 64 --lr 0.0001 --sim_regularity 0.0001 --batch_size 1024 --node_dropout True --node_dropout_rate 0.5 --mess_dropout True --mess_dropout_rate 0.1 --gpu_id 0 --context_hops 3

Amazon-book dataset

python main.py --dataset amazon-book --dim 64 --lr 0.0001 --sim_regularity 0.00001 --batch_size 1024 --node_dropout True --node_dropout_rate 0.5 --mess_dropout True --mess_dropout_rate 0.1 --gpu_id 0 --context_hops 3

Alibaba-iFashion dataset

python main.py --dataset alibaba-fashion --dim 64 --lr 0.0001 --sim_regularity 0.0001 --batch_size 1024 --node_dropout True --node_dropout_rate 0.5 --mess_dropout True --mess_dropout_rate 0.1 --gpu_id 0 --context_hops 3

Important argument:

sim_regularity
- It indicates the weight to control the independence loss.
- 1e-4(by default), which uses 0.0001 to control the strengths of correlation.

Dataset

We provide three processed datasets: Amazon-book, Last-FM, and Alibaba-iFashion.

You can find the full version of recommendation datasets via Amazon-book, Last-FM, and Alibaba-iFashion.
We follow KB4Rec to preprocess Amazon-book and Last-FM datasets, mapping items into Freebase entities via title matching if there is a mapping available.

		Amazon-book	Last-FM	Alibaba-ifashion
User-Item Interaction	#Users	70,679	23,566	114,737
	#Items	24,915	48,123	30,040
	#Interactions	847,733	3,034,796	1,781,093
Knowledge Graph	#Entities	88,572	58,266	59,156
	#Relations	39	9	51
	#Triplets	2,557,746	464,567	279,155

train.txt
- Train file.
- Each line is a user with her/his positive interactions with items: (userID and a list of itemID).
test.txt
- Test file (positive instances).
- Each line is a user with her/his positive interactions with items: (userID and a list of itemID).
- Note that here we treat all unobserved interactions as the negative instances when reporting performance.
user_list.txt
- User file.
- Each line is a triplet (org_id, remap_id) for one user, where org_id and remap_id represent the ID of such user in the original and our datasets, respectively.
item_list.txt
- Item file.
- Each line is a triplet (org_id, remap_id, freebase_id) for one item, where org_id, remap_id, and freebase_id represent the ID of such item in the original, our datasets, and freebase, respectively.
entity_list.txt
- Entity file.
- Each line is a triplet (freebase_id, remap_id) for one entity in knowledge graph, where freebase_id and remap_id represent the ID of such entity in freebase and our datasets, respectively.
relation_list.txt
- Relation file.
- Each line is a triplet (freebase_id, remap_id) for one relation in knowledge graph, where freebase_id and remap_id represent the ID of such relation in freebase and our datasets, respectively.

Acknowledgement

Any scientific publications that use our datasets should cite the following paper as the reference:

@inproceedings{KGIN2020,
  author    = {Xiang Wang and
              Tinglin Huang and 
              Dingxian Wang and
              Yancheng Yuan and
              Zhenguang Liu and
              Xiangnan He and
              Tat{-}Seng Chua},
  title     = {Learning Intents behind Interactions with Knowledge Graph for Recommendation},
  booktitle = {{WWW}},
  pages     = {878-887},
  year      = {2021}
}

Nobody guarantees the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions:

The user must acknowledge the use of the data set in publications resulting from the use of the data set.
The user may not redistribute the data without separate permission.
The user may not try to deanonymise the data.
The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from us.

knowledge_graph_based_intent_network's People

Contributors

Stargazers

Watchers

Forkers

mathematicalmodels zcsctc qianrenjian lulu0913 changzhijiang yangkailun stephen-zhenguo zhangleihong minzc nextplusplus linweizhedragon sidney1994 tpnguyen rohitn anuprav hmingz imstillfight aisi746467512 lingli97 cyp-jlu-ai lvan9999 srbuhimirzoyan crystal22 800 sungjune-kim bookjiang greatmeath george-polya bh-range wzz8850563 cleanspeech317 mikedean2367 qqfox laowangzi emmmmmboom frostmourne78 shuaihuapei gayeon603 xiaochaolee jason116248 aliang-rec steve30572 learnerma duanchao baical77 hfwangxxx huangjing-gz

knowledge_graph_based_intent_network's Issues

How to get the result like Fig 5 in your paper?

like you mention in this issue #14 to get each intent's score, but how can I get the relation's score in a intent?

关于utils/data_loader.py第69行注释的小疑问

请问针对utils/data_loader.py，第69行的注释# including items + users是否应该改为# including items？因为变量n_entities不应该包括users，只包括items和其他entities。

    n_entities = max(max(triplets[:, 0]), max(triplets[:, 2])) + 1  # including items + users
    n_nodes = n_entities + n_users

    n_entities = max(max(triplets[:, 0]), max(triplets[:, 2])) + 1  # including items
    n_nodes = n_entities + n_users

怎么恢复保存的模型的参数

您好！非常感谢您分享的代码我注意到您的代码中有save的相关代码
"""save weight"""
if ret['recall'][0] == cur_best_pre_0 and args.save:
torch.save(model.state_dict(), args.out_dir + 'model_' + args.dataset + '.ckpt')
但是我发现下一次训练的时候并没有加载保存的参数，请问您的代码中实现恢复模型的相关代码了吗？

Two questions about sparse relational graph construction and IG aggregation

一、
我注意到在代码:
adj = sp.coo_matrix((vals, (np_mat[:, 0], np_mat[:, 1])), shape=(n_nodes, n_nodes))中，
np_mat[:, 0]和np_mat[:, 1]是否也应当如处理cf图那样，需要加上n_users。users并未被remap到entities中去，稀疏矩阵的前n_users行与n_users列应当为users预留。
对比之前KGAT中数据预处理的实现：
K, K_inv = _np_mat2sp_adj(np.array(self.relation_dict[r_id]), row_pre=self.n_users, col_pre=self.n_users)

不知道是否有理解错。

二、
我注意到在代码
user_agg = user_agg * (disen_weight * score).sum(dim=1) + user_agg # [n_users, channel]中，
user_embedding的计算公式是否与论文中的Eq[7]有所出入。

论文中描述的是user所对应的item与intent加权求和，而在代码中的实现，是否可以理解为先求和再加权。
以及最后+ user_agg是什么用意。
我理解的聚合应当在代码
entity_res_emb = torch.add(entity_res_emb, entity_emb) user_res_emb = torch.add(user_res_emb, user_emb) 中实现了。

希望得到您的解答与纠正。谢谢！

关于复现

再次感谢您的开源代码！
最近在用您的代码跑实验的时候，发现完全相同的两次实验的评估结果不完全一样。这种情况是否正常？我注意到您的代码中已经有随机数的固定和一些复现的设置（这些代码我都是保持原状）。

How

关于KGAT等基线方法实验结果与KGAT原论文中的不同

感谢您们的开源工作，我从中学到了很多。有一个疑问希望获得答案。
我在做对比实验时，发现KGIN与KGAT论文中使用了相同的Amazon，Last数据集和同样的评估指标，但是为什么两篇论文中的结果recall相近，ndcg却有较大的差距？

可否提供rgcn在推荐上的的复现代码

感谢作者提供代码！注意到在论文中KGIN模型和调整为推荐任务的RGCN模型做了对比。可否提供一下用于对比的RGCN模型代码？不胜感激！

how to fig?

May I ask you about the drawing the figure 4? How to create a similar Painting style.
Thanks a lot in advance if you could provide the script!?
Best

in the code latent_emb embedding has two ways?

hi, when i see the code, i have Puzzled

in the paper, formula 7 and formula 8, ep why is not same in the code, why cal user intent att use a new init embedding？

negative cor

i run your code several times and always get negative cor num
start training ... using time 496.1616, training loss at epoch 0: 298.6853, cor: -103918.234375
or
`
start training ...
using time 636.8174, training loss at epoch 0: 299.3976, cor: -103918.351562
+-------+-------------------+--------------------+--------------------+----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+
| Epoch | training time | tesing time | Loss | recall | ndcg | precision | hit_ratio |
+-------+-------------------+--------------------+--------------------+----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+
| 1 | 636.8162143230438 | 137.00747871398926 | 194.09571838378906 | [0.07513288 0.09760893 0.11306548 0.12481185 0.13517161] | [0.06848064 0.07576022 0.08097082 0.08491058 0.08831252] | [0.02926674 0.02149495 0.01776641 0.01546985 0.01389799] | [0.26474582 0.34498854 0.39756429 0.43486379 0.46601035] |
+-------+-------------------+--------------------+--------------------+----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+

using time 636.9025, training loss at epoch 2: 170.9751, cor: -113220.000000
using time 636.6866, training loss at epoch 3: 157.5762, cor: -113220.000000
using time 637.4699, training loss at epoch 4: 149.2337, cor: -113220.000000
`

it seems something has error?

A question about equation7

老师您好：
我想询问一下公式7中1/Nu这个式子在代码何处体现出来user_agg = user_agg * (disen_weight * score).sum(dim=1) + user_agg # [n_users, channel]，是我找的地方不对吗？烦请老师解答一下。
期待您的回复

latent_emb问题

你好，很感谢你们团队能提供代码，我有个问题，Recommender中定义了latent_emb，而根据论文中的公式1和公式2，latent_emb不应该是关系向量确定的吗？希望得到您的回复。

公式7的实现

作者您好，感谢您的团队提供的KGIN代码！现在有一个问题想请教：在KGIN代码实现公式7的时候，
user_agg = torch.sparse.mm(interact_mat, entity_agg)
user_agg = user_agg * (disen_weight * score).sum(dim=1) + user_agg
我理解的上面一行得到的user_agg是公式7的ei，那么下面一行+user_agg对应到原公式7中，是不是公式最后+ei，这让我有些难以理解为什么这样做。
我看到之前的issue里面有人提到，但是我还是没有理解到您说的自连接的意思。
希望得到您的解答与纠正，谢谢！

Could you tell me the hardware information for the experiment?

Thank you very much for openning source.
I'd like to ask about the hardware of your experiment (GPU, memory size, etc.). Looking forward to your reply.

explainable result

How get the explainable?

intent的embedding以及意图独立性时的参数等问题

你好！感谢你们团队能提供代码供大家学习！这里有两个问题想问一下：
① 在论文中的ep，即意图的嵌入，应该是由关系嵌入计算得出的，但是在代码中是直接定义了一个latent_emb并直接使用，好像与文中不符（或者意图向量不是latent_emb？）。
② 在计算意图独立性时，论文中是用已经得到的ep进行计算，但是代码里用来计算的是参数是disen_weight_att，我理解这个参数是wrp，即关系r到意图p的一个可训练参数，这里也和原文中有些不同。

代码问题

dataloader里的_bi_norm_lap方法中的拉普拉斯矩阵计算那里是否应为I-D^{-1/2}AD^{-1/2}

Link to the Paper

Is the link to the paper wrong？

关于正则项

首先十分感谢您的工作以及开源代码，从中我学习了很多。
关于代码中的正则项我有点疑惑，因为它与论文中提到的不同。代码中的正则项只包含用户，物品项，似乎没有其他模型的参数？对于这一点您怎么看？

How to realize the equation (12) in your paper?

Hi, Huang.
The process of Relational Path-aware Aggregation realized on line 25-50 in your code (KGIN.py), and then sum representations of user and item up as the final representations(equtation(13) in your paper) on line 202-203 in your code (KGIN.py). But I can't find the process of Capturing Relational Paths(the quation(12) in your paper). Where is this process implemented in your code?

Thank you very much for reading my questions. Your paper is very helpful to me. I look forward to your reply.

@huangtinglin

How do I know the total distribution and score of R in P in Figure 5

您好：
我该如何找到在图5中的p1和r56,r34之间的联系呢，是通过哪个代码体现的吗？
还有就是score是如何计算出来的，是用公式8计算的吗？如果是，那么我该在哪找到这个score？

十分期待您的回答！

实验结果

想问一下作者，为什么实验的数据集是一样的，结果会如此不同，请问作者知道是什么原因吗？Knowledge graph convolutional networks for recommender systems，19年的文章，通常采取所有的items去除训练集的方式，实验的结果不在同一个量纲范围内。

您好，论文内容有一处不解，冒昧请教

对于每一对交互历史(u,i)，在引入intent之后，都会变成(u,p1,i),(u,p2,i),(u,p3,i),(u,p4,i)吗？也就是说意图集的4个p会对所有的interactions均有贡献吗？

Some questions about Lastfm datasets

I'm sorry to bother you!
Could you please tell me which particular Last-FM data set you used in your paper? Is it the LFM-1B UGP data set in the website link you gave?
PS: The KGAT paper provides lastfm dataset address is https://grouplens.org/datasets/hetrec-2011/ Can the data of these two addresses be used?

KGIN关于推荐的问题

您好：
在了解KGIN的可解释性功能后，我想知道KGIN的实际用途是什么？能否向user推荐一个新的item，如何推荐，在代码或是论文中的哪一点体现出来。
十分感谢您的回答

模型处理大规模数据集时cuda OOM

Can you tell me about kgfinal.txt

Can you tell me about kgfinal.txt, in kgfinal.txt the head entity and tail entity id are independent or not. My understanding is that userid and itemid are independent in trian.txt, where itemid is kept the same as itemid in kgfinal.txt, and the two columns are not independent in knowledge graph.

训练过程报错“name 'test_user_set' is not defined”

您好，最近在学习该工作的代码，通过调用给出的命令行参数，训练过程中evaluate.py文件的test_one_user(x)函数，在“user_pos_test = test_user_set[u]”处报错“name 'test_user_set' is not defined”。调试了一段时间没有找到出错的原因，想请问一下您是否知道原因？

Lastfm数据集

你好~我想问一下Lastfm数据集中最后用于推荐的物品（item-list.txt中的内容）是原始数据中的tracks嘛？

感谢~

训练速度问题

学长您好，请问您实验配置是什么呢？我直接跑代码的话，速度会比您发布的日志慢3倍多，不知道问题出在哪里？谢谢！！！！

About user representation aggregation

In the paper, user representation get from the Intenet Graph. Is IG just compose by user,intent,item (without entity about item)?

And how to calculate high-order representation of user, like e_u^(3) ?
First calculate high-order representation of item e_i_(_2), Then got e_u^(3) by equation (6)(7)(8) in paper?

Thx reply .