hwwang55 / mkr Goto Github PK

View Code? Open in Web Editor NEW

319.0 10.0 110.0 14.65 MB

A tensorflow implementation of MKR (Multi-task Learning for Knowledge Graph Enhanced Recommendation)

License: MIT License

Python 100.00%

recommender-systems knowledge-graph multi-task-learning

mkr's Introduction

MKR

This repository is the implementation of MKR (arXiv):

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation
Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo.
In Proceedings of The 2019 Web Conference (WWW 2019)

MKR is a Multi-task learning approach for Knowledge graph enhanced Recommendation. MKR consists of two parts: the recommender system (RS) module and the knowledge graph embedding (KGE) module. The two modules are bridged by cross&compress units, which can automatically learn high-order interactions of item and entity features and transfer knowledge between the two tasks.

Files in the folder

data/
- book/
  - BX-Book-Ratings.csv: raw rating file of Book-Crossing dataset;
  - item_index2entity_id.txt: the mapping from item indices in the raw rating file to entity IDs in the KG;
  - kg.txt: knowledge graph file;
- movie/
  - item_index2entity_id.txt: the mapping from item indices in the raw rating file to entity IDs in the KG;
  - kg.txt: knowledge graph file;
  - ratrings.dat: raw rating file of MovieLens-1M;
- music/
  - item_index2entity_id.txt: the mapping from item indices in the raw rating file to entity IDs in the KG;
  - kg.txt: knowledge graph file;
  - user_artists.dat: raw rating file of Last.FM;
src/: implementations of MKR.

Running the code

Movie

$ cd src
$ python preprocess.py --dataset movie
$ python main.py

Book
- ```
$ cd src
$ python preprocess.py --dataset book
```
- open main.py file;
- comment the code blocks of parameter settings for MovieLens-1M;
- uncomment the code blocks of parameter settings for Book-Crossing;
- ```
$ python main.py
```
Music
- ```
$ cd src
$ python preprocess.py --dataset music
```
- open main.py file;
- comment the code blocks of parameter settings for MovieLens-1M;
- uncomment the code blocks of parameter settings for Last.FM;
- ```
$ python main.py
```

mkr's People

Contributors

Stargazers

Watchers

Forkers

gegetang jiaqinglin shubhampachori12110095 doublepg anonymous3151 qss2012 xuezhizeng microw voladorlu tiffen xiaoqingwang bkwapong huizhaowang gengzigang zhustory-ll wangtaogh bzp92 raihan2108 xerebral mindis tlwzzy scotthcl cj1406942109 ninghaohello syahbima yueyedeai floatingmaple maximilian-shi simplzyu kezhende wangyuanhao gavinljj todun hapengzi thunderleefocus hhy5277 wangxuekui lucian-whu collapseyu llucky qianrenjian naveen2507 67in jiaxuanliu fkchan chris19920210 seeker1943 may-sunshine cindychen1995 qiuzhishang robstens iamyourboss lilycuhk qqwertyu kiminh paul0m ccfbupt crystal22 cucsea neuliyiping mync123456 yz3007 o0mahan0o cse-ljl jenniejiang akshaykatyal rcdnn cjmcgraw siyongxu unicomes abhisheksingl chenyi729 supermousse cwickniss scarlett796 zhaozx2115 lulu0913 huhaoyu777 dane666 jianhuanzhuo fs57585 frankiegu adrienboustie aliang-rec jackylee1 essence-611 yilinmaster axin647 leepoyang mostsuperman fffcapatin328 yewenzhou hookk blair1412 charleoy gengyishuai liuqiangh zhijinguo97 invictusl qiqi1996aemlia

mkr's Issues

不进行KGE训练似乎可以得到更好的auc

您好，我在ml1m数据集上，直接运行这份代码时test auc可以达到0.9155，但是当我注释掉KGE训练部分后再运行时test auc可以达到0.9169。我对此十分疑惑，难道KGE训练对推荐质量有负作用，请问这该如何解释呢。

Baselines PER描述为什么是MovieLens- 20M

请问 Baselines PER 描述为什么是 MovieLens- 20M而不是MovieLens- 1M（文章实验是ML-1M上进行的）

PER [39] treats the KG as heterogeneous information networks and extracts meta-path based features to represent
the connectivity between users and items. In this paper,we use manually designed user-item-attribute-item paths
as features, i.e., "user-movie-director-movie", "user-movie-genre-movie", and "user-movie-star-movie" for MovieLens-
20M; "user-book-author-book" and "user-book-genre-book"for Book-Crossing; "user-musician-genre-musician", "user-
musician-country-musician", and "user-musician-age-musician"(age is discretized) for Last.FM. Note that PER cannot be ap-
plied to news recommendation because it’s hard to pre-define meta-paths for entities in news

the representation of relation and tail

Hi Hongwei, I am reading your code, which is very inspiring, but I am a little confused about the representation of relation and tail. In line 56, within the _build_low_layers function of the file model.py, the embedding of the tail is passed to an MLP structure, while the embedding of relation remains unchanged until concatenated with head. I believe it is contrary to the subsection 2.5 Knowledge Graph Embedding Module, where the embedding of relation should be processed with an MLP structure, and the tail should be the real feature vector. Should the embedding of the tail on line 56 be changed to the embedding of the relation?

when L>1 auc always equals 0.500

你好，请问代码里KGE部分的实现和论文里是不是有些差异

我发现的差异如下：
1、根据论文2.5节，可以知道KGE部分关系向量的迭代方式是L层感知机，尾结点不变（用来计算损失）；但是代码里截然相反，尾结点是L层感知机，关系向量不变
2、根据论文里公式8，可以知道，KGE的损失函数计算的是尾结点的损失，也就是真实值t和预测值t^的损失(点乘再回归)；但代码里的真实值却经过了L层感知机的迭代。

我不明白为何会有这两个这么大的差异。另外对于训练KGE部分，我也有一定的困惑：
代码里训练KGE用了一个专门的张量rmse，看来是尾结点真实值和预测值的差值，求了平方和的均值再开方。为何要这么做，感觉既像对应论文里的公式9，又不像（代码里好像没有负采样部分）

希望作者抽空答疑解惑以下，谢谢。

hi，about the data I have some questions.

First, what meaning is the data of one to four colomn in rating.dat document. Second , the data in item_index2entity_id.txt only from 1 to 3951, but the data in the kg.txt more than this scale.

Similar recommendation results for different users by MKR

hi，great job! I use MKR on my own food dataset.The last epoch assessment as follows: train auc: 0.9210 acc: 0.8376 eval auc: 0.9122 acc: 0.8310 test auc: 0.9118 acc: 0.8308. But the recommend results for different user are similar, like this:[1, 2, 4, 6, 7],[1, 4, 6, 7, 11], [1, 4, 6, 7, 14], [1, 4, 6, 16, 18], [6, 7, 11, 12, 14]. I only change the default setting with the n_epochs=10 and batch_size=256 on your book case.
what is the trouble？Thanks for your reply.

item_index2entity_id.txt clarification

Hello Mr. Wang I'm trying to reproduce your model on a new dataset that uses jurisprudemce documents. My only question is how was the item_index2entity_id.txt dataset generated? Are the item ids the items that appear in the knowledge group only as a tail entity?

Or are the item ids the items that sppear in the knowledge graph as both head and tail entities? For example if a movie item is also a head entity in triple A and is also a tail entity in another triple, say triple B would this be criteria for the item to be part in the item_index2entity_id.txt file or is it enough that an item appear as a tail entity as you've detailed in your MKR paper?

dataset issuses

Hi,
Thanks for your wonderful and outstanding paper. In this code, is it a complete dataset? If it is not, could you provide me a complete one? Thanks agian,

Code inconsistent with the model in the paper

Hi,
I notice that in the paper, you are using the embedding of relation and head in the low layers. According to equation (7) in the paper, the relation embedding should pass a MLP. However, in the code, this part is missing and instead you pass the tail embedding to MLP in the low layers (model.py line 56). This seems to be inconsistent with the model in the paper, could you please explain the reason. Thanks:)