rootlu / metahin Goto Github PK

View Code? Open in Web Editor NEW

121.0 1.0 38.0 39.29 MB

Source code for KDD 2020 paper "Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation"

Python 30.24% Jupyter Notebook 69.76%

meta-learning heterogeneous-graph heterogeneous-information-networks cold-start recommender-systems kdd2020 maml

metahin's Introduction

MetaHIN

Source code for KDD 2020 paper "Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation"

Requirements

Python 3.6.9
PyTorch 1.4.0
My operating system is Ubuntu 16.04.1 with one GPU (GeForce RTX) and CPU (Intel Xeon W-2133)
Detailed requirements

Datasets

We have uploaded the original data of DBook, Movielens and Yelp in the data/ folder.

The processed data of DBook and Movielens can be downloaded from Google Drive and BaiduYun (Extraction code: ened).

The processed data of Yelp can be generate by the code data/yelp/YelpProcessor.ipynb.

Description

MetaHIN/
├── code
│   ├── main.py：the main funtion of model
│   ├── Config.py：configs for model
│   ├── Evaluation.py: evaluate the performance of learned embeddings w.r.t clustering and classification
│   ├── DataHelper.py: load data
│   ├── EmbeddingInitializer.py: map feature and inilitize embedding tables
│   ├── HeteML_new.py: update paramerters in meta-learning paradigm 
│   ├── MetaLeaner_new.py: the base model 
├── data
│   └── dbook
│       ├── original/: the original data without any preprocess
│       ├── DBookProcessor.ipynb: preprocess data 
│   └── movielens
│       ├── original/: the original data without any preprocess
│       ├── MovielensProcessor.ipynb: preprocess data 
│   └── yelp
│       ├── original/: the original data without any preprocess
│       ├── YelpProcessor.ipynb: preprocess data 
├── README.md

Reference

@inproceedings{lu2020meta,
  title={Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation},
  author={Lu, Yuanfu and Fang, Yuan and Shi, Chuan},
  booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={1563--1573},
  year={2020}
}

metahin's People

Contributors

Stargazers

Watchers

Forkers

beesitech qianrenjian subin4git songfgh staminatang helenligit hexin5515 dmmler rcdnn yangylin qianlima-lab qintianhang stevenn9981 cenzwong juyongjiang l294265421 hmingz lmm6895071 pooyamoini jianzhu bupt-gamma xiaowuqing-0924 fusion-research lilyzhangyanlin hanzheli wuhuaguo2 zpqiu qwzhong1988 sailfish009 luckydl21 manojll crystal22 yishuaigeng stjordanis jiangxing2001 harishgovardhandamodar ttttthy keep0828

metahin's Issues

处理文件过大

您好，

我了解到需要下载谷歌云盘里面的数据作为模型的输入，但是我下载movielens后发现文件解压之后大小有104G，请问这104G的文件就是模型的输入吗？这个是不是太大了。

谢谢分享。

cur_training_u_movies variable not found in YelpProcessor.ipynb

Hello,

       Thank you for your awesome work, I am using your scripts of YelpProcessor.ipynb to preprocess data. But it indicates that "cur_training_u_movies" not defined. Could you help me with it? Thank you!

About the update of DBookProcessor code

I'd like to ask about the processing code of support set json file and the query set json file, since I'm not quite sure about the format and constructing method of both of them. Will you release it or you can modify the Yelp processing code for the DBOOK one?

提示找不到'DataProcessor':No module named 'DataProcessor'

你好，感谢分享代码，我运行main.py后有如下提示：
Traceback (most recent call last): File "main.py", line 11, in <module> from DataHelper import DataHelper File "/content/gdrive/My Drive/Colab Notebooks/PaperCode/MetaHIN-master/code/DataHelper.py", line 8, in <module> from DataProcessor import Movielens ModuleNotFoundError: No module named 'DataProcessor'
请问是什么导致的？谢谢

K-folds cross validation for item cold start

关于代码复现

您好我在复现代码的时候遇到了很多问题

FileNotFoundError: [Errno 2] No such file or directory: './meta_training/support_u_movies.json'
movielens和dbook的processor都报了这个问题如果自己创建这个文件，就会报下面这个错误
ValueError: invalid literal for int() with base 10: 'name'
yelpprocessor则是报了下面这个错
NameError: name 'new_user' is not defined
三个数据集的处理代码都跑不起来导致源码也跑不起来期待您的回复

MovieLensProcessor程序问题

执行MovieLensProcessor程序时，显示Metatraining文件夹缺失support_u_movies.json，query_u_movies.json，support_u_movies_y.json文件，请问这三个文件是怎么构造出来的？

模型性能达不到论文结果的问题

movielens的实验运行

您好！我想问一下我做movielens的实验，我把处理过的数据集下载下来并解压之后应该怎么做呢？我看yelp实验中是要转换一个yelp_academic_dataset.json文件，但是您的readme中给的下载这个文件的网站失效了，所以我转向做movielens的实验，但是我不太明白处理数据下载后的下一步应该做什么

gcmc can not be found

Hi dear

It returns an error when I tried to running "pip3 install -r requirements.txt" commands
can you give me some instructions? thank you very much.

thanks
weizhen

各种冷启动场景下train_data

您好，
我在本地复现了您的MetaHIN模型，但是实验结果有较大的偏差。
请问一下，各种冷启动场景train_data导入的都是meta_training文件夹中的数据吗？还是说不同的冷启动场景train_data导入不同的数据，比如说user冷启动场景导入的是user_cold_testing文件夹中的数据，很期待收到您的回复。

Error in Data Generation

Traceback (most recent call last):
File "./MetaHIN/code/main.py", line 147, in
training(hml, model_save=True, model_file=model_filename,device=cuda_or_cpu)
File "./MetaHIN/code/main.py", line 35, in training
supp_xs_s, supp_ys_s, supp_mps_s, query_xs_s, query_ys_s, query_mps_s = zip(*train_data) # supp_um_s:(list,list,...,2553)
ValueError: not enough values to unpack (expected 6, got 0)

数据处理文件

您好，想请问一下有处理过后的数据文件吗，每次在本机上运行YelpProcessor.ipynb最后几行代码时电脑就死机了，是因为数据文件太大了吗？

Error when generating yelp data in YelpProcessor.ipynb

Hello !

When I try to generate the data for yelp as adviced in this issue I have the following error in the YelpProcessor.ipynb :

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-071c77df7579> in <module>
      8     for state in states:
      9         os.mkdir("{}/{}/".format(output_dir, state))
---> 10         os.mkdir("{}/{}/".format(melu_output_dir, state))
     11         if not os.path.exists("{}/{}/{}".format(output_dir, "log", state)):
     12             os.mkdir("{}/{}/{}".format(output_dir, "log", state))

FileNotFoundError: [Errno 2] No such file or directory: '../../../MeLU/yelp//warm_up/'

If I understand well, I cloned the MeLU git project but i'm not sure it is what is expected ? I don't see any yelp folder in their github project.

Thanks in advance and for the availability of your code in any case !

缺少文件

在movielens数据集中并没有./meta_training/support_u_movies.json，./meta_training/support_u_movies_y.json等四个文件

关于support_u_movies.json等文件的生成

数据集的处理是使用对应的 .ipynb 文件来对原始数据进行处理
在MovieLens数据集的处理文件 MovieLensProcessor.ipynb中，直接使用了非原始数据的support_u_movies.json 等文件的读取操作。
请问是否遗漏了support set和 query set的生成代码？

Something Wrong with DataProcessor.py

Hi, when I tried to run DataProcessor.py to generate data, an error occurred.

  File "DataProcessor.py", line 339, in <module>
    ml = Movielens(os.path.join(input_dir, 'movielens'), os.path.join(output_dir, 'movielens'))
  File "DataProcessor.py", line 30, in __init__
    self.metapath_data()
  File "DataProcessor.py", line 188, in metapath_data
    u_m_directors[(user,movie)] += list(set(self.movie_directors[movie]))
KeyError: 1259

I found that it's because of the keys of movie_directors are as str type while movie in this iteration is as int type.

Then I noticed that u_m_directors.json and u_m_actors.json are commented in support_query_data function while they still remained uncommented in metapath_data function, which caused the error above. Thus I wonder if these two path are useless, which means I just need to comment u_m_directors as well as u_m_actors and the problem solved.

According to my shallow understanding of the paper, these u_m_d or u_m_a paths are just temporary paths. What we want to find are u->m and u->m->sth->m, thus we don't need to store them, right?

By the way, a giant TODO appears at the end of support_query_data function, which makes me curious whether this part of your published repo is unfinished. Also, I guess the input_dir and output_dir should be 'movielens' rather than 'movielens_1m' according to the project structure. (not a big deal)

I list all my problems here:

How to solve this error?
Do we need to store u_m_directors.json or u_m_actors.json? Do we even need to calculate them?
Dbook & Yelp, would they works fine in data-processing with two IPython scripts? Does TODO matter for MovieLens?
How long does it take for training process (just your model, not ablation study or baselines)

Anyway, I want to personally thank you for the outstanding job. It's so nice for you to publish codes on Github. Wish you all the best in future research careers!