Code Monkey home page Code Monkey logo

metahin's Introduction

MetaHIN

Source code for KDD 2020 paper "Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation"

Requirements

  • Python 3.6.9
  • PyTorch 1.4.0
  • My operating system is Ubuntu 16.04.1 with one GPU (GeForce RTX) and CPU (Intel Xeon W-2133)
  • Detailed requirements

Datasets

We have uploaded the original data of DBook, Movielens and Yelp in the data/ folder.

The processed data of DBook and Movielens can be downloaded from Google Drive and BaiduYun (Extraction code: ened).

The processed data of Yelp can be generate by the code data/yelp/YelpProcessor.ipynb.

Description

MetaHIN/
├── code
│   ├── main.py:the main funtion of model
│   ├── Config.py:configs for model
│   ├── Evaluation.py: evaluate the performance of learned embeddings w.r.t clustering and classification
│   ├── DataHelper.py: load data
│   ├── EmbeddingInitializer.py: map feature and inilitize embedding tables
│   ├── HeteML_new.py: update paramerters in meta-learning paradigm 
│   ├── MetaLeaner_new.py: the base model 
├── data
│   └── dbook
│       ├── original/: the original data without any preprocess
│       ├── DBookProcessor.ipynb: preprocess data 
│   └── movielens
│       ├── original/: the original data without any preprocess
│       ├── MovielensProcessor.ipynb: preprocess data 
│   └── yelp
│       ├── original/: the original data without any preprocess
│       ├── YelpProcessor.ipynb: preprocess data 
├── README.md

Reference

@inproceedings{lu2020meta,
  title={Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation},
  author={Lu, Yuanfu and Fang, Yuan and Shi, Chuan},
  booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={1563--1573},
  year={2020}
}

metahin's People

Contributors

rootlu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

metahin's Issues

处理文件过大

您好,

我了解到需要下载谷歌云盘里面的数据作为模型的输入,但是我下载movielens后发现文件解压之后大小有104G,请问这104G的文件就是模型的输入吗?这个是不是太大了。

谢谢分享。

关于代码运行

您好 您的工作非常棒 最近在复现您的代码 想问一下代码的执行顺序是什么呢

About the update of DBookProcessor code

I'd like to ask about the processing code of support set json file and the query set json file, since I'm not quite sure about the format and constructing method of both of them. Will you release it or you can modify the Yelp processing code for the DBOOK one?

提示找不到'DataProcessor':No module named 'DataProcessor'

你好,感谢分享代码,我运行main.py后有如下提示:
Traceback (most recent call last): File "main.py", line 11, in <module> from DataHelper import DataHelper File "/content/gdrive/My Drive/Colab Notebooks/PaperCode/MetaHIN-master/code/DataHelper.py", line 8, in <module> from DataProcessor import Movielens ModuleNotFoundError: No module named 'DataProcessor'
请问是什么导致的?谢谢

关于代码复现

您好 我在复现代码的时候遇到了很多问题
图片
FileNotFoundError: [Errno 2] No such file or directory: './meta_training/support_u_movies.json'
movielens和dbook的processor都报了这个问题 如果自己创建这个文件,就会报下面这个错误
ValueError: invalid literal for int() with base 10: 'name'
yelpprocessor则是报了下面这个错
NameError: name 'new_user' is not defined
三个数据集的处理代码都跑不起来 导致源码也跑不起来 期待您的回复

MovieLensProcessor程序问题

执行MovieLensProcessor程序时,显示Metatraining文件夹缺失support_u_movies.json,query_u_movies.json,support_u_movies_y.json文件,请问这三个文件是怎么构造出来的?

movielens的实验运行

您好!我想问一下我做movielens的实验,我把处理过的数据集下载下来并解压之后应该怎么做呢?我看yelp实验中是要转换一个yelp_academic_dataset.json文件,但是您的readme中给的下载这个文件的网站失效了,所以我转向做movielens的实验,但是我不太明白处理数据下载后的下一步应该做什么

gcmc can not be found

Hi dear

It returns an error when I tried to running "pip3 install -r requirements.txt" commands
can you give me some instructions? thank you very much.

thanks
weizhen
image

各种冷启动场景下train_data

您好,
我在本地复现了您的MetaHIN模型,但是实验结果有较大的偏差。
请问一下,各种冷启动场景train_data导入的都是meta_training文件夹中的数据吗?还是说不同的冷启动场景train_data导入不同的数据,比如说user冷启动场景导入的是user_cold_testing文件夹中的数据,很期待收到您的回复。

Error in Data Generation

Traceback (most recent call last):
File "./MetaHIN/code/main.py", line 147, in
training(hml, model_save=True, model_file=model_filename,device=cuda_or_cpu)
File "./MetaHIN/code/main.py", line 35, in training
supp_xs_s, supp_ys_s, supp_mps_s, query_xs_s, query_ys_s, query_mps_s = zip(*train_data) # supp_um_s:(list,list,...,2553)
ValueError: not enough values to unpack (expected 6, got 0)

数据处理文件

您好,想请问一下有处理过后的数据文件吗,每次在本机上运行YelpProcessor.ipynb最后几行代码时电脑就死机了,是因为数据文件太大了吗?

Error when generating yelp data in YelpProcessor.ipynb

Hello !

When I try to generate the data for yelp as adviced in this issue I have the following error in the YelpProcessor.ipynb :

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-071c77df7579> in <module>
      8     for state in states:
      9         os.mkdir("{}/{}/".format(output_dir, state))
---> 10         os.mkdir("{}/{}/".format(melu_output_dir, state))
     11         if not os.path.exists("{}/{}/{}".format(output_dir, "log", state)):
     12             os.mkdir("{}/{}/{}".format(output_dir, "log", state))

FileNotFoundError: [Errno 2] No such file or directory: '../../../MeLU/yelp//warm_up/'

If I understand well, I cloned the MeLU git project but i'm not sure it is what is expected ? I don't see any yelp folder in their github project.

Thanks in advance and for the availability of your code in any case !

缺少文件

在movielens数据集中并没有./meta_training/support_u_movies.json,./meta_training/support_u_movies_y.json等四个文件

关于support_u_movies.json等文件的生成

  1. 数据集的处理是使用对应的 .ipynb 文件来对原始数据进行处理
  2. 在MovieLens数据集的处理文件 MovieLensProcessor.ipynb中,直接使用了非原始数据的support_u_movies.json 等文件的读取操作。
    请问是否遗漏了support set和 query set的生成代码?

Something Wrong with DataProcessor.py

Hi, when I tried to run DataProcessor.py to generate data, an error occurred.

  File "DataProcessor.py", line 339, in <module>
    ml = Movielens(os.path.join(input_dir, 'movielens'), os.path.join(output_dir, 'movielens'))
  File "DataProcessor.py", line 30, in __init__
    self.metapath_data()
  File "DataProcessor.py", line 188, in metapath_data
    u_m_directors[(user,movie)] += list(set(self.movie_directors[movie]))
KeyError: 1259

I found that it's because of the keys of movie_directors are as str type while movie in this iteration is as int type.

Then I noticed that u_m_directors.json and u_m_actors.json are commented in support_query_data function while they still remained uncommented in metapath_data function, which caused the error above. Thus I wonder if these two path are useless, which means I just need to comment u_m_directors as well as u_m_actors and the problem solved.

According to my shallow understanding of the paper, these u_m_d or u_m_a paths are just temporary paths. What we want to find are u->m and u->m->sth->m, thus we don't need to store them, right?

By the way, a giant TODO appears at the end of support_query_data function, which makes me curious whether this part of your published repo is unfinished. Also, I guess the input_dir and output_dir should be 'movielens' rather than 'movielens_1m' according to the project structure. (not a big deal)

I list all my problems here:

  1. How to solve this error?
  2. Do we need to store u_m_directors.json or u_m_actors.json? Do we even need to calculate them?
  3. Dbook & Yelp, would they works fine in data-processing with two IPython scripts? Does TODO matter for MovieLens?
  4. How long does it take for training process (just your model, not ablation study or baselines)

Anyway, I want to personally thank you for the outstanding job. It's so nice for you to publish codes on Github. Wish you all the best in future research careers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.