muhanzhang / igmc Goto Github PK

Inductive graph-based matrix completion (IGMC) from "M. Zhang and Y. Chen, Inductive Matrix Completion Based on Graph Neural Networks, ICLR 2020 spotlight".

License: MIT License

Python 98.59% Shell 1.41%

igmc's People

Contributors

Stargazers

Watchers

Forkers

zwxxj polybahn collapseyu bestyours mishidemudong 0215arthur jdc08161063 cmaxhao stevenji zwytop scaactk aabbccgithub freekang fairsr z-tong muik hjlee9182 tajuddeen riordanalfredo mldl xuyuanchi patrykradon greenary-john linwang1982 yang0110 siyarvurucu lujiayu0807 1895-art hexin5515 zshwuhan cqian1127 zscdumin wesleyclode liuyunwu mudwall changzhijiang lostdirt cs6101-team kfk745 chenyuuuuuuu liweiowl beesitech sanmayphy fusion-research gabrielcbenedito boxindu hell-to-heaven eehiter shuowang-ai shrikrishnalolla abdurahman02 seasonedfish lspongebobjh chee154 wangyuxiang8 stochasticallyconverging hoanghl ajanaliz mcdragon hookk stjordanis glancerz moguizhizi nanfengk anishthite mohandodda0 pritigup hwuiiiii nijzhang ramachandransitaraman guptav96 sagar-nyamagouda varun-spectre furmanlukasz techthiyanes tanhanjay ovsale zouzekun hodori0719 thatsshirleylee barknmeow

igmc's Issues

torch.nn.modules.module.ModuleAttributeError: 'RGCNConv' object has no attribute 'att'

Hi there,

Very interesting paper. However, When I train it, it shows ' 'RGCNConv' object has no attribute 'att'', When I look at the documentation in pytorch, there is no such attribute as well. Please check it, cheers.

Regards

Is it available transform to 'DGL'?

'IGMC' is very awesome paper about recommendation system!

I'm a 'DGL' user , so I want to transform to this library to "DGL".
Is it available? or Do you have a plan about production of this paper which 'DGL' version.

An issue about the paper

Hi，I I wonder if how RGCN work on the user-item Heterogeneous graph. user-item graph have two type of nodes,and they have differet feature.

Does it support distributed training ?

May I ask if this project supports DDP training & inference on multi GPUS and nodes?

deleted

visualization

Hi,
I've used 'visualize' option to view the subgraphs and predicted ratings. I wanted to know how to get the user and item ids corresponding to the 'red' and 'blue' nodes in the visualization.pdf so that it can be identified which target user-item pair this enclosing subgraph has been built around.

I am using this code for a new dataset, and I want to annotate the nodes in the visualization with the actual row and column indices so that the rating matrix can be tallied with the corresponding subgraph. Thanks.

Not able to reproduce results

Traceback (most recent call last):
  File "Main.py", line 338, in <module>
    multiply_by=multiply_by)
  File "/Users/vasfern/Downloads/IGMC-master/models.py", line 146, in __init__
    super(IGMC, self).__init__(dataset, GCNConv, latent_dim, regression, adj_dropout, force_undirected)
  File "/Users/vasfern/Downloads/IGMC-master/models.py", line 20, in __init__
    self.convs.append(gconv(dataset.num_features, latent_dim[0]))
  File "/Users/vasfern/Downloads/IGMC-master/.venv/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 117, in num_features
    return self.num_node_features
  File "/Users/vasfern/Downloads/IGMC-master/.venv/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 112, in num_node_features
    return self[0].num_node_features
  File "/Users/vasfern/Downloads/IGMC-master/.venv/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 188, in __getitem__
    data = self.get(self.indices()[idx])
TypeError: 'tuple' object is not callable

Dealing with large graph

I met same problem with #9 on large graph , related code in preprocessing.py on line 156:
labels = np.full((num_users, num_items), neutral_rating, dtype=np.int32)
Could you please give some suggestions? Thanks.

About large-scale experiments

Dear authors,
Recently I am doing subgraph-based GNN experiments on large-scale graphs. But I found that it is difficult to pre-process subgraphs around edges on a large-scale bipartite graph.
Could you please give some suggestions? Thanks.

Regards,

File "E:\Anaconda3\envs\IGMC-master\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError

Dear Dr.Zhang

First I'd like to say big thanks to this great work and the paper.

I tried to run this program on my windows pc.
After I run "python Main.py --data-name yahoo_music --epochs 1 --testing --ensemble" in the PyCharm terminal (as well on douban and flixster dataset) , the program always got finally stuck at a place, raising runtime error like below:

And the data folder ended as empty, but files were generated under results folder

Part of the output from the terminal was copied into a google doc, and from the output can see that this runtime error happens all the way during the program running, however, only got stuck at the last place.
https://docs.google.com/document/d/1EV-qGCuVi0t6q-jh1K5Dvnj9UASOLYq-djh6lD8FM7g/edit?usp=sharing

The python and torch environment can be seen in below picture

I will try to run it again under the Python 3.8.1 + PyTorch 1.4.0 + PyTorch_Geometric 1.4.2. environment to see what will happen.

Thanks so much and best regards
Jiyang

Possibilities of adding side features?

Hi, Thanks for constantly maintaining this repo. I understand that this paper mainly focuses on the situation without side information, but in the experiments, IGMC seems to outperform many other models with side features. If we do have some side information (say, we have user features but not item features), can we add those features in a meaningful way to IGMC? Or could you recommend some other recent works in this regard (inductive recommendation w/ some side features)? Thanks in advance!

error for constructing hop-2 subgraphs

Hello,

I encountered an error when trying to construct 2-hop subgraphs by running python Main.py --data-name flixster --epochs 40 --testing --ensemble --hop 2

See error message below. Let me know if you need any further information. Thank you!

Namespace(ARR=0.001, adj_dropout=0.2, asin_pop_thres=25, batch_size=50, continue_from=None, data_appendix='', data_name='flixster', data_seed=1234, debug=False, dynamic_test=False, dynamic_train=False, dynamic_val=False, ensemble=True, epochs=40, force_undirected=False, hop='2', keep_old=False, lr=0.001, lr_decay_factor=0.1, lr_decay_step_size=50, max_nodes_per_hop=10000, max_test_num=None, max_train_num=None, max_val_num=None, multiply_by=1, no_train=False, num_relations=5, ratio=1.0, reprocess=False, sample_ratio=1.0, save_appendix='', save_interval=10, seed=1, standard_rating=False, test_freq=1, testing=True, transfer='', use_features=False, user_thres=140, visualize=False)
Command line input: python Main.py --data-name flixster --epochs 40 --testing --ensemble --hop 2
 is saved.
number of users =  2341
number of item =  2956
User features shape: (3000, 3000)
Item features shape: (3000, 3000)
All ratings are:
[0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5. ]
#train: 23556, #val: 4712, #test: 2617
Processing...
Enclosing subgraph extraction begins...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:03<00:00,  9.79it/s]multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 217, in subgraph_extraction_labeling
    v_fringe, u_fringe = neighbors(u_fringe, Arow), neighbors(v_fringe, Acol)
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 302, in neighbors
    return set(A[list(fringe)].indices)
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 61, in __getitem__
    indices = np.concatenate(self.indices[col_selector])
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: need at least one array to concatenate
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "Main.py", line 337, in <module>
    max_num=args.max_train_num
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 91, in __init__
    super(MyDataset, self).__init__(root)
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/site-packages/torch_geometric/data/in_memory_dataset.py", line 53, in __init__
    pre_filter)
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 93, in __init__
    self._process()
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 166, in _process
    self.process()
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 106, in process
    self.class_values, self.parallel)
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 190, in links2subgraphs
    results = results.get()
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/ubuntu/anaconda3/envs/igmc/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 217, in subgraph_extraction_labeling
    v_fringe, u_fringe = neighbors(u_fringe, Arow), neighbors(v_fringe, Acol)
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 302, in neighbors
    return set(A[list(fringe)].indices)
  File "/home/ubuntu/proj/IGMC/util_functions.py", line 61, in __getitem__
    indices = np.concatenate(self.indices[col_selector])
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: need at least one array to concatenate
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:03<00:00,  9.64it/s]
python Main.py --data-name flixster --epochs 40 --testing --ensemble --hop 2  12.19s user 1.34s system 130% cpu 10.410 total

Question abou sub graph

Hello, I'm a newbie with GNN and interested in Recommender system using Graphs.
Thanksfully I found a nice paper IGMC.

I 'm trying to understand 'generating sub graph' of this code,
What I was doing was:

generate csr with example picture of rating


row = np.array([0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6])
col = np.array([0, 1, 2, 4, 8, 4, 6, 7, 1, 3, 4, 6, 0, 6, 7, 8, 2, 3, 5, 7, 9, 1, 3, 5, 8, 0, 2, 5, 7, 9])
rat = np.array([1, 2, 5, 2, 4, 4, 5, 2, 1, 4, 3, 5, 5, 5, 1, 2, 1, 2, 5, 5, 4, 1, 4, 4, 1, 5, 4, 3, 2, 5])
rat = rat - 1  # value to index
ACsr = ssp.csr_matrix((rat, (row, col)))
※ these values are just same with picture of your example rating table.

(<7x10 sparse matrix of type ''
 	with 30 stored elements in Compressed Sparse Row format>,
 array([[0, 1, 4, 0, 1, 0, 0, 0, 3, 0],
        [0, 0, 0, 0, 3, 0, 4, 1, 0, 0],
        [0, 0, 0, 3, 2, 0, 4, 0, 0, 0],
        [4, 0, 0, 0, 0, 0, 4, 0, 1, 0],
        [0, 0, 0, 1, 0, 4, 0, 4, 0, 3],
        [0, 0, 0, 3, 0, 3, 0, 0, 0, 0],
        [4, 0, 3, 0, 0, 2, 0, 1, 0, 4]]))

Then I get a batch sample

dataset_class = 'MyDynamicDataset'


Batch(x=[120, 4], edge_index=[2, 112], y=[30], edge_type=[112], batch=[120], ptr=[31])

edge_index (2, 116)
y (30,)
edge_type (116,)

edge_index [[  1   0   2   3   4   7   9   8  10  11  13  12  14  15  17  16  18  19
   21  20  22  23  25  24  26  27  29  28  30  31  33  32  34  35  37  36
   38  39  41  40  42  43  45  44  46  47  49  49  50  51  53  53  54  55
   57  56  58  59  65  64  66  67  69  68  70  71  73  72  74  75  77  76
   78  79  81  80  81  82  83  83  85  85  86  87  89  88  89  90  91  91
   93  94  97  96  98  99 101 100 102 103 104 105 107 107 109 108 110 111
  112 115 117 116 117 118 119 119]
 [  2   3   1   0   7   4  10  11   9   8  14  15  13  12  18  19  17  16
   22  23  21  20  26  27  25  24  30  31  29  28  34  35  33  32  38  39
   37  36  42  43  41  40  46  47  45  44  50  51  49  49  54  55  53  53
   58  59  57  56  66  67  65  64  70  71  69  68  74  75  73  72  78  79
   77  76  82  83  83  81  80  81  86  87  85  85  90  91  91  89  88  89
   94  93  98  99  97  96 102 103 101 100 107 107 104 105 110 111 109 108
  115 112 118 119 119 117 116 117]]
y [1. 2. 5. 2. 4. 4. 5. 2. 1. 4. 3. 5. 5. 5. 1. 2. 1. 2. 5. 5. 4. 1. 4. 4.
 1. 5. 4. 3. 2. 5.]
edge_type [3 3 3 3 3 3 2 0 2 0 2 0 2 0 0 0 0 0 1 0 1 0 3 2 3 2 0 3 0 3 0 2 0 2 0 1 0
 1 0 2 0 2 3 2 3 2 3 0 3 0 3 0 3 0 3 3 3 3 3 3 3 3 2 2 2 2 1 0 1 0 0 0 0 0
 3 3 0 3 3 0 0 2 0 2 0 2 3 0 2 3 1 1 2 2 2 2 3 3 3 3 3 2 3 2 3 3 3 3 1 1 2
 0 3 2 0 3]
ADJ (120, 120) [[0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0]
 [0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0]
 [4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0]

Can you provide how ADJ is acquired ?

rating was 30 originally, but we got edge_index : (2, 116), edge_type (116,)
what these are meaning ?

OSError: [WinError 126] module could not be found

Hello,

I tried running
python Main.py --data-name flixster --epochs 40 --testing --ensemble

and got:

File "Main.py", line 11, in
from util_functions import *
File "C:\Users\PCOvice\Downloads\IGMC-master\IGMC-master\util_functions.py", line 13, in
from torch_geometric.data import Data, Dataset, InMemoryDataset
File "C:\Users\PCOvice\anaconda3\lib\site-packages\torch_geometric_init_.py", line 2, in
import torch_geometric.nn
File "C:\Users\PCOvice\anaconda3\lib\site-packages\torch_geometric\nn_init_.py", line 2, in
from .data_parallel import DataParallel
File "C:\Users\PCOvice\anaconda3\lib\site-packages\torch_geometric\nn\data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "C:\Users\PCOvice\anaconda3\lib\site-packages\torch_geometric\data_init_.py", line 1, in
from .data import Data
File "C:\Users\PCOvice\anaconda3\lib\site-packages\torch_geometric\data\data.py", line 8, in
from torch_sparse import coalesce, SparseTensor
File "C:\Users\PCOvice\anaconda3\lib\site-packages\torch_sparse_init_.py", line 13, in
library, [osp.dirname(file)]).origin)
File "C:\Users\PCOvice\anaconda3\lib\site-packages\torch_ops.py", line 105, in load_library
ctypes.CDLL(path)
File "C:\Users\PCOvice\anaconda3\lib\ctypes_init_.py", line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

Does someone know how to solve that?

Thank you in advance

questions about the benchmarks

Hello,

Thank you very much the nice paper and code. I was wondering when you compare the IGMC with other methods, how do you replicate the results from previous work? Is there any good library to try them all easily, or you have to test it one by one? Thank you!

Initialization of pair(u, v)

Hi, according to the paper on node labeling, the target user and item of subgraph should be labeled as 0 and 1. But the codes as

IGMC/util_functions.py

Line 213 in 5fa0e3c

u_dist, v_dist = [0], [0]

show that the initial labels of target user and item are 0 and 0. Is this a bug or something I didn't get? Thank you!

[Feature Request] Add `requirements.txt`

The repository as of now lacks a requirements.txt. It'd be nice to have the explicit PyTorch Geometric version in order to run the experiments. Hopefully, this will prevents errors like #7 from taking place.

Having a requirements.txt file would make reproducibility and the onboarding process much easier.

Couple questions

One advantage of a node-embedding-based link predictor is that it's relatively easy to define item-to-item or user-to-user similarity from the embeddings (for recommender systems). Since your approach doesn't use node embeddings, is there any analogous way of defining similarity between nodes within one of the parts?
Have you tried loss functions and evaluations for ranking and implicit feedback instead of rating prediction and explicit feedback?

NotImplementedError

When i run the model with the ml_1m and met this error

测试的时候出错

添加参数--user_features, 运行ml_1m数据集，测试的时候报错，Batch没有u_feature属性...也就是test数据没有u_feature属性

1 u.item of ml_100k datasets is not UTF-8 encoded

The jupyter notebook cannot read the file and the reproduce get stuck.

Training Stuck at the pbar stage

I encounter a small problem during running this command:
python Main.py --data-name ml_10m --save-appendix _mnhp100 --data-appendix _mnph100 --max-nodes-per-hop 100 --testing --epochs 40 --save-interval 5 --adj-dropout 0 --lr-decay-step-size 20 --ensemble --dynamic-dataset

The training just stuck here for hours:

I made certain that Python ==3.8.1 Pytorch==1.4.0, Torch Geometric == 1.4.2. Not certain what is wrong. Thanks for your help in advance!

Can you share the comparison method code?

Such as the implement of PinSage.

a issue about paper

hi,although the model use differet label to distinguish user node and item node,the two type of node have different feature.how RCGN aggregate two type of feature?besides if the label also is feature that take part in convolution operation?thank you.

Pip requirements

Can you add the required software? Like pip freeze > requirements.txt

visualize is failing on higher epoch and even the ratings are getting other values apart from the class_labels of [0,1] as epoch are increased

I am trying to solve a binary matrix completion problem using IGMC Algorithm. I have kept only 2 ratings [0,1]. When I feed the data of 0 and 1 labels, create a train test split, and run the code end to end it works fine but provided the epoch I limit it to 20. so when the epoch is kept at 20, the algorithm also works and the visualize also works fine and graphs are generated, but as the epoch is increased to 30 or 40 the training ensemble code runs fine and in the edge attributes there are different labels assigned apart from 0 and 1 such as 2, 3, 4 due to which the visualize option is failing with index error

edge_types = [class_values[edge_types[x]] for x in g.edges()]
IndexError: index 2 is out of bounds for axis 0 with size 2

Please let me know if I am missing something or need to understand some code behavior or if can I apply such a binary matrix completion problem with the IGMC algorithm in the first place.

Performances on ML-100k

Hi,
Thanks for sharing your code and your interesting paper.
I could not install the prior version of pytorch geometric on my machine but I did install

torch==1.7.0
torch-cluster==1.5.8
torch-geometric==1.6.3
torch-scatter==2.0.5
torch-sparse==0.6.8
and then fix the code in train_eval.py by updating the attributes names by

-                    gconv.att,
-                    gconv.basis.view(gconv.num_bases, -1)
+                    gconv.comp,
+                    gconv.weight.view(gconv.num_bases, -1)

Then, I launched the command on ml_100k (without dynamic train to speed up the training)

python Main.py --data-name ml_100k --save-appendix _mnph200 --data-appendix _mnph200 --epochs 80 --max-nodes-per-hop 200 --testing --ensemble

Epoch 80, train loss 0.847587, test rmse 0.921295:  99%|██████████████████████████████████████████████████████▎| 79/80 [4:13:22<03:10, 190.49s/it]Saving model states...
Epoch 80, train loss 0.847587, test rmse 0.921295: 100%|███████████████████████████████████████████████████████| 80/80 [4:13:22<00:00, 190.03s/it]

Test Once RMSE: 0.922065, Duration: 72.088108

However, the performance is much lower compared to the ones reported in the paper. Is there any additional hyperparameters to change ? Have you tried your code with more recent torch and torch_geometric installation ?

Thanks for your help !

rating prediction for new datasets

I am trying to apply this code for two different settings:

when predicting 1 (or 0) instead of ratings 1-5.
when predicting non-standardized ratings as either continuous or larger number of classes

Has this method been tested for predicting ratings for the above cases? If yes, what are your suggestions?

muhanzhang / igmc Goto Github PK

igmc's People

Contributors

Stargazers

Watchers

Forkers

igmc's Issues

Recommend Projects

Recommend Topics

Recommend Org