riannevdberg / gc-mc Goto Github PK

View Code? Open in Web Editor NEW

408.0 408.0 100.0 3.5 MB

Python 99.01% Shell 0.99%

gc-mc's People

Contributors

Stargazers

Watchers

Forkers

science4fun locussam hunslater-deeplearning boluoyu jdc08161063 lucktroy shubhampachori12110095 codeaudit tony32769 tigerneil sunshlnw schaelle chensul hyzcn samiranrl eridgd songfgh xy1234552 chubbymaggie junsangpark chenboability nishiba ximinwu lx1010 marphy clpl jinhangli zjplab barclayii hhh920406 breadyang nguyenvo09 fdsmlhn fangego andrew-dungle indrarahul fffyyzhang sunshinelium leereborn geniusyx yuanyuansiyuan tracy-king collapseyu michael-lash tubamuzzaffar annoliver li-study mtaoz xiaoleihou214 xiexq2019 longtng xuetf dadadadafggg crystal22 meelement xrosliang everest1215 jumpingdove haolun-wu luckmoon lazywolf007 yunyyyy scaactk rotcx loicvz190 theaperdeng siqiliu7 shouhengtuo aabbccgithub zscdumin lxk-yb crrflying littlestar-angel gk12happy gl123456 yueyedeai pof5 daluzi chengyema panaq shoi0321soccer menghanii jessechen96 qintianhang wesleyclode ghk829 chenghsuhsu rcdnn vivian457 dr-savitar jackwayne000 bsmassena wangyuxiang8 candycc9626 napaullut dianekexin mofushaohua salamethimawan tseesurenb sabrinakim4

gc-mc's Issues

I would like to ask how to get the complete rating matrix predicted

the W_user or W_movies

Hi Rianne,
Sorry, but a little confused with the data in .mat, what's the meaning of the W_user or W_movies? Thanks~

Are you using test rating matrix during matrix completion??

I analyze your code and I think you use test rating matrix is used while completing test rating matrix.

Is it real?????

I judge following code.

test_support = support[np.array(test_u)]
test_support_t = support_t[np.array(test_v)]

This part save the things which we need to predict. Right?

Also, when GCMC do encoding,

self.layers.append(OrdinalMixtureGCN(input_dim=self.input_dim,
                                                 output_dim=self.hidden[0],
                                                 support=self.support,
                                                 support_t=self.support_t,
                                                 num_support=self.num_support,
                                                 u_features_nonzero=self.u_features_nonzero,
                                                 v_features_nonzero=self.v_features_nonzero,
                                                 sparse_inputs=True,
                                                 act=tf.nn.relu,
                                                 bias=False,
                                                 dropout=self.dropout,
                                                 logging=self.logging,
                                                 share_user_item_weights=True,
                                                 self_connections=False))

In this code, GCN use support to encode, which is predicted itself.

If I have mistake, please correct me.

I think we only use training matrix to complete matrix and test matrix is used only for calculating RMSE or some evaluation metric.

When running your code, some error happened

Hello author:
the environement is python 3.6 , tensorflower 1.4 but, when run your code , some error happens. Here is the specific errors.

Settings:
{'dataset': 'flixster', 'data_seed': 1234, 'feat_hidden': 64, 'write_summary': False, 'norm_symmetric': False, 'dropout': 0.7, 'features': True, 'epochs': 200, 'summaries_dir': 'logs/2018-11-15_15:03:52.631584', 'accumulation': 'stack', 'num_basis_functions': 2, 'learning_rate': 0.01, 'testing': True, 'hidden': [500, 75]}

number of users = 2341
number of item = 2956
Traceback (most recent call last):
File "train.py", line 150, in
test_u_indices, test_v_indices, class_values = load_data_monti(DATASET, TESTING)
File "/home/zxj/PycharmProjects/gc-mc/gcmc/preprocessing.py", line 274, in load_data_monti
np.random.shuffle(rand_idx)
File "mtrand.pyx", line 4852, in mtrand.RandomState.shuffle
File "mtrand.pyx", line 4855, in mtrand.RandomState.shuffle
TypeError: 'range' object does not support item assignment
Can you help me solved this problems! Thank you very much!

datasetの作成

Data's problem

Hi,
I am trying to run your code, however I am confused about the data.
Are douban data and flixster data the same data? Becuase, they have the same size and same project names inside.
Also, I tried to open W_movies in flixster data by using the follow code

and system shows that "Unable to open object (object 'W_movies' doesn't exist)".
However I can run your code with flixster data in cmd.
Could you told me why my code doesn't work? Thank you so much.

Best wishes
Lee

I would like to ask about the "u_features" in the code.

I would like to ask about the "u_features" in the code. It should be used as the input of the graph convolution layer, but why should it be made into the form of Identity matrix connected with zero matrix?

error: Could not find suitable distribution for Requirement.parse('cPickle')

Solution: Remove line 16 from setup.py, i.e. replace:

      install_requires=['numpy',
                        'tensorflow',
                        'scipy',
                        'pandas',
                        'cPickle',
                        'h5py'
                        ],

with:

      install_requires=['numpy',
                        'tensorflow',
                        'scipy',
                        'pandas',
                        'h5py'
                        ],

We will push a fix soon.

Adding side features to train_mini_batch.py

Hi,

Thank you for the code, I read the paper and I am interested in applying this model to a large scale bipartite graph with side features for each node. I was able to get "train_mini_batch.py" up and running on movielens 10 million. I am interested in modifying the code to incorporate side features.

Do you think this is feasible? I have been looking at "train.py" and as far as I can see, I should just be able to treat the side features as additional sparse matrices. In batch mode, I can look up the required rows of the side feature matrices analogous to train_u_indices_batch on line 284 of "train_mini_batch.py". Would I have to make any internal modifications to RecommenderSideInfoGAE?

Do you think this is feasible?

Thank You,

Kuhan

test data generated from training data?

Hi Rianne,

I have a question regarding the support matrix data. From the code, it seems you are using train rating matrix as a full dataset to generate also test support matrix.

rating support matrix rating_mx_train is generated from training rating data. (in testing, it contains training and validation data).

gc-mc/gcmc/preprocessing.py

Lines 191 to 193 in 722f37d

    
           rating_mx_train = np.zeros(num_users * num_items, dtype=np.float32) 
        
           rating_mx_train[train_idx] = labels[train_idx].astype(np.float32) + 1. 
        
           rating_mx_train = sp.csr_matrix(rating_mx_train.reshape(num_users, num_items))

support matrix is generated from the adj_train which is the rating_mx_train

gc-mc/gcmc/train.py

Lines 203 to 216 in 722f37d

    
           adj_train_int = sp.csr_matrix(adj_train, dtype=np.int32) 
        
           for i in range(NUMCLASSES): 
        
               # build individual binary rating matrices (supports) for each rating 
        
               support_unnormalized = sp.csr_matrix(adj_train_int == i + 1, dtype=np.float32) 
        
               if support_unnormalized.nnz == 0 and DATASET != 'yahoo_music': 
        
                   # yahoo music has dataset split with not all ratings types present in training set. 
        
                   # this produces empty adjacency matrices for these ratings. 
        
                   sys.exit('ERROR: normalized bipartite adjacency matrix has only zero entries!!!!!') 
        
               support_unnormalized_transpose = support_unnormalized.T 
        
               support.append(support_unnormalized) 
        
               support_t.append(support_unnormalized_transpose)

But then 'test_support' is extracted from 'support'.

gc-mc/gcmc/train.py

Line 246 in 722f37d

test_support = support[np.array(test_u)]

Shouldn't we change line 192 to
rating_mx_train[idx_nonzero] = labels[idx_nonzero].astype(np.float32) + 1.0
such that all rating_mx_train contains all rating data.

gc-mc/gcmc/preprocessing.py

Lines 191 to 193 in 722f37d

    
           rating_mx_train = np.zeros(num_users * num_items, dtype=np.float32) 
        
           rating_mx_train[train_idx] = labels[train_idx].astype(np.float32) + 1. 
        
           rating_mx_train = sp.csr_matrix(rating_mx_train.reshape(num_users, num_items))

Questions On feed_dict

dropoutの追加

Where is the augmented adjacency matrix in `global_normalize_bipartite_adjacency`

In the implementation of global_normalize_bipartite_adjacency function,
there is a comment

gc-mc/gcmc/preprocessing.py

Line 79 in 722f37d

# degree_u and degree_v are row and column sums of adj+I

But I don't see the augmented adj (adj+I) in and before the function.

Where is the adj+I?

Why does it matter?
If you only use the adj with different ratings, then when aggregating the features from its neighbors (for users, those are items, vice versa), the new feature has no information of itself.

normalize_features(feat)？

Your code in preprocessing.py in line 17
degree = np.asarray(feat.sum(1)).flatten()
If I understand it correctly, this is to sum horizontally over the feature matrix, and then inverse it as the multiplier. To me sum over axis=0 makes more sense?

How to apply multiple layers for the mini-batch version?

Hello,
I have a question of the mini-batch code. The first gcn layer outputs batch-size num embeddings, but the support matrix need to matmul the full-size embeddings. That is, the output embeddings of the first gcn layer is unable to meet the requirements of the operations in the second gcn layer. How should we solve the problem. Hope for your help soon. Thank you.

	rating_mx_train = np.zeros(num_users * num_items, dtype=np.float32)
	rating_mx_train[train_idx] = labels[train_idx].astype(np.float32) + 1.
	rating_mx_train = sp.csr_matrix(rating_mx_train.reshape(num_users, num_items))

	adj_train_int = sp.csr_matrix(adj_train, dtype=np.int32)

	for i in range(NUMCLASSES):
	# build individual binary rating matrices (supports) for each rating
	support_unnormalized = sp.csr_matrix(adj_train_int == i + 1, dtype=np.float32)

	if support_unnormalized.nnz == 0 and DATASET != 'yahoo_music':
	# yahoo music has dataset split with not all ratings types present in training set.
	# this produces empty adjacency matrices for these ratings.
	sys.exit('ERROR: normalized bipartite adjacency matrix has only zero entries!!!!!')

	support_unnormalized_transpose = support_unnormalized.T
	support.append(support_unnormalized)
	support_t.append(support_unnormalized_transpose)

riannevdberg / gc-mc Goto Github PK

gc-mc's People

Contributors

Stargazers

Watchers

Forkers

gc-mc's Issues

Recommend Projects

Recommend Topics

Recommend Org