Code Monkey home page Code Monkey logo

gpt-gnn's Introduction

Hi, welcome to my Github 👋 I am Ziniu Hu

I am now a Postdoctoral Fellow at Caltech CMS, and a part-time researcher at Google Deepmind.

My Recent research focus on Large Language Model (LLM), including Agents (Using Tool & Memory), Planning and Reasoning (especially on Math, Code, Games and Visual World) and Self-Improvement.

gpt-gnn's People

Contributors

acbull avatar dependabot[bot] avatar keytoyze avatar zheng-da avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt-gnn's Issues

pretrain_OAG.py Bug

Hi, I am running the file 'pretrain_OAG.py', but meet the following bug after few iterations, I don't find how to solve this bug, do you have any idea? Thank you~

Start Pretraining...
Data Preparation: 73.9s
Epoch: 1, (1 / 266) 41.3s LR: 0.00005 Train Loss: (5.224, 10.440) Valid Loss: (5.086, 10.286) NDCG: 0.273 Norm: 0.604 queue: 12
UPDATE!!!
Data Preparation: 21.6s
Epoch: 1, (2 / 266) 40.3s LR: 0.00006 Train Loss: (4.914, 10.121) Valid Loss: (4.820, 9.884) NDCG: 0.361 Norm: 0.660 queue: 12
UPDATE!!!
Data Preparation: 22.7s
Epoch: 1, (3 / 266) 40.5s LR: 0.00007 Train Loss: (4.821, 9.512) Valid Loss: (4.682, 8.894) NDCG: 0.374 Norm: 0.729 queue: 12
UPDATE!!!
Data Preparation: 22.2s
Epoch: 1, (4 / 266) 40.5s LR: 0.00007 Train Loss: (4.712, 8.381) Valid Loss: (4.597, 7.592) NDCG: 0.362 Norm: 0.841 queue: 12
UPDATE!!!
Data Preparation: 22.8s
Epoch: 1, (5 / 266) 40.8s LR: 0.00008 Train Loss: (4.673, 7.576) Valid Loss: (4.740, 7.292) NDCG: 0.354 Norm: 0.905 queue: 12
UPDATE!!!
Data Preparation: 21.2s
Epoch: 1, (6 / 266) 40.6s LR: 0.00009 Train Loss: (4.560, 7.215) Valid Loss: (4.421, 6.747) NDCG: 0.361 Norm: 0.991 queue: 12
UPDATE!!!
Data Preparation: 28.1s
Epoch: 1, (7 / 266) 40.7s LR: 0.00010 Train Loss: (4.552, 6.979) Valid Loss: (4.371, 6.690) NDCG: 0.382 Norm: 1.057 queue: 12
UPDATE!!!
Data Preparation: 22.1s
Epoch: 1, (8 / 266) 40.4s LR: 0.00011 Train Loss: (4.519, 6.856) Valid Loss: (4.848, 6.588) NDCG: 0.348 Norm: 1.117 queue: 12
Data Preparation: 22.0s
Epoch: 1, (9 / 266) 40.1s LR: 0.00012 Train Loss: (4.421, 6.804) Valid Loss: (4.393, 6.605) NDCG: 0.383 Norm: 1.147 queue: 12
UPDATE!!!
Data Preparation: 25.9s
Epoch: 1, (10 / 266) 40.0s LR: 0.00013 Train Loss: (4.369, 6.741) Valid Loss: (4.654, 6.518) NDCG: 0.361 Norm: 1.180 queue: 12
Data Preparation: 22.3s
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [328,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [329,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [330,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [331,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [332,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [333,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [334,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [12,0,0], thread: [335,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [16,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [17,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [18,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [19,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [20,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [21,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [22,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1570710743984/work/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [23,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
Traceback (most recent call last):
File "pretrain_OAG.py", line 262, in
loss.backward()
File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 150, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: copy_if failed to synchronize: device-side assert triggered

Results of example data

Hi, could you please provide the result of GPT-GNN on the OAG-CS data? I'm wondering what is the expected result if I run your code directly on the example data you provided. It would be great if a result for baseline (no pre-train) could also be provided. Thank you!

Can't download data

Because of the Internet, can't download the data from google.com. Would you mind showing the data structure? Thank you very much.

About the training time

Hi, thank you for your excellent job. However, when I run the code using the default setting for pretraining on OAG_CS, the time taken for an epoch is much longer than you reported. It takes around 40 minutes for an epoch and 40 * 400 / 60 = 266.7 hours for 400 epochs, which is much longer than 12 hours in the paper. My machine is Tesla P100 and 8 * Xeon(R) CPU E5-2690 v4. How can I solve this problem?

The following is the log

+--------------------+-------------------------------------------+
| Parameter | Value |
+--------------------+-------------------------------------------+
| attr_ratio | 0.500 |
+--------------------+-------------------------------------------+
| attr_type | text |
+--------------------+-------------------------------------------+
| neg_samp_num | 255 |
+--------------------+-------------------------------------------+
| queue_size | 256 |
+--------------------+-------------------------------------------+
| w2v_dir | /data/data0/gjy/dataset/OAG/w2v_all |
+--------------------+-------------------------------------------+
| data_dir | /data/data0/gjy/dataset/OAG/graph_CS.pk |
+--------------------+-------------------------------------------+
| pretrain_model_dir | /data/data0/gjy/GPT-GNN/saved/OAG/gnn.pkl |
+--------------------+-------------------------------------------+
| cuda | 7 |
+--------------------+-------------------------------------------+
| sample_depth | 3 |
+--------------------+-------------------------------------------+
| sample_width | 128 |
+--------------------+-------------------------------------------+
| conv_name | hgt |
+--------------------+-------------------------------------------+
| n_hid | 400 |
+--------------------+-------------------------------------------+
| n_heads | 8 |
+--------------------+-------------------------------------------+
| n_layers | 3 |
+--------------------+-------------------------------------------+
| prev_norm | 1 |
+--------------------+-------------------------------------------+
| last_norm | 1 |
+--------------------+-------------------------------------------+
| dropout | 0.200 |
+--------------------+-------------------------------------------+
| max_lr | 0.001 |
+--------------------+-------------------------------------------+
| scheduler | cycle |
+--------------------+-------------------------------------------+
| n_epoch | 200 |
+--------------------+-------------------------------------------+
| n_pool | 8 |
+--------------------+-------------------------------------------+
| n_batch | 32 |
+--------------------+-------------------------------------------+
| batch_size | 256 |
+--------------------+-------------------------------------------+
| clip | 0.500 |
+--------------------+-------------------------------------------+
cuda:7
Start Loading Graph Data...
Finish Loading Graph Data!
paper PP_cite
paper rev_PP_cite
venue rev_PV_Conference
venue rev_PV_Journal
field rev_PF_in_L3
field rev_PF_in_L1
field rev_PF_in_L2
field rev_PF_in_L4
author AP_write_last
author AP_write_other
author AP_write_first
Start Pretraining...
Data Preparation: 68.7s
Epoch: 1, (1 / 41) 45.3s LR: 0.00005 Train Loss: (4.773, 9.771) Valid Loss: (4.762, 8.815) NDCG: 0.314 Norm: 20.012 queue: 1
UPDATE!!!
Data Preparation: 57.1s
Epoch: 1, (2 / 41) 40.3s LR: 0.00005 Train Loss: (4.594, 8.514) Valid Loss: (4.532, 7.968) NDCG: 0.353 Norm: 20.025 queue: 1
UPDATE!!!
Data Preparation: 29.7s
Epoch: 1, (3 / 41) 38.4s LR: 0.00006 Train Loss: (4.469, 7.768) Valid Loss: (4.628, 7.167) NDCG: 0.359 Norm: 20.035 queue: 1
UPDATE!!!
Data Preparation: 17.0s
Epoch: 1, (4 / 41) 36.8s LR: 0.00006 Train Loss: (4.426, 7.283) Valid Loss: (4.453, 6.991) NDCG: 0.367 Norm: 20.043 queue: 1
UPDATE!!!
Data Preparation: 13.0s
Epoch: 1, (5 / 41) 36.8s LR: 0.00007 Train Loss: (4.375, 7.060) Valid Loss: (4.509, 6.793) NDCG: 0.365 Norm: 20.047 queue: 1
UPDATE!!!
Data Preparation: 12.3s

How to create the node permutation?

Hi, @acbull. thank you so much for sharing the code. I have a question on the graph preprocessing. May I ask how to determinate the order (permutation) of nodes in a sampled subgraph? What's the criterion that we refer to? -:)

The experimental result on ogbn-mag

Hello, this is a nice work. However, I wonder why you don't do experiments on ogbn-mag dataset?
I have tried to modify the code, but it is too hard. Can you do experiments on ogbn-mag and tell me the result?
Thanks a lot.

I want to know if I can fine-tuning directly on the GPT-GNN model?

hi @acbull
image

that means I must take some categories for pre-training and then using another categories to fine-tuning?
I wonder if i can fine-tuning in the GPT-GNN model directly or not?like fine-tuning in the pre-trained bert,just modify the data to adjust their input format and then fine-tuning for the downstream task.

thanks.

How to generate a new graph?

I want to generate new graphs for the given input graph. I checked the repo but am still confused as to how to proceed. For a given input graph, my objective is to generate a new graph with node attributes and adjacency matrix. Can you please help me show how to proceed?

AttributeError in preprocess_reddit.py

Hi acbull.
firstly, thanks for your amazing work.
when i use preprocess_reddit,py, i got an error in line 19

  • x = np.concatenate((dataset.data.x.numpy(), np.log(degree).reshape(-1, 1)), axis=-1)

  • AttributeError: 'NoneType' object has no attribute 'numpy'

and i print(dataset.data.x) return None.
I want to know how to fix it,and I have another question, which part of your code is Adaptive Queue in your paper and what is budget mean?
Thank you

attr_ loss is a negative value

Thank you for your work on GPT-GNN. I am very interested in this article and I am currently replicating your code. When I run your file "example_ reddit/pretrain_ reddit. py ", train_loss consists of two parts, link_ Loss and attr_ Loss. However, attr_ loss is a negative value, and during training, attr_ loss is getting smaller and smaller, as shown below.

Epoch: 1, (1 / 19) 31.7s LR: 0.00088 Train Loss: (5.198, -1.370) Valid Loss: (4.632, -2.647) NDCG: 0.331 Norm: 3.278 queue: 85
UPDATE!!!
Data Preparation: 2.2s
Epoch: 1, (2 / 19) 22.3s LR: 0.00099 Train Loss: (4.473, -2.519) Valid Loss: (4.257, -3.451) NDCG: 0.398 Norm: 3.231 queue: 85
UPDATE!!!
Data Preparation: 2.4s
Epoch: 1, (3 / 19) 23.9s LR: 0.00097 Train Loss: (4.220, -2.910) Valid Loss: (4.169, -3.735) NDCG: 0.421 Norm: 4.161 queue: 85
UPDATE!!!
Data Preparation: 2.4s
Epoch: 1, (4 / 19) 25.1s LR: 0.00095 Train Loss: (4.041, -3.255) Valid Loss: (3.664, -4.088) NDCG: 0.529 Norm: 5.624 queue: 85
UPDATE!!!
Data Preparation: 2.5s
Epoch: 1, (5 / 19) 23.1s LR: 0.00093 Train Loss: (3.916, -3.580) Valid Loss: (3.703, -4.224) NDCG: 0.489 Norm: 5.439 queue: 85
UPDATE!!!
Data Preparation: 2.3s
Epoch: 1, (6 / 19) 21.1s LR: 0.00092 Train Loss: (3.839, -3.743) Valid Loss: (3.718, -4.373) NDCG: 0.499 Norm: 7.101 queue: 85
UPDATE!!!
Data Preparation: 5.7s
Epoch: 1, (7 / 19) 21.5s LR: 0.00090 Train Loss: (3.795, -3.839) Valid Loss: (3.806, -4.290) NDCG: 0.481 Norm: 6.792 queue: 85
Data Preparation: 2.9s
Epoch: 1, (8 / 19) 21.5s LR: 0.00088 Train Loss: (3.764, -3.905) Valid Loss: (3.626, -4.551) NDCG: 0.517 Norm: 7.914 queue: 85
UPDATE!!!
Data Preparation: 3.4s
Epoch: 1, (9 / 19) 21.4s LR: 0.00086 Train Loss: (3.790, -4.006) Valid Loss: (3.527, -4.650) NDCG: 0.518 Norm: 8.545 queue: 85
UPDATE!!!
Data Preparation: 4.9s
Epoch: 1, (10 / 19) 21.5s LR: 0.00085 Train Loss: (3.789, -4.053) Valid Loss: (3.567, -4.608) NDCG: 0.519 Norm: 7.481 queue: 85
Data Preparation: 4.1s
Epoch: 1, (11 / 19) 20.7s LR: 0.00083 Train Loss: (3.667, -4.089) Valid Loss: (3.434, -4.530) NDCG: 0.543 Norm: 8.159 queue: 85
Data Preparation: 4.8s
Epoch: 1, (12 / 19) 21.8s LR: 0.00081 Train Loss: (3.719, -4.171) Valid Loss: (3.316, -4.752) NDCG: 0.564 Norm: 8.720 queue: 85
UPDATE!!!
Data Preparation: 2.8s
Epoch: 1, (13 / 19) 20.7s LR: 0.00079 Train Loss: (3.656, -4.219) Valid Loss: (3.577, -4.913) NDCG: 0.521 Norm: 8.388 queue: 85
Data Preparation: 6.1s
Epoch: 1, (14 / 19) 25.3s LR: 0.00078 Train Loss: (3.630, -4.223) Valid Loss: (3.508, -4.865) NDCG: 0.527 Norm: 8.830 queue: 85
Data Preparation: 13.7s
Epoch: 1, (15 / 19) 22.7s LR: 0.00076 Train Loss: (3.666, -4.272) Valid Loss: (3.507, -4.825) NDCG: 0.530 Norm: 9.109 queue: 85
Data Preparation: 2.5s
Epoch: 1, (16 / 19) 22.2s LR: 0.00074 Train Loss: (3.648, -4.333) Valid Loss: (3.511, -5.162) NDCG: 0.543 Norm: 8.926 queue: 85
UPDATE!!!
Data Preparation: 2.9s
Epoch: 1, (17 / 19) 22.2s LR: 0.00073 Train Loss: (3.608, -4.351) Valid Loss: (3.555, -4.808) NDCG: 0.515 Norm: 9.351 queue: 85
Data Preparation: 2.4s
Epoch: 1, (18 / 19) 22.1s LR: 0.00071 Train Loss: (3.621, -4.367) Valid Loss: (3.386, -5.186) NDCG: 0.555 Norm: 9.608 queue: 85
UPDATE!!!

Apart from reducing sample_depth and sample_width to accommodate memory, I have not made any other modifications to your code. May I ask the change of attr_ Ioss is normal? Looking forward to your answer.

example_Reddit can't run

When I run preprocess_reddit.py ,there is error that download link is wrong, urllib.error.HTTPError: HTTP Error 404: Not Found
, How to solve this?

time_range for the finetuning experiment

Thanks again for this awesome repo. It helps me a lot. I've got a question regarding which time_range to use for sampling subgraphs for test. For example, in finetune_OAG_PF.py, this line is used to prepare the input to GNN:

node_feature, node_type, edge_time, edge_index, edge_type, x_ids, ylabel =  node_classification_sample(randint(), test_pairs, test_range)

where test_range is used to filter out nodes when sampling the subgraph as shown in L128 in data.py

if source_time > np.max(list(time_range.keys())) or source_id in layer_data[source_type]:
    continue

It looks that some test edges (which are not the prediction targets for current batch but might be the prediction targets for other batches) might be included in the sampled subgraph even after the masking process in Line 114 in finetune_OAG_PF.py.

    '''
        (3) Mask out the edge between the output target nodes (paper) with output source nodes (L2 field)
    '''
    masked_edge_list = []
    for i in edge_list['paper']['field']['rev_PF_in_L2']:
        if i[0] >= args.batch_size:
            masked_edge_list += [i]
    edge_list['paper']['field']['rev_PF_in_L2'] = masked_edge_list

    masked_edge_list = []
    for i in edge_list['field']['paper']['PF_in_L2']:
        if i[1] >= args.batch_size:
            masked_edge_list += [i]
    edge_list['field']['paper']['PF_in_L2'] = masked_edge_list

I'm not sure how this will impact on the evaluation. Looking forward to your feedback on this.

error when run example_reddit/pretrain_reddit.py using cpu

When I run example_reddit/pretrain_reddit.py using cpu, encountered this error
So any advice is welcome

...
File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 5063, in getattr
return object.getattribute(self, name)
File "pandas/_libs/properties.pyx", line 65, in pandas._libs.properties.AxisProperty.get
File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 5063, in getattr
return object.getattribute(self, name)
RecursionError: maximum recursion depth exceeded while calling a Python object

About the format of the data set?

hi @acbull
my case is that i should use a sample of the training data sets to build a graph but not all of the training data sets.
that is to say: Use the internal elements of the sample to construct a heterogeneous graph.
I was wonder if i can use GPT-GNN to fine-tuning it and then Do a classification task in the downstream ?

thanks!!!

'node_emb' & 'emb' in graph.node_features

Hi,

Could you tell me how did you get the 'node_emb' and 'emb' in the graph.node_features?

I have checked the OAG dataset, seems the 'emb' has already been added to the dataset before any training. Maybe I missed something in the paper, but could you give me some hints?

Thankx in advance

embeddings

IndexError: "index out of range in self" in training on custom dataset

i, I was trying to use HGTConv on a custom graph with 5 different nodes, but I kept on running into an error IndexError: index out of range in self when node_type only has target node.

Error messages:
node_type = tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1..., 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1])
node_type.shape = (129,)
edge_index = tensor([[ 0, 1, 14, 3, 4, 5, 6, 7,...82, 184, 188, 189, 189,
190, 190, 192]])
edge_index.shape = (2, 353)

IndexError: index out of range in self
When calling: self.propagate(edge_index, node_inp=node_inp, node_type=node_type,
edge_type=edge_type, edge_time = edge_time)
Call ended by exception
meta_xs = gc(meta_xs, node_type_id, edge_index, edge_type, edge_time)
IndexError: index out of range in self
When calling: gc(meta_xs, node_type_id, edge_index, edge_type, edge_time)
Call ended by exception

I was looking at pyg-team/pytorch_geometric#2073 where suggestion "remove cached=True argument from the GCNConv layer can solve the index error.

and pyg-team/pytorch_geometric#1631: set add_self_loops=False in GATConv(..., add_self_loops=False), but no such argument in HGTConv.

running time on OAG_CS dataset

Hi, thanks for providing the awesome code of GPT-GNN.

I am trying to run your code on OAG_CS dataset but I am not sure if I get it right. In the paper, the reported pre-training time is about 10-12 hours for 400 epochs while it took much longer on my side. I wonder if you could specify the requirements of the computational resources. For example, how many cpus do I need for achieving a pre-training time of 10 hours? I attached the output for my run as follows.

+--------------------+-------------------------------+
| Parameter          | Value                         |
+--------------------+-------------------------------+
| attr_ratio         | 0.500                         |
+--------------------+-------------------------------+
| attr_type          | text                          |
+--------------------+-------------------------------+
| neg_samp_num       | 255                           |
+--------------------+-------------------------------+
| queue_size         | 256                           |
+--------------------+-------------------------------+
| w2v_dir            | ./data/oag_output/w2v_all     |
+--------------------+-------------------------------+
| data_dir           | ./data/oag_output/graph_CS.pk |
+--------------------+-------------------------------+
| pretrain_model_dir | ./tmp/model/gta_all_cs3       |
+--------------------+-------------------------------+
| cuda               | 0                             |
+--------------------+-------------------------------+
| sample_depth       | 6                             |
+--------------------+-------------------------------+
| sample_width       | 128                           |
+--------------------+-------------------------------+
| conv_name          | hgt                           |
+--------------------+-------------------------------+
| n_hid              | 400                           |
+--------------------+-------------------------------+
| n_heads            | 8                             |
+--------------------+-------------------------------+
| n_layers           | 3                             |
+--------------------+-------------------------------+
| prev_norm          | 0                             |
+--------------------+-------------------------------+
| last_norm          | 0                             |
+--------------------+-------------------------------+
| dropout            | 0.200                         |
+--------------------+-------------------------------+
| max_lr             | 0.001                         |
+--------------------+-------------------------------+
| scheduler          | cycle                         |
+--------------------+-------------------------------+
| n_epoch            | 20                            |
+--------------------+-------------------------------+
| n_pool             | 8                             |
+--------------------+-------------------------------+
| n_batch            | 32                            |
+--------------------+-------------------------------+
| batch_size         | 256                           |
+--------------------+-------------------------------+
| clip               | 0.500                         |
+--------------------+-------------------------------+
Start Loading Graph Data...
Finish Loading Graph Data!
paper PP_cite
paper rev_PP_cite
venue rev_PV_Conference
venue rev_PV_Journal
field rev_PF_in_L3
field rev_PF_in_L1
field rev_PF_in_L2
field rev_PF_in_L4
author AP_write_last
author AP_write_other
author AP_write_first
Start Pretraining...
Data Preparation: 80.1s
Epoch: 1, (1 / 41) 55.9s  LR: 0.00010 Train Loss: (5.129, 10.292)  Valid Loss: (5.082, 9.933)  NDCG: 0.306  Norm: 0.666  queue: 12
UPDATE!!!
Data Preparation: 23.3s
Epoch: 1, (2 / 41) 45.0s  LR: 0.00015 Train Loss: (4.877, 9.236)  Valid Loss: (4.861, 8.130)  NDCG: 0.320  Norm: 0.950  queue: 12
UPDATE!!!
Data Preparation: 34.6s
Epoch: 1, (3 / 41) 40.1s  LR: 0.00021 Train Loss: (4.776, 7.650)  Valid Loss: (4.899, 6.895)  NDCG: 0.327  Norm: 1.243  queue: 12
UPDATE!!!
Data Preparation: 37.3s
Epoch: 1, (4 / 41) 42.5s  LR: 0.00027 Train Loss: (4.716, 6.930)  Valid Loss: (4.697, 6.571)  NDCG: 0.334  Norm: 1.493  queue: 12
UPDATE!!!
Data Preparation: 33.9s
Epoch: 1, (5 / 41) 40.8s  LR: 0.00032 Train Loss: (4.635, 6.624)  Valid Loss: (4.614, 6.290)  NDCG: 0.341  Norm: 1.950  queue: 12
UPDATE!!!
Data Preparation: 38.9s
Epoch: 1, (6 / 41) 45.2s  LR: 0.00038 Train Loss: (4.572, 6.470)  Valid Loss: (4.568, 6.386)  NDCG: 0.357  Norm: 2.481  queue: 12
Data Preparation: 30.5s
Epoch: 1, (7 / 41) 42.7s  LR: 0.00044 Train Loss: (4.438, 6.391)  Valid Loss: (4.501, 6.224)  NDCG: 0.371  Norm: 2.532  queue: 12
UPDATE!!!

About Title Generation

哈喽,拜读了大作,很佩服你们的工作。
论文在附录B中提到:"won't leak the information to the autogressive generative objection ..."
但看源码时发现一点疑惑,OAG数据文本生成时,您使用LSTM进行的解码。输入的其实是论文标题的全部正确字符(emb),就是说每次预测下一个字符-i时,用的都是前i-1个正确字符。这里跟传统的自回归文本生成不太一样,请问作者是基于训练难易度的考虑吗?
是因为下游任务不需要文本生成,这里只是利用文本生成做预训练来提升模型对图结构和节点属性文本的学习,所以不需要那么严格的自回归?

Raise 'IndexError: index out of range in self' when I use HGTConv with my own dataset.

In my work, I build the Hetergouenous graph firstly, then I apply the HGTConv as the tutorial shown.

The custom KG is shown as:
HeteroData(
symptom={ x=[39, 128] },
component={ x=[19, 128] },
reason={ x=[17, 128] },
solution={ x=[18, 128] },
(symptom, take_place, component)={ edge_index=[2, 38] },
(symptom, cause_by, reason)={ edge_index=[2, 33] },
(symptom, how_to_fit, solution)={ edge_index=[2, 33] },
(component, component_parallel, component)={ edge_index=[2, 17] }
)

And my code as follow:


class HGT(torch.nn.Module):
def init(self, hidden_channels, out_channels, num_heads, num_layers):
super().init()

    self.lin_dict = torch.nn.ModuleDict()
    for node_type in data_IKG.node_types:
        self.lin_dict[node_type] = Linear(-1, hidden_channels)

    self.convs = torch.nn.ModuleList()
    for _ in range(num_layers):
        conv = HGTConv(hidden_channels, hidden_channels, data_IKG.metadata(),
                       num_heads, group='sum',cached=False)
    self.convs.append(conv)

    self.lin = Linear(hidden_channels, out_channels)

def forward(self, x_dict, edge_index_dict):
    for node_type, x in x_dict.items():
        x_dict[node_type] = self.lin_dict[node_type](x).relu_()

    for conv in self.convs:
        print(1)
        x_dict = conv(x_dict, edge_index_dict)
        print(2)
    return self.lin(x_dict['symptom'])

model_IKG = HGT(hidden_channels=64, out_channels=5,
num_heads=1, num_layers=1)

with torch.no_grad(): # Initialize lazy modules.
out = model_IKG(data_IKG.x_dict, data_IKG.edge_index_dict)

Then it raises:
Input In [324], in HGT.forward(self, x_dict, edge_index_dict)
29 for conv in self.convs:
30 print(1)
---> 31 x_dict = conv(x_dict, edge_index_dict)
32 print(2)
33 return self.lin(x_dict['symptom'])


Then the error message is:


File ~/.virtualenvs/xlq/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
1098 # If we don't have any hooks, we want to skip the rest of the logic in
1099 # this function, and just call forward.
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.virtualenvs/xlq/lib/python3.8/site-packages/torch_geometric/nn/conv/hgt_conv.py:159, in HGTConv.forward(self, x_dict, edge_index_dict)
156 v = (v_dict[src_type].transpose(0, 1) @ m_rel).transpose(1, 0)
158 # propagate_type: (k: Tensor, q: Tensor, v: Tensor, rel: Tensor)
--> 159 out = self.propagate(edge_index, k=k, q=q_dict[dst_type], v=v,
160 rel=self.p_rel[edge_type], size=None)
161 out_dict[dst_type].append(out)
163 # Iterate over node-types:

File ~/.virtualenvs/xlq/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py:309, in MessagePassing.propagate(self, edge_index, size, **kwargs)
306 for arg in decomp_args:
307 kwargs[arg] = decomp_kwargs[arg][i]
--> 309 coll_dict = self.collect(self.user_args, edge_index,
310 size, kwargs)
312 msg_kwargs = self.inspector.distribute('message', coll_dict)
313 for hook in self._message_forward_pre_hooks.values():

File ~/.virtualenvs/xlq/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py:202, in MessagePassing.collect(self, args, edge_index, size, kwargs)
200 if isinstance(data, Tensor):
201 self.set_size(size, dim, data)
--> 202 data = self.lift(data, edge_index, dim)
204 out[arg] = data
206 if isinstance(edge_index, Tensor):

File ~/.virtualenvs/xlq/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py:172, in MessagePassing.lift(self, src, edge_index, dim)
170 if isinstance(edge_index, Tensor):
171 index = edge_index[dim]
--> 172 return src.index_select(self.node_dim, index)
173 elif isinstance(edge_index, SparseTensor):
174 if dim == 1:

IndexError: index out of range in self


Please help me to fix the problem, much appreciate.

pretrain_OAG.py hgt

Thanks for your great work. I try to run the pretrain_OAG.py and get this error.
Traceback (most recent call last):
File "/home/tiaoban/chen_qian_yu/gptgnn/GPT-GNN/example_OAG/pretrain_OAG.py", line 228, in
node_emb = gpt_gnn.gnn(node_feature.to(device), node_type.to(device), edge_time.to(device), edge_index.to(device), edge_type.to(device))
File "/home/tiaoban/anaconda3/envs/pyg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/tiaoban/chen_qian_yu/gptgnn/GPT-GNN/example_OAG/GPT_GNN/model.py", line 191, in forward
meta_xs = gc(meta_xs, node_type, edge_index, edge_type, edge_time)
File "/home/tiaoban/anaconda3/envs/pyg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/tiaoban/chen_qian_yu/gptgnn/GPT-GNN/example_OAG/GPT_GNN/conv.py", line 169, in forward
return self.base_conv(meta_xs, node_type, edge_index, edge_type, edge_time)
File "/home/tiaoban/anaconda3/envs/pyg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/tiaoban/chen_qian_yu/gptgnn/GPT-GNN/example_OAG/GPT_GNN/conv.py", line 55, in forward
edge_type=edge_type, edge_time = edge_time)
File "/home/tiaoban/anaconda3/envs/pyg/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate
kwargs)
File "/home/tiaoban/anaconda3/envs/pyg/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 156, in collect
self.set_size(size, dim, data)
File "/home/tiaoban/anaconda3/envs/pyg/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 118, in set_size
size[dim] = src.size(self.node_dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)

There are any problems with the pretrain_OAG.py?Thanks!

KeyError: 'emb'

Does anyone meet this problem when running preprocess_OAG.py?

image

How to choose pre-training data

hi, @acbull Thank you for your wonderful work. I have a few issues when using this method for my own dataset (10000 nodes). 1. How to select the pre-training data at this time, is it to randomly sample 90% of the data from the training data set? 2. My data set is in the form of triples (sub, rel, obj). At this time, the method of processing data still uses proprecess.py? ?

thanks!!

attribute generation for pretrain_OAG

Thanks again for this awesome repo. It helps me a lot. But I have a question for attribute generation about pretrain_OAG.
When we generate i-th attribute, i+1 node should not be accepted. However, in pretrain_OAG, all source node attributes are not masked, so if there exists the situation i-th node can accept i+1 node?
for example:
i-th node is a "paper" node, i+1-th node is another "paper" node which cites the i-the "paper" node.
绘图1

How to generate vfi_vector.csv

Hello, I really appreciate your work. And I want to use my own dataset to pretrain the model. But when I preprocess the data, I didn't have the file 'vfi_vector.csv'. So I want to ask how to generate 'vfi_vector.csv' in my own work?

About the experiment in GPT-GNN

Hi authors
Thanks for your amazing work in Pre-train GNN, It can solve larger datasizes Graph in GNN model. but I have some questions when I try to understand the part of experiment. I saw your code in github, and I noticed that it includes pre-train and fine-tune. And I have a question about experiment in your paper.

  1. How do you conduct the experiment in GraphSAGE and GAE? If I split data into 0-70% Pre-train, 70%-80% train, 80%-90%valid, 90%-100%test, I should put 0-80% data as train data into GraphSAGE and GAE ?
    Looking forward to your reply, thank you!

About ablation studies on base GNNs

Hi,

Thank you for your good paper!

The caption of Tab.2 says that experiments included here are under the combined transfer setting, where the accuracy for GPT-GNN + HGT is 0.407:

image

However, according to Tab.1, GPT-GNN + HGT under this setting achieves acc of 0.393. Rather, 0.407 is the result of GPT-GNN+ HGT under the field transfer setting.

Could you please clarify the setting used in Tab.2?

Additionally, I am also wondering how you evaluated GAE, GraphSage (unsp.), and Graph Infomax in experiments in Tab.1. As far as I know, GNN encoders used in the above three papers are GCN, the GraphSage architecture, and GIN, which are designed to learn from homogeneous graphs. May I ask whether you also used these GNN encoders, i.e., GCN, GraphSage and GIN, for related experiments in Tab.1? If so, could you please elaborate on how to apply these encoders to OAG and Amazon that are heterogeneous graphs?

Thank you!

f1_score

Hi acbull,
When I use finetune_reddit.py, i get an error name 'f1_score' is not defined int 203 line.
and I didnt find f1_score reference in example_reddit actually. so could you tell me about the f1_score's detail?
thanks!

Error when using han or hetgnn

Hi, when I finetune directly (without pre-train) on models based on han or hetgnn, I encounter with this error:

Start Loading Graph Data...
Finish Loading Graph Data!
Data Preparation: 91.0s
Traceback (most recent call last):
File "finetune_OAG_PF.py", line 252, in
res = classifier.forward(node_rep[x_ids])
TypeError: 'NoneType' object is not subscriptable

Do you have any ideas for the reasons? Thank you!

About the dataset in GPT_GNN

I notice that you actually have three categories (CS/Med/NN) in OAG, which is available in the preprocessed graphs. I am interested in the whole datasets about the three categories. Maybe, you can provide the raw data about the Med and NN like CS. Thanks for your help in advance.

want to pretrain on my own datasets

Hi, acbull~ I think this algorithm is very interesting and I really want to test on my own graph dataset. It is there any advice or tips on how to prepare my own pretrain graph data? Thank you very much~~

get training data takes a long time

When I run the code:
train_data = [job.get() for job in jobs_pool[:-1]]

the program waits this line to finish for a very long time, more than 2 days.
Does anyone know where the problem is?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.