Code Monkey home page Code Monkey logo

tf-gnn-samples's Introduction

TF Graph Neural Network Samples

This repository is the code release corresponding to an article introducing graph neural networks (GNNs) with feature-wise linear modulation (Brockschmidt, 2019). In the paper, a number of GNN architectures are discussed:

  • Gated Graph Neural Networks (GGNN) (Li et al., 2015).
  • Relational Graph Convolutional Networks (RGCN) (Schlichtkrull et al., 2016).
  • Relational Graph Attention Networks (RGAT) - a generalisation of Graph Attention Networks (Veličković et al., 2018) to several edge types.
  • Relational Graph Isomorphism Networks (RGIN) - a generalisation of Graph Isomorphism Networks (Xu et al., 2019) to several edge types.
  • Graph Neural Network with Edge MLPs (GNN-Edge-MLP) - a variant of RGCN in which messages on edges are computed using full MLPs, not just a single layer.
  • Relational Graph Dynamic Convolution Networks (RGDCN) - a new variant of RGCN in which the weights of convolutional layers are dynamically computed.
  • Graph Neural Networks with Feature-wise Linear Modulation (GNN-FiLM) - a new extension of RGCN with FiLM layers.

The results presented in the paper are based on the implementations of models and tasks provided in this repository.

This code was tested in Python 3.6 with TensorFlow 1.13.1. To install required packages, run pip install -r requirements.txt.

The code is maintained by the Deep Program Understanding project at Microsoft Research, Cambridge, UK. We are hiring.

Running

To train a model, it suffices to run python train.py MODEL_TYPE TASK, for example as follows:

$ python train.py RGCN PPI
Loading task/model-specific default parameters from tasks/default_hypers/PPI_RGCN.json.
 Loading PPI train data from data/ppi.
 Loading PPI valid data from data/ppi.
Model has 699257 parameters.
Run PPI_RGCN_2019-06-26-14-33-58_17208 starting.
 Using the following task params: {"add_self_loop_edges": true, "tie_fwd_bkwd_edges": false, "out_layer_dropout_keep_prob": 1.0}
 Using the following model params: {"max_nodes_in_batch": 12500, "graph_num_layers": 3, "graph_num_timesteps_per_layer": 1, "graph_layer_input_dropout_keep_prob": 1.0, "graph_dense_between_every_num_gnn_layers": 10000, "graph_model_activation_function": "tanh", "graph_residual_connection_every_num_layers": 10000, "graph_inter_layer_norm": false, "max_epochs": 10000, "patience": 25, "optimizer": "Adam", "learning_rate": 0.001, "learning_rate_decay": 0.98, "momentum": 0.85, "clamp_gradient_norm": 1.0, "random_seed": 0, "hidden_size": 256, "graph_activation_function": "ReLU", "message_aggregation_function": "sum"}
== Epoch 1
 Train: loss: 77.42656 || Avg MicroF1: 0.395 || graphs/sec: 15.09 | nodes/sec: 33879 | edges/sec: 1952084
 Valid: loss: 68.86771 || Avg MicroF1: 0.370 || graphs/sec: 14.85 | nodes/sec: 48360 | edges/sec: 3098674
  (Best epoch so far, target metric decreased to 224302.10938 from inf. Saving to 'trained_models/PPI_RGCN_2019-06-26-14-33-58_17208_best_model.pickle')
[...]

An overview of options can be obtained by python train.py --help.

Note that task and model parameters can be overriden (note that every training run prints their current settings) using the --task-param-overrides and --model-param-overrides command line options, which take dictionaries in JSON form. So for example, to choose a different number of layers, --model-param-overrides '{"graph_num_layers": 4}' can be used.

Results of the training run will be saved as well in a directory (by default trained_models/, but this can be set using the --result_dir flag). Concretely, the following three files are created:

  • ${RESULT_DIR}/${RUN_NAME}.log: A log of the training run.
  • ${RESULT_DIR}/${RUN_NAME}_best_model.pickle: A dump of the model weights achieving the best results on the validation set.

To evaluate a model, use the test.py script as follows on one of the model dumps generated by train.py:

$ python test.py trained_models/PPI_RGCN_2019-06-26-14-33-58_17208_best_model.pickle
Loading model from file trained_models/PPI_RGCN_2019-06-26-14-33-58_17208_best_model.pickle.
Model has 699257 parameters.
== Running Test on data/ppi ==
 Loading PPI test data from data/ppi.
Loss 11.13117 on 2 graphs
Metrics: Avg MicroF1: 0.954

python test.py --help provides more options, for example to specify a different test data set. A run on the default test set can be be automatically triggered after training using the --run-test option to train.py as well.

Experimental Results

Experimental results reported in the accompanying article can be reproduced using the code in the repository. More precisely, python run_ppi_benchs.py ppi_eval_results/ should produce an ASCII rendering of Table 1 - note, however, that this will take quite a while. Similarly, python run_qm9_benchs.py qm9_eval_results/ should produce an ASCII rendering of Table 2 - this will take a very long time (approx. 13 * 4 * 45 * 5 minutes, i.e., around 8 days), and in practice, we used a different version of this parallelising the runs across many hosts using Microsoft-internal infrastructure.

Note that the training script loads fitting default hyperparameters for model/task combinations from tasks/default_hypers/{TASK}_{MODEL}.json.

Models

Currently, five model types are implemented:

  • GGNN: Gated Graph Neural Networks (Li et al., 2015).
  • RGCN: Relational Graph Convolutional Networks (Schlichtkrull et al., 2017).
  • RGAT: Relational Graph Attention Networks (Veličković et al., 2018).
  • RGIN: Relational Graph Isomorphism Networks (Xu et al., 2019).
  • GNN-Edge-MLP: Graph Neural Network with Edge MLPs - a variant of RGCN in which messages on edges are computed using full MLPs, not just a single layer applied to the source state.
  • RGDCN: Relational Graph Dynamic Convolution Networks - a new variant of RGCN in which the weights of convolutional layers are dynamically computed.
  • GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation - a new extension of RGCN with FiLM layers.

Tasks

New tasks can be added by implementing the tasks.sparse_graph_task interface. This provides hooks to load data, create a task-specific output layers and compute task-specific metrics. The documentation in tasks/sparse_graph_task.py provides a detailed overview of the interface. Currently, four tasks are implemented, exposing different aspects.

Citation networks

The CitationNetwork task (implemented in tasks/citation_network_task.py) handles the Cora, Pubmed and Citeseer citation network datasets often used in evaluation of GNNs (Sen et al., 2008). The implementation illustrates how to handle the case of transductive graph learning on a single graph instance by masking out nodes that shouldn't be considered. You can call this by running python train.py MODEL Cora (or Pubmed or Citeseer instead of Cora).

To run experiments on this task, you need to download the data from https://github.com/kimiyoung/planetoid/raw/master/data. By default, the code looks for this data in data/citation-networks, but this can be changed by using --data-path "SOME/OTHER/DIR".

PPI

The PPI task (implemented in tasks/ppi_task.py) handles the protein-protein interaction task first described by Zitnik & Leskovec, 2017. The implementation illustrates how to handle the case of inductive graph learning with node-level predictions. You can call this by running python train.py MODEL PPI.

To run experiments on this task, you need to download the data from.

curl -LO https://data.dgl.ai/dataset/ppi.zip
unzip ppi.zip -d <path-to-directory>

By default, the code looks for this data in data/ppi, but this can be changed by using --data-path "SOME/OTHER/DIR".

Current Results

Running python run_ppi_benchs.py ppi_results/ should yield results looking like this (on an NVidia V100):

Model Avg. MicroF1 Avg. Time
GGNN 0.990 (+/- 0.001) 432.6
RGCN 0.989 (+/- 0.000) 759.0
RGAT 0.989 (+/- 0.001) 782.3
RGIN 0.991 (+/- 0.001) 704.8
GNN-Edge-MLP0 0.992 (+/- 0.000) 556.9
GNN-Edge-MLP1 0.992 (+/- 0.001) 479.2
GNN_FiLM 0.992 (+/- 0.000) 308.1

QM9

The QM9 task (implemented in tasks/qm9_task.py) handles the quantum chemistry prediction tasks first described by Ramakrishnan et al., 2014 The implementation illustrates how to handle the case of inductive graph learning with graph-level predictions. You can call this by running python train.py MODEL QM9.

The data for this task is included in the repository in data/qm9, which just contains a JSON representation of a pre-processed version of the dataset originally released by Ramakrishnan et al., 2014.

The results shown in Table 2 of the technical report can be reproduced by running python run_qm9_benchs.py qm9_results/, but this will take a very long time (several days) and should best be distributed onto different compute nodes.

VarMisuse

The VarMisuse task (implemented in tasks/varmisuse_task.py) handles the variable misuse task first described by Allamanis et al., 2018. Note that we do not fully re-implement the original model here, and so results are not (quite) comparable with the results reported in the original paper. The implementation illustrates how to handle the case of inductive graph learning with predictions based on node selection. You can call this by running python train.py MODEL VarMisuse.

To run experiments on this task, you need to download the dataset from https://aka.ms/iclr18-prog-graphs-dataset. To make this usable for the data loading code in this repository, you then need to edit the top lines of the script reorg_varmisuse_data.sh (from this repo) to point to the downloaded zip file and the directory you want to extract the data to, and then run it. Note that this will take a relatively long time. By default, the code looks for this data in data/varmisuse/, but this can be changed by using --data-path "SOME/OTHER/DIR".

Current Results

Running python run_varmisuse_benchs.py varmisuse_results/ should yield results looking like this (on a single NVidia V100, this will take about 2 weeks):

Model Valid Acc Test Acc TestOnly Acc
GGNN 0.821 (+/- 0.009) 0.857 (+/- 0.005) 0.793 (+/- 0.012)
RGCN 0.857 (+/- 0.016) 0.872 (+/- 0.015) 0.814 (+/- 0.023)
RGAT 0.842 (+/- 0.010) 0.869 (+/- 0.007) 0.812 (+/- 0.009)
RGIN 0.842 (+/- 0.010) 0.871 (+/- 0.001) 0.811 (+/- 0.009)
GNN-Edge-MLP0 0.834 (+/- 0.003) 0.865 (+/- 0.002) 0.805 (+/- 0.014)
GNN-Edge-MLP1 0.844 (+/- 0.004) 0.869 (+/- 0.003) 0.814 (+/- 0.007)
GNN_FiLM 0.846 (+/- 0.006) 0.870 (+/- 0.002) 0.813 (+/- 0.009)

References

Allamanis et al., 2018

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to Represent Programs with Graphs. In International Conference on Learning Representations (ICLR), 2018. (https://arxiv.org/pdf/1711.00740.pdf)

Brockschmidt, 2019

Marc Brockschmidt. GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation. (https://arxiv.org/abs/1906.12192)

Li et al., 2015

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated Graph Sequence Neural Networks. In International Conference on Learning Representations (ICLR), 2016. (https://arxiv.org/pdf/1511.05493.pdf)

Ramakrishnan et al., 2014

Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole Von Lilienfeld. Quantum Chemistry Structures and Properties of 134 Kilo Molecules. Scientific Data, 1, 2014. (https://www.nature.com/articles/sdata201422/)

Schlichtkrull et al., 2017

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. Modeling Relational Data with Graph Convolutional Networks. In Extended Semantic Web Conference (ESWC), 2018. (https://arxiv.org/pdf/1703.06103.pdf)

Sen et al., 2008

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective Classification in Network Data. AI magazine, 29, 2008. (https://www.aaai.org/ojs/index.php/aimagazine/article/view/2157)

Veličković et al. 2018

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph Attention Networks. In International Conference on Learning Representations (ICLR), 2018. (https://arxiv.org/pdf/1710.10903.pdf)

Xu et al. 2019

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful are Graph Neural Networks? In International Conference on Learning Representations (ICLR), 2019. (https://arxiv.org/pdf/1810.00826.pdf)

Zitnik & Leskovec, 2017

Marinka Zitnik and Jure Leskovec. Predicting Multicellular Function Through Multi-layer Tissue Networks. Bioinformatics, 33, 2017. (https://arxiv.org/abs/1707.04638)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

tf-gnn-samples's People

Contributors

irvifa avatar microsoft-github-policy-service[bot] avatar mmjb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tf-gnn-samples's Issues

Is there a documentation on how to use this?

Hi, I'm looking into this repo and wondering small details such as,

  • format of training data
  • what problems it can solve
  • how to use it after training and testing
  • How to generate embeddings?

Is there any documentation / article hosted?

Some questions about running.

Hello, I have some questions when running your code:
For PPI data, I downloaded your ppi.zip, which has .npy file. when Runing code,
ValueError: File suffix must be .json, .json.gz, .pkl or .pkl.gz: data / ppi \ train_feats.npy
How can I fix it ?

QM9 units mismatch

Looking at your QM9 dataset I've discovered that the raw data seems to be standardized. Here you seem to provide the targets' standard deviations. Comparing (the inverse of) these with the standard deviations (and converting all energies from kcal to eV) I mostly get a good match with the PyTorch Geometric dataset's standard deviations:

 Target: GNN-FiLM ?= PyG
      μ:    15.03 ?= 1.50
      α:    81.73 ?= 8.18
 ε_homo:     0.60 ?= 0.60
 ε_lumo:     1.29 ?= 1.27
     Δε:     1.29 ?= 1.29
   <R²>:   233.73 ?= 280.61
   ZPVE:    32.58 ?= 0.90
U0_atom:    10.41 ?= 10.33
 U_atom:    10.50 ?= 10.42
 H_atom:    10.58 ?= 10.50
 G_atom:     9.58 ?= 9.51
    c_v:    81.35 ?= 4.06

However, there is a 10x difference in μ and α and <R²> (gap), ZPVE, and c_v don't match at all. What am I missing here?

PS.: There seems to be a misunderstanding of the term chemical accuracy. The chemical accuracy would be a constant associated with the precision of experimental measurements, not the standard deviation . In issue #11 you said that CHEMICAL_ACC_NORMALISING_FACTORS gives the chemical accuracies. Those (and neither their inverse or some other transformation) don't match at all:

 Target: GNN-FiLM ?= MPNN chem. accuracy
      μ:   0.0665 ?= 0.100
      α:   0.0122 ?= 0.100
 ε_homo:   0.0719 ?= 0.043
 ε_lumo:   0.0337 ?= 0.043
     Δε:   0.0335 ?= 0.043
   <R²>:   0.0043 ?= 1.200
   ZPVE:   0.0013 ?= 0.001
     U0:   0.0042 ?= 0.043
      U:   0.0041 ?= 0.043
      H:   0.0041 ?= 0.043
      G:   0.0045 ?= 0.043
    c_v:   0.0123 ?= 0.050
      ω:   0.0375 ?= 10.000

A question about graph generation for dataset

Hi, I am trying to understand the "VarMisuse" task and i have run the code successfully. While I am reading dataset of the task, I find the dataset contains pre-generated graphs stored in *.gz from source projects. However, I cannot find code about the graph generation. If it is convenient, could you please tell me is there any code about the preprocessing?
Thanks a lot!

VarMisuse dataset split

Hey @mmjb, how are you?

I am trying to play with training/testing of different architectures for VarMisuse, and I am trying to reproduce the exact dataset as in your experiments from the GNN-FiLM paper, such that the numbers will be comparable.

I downloaded and decompressed the dataset into data/varmisuse/.
The code assumes that there is graphs-test subdirectory and a graphs-testonly directory, but in fact, the dataset is split to projects first, and then to train/dev/test.

I guess that I can re-split according to Appendix D in your ICLR'2018 paper, but there are a few inconsistencies between the Appendix and the actual dataset, for example: openlivewriter appears in the dataset but not in the paper, it only has a graphs subdir but not graphs-{train,test,valid}; hangfire, optikey (train) and ravendb(dev) are in the paper but not in the dataset; botbuilder is marked as "train" in the paper but its graphs-train subdir is empty (unlike the rest of the training projects).

So -

  1. Did you use the paper version of the dataset or the public version for the experiments reported in the GNN-FiLM paper?
  2. If the answer is "public" - then I should just re-organize the files to create the training, test, and test-only dirs? In this case, the "dev" set contains only a single project, right?
  3. Regarding file types - when I try to load a trained model and test it on a directory which contains only the file commandline.0.gz - I'm getting an error in richpath.py that "ValueError: File suffix must be .json, .json.gz, .pkl or .pkl.gz: data/varmisuse/commandline/graphs/commandline.0.gz".

Am I missing something? Maybe it will be easiest if you could share your file structure in the data/varmisuse dir and I'll organize my files accordingly?

Thanks a lot!

QM9 Mulliken partial charges

Thank you for making the code public and helping with the target value scaling last time!

Looking at the source code I've now discovered that you are using the QM9 atom features as-is. However, your version of the dataset includes the Mulliken partial charges, as described in the original paper: https://www.nature.com/articles/sdata201422/

It is rather easy to spot since it is the only float-valued node feature (number 7). This feature should not be available to the model. You can only obtain these charges via QM-based calculations based on the molecular geometries. This is therefore an output, not an input.

In my opinion, this feature should be excluded from the dataset, as done e.g. in the Gilmer MPNN paper. Otherwise you have information leakage. This seems to be an accident -- or did I overlook something? What are your thoughts?

where is data/ppi

when I run "python train.py RGCN PPI", there is an error "No such file or directory: 'data/ppi/train_graph.json'
"

How to add Precision and Recall Metrics

I would like to get Precision and Recall metrics during the training of the Varmisuse task. I have tried modifying varmisuse_task.py with the following code:

Inside make_task_output_model, at approximately line 438
`predicted = tf.argmax(tf.nn.softmax(logits), 1, output_type=tf.int32)
prediction_is_correct = tf.equal(predicted, correct_choices)
accuracy = tf.reduce_mean(tf.cast(prediction_is_correct, tf.float32))

    TP = tf.count_nonzero(predicted * correct_choices)
    TN = tf.count_nonzero((1-predicted) * (1-correct_choices))
    FP = tf.count_nonzero(predicted * (1-correct_choices))
    FN = tf.count_nonzero((1-predicted) * correct_choices)

    precision = tf.divide(TP, TP+FP)
    recall = tf.divide(TP, TP+FN)

    tf.summary.scalar('accuracy', accuracy)
    model_ops['task_metrics'] = {
        'loss': tf.reduce_mean(per_graph_loss),
        'total_loss': tf.reduce_sum(per_graph_loss),
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'num_correct_predictions': tf.reduce_sum(tf.cast(prediction_is_correct, tf.int32)),
    }`

Inside pretty_print_epoch_task_metrics:

acc = sum([m['num_correct_predictions'] for m in task_metric_results]) / float(num_graphs) precision = sum([m['precision'] for m in task_metric_results]) / float(num_graphs) recall = sum([m['recall'] for m in task_metric_results]) / float(num_graphs) return "Accuracy: %.3f | Precision: %.3f | Recall: %.3f" % (acc, precision, recall)

However, this code outputs nan for both Precision and Recall. If anyone knows why this happens and could point me in the right direction, I would greatly appreciate it.

Raw programs for the VarMisuse task

This is a great project! I'd like to run experiments on the VarMisuse task, and ideally on other related tasks like variable naming. The training process works fine for me, but how can I access the raw C# programs used to create the dataset? Alternatively, is there code for creating the graphs from source C# programs? I'd like to construct different graph structures, e.g. by preprocessing the programs, performing program analyses etc.

I tried to reconstruct the programs using the 'ContextGraph' property on samples in the dataset, but the programs don't seem to be correct (e.g. only 2 blocks are closed for an Akka program).

In [43]: program = ""
    ...: for u, v in raw_sample['ContextGraph']['Edges']['NextToken']:
    ...:     label = raw_sample['ContextGraph']['NodeLabels'][str(u)]
    ...:     program += label
    ...:     if label == ";":
    ...:         program += "\n"
    ...: program += raw_sample['ContextGraph']['NodeLabels'][str(v)]
    ...: print(program)
(,value)<SLOT>valuevaluevaluekvkvkvkv_clusterbucketvbucketkeyv_cluster_cluster_cluster_cluster_clusterbucketbucketbucketvkeykeyContext_cluster_cluster_cluster_cluster_cluster_cluster_cluster_cluster_cluster_clusterbucketValueHolder,v,=vvar,)vvdeltaContent=bucketkvkvcurrentkvcurrentkvkv>&&=entryvar.entryentry.entryentryentry.=>entry(.)(=>kv.Count;
Sender.SenderTell(count)countvar=_registry._registry_registry_registrySum{=bucketbucketvar{)foreach(varentryin_registrykvin.{_registryvartopicPrefix=Self.SelfSelf=Key;
...

Thanks!

Question on building the subtoken nodes

Hello

I have tried read the varmisuse implementation provided by varmisuse_task.py, however I faced one question on building the subtoken nodes, see the following line

if node_label in unsplittable_node_names:
,
it seems that the node label wants to skip AST nodes and punctuation, however when I check the unsplittable_node_names read from the c_sharp.txt from dpu_utils.codeutils.get_language_keywords, the AST nodes are not involved in, which leads to the constructed edge contains the AST nodes. According to my understanding, the AST nodes should not involve in building the subtoken edge, is it correct?

Thanks

ERRORS: Headers Error for 7za

Hi, when I try to unzip the varmisuse dataset, it says


$7za x ../graph-dataset.zip

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,32 CPUs AMD Ryzen Threadripper 2950X 16-Core Processor  (800F82),ASM,AES-NI)

Scanning the drive for archives:
1 file, 14654161628 bytes (14 GiB)

Extracting archive: ../graph-dataset.zip

ERRORS:
Headers Error

--
Path = ../graph-dataset.zip
Type = zip
ERRORS:
Headers Error
Physical Size = 14654161628
64-bit = +



Archives with Errors: 1

Open Errors: 1

However, there are still some files get extracted. I wonder if this error is expected?

Some questions about graph classification.

Sorry to bother you.
I want to classify graphs (each graph has a label and contains multiple nodes). I have read the cases in your project. If I want to fulfill my needs,
pm9 is a regression problem. I want to change it to a classification problem.
ppi is the classification of the nodes, and I want to change to the classification of the graph.
What is the fastest way to do it, and I hope to get your suggestions,thank u.

varmisuse_data_splitter

Hi,Good evening~
I am confused about using reorg_varmisuse_data file and I need your help.
I know that when this file runs to the end, the varmisuse_data_splitter.py file is not executed,
The error is as follows:

File "./utils/varmisuse_data_splitter.py", line 26
def _data_loading_worker(file_queue:Queue, result_queue:Queue) -> None:
^
SyntaxError: invalid syntax

I tried to solve this problem, but it didn't work.If you can provide some help, thank you so much!!!!

Results not reproducible with fixed random seed.

While training the VarMisuse task with GGNN model, I have noticed that results are not reproducible even with the random seed fixed.

What I have done:

  1. Fixed the random seed in VarMisuse_GGNN.json to any positive value (e.g., 1252).
  2. Trained two models using python train.py GGNN varmisuse --data-path <my_data_path> --result-dir <my_result_dir>
    What I expect:
    With the same random seed, the results of training on the same dataset should be the exact same, reflected in the log files generated in the result dir.

What I get instead:
Different results, reflected in different training and validation accuracies and different numbers of epochs trained.

I was wondering whether this is an implementation error, or if there is a reason that fixing the random seed does not make results reproducible that I am overlooking.

Thank you!

QM9 preprocessing Chemical Unit

I find the target values in /data/qm9/train.jsonl.gz do not match to its real value in the original dataset.

For example, the QM9 id 0000001 molecule should be methane, therefore the first value of dipole should be 0 instead of [-1.7779076]. Besides, all other 11 targets lose its physical meanings.

{"targets": [[-1.7779076], [-7.5946741], [-6.7142577], [2.2468657], [5.355917], [-4.114645], [-3.1489365], [5.7098937], [5.6933656], [5.6850829], [5.7576447], [-6.1835322], [-1.3203824]], "graph": [[0, 1, 1], [0, 1, 2], [0, 1, 3], [0, 1, 4]], "id": "qm9:000001", "node_features": [[0, 1, 0, 0, 0, 6, -0.535689, 0, 0, 0, 0, 0, 1, 0, 4], [1, 0, 0, 0, 0, 1, 0.133921, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 1, 0.133922, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 1, 0.13392299, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 1, 0.13392299,0, 0, 0, 0, 0, 0, 1, 0]]}

The orginal value of QM9:000001 molecule methane should be as follow:

smiles | mu | alpha | homo | lumo | gap | r2 | zpve | cv | u0 | u298 | h298 | g298
C | 0 | 13.21 | -0.3877 | 0.1171 | 0.5048 | 35.3641 | 0.044749 | 6.469 | -40.4789 | -40.4761 | -40.4751 | -40.4986

What method did you use for QM9 target preprocessing? How can we scale back to the MAE in Chemical units according the paper? Glad if you can help. The units for reference may be shown in [https://arxiv.org/pdf/1712.06113v3.pdf].

Units of QM9 targets

Hi,

Thanks for the code.
Different papers seem to report different units and there is large difference in the errors of these papers like MPNN and k-gnn ( https://arxiv.org/pdf/1810.02244.pdf ). Can you let me know what are the units of targets for QM9 dataset reported in the paper?

exception in varmisuse task ,when use no_parallel:bool=True in _load_data() method

when use params "no_parallel=True" in _load_data() ,We can get the data successfully from load_single_sample method , and reurn result list[GraphSample].but __load_data() can't get the result from “return _load_data()” ,or the type of the result that should be a list is a generator.
the code as follow:
image
the exception:
image
the debug info in vscode:
image
image
In order to facilitate debugging, we will assign the result to the temporary variable 'result' first.It is easy to find that there is correct data in ‘result’, but there is no data in self.loaded_data

Applying GNN-FiLM to dependency parsing

Hi!
I am just a fan of state of the art, not an NLP researcher.

I found on NLP-progress this SOTA result on dependency parsing using GNN.
https://www.aclweb.org/anthology/P19-1237
Source: https://github.com/sebastianruder/NLP-progress/blob/master/english/dependency_parsing.md

I believe they use a GAT, and they claim to be the first to use a GNN for this fundamental NLP task.

I was thinking that, you made what seems to be like a major, task generic advance in GNNs, and that it would be interesting for you to make a collaboration with those researchers.

can not run

I am trying run these examples but was unable to set up a proper environment
Is there a guide or is this code abandoned?
It would be helpful to have a docker

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.