Code Monkey home page Code Monkey logo

scaled's Introduction

ScaLed: Sampling Enclosing Subgraph for Link Prediction

ScaLed (Sampling Enclosing Subgraph for Link Prediction) is a fork of SEAL aimed at training GNNs on sparser k-hop subgraphs on the downstream task of link prediction.

arXiv link: https://arxiv.org/abs/2206.12004

ACM Digital Library Link: https://dl.acm.org/doi/10.1145/3511808.3557688

Experimental setup

To setup the development environment, use the quick_install.sh bash script. This contains all the essential python packages needed to run the experiments and work on the codebase. Please note that this dev setup is specifically crafted for Macbooks running the M1 silicon. If you have any other system, update how Pytorch and Pytorch Geometric gets build in the script (lines 5 and 6).

To run the quick install use the command: source quick_install.sh. Do note that you will be prompted to enter y at times.

In case any important package is left out, please let us know.

Useful Command line arguments

Since ScaLed is a fork of the SEAL repo, all the command line arguments that exist in SEAL-OGB(with few exceptions) will work in ScaLed as well.

The following are the command line arguments specifically created for ScaLed

  • dataset_stats - Helps print dataset statistics
  • m - Set the length of the random walk taken for ScaLed (called h in the paper)
  • M - Set the number of random walks rooted from the source and destination nodes to take (called k in the paper)
  • dropedge - Set dropedge value to randomly drop edge indices in the forward pass
  • cuda_device - Set the GPU ID to be used (eg; 0)
  • calc_ratio - Calculate the sparsity of ScaLed vs SEAL
  • pairwise - Run with pairwise loss function
  • loss_fn - Set the loss function (default loss is BCE with logit loss)
  • neg_ratio - Set the number of negative edges to take for one positive edge (defaults to 1)
  • profile - Helps run the Pytorch Geometric profiler to get insights into GPU memory usage, time taken etc.
  • split_val_ratio - Choose the split % for validation (defaults to 5% of data)
  • split_test_ratio - Choose the split % for test (defaults to 10% of data)
  • train_mlp - Train using structure unaware MLP (this is used as a baseline)
  • train_gae - Train using graph auto encoders (this is used as a baseline)
  • base_gae - Set the base GNN encoder for the GAE training
  • dropout - Set value of dropout in forward pass of GNNs (defaults to 0.5)
  • seed - Set seed for reproducibility (defaults to 1)
  • train_n2v - Train using Node2Vec model (this is used as a baseline)
  • train_mf - Train using Matrix Factorization (this is used as a baseline)

Supported Datasets

We support the following datasets:

  1. ogbl-collab (other ogb link prediction datasets are not tested) from "OGB https://arxiv.org/abs/2005.00687"
  2. Planetoid Dataset (Cora, PubMed, CiteSeer) from "Revisiting Semi-Supervised Learning with Graph Embeddings https://arxiv.org/abs/1603.08861"
  3. attributed-Facebook (other attributed datasets are not tested) from "Scaling Attributed Network Embedding to Massive Graphs" https://arxiv.org/abs/2009.00826"
  4. SEAL datasets (USAir, Yeast etc. introduced in the original paper) from "Link prediction based on graph neural networks https://arxiv.org/pdf/1802.09691.pdf"

Reproducibility

All experimental setup for all datasets are available under the experiments folder. To reproduce the results in the paper, run the scripts in this folder. We run each experiment 5 times on five fixed random seeds to ensure reproducibility of results from the paper.

The code for all baselines is available under the baselines folder. All baselines specific to ogbl datasets are under the ogbl_datasets folder.

For example usage on non-attributed datasets, see below:

To run ScaLed on USAir with profiling for a seed:

python seal_link_pred.py --dataset USAir --epochs 50 --m 2 --M 20 --seed 1 --profile

To run SEAL on USAir with profiling for a seed:

python seal_link_pred.py --dataset USAir --epochs 50 --num_hops 2 --seed 1 --profile

For example usage on attributed datasets, see below:

To run ScaLed on Cora with profiling for a seed:

python seal_link_pred.py --dataset Cora --m 3 --M 20 --use_feature --profile --seed 1

To run SEAL on Cora with profiling for a seed:

python seal_link_pred.py --dataset Cora --num_hops 3 --use_feature --profile --seed 1

Backward compatibility with SEAL-OGB

Since ScaLed is a fork of the original SEAL-OGB repo, all of their experiments work on our repo as well

Note: The earlier name of the repo was SWEAL-OGB. As a result, there could be traces of SWEAL-OGB still existing in the codebase . SWEAL-OGB should be treated as a synomyn of ScaLed.

Reporting Issues and Improvements

We currently don't have an issue/PR template. However, if you find an issue in our code please create an issue in GitHub. It would be great if you could give as much information regarding the issue as possible(what command was run, what are the python package versions, providing full stack trace etc.).

If you have any further questions, you can reach out to us via email. Paul Louis, Shweta Ann Jacob

Miscellaneous

We also provide the following miscellaneous codes:

  • plots folder - contains all the code required to reproduce all the plots in the paper
  • data folder - contains the datasets from SEAL ("Link prediction based on graph neural networks https://arxiv.org/pdf/1802.09691.pdf")
  • custom_losses.py - contains some custom loss functions adapted from "Pairwise Learning for Neural Link Prediction (https://arxiv.org/pdf/2112.02936.pdf"
  • hypertuner.py - Helps run some tuning to determine best m and M values (or h, k values as shown in paper)
  • parsers folder - Helps parse output logs to build tables in the paper

Bibtex

If you find our work useful, please cite us using the following:

@inproceedings{10.1145/3511808.3557688,
author = {Louis, Paul and Jacob, Shweta Ann and Salehi-Abari, Amirali},
title = {Sampling Enclosing Subgraphs for Link Prediction},
year = {2022},
isbn = {9781450392365},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3511808.3557688},
doi = {10.1145/3511808.3557688},
abstract = {Link prediction is a fundamental problem for graph-structured data (e.g., social networks, drug side-effect networks, etc.). Graph neural networks have offered robust solutions for this problem, specifically by learning the representation of the subgraph enclosing the target link (i.e., pair of nodes). However, these solutions do not scale well to large graphs as extraction and operation on enclosing subgraphs are computationally expensive. This paper presents a scalable link prediction solution, that we call ScaLed, which utilizes sparse enclosing subgraphs to make predictions. To extract sparse enclosing subgraphs, ScaLed takes multiple random walks from a target pair of nodes, then operates on the sampled enclosing subgraph induced by all visited nodes. By leveraging the smaller sampled enclosing subgraph, ScaLed can scale to larger graphs with much less overhead while maintaining high accuracy. Through comprehensive experiments, we have shown that ScaLed can produce comparable accuracy to those reported by the existing subgraph representation learning frameworks while being less computationally demanding.},
booktitle = {Proceedings of the 31st ACM International Conference on Information & Knowledge Management},
pages = {4269โ€“4273},
numpages = {5},
keywords = {subgraph sampling, graph neural networks, link prediction},
location = {Atlanta, GA, USA},
series = {CIKM '22}
}

Supplementatry Material Related to the Paper/Research

  • You can also find our preprint on arXiv at https://arxiv.org/abs/2206.12004
  • You can watch a short video presentation of our work here presented by Shweta Ann Jacob. The slides used for the presentation can be found here.
  • The poster for this work is available here.
  • Finally, the published copy of the paper is available on ACM DL.

scaled's People

Contributors

venomouscyanide avatar muhanzhang avatar shhs29 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.