Code Monkey home page Code Monkey logo

marius's Introduction

Marius and MariusGNN

This repository contains the code for the Marius and MariusGNN papers. We have combined the two works into one unified system for training graph embeddings and graph neural networks over large-scale graphs on a single machine using the entire memory hierarchy.

Marius (OSDI '21 Paper) is designed to mitigate/reduce data movement overheads for graph embeddings using:

  • Pipelined training and IO
  • Partition caching and a buffer-aware data ordering to minimize IO for disk-based training (called BETA)

MariusGNN (EuroSys '23 Paper) utilizes the data movement optimizations from Marius and adds support for scalable graph neural network training through:

  • An optimized data structure for neighbor sampling and GNN aggregation (called DENSE)
  • An improved data ordering for disk-based training (called COMET) which minimizes IO and maximizes model accuracy (with COMET now subsuming BETA)

Build and Install

Requirements

  • CUDA >= 10.1
  • CuDNN >= 7
  • PyTorch >= 1.8
  • Python >= 3.7
  • GCC >= 7 (On Linux) or Clang >= 11.0 (On MacOS)
  • CMake >= 3.12
  • Make >= 3.8

Docker Installation

We recommend using Docker for build and installation. We provide a Dockerfile which installs all the necessary requirements and provide end-to-end instructions in examples/docker/.

Pip Installation

With the required dependencies installed, Marius and MariusGNN can be built using Pip:

git clone https://github.com/marius-team/marius.git
cd marius
pip3 install .

Installation Result

After installation, the Python API can be accessed with import marius.

The following command line tools will be also be installed:

  • marius_train: Train models using configuration files and the command line
  • marius_eval: Command line model evaluation
  • marius_preprocess: Built-in dataset downloading and preprocessing
  • marius_predict: Batch inference tool for link prediction or node classification

Command Line Interface

The command line interface supports performant in-memory and out-of-core training and evaluation of graph learning models. Experimental results from our papers can be reproduced using this interface (we also provide an exact experiment artifact for each paper in separate branches).

Quick Start:

First make sure Marius is installed.

Preprocess the FB15K_237 dataset with marius_preprocess --dataset fb15k_237 --output_dir datasets/fb15k_237_example/

Train using the example configuration file (assuming we are in the root directory of the repository) marius_train examples/configuration/fb15k_237.yaml

After running this configuration file, the MRR output by the system should be about .25 after 10 epochs.

Perform batch inference on the test set with marius_predict --config examples/configuration/fb15k_237.yaml --metrics mrr --save_scores --save_ranks

See the full example for details.

Python API

The Python API is currently experimental and can be used to perform in-memory training and evaluation of graph learning models.

See the documentation and examples/python/ for Python API usage and examples.

Citing Marius or MariusGNN

Marius (out-of-core graph embeddings)

@inproceedings{Marius,
    author = {Jason Mohoney and Roger Waleffe and Henry Xu and Theodoros Rekatsinas and Shivaram Venkataraman},
    title = {Marius: Learning Massive Graph Embeddings on a Single Machine},
    booktitle = {15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21)},
    year = {2021},
    isbn = {9781939133229},
    pages = {533--549},
    url = {https://www.usenix.org/conference/osdi21/presentation/mohoney},
    publisher = {{USENIX} Association}
}

MariusGNN (out-of-core GNN training)

@inproceedings{MariusGNN, 
    author = {Roger Waleffe and Jason Mohoney and Theodoros Rekatsinas and Shivaram Venkataraman},
    title = {MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks}, 
    booktitle = {Proceedings of the Eighteenth European Conference on Computer Systems}, 
    year = {2023}, 
    isbn = {9781450394871}, 
    pages = {144โ€“161},
    url = {https://doi.org/10.1145/3552326.3567501},
    publisher = {Association for Computing Machinery}
}

marius's People

Contributors

jasonmoho avatar anzexie avatar basavaraj29 avatar rogerwaleffe avatar cthoyt avatar sarda-devesh avatar ryansun117 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.