Code Monkey home page Code Monkey logo

subg_acc's Introduction

SubGACC: Subgraph Operation Accelerator

Version

The SubGAcc package is an extension library based on C and openmp to accelerate operations in subgraph-based graph representation learning (SGRL).

Follow the principles of algorithm system co-design, queried subgraphs for target links/motifs (e.g. ego-network in canonical SGRLs) are decomposed into node-level ones (e.g. collection of walks by walk_sampler in SUREL, set of nodes by gset_sampler in SUREL+), whose join can act as proxies of subgraphs, and can be reused among different queries.

Currently, SubGAcc consists of the following methods for scalable realization of SGRLs:

  • gset_sampler node set sampling with structure encoder of landing probability (LP)
  • walk_sampler walk sampling with relative positional encoder (RPE)
  • batch_sampler query sampling (a group of nodes) for mini-batch training of link prediction
  • walk_join online joining of node-level walks to construct the proxy of subgraph for given queries (e.g. link query $Q= \lbrace u,v \rbrace$ $\to$ join sampled walks of node $u$ and $v$ as $\mathcal{G}_{Q} = \lbrace W_u \bigoplus W_v \rbrace$)

Update

Feb. 25, 2023:

  • Release v2.2 with more robust memory management of allocation, release and indexing (billion edges).
  • Add bitwise-based hash for encoding structural features.
  • Add test cases and script of wall time measure.

Jan. 29, 2023:

  • Release v2.1 with refactored code base.
  • More robust memory accessing with buffer for set sampler on large graphs (million nodes).

Jan. 28, 2023:

  • Release v2.0 with the walk-based set sampler gset_sampler.

Requirements

(Other versions may work, but are untested)

  • python >= 3.8
  • numpy >= 1.17
  • gcc >= 8.4

Installation

python setup.py install

Functions

gset_sampler

subg_acc.gset_sampler(indptr, indices, query, num_walks, num_steps) 
-> (numpy.array [n], numpy.array [2,?], numpy.array [?,num_steps+1])

Sample a node set for each node in query (size of n) through num_walks-many num_steps-step random walks on the input graph in CSR format (indptr, indices), and encodes landing probability at each step of all nodes in the sampled set as structural features of the seed node.

For usage examples, see test.py.

Parameters

  • indptr (np.array) - Index pointer array of the adjacency matrix in CSR format.
  • indices (np.array) - Index array of the adjacency matrix in CSR format.
  • query (np.array / list) - Nodes are queried to be sampled.
  • num_walks (int) - The number of random walks.
  • num_steps (int) - The number of steps in a walk.
  • bucket (int, optional) - The buffer size for sampled neighbors per node.
  • nthread (int, optional) - The number of threads.
  • seed (int, optional) - Random seed.

Returns

  • nsize (np.array) - The size of sampled set for each node in query.
  • remap (np.array) - Pairwised node id and the index of its associated structural encoding in enc array.
  • enc (np.array) - The compressed (unique) encoding of structural features.

subg_acc's People

Contributors

veritasyin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.