Code Monkey home page Code Monkey logo

nebula-dgl's Introduction

nebula-dgl

pdm-managed License

nebula-dgl is the Lib for NebulaGraph integration with Deep Graph Library (DGL).

nebula-dgl is still WIP, there is a demo project here .

Guide

Installation

Install from PyPi

python3 -m pip install nebula-dgl
python3 -m pip install dgl dglgo -f https://data.dgl.ai/wheels/repo.html

Install from codebase for dev

python3 -m pip install nebula3-python
python3 -m pip install dgl dglgo -f https://data.dgl.ai/wheels/repo.html

# build and install
python3 -m pip install .

Playground

Clone this repository to your local directory first.

git clone https://github.com/wey-gu/nebula-dgl.git
cd nebula-dgl
  1. Deploy NebulaGraph playground with Nebula-UP:

Install NebulaGraph:

curl -fsSL nebula-up.siwei.io/install.sh | bash

Load example data:

~/.nebula-up/load-basketballplayer-dataset.sh
  1. Create a jupyter notebook in same docker network: nebula-net
docker run -it --name dgl -p 8888:8888 --network nebula-net \
    -v "$PWD":/home/jovyan/work jupyter/datascience-notebook \
    start-notebook.sh --NotebookApp.token='secret'

Now you can either:

Or:

  • run ipython with the container:
docker exec -it dgl ipython
cd work
  1. Install nebula-dgl in notebook:

Install nebula-dgl:

!python3 -m pip install python3 -m pip install nebula3-python==3.4.0
!python3 -m pip install dgl dglgo -f https://data.dgl.ai/wheels/repo.html
!python3 -m pip install .
  1. Try with a homogeneous graph:
import yaml
import networkx as nx

from nebula_dgl import NebulaLoader


nebula_config = {
    "graph_hosts": [
                ('graphd', 9669),
                ('graphd1', 9669),
                ('graphd2', 9669)
            ],
    "nebula_user": "root",
    "nebula_password": "nebula",
}

# scan loader(mostly for training)

with open('example/homogeneous_graph.yaml', 'r') as f:
    feature_mapper = yaml.safe_load(f)

nebula_loader = NebulaLoader(nebula_config, feature_mapper)
homo_dgl_graph = nebula_loader.load()

# or query based(mostly for small graph when inference)

query = """
MATCH p=()-[:follow]->() RETURN p
"""
nebula_loader = NebulaLoader(nebula_config, feature_mapper, query=query, query_space="basketballplayer")
homo_dgl_graph = nebula_loader.load()

nx_g = homo_dgl_graph.to_networkx()
nx.draw(nx_g, with_labels=True, pos=nx.spring_layout(nx_g))

Result:

nx_draw

  1. Compute the degree centrality of the graph:
nx.degree_centrality(nx_g)

Result:

{0: 0.0,
 1: 0.04,
 2: 0.02,
 3: 0.02,
 4: 0.06,
 5: 0.06,
 6: 0.04,
 7: 0.24,
 8: 0.16,
 9: 0.0,
 10: 0.02,
 11: 0.04,
 12: 0.04,
 13: 0.04,
 14: 0.1,
 15: 0.04,
 16: 0.0,
 17: 0.1,
 18: 0.04,
 19: 0.04,
 20: 0.0,
 21: 0.0,
 22: 0.04,
 23: 0.02,
 24: 0.02,
 25: 0.04,
 26: 0.06,
 27: 0.0,
 28: 0.02,
 29: 0.0,
 30: 0.04,
 31: 0.12,
 32: 0.04,
 33: 0.22,
 34: 0.14,
 35: 0.1,
 36: 0.04,
 37: 0.14,
 38: 0.1,
 39: 0.02,
 40: 0.14,
 41: 0.08,
 42: 0.1,
 43: 0.12,
 44: 0.12,
 45: 0.08,
 46: 0.1,
 47: 0.02,
 48: 0.04,
 49: 0.12,
 50: 0.06}

NebulaGraph to DGL

from nebula_dgl import NebulaLoader


nebula_config = {
    "graph_hosts": [
                ('graphd', 9669),
                ('graphd1', 9669),
                ('graphd2', 9669)
            ],
    "nebula_user": "root",
    "nebula_password": "nebula"
}

# load feature_mapper from yaml file
with open('example/nebula_to_dgl_mapper.yaml', 'r') as f:
    feature_mapper = yaml.safe_load(f)

nebula_loader = NebulaLoader(nebula_config, feature_mapper)
dgl_graph = nebula_loader.load()

Play homogeneous graph algorithms in networkx

import networkx

with open('example/homogeneous_graph.yaml', 'r') as f:
    feature_mapper = yaml.safe_load(f)

nebula_loader = NebulaLoader(nebula_config, feature_mapper)
homo_dgl_graph = nebula_loader.load()
nx_g = homo_dgl_graph.to_networkx()

# plot it
networkx.draw(nx_g, with_lables=True)

# get degree
networkx.degree(nx_g)

# get degree centrality
networkx.degree_centrality(nx_g)

Multi-Part Loader for NebulaGraph

  1. For now, the Multi-Part Loader is slow like sequence scan, need to profile the performance.
import yaml
import networkx as nx
import matplotlib.pyplot as plt

from nebula_dgl import NebulaReducedLoader


nebula_config = {
    "graph_hosts": [
                ('127.0.0.1', 9669)
            ],
    "nebula_user": "root",
    "nebula_password": "nebula",
}

with open('example/homogeneous_graph.yaml', 'r') as f:
    feature_mapper = yaml.safe_load(f)

# you only need change the following line: from NebulaLoader to NebulaReducedLoader
# Easy for you to use the multi-part loader 
nebula_reduced_loader = NebulaReducedLoader(nebula_config, feature_mapper)
homo_dgl_graph = nebula_reduced_loader.load()
nx_g = homo_dgl_graph.to_networkx()
nx.draw(nx_g, with_labels=True, pos=nx.spring_layout(nx_g))
plt.savefig("multi_graph.png")

nebula-dgl's People

Contributors

milittle avatar wey-gu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nebula-dgl's Issues

label handling

Add label expression as below:

---
# If vertex id is string type, remap_vertex_id must be true.
remap_vertex_id: True
space: yelp
# str or int
vertex_id_type: int
vertex_tags:
  - name: review
    label:
      name: is_fraud
      properties:
        - name: is_fraud
          type: int
          nullable: False
      filter:
        type: value
    features:
      - name: f0
        properties:
          - name: f0
            type: float
            nullable: False
        filter:
          type: value
      - name: f1
        properties:
          - name: f1
            type: float
            nullable: False
        filter:
          type: value

rank awareness

as title, now we are not supporting consider rank field of an edge.

doc: distributed training

How to do distributed training:

Load data and prepare on graph partition

import dgl

g = ...  # load the DGLGraph object with nebula-dgl
dgl.distributed.partition_graph(g, 'mygraph', 2, 'data_root_dir')

It'll output the partitioned graph as:

data_root_dir/
  |-- mygraph.json          # metadata JSON. File name is the given graph name.
  |-- part0/                # data for partition 0
  |  |-- node_feats.dgl     # node features stored in binary format
  |  |-- edge_feats.dgl     # edge features stored in binary format
  |  |-- graph.dgl          # graph structure of this partition stored in binary format
  |
  |-- part1/                # data for partition 1
     |-- node_feats.dgl
     |-- edge_feats.dgl
     |-- graph.dgl

See more on the reference docs:

ref:

Prepare distributed training env

  • create a cluster of machines
  • upload training script and partitioned data to each cluster
    • Could consider NFS/JuiceFS for ease of data access from distributed servers
  • SSH access, prepare SSH pub key to enable password-less SSH auth
  • Launch training job

ref:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.