The dgnn from dgsparse

dgNN

Note: We'd like to inform our users that dgNN will no longer be actively maintained. Our development efforts will focus on integrating its features and functionalities into dgSPARSE-Lib. We believe this consolidation will provide a more streamlined and efficient experience for our user community. Thank you for your continued support.

dgNN is a high-performance backend for GNN layers with DFG (Data Flow Graph) level optimization. dgNN project is based on PyTorch.

How to install

through pip

pip install dgNN

If pip couldn't build dgNN, we recommend you to build dgNN from source.

git clone [email protected]:dgSPARSE/dgNN.git
cd dgNN
bash install.sh

Requirement

CUDA toolkit >= 10.0
pytorch >= 1.7.0
scipy
dgl >= 0.7 (We use dgl's dataset)
ninja

We prepare a docker to run our implementation. You could run our dgNN in a docker container.

cd docker
docker build -t dgNN:v1 -f Dockerfile .
docker run -it dgNN:v1 /bin/bash

Examples

Our training script is modified from DGL. Now we implements three popular GNN models.

Run GAT

DGL Code

cd dgNN/script/train
python train_gatconv.py --num-hidden=64 --num-heads=4 --dataset cora --gpu 0

Run Monet

DGL Code

cd dgNN/script/train
python train_gmmconv.py --n-kernels 3 --pseudo-dim 2 --dataset cora --gpu 0

Run PointCloud

We use modelnet40-sampled-2048 data in our PointNet. DGL Code

cd dgNN/script/train
python train_edgeconv.py

Collaborative Projects

CogDL is a flexible and efficient graph-learning framework that uses GE-SpMM to accelerate GNN algorithms. This repo is implemented in CogDL as a submodule.

Citation

If you use our dgNN project in your research, please cite the following bib:

This project also implements part of algorithms from GNN-computing, especially method of neighbor grouping in SpMM. If you use our dgNN project in your research, please also cite the following bib:

@inproceedings{huang2021understanding,
  title={Understanding and bridging the gaps in current GNN performance optimizations},
  author={Huang, Kezhao and Zhai, Jidong and Zheng, Zhen and Yi, Youngmin and Shen, Xipeng},
  booktitle={Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
  pages={119--132},
  year={2021}
}

If you meet any problems in this repo, fill free to write issues or contact us by e-mail ([email protected]).

CUDA error: no kernel image is available for execution on the device when running the training

Hi, I am trying to run your framework and I have come across such a problem.

Traceback (most recent call last):
  File "/scratch/snx3000/prenc/dgNN/dgNN/script/train/train_gatconv.py", line 202, in <module>
    main(args)
  File "/scratch/snx3000/prenc/dgNN/dgNN/script/train/train_gatconv.py", line 129, in main
    logits = model(features)
  File "/scratch/snx3000/prenc/dgnn-venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/snx3000/prenc/dgNN/dgNN/script/train/train_gatconv.py", line 52, in forward
    logits = self.gat_layers[-1](self.row_ptr,self.col_idx,self.col_ptr,self.row_idx,self.permute,h).mean(1)
  File "/scratch/snx3000/prenc/dgnn-venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/snx3000/prenc/dgNN/dgNN/layers/gatconv_layer.py", line 72, in forward
    h=self.feat_drop(h)
  File "/scratch/snx3000/prenc/dgnn-venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/snx3000/prenc/dgnn-venv/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 59, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "/scratch/snx3000/prenc/dgnn-venv/lib/python3.9/site-packages/torch/nn/functional.py", line 1252, in dropout
    return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The error seems to be related to pytorch however the environment works properly when I run the following commands. Have you encountered something like this?

Python 3.9.16 (main, Mar  8 2023, 14:00:05)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import sys
>>> print('A', sys.version)
A 3.9.16 (main, Mar  8 2023, 14:00:05)
[GCC 11.2.0]
>>> print('B', torch.__version__)
B 2.0.0
>>> print('C', torch.cuda.is_available())
C True
>>> print('D', torch.backends.cudnn.enabled)
D True
>>> device = torch.device('cuda')
>>> print('E', torch.cuda.get_device_properties(device))
E _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', major=6, minor=0, total_memory=16280MB, multi_processor_count=56)
>>> print('F', torch.tensor([1.0, 2.0]).cuda())
F tensor([1., 2.], device='cuda:0')

dgsparse / dgnn Goto Github PK

dgnn's Introduction

dgNN

How to install

Requirement

Examples

Collaborative Projects

Citation

dgnn's People

Contributors

Stargazers

Watchers

Forkers

dgnn's Issues

CUDA error: no kernel image is available for execution on the device when running the training

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent