lightaime / deep_gcns_torch Goto Github PK

Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

License: MIT License

Python 98.24% Shell 1.76%

graph-neural-networks geometric-deep-learning 3d-point-clouds deep-gcns data-mining pytorch computer-vision bioinformatics cheminformatics social-network

deep_gcns_torch's Introduction

DeepGCNs: Can GCNs Go as Deep as CNNs?

In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly residual/dense connections and dilated convolutions, and adapt them to GCN architectures. Through extensive experiments, we show the positive effect of these deep GCN frameworks.

[Project] [Paper] [Slides] [Tensorflow Code] [Pytorch Code]

Overview

We do extensive experiments to show how different components (#Layers, #Filters, #Nearest Neighbors, Dilation, etc.) effect DeepGCNs. We also provide ablation studies on different type of Deep GCNs (MRGCN, EdgeConv, GraphSage and GIN).

How to train, test and evaluate our models

Please look the details in Readme.md of each task inside examples folder. All the information of code, data, and pretrained models can be found there.

DeepGCNs (ICCV'2019, TPAMI'2021): S3DIS, PartNet, ModelNet40, PPI
DeeperGCN (Arxiv'2020): OGB
GNN'1000 (ICML'2021): OGB

Recommended Requirements

Python>=3.7
Pytorch>=1.9.0
pytorch_geometric>=1.6.0
ogb>=1.3.1 only used for experiments on OGB datasets
dgl>=0.5.3 only used for the experiment examples/ogb_eff/ogbn_arxiv_dgl

Install enviroment by runing:

source deepgcn_env_install.sh

Code Architecture

.
├── misc                    # Misc images
├── utils                   # Common useful modules
├── gcn_lib                 # gcn library
│   ├── dense               # gcn library for dense data (B x C x N x 1)
│   └── sparse              # gcn library for sparse data (N x C)
├── eff_gcn_modules         # modules for mem efficient gnns
├── examples 
│   ├── modelnet_cls        # code for point clouds classification on ModelNet40
│   ├── sem_seg_dense       # code for point clouds semantic segmentation on S3DIS (data type: dense)
│   ├── sem_seg_sparse      # code for point clouds semantic segmentation on S3DIS (data type: sparse)
│   ├── part_sem_seg        # code for part segmentation on PartNet
│   ├── ppi                 # code for node classification on PPI dataset
│   └── ogb                 # code for node/graph property prediction on OGB datasets
│   └── ogb_eff             # code for node/graph property prediction on OGB datasets with memory efficient GNNs
└── ...

Citation

Please cite our paper if you find anything helpful,

@InProceedings{li2019deepgcns,
    title={DeepGCNs: Can GCNs Go as Deep as CNNs?},
    author={Guohao Li and Matthias Müller and Ali Thabet and Bernard Ghanem},
    booktitle={The IEEE International Conference on Computer Vision (ICCV)},
    year={2019}
}

@article{li2021deepgcns_pami,
  title={Deepgcns: Making gcns go as deep as cnns},
  author={Li, Guohao and M{\"u}ller, Matthias and Qian, Guocheng and Perez, Itzel Carolina Delgadillo and Abualshour, Abdulellah and Thabet, Ali Kassem and Ghanem, Bernard},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2021},
  publisher={IEEE}
}

@misc{li2020deepergcn,
    title={DeeperGCN: All You Need to Train Deeper GCNs},
    author={Guohao Li and Chenxin Xiong and Ali Thabet and Bernard Ghanem},
    year={2020},
    eprint={2006.07739},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

@InProceedings{li2021gnn1000,
    title={Training Graph Neural Networks with 1000 layers},
    author={Guohao Li and Matthias Müller and Bernard Ghanem and Vladlen Koltun},
    booktitle={International Conference on Machine Learning (ICML)},
    year={2021}
}

License

MIT License

Contact

For more information please contact Guohao Li, Matthias Muller, Guocheng Qian.

deep_gcns_torch's People

Contributors

Stargazers

Watchers

Forkers

dlwbm123 supermousse hzhang57 peterzs azuki-miho shengyupei kshaonan coderzwei lzu-zwb soinlovelin zhwzhong seeker1943 jdc08161063 xrosliang xuyou314 shengzhang90 jlqzzz rxlgq amiormu allan0703 caesarstefanus climbingdaily iiihunter 0xmarsrover gztangde verigle tor4z jryongithub guofenggitlearning mengkunzhao elizabeth1997 jwyang slkun peterzhousz dragomirradev scape1989 cecilia930426 wutaiqiang keephappy-365 zymale eldentse mclyang sitp2018 salensoft leviome zbn123 gangmingzhao tangwenming qianrenjian nuxcodes qin-folks milkigit hermine2015 iamani baobaoluo haofanwang lodurality rizvix0 harold-jeffreys bestzsq youngjoo-kim richard-he hypnopump minhbau lchj linkun-1998 sooheon thubiter mhy-doracmon csu-muskmelon missbook520 asarigun guochengqian zkyzq sunfeng90 qmwz518 2502635892 markunan 732434665 darwin-systems cyysc1998 annopackage adityaasinha28 hansonhl hongbo-miao jacksnowwolf zhihan-lu sailfish009 zhaopengpeng1116 shuowang-ai miracleyin taishanfuxiao jianmohuo xuhuahaoren ziyan-wyq ibrandiay xiaolong-yun liujing1023 carolzxyzxy steelman97

deep_gcns_torch's Issues

Cannot run GAT with multiple heads attention on PPI

When running the following commands on PPI dataset, it will crash eventually.

python -u examples/ppi/main.py --phase train --conv gat --data_dir /data/deepgcn/ppi --block dense --n_filters 256 --n_heads 4

@lightaime @guochengqian

Question about preparing the data

Thank you for providing Pytorch version!
Could you tell how to preparing the data of the project? What about the path and name of them?
Does it the same as the Tensorflow version?

When could the pytorch version publish?

Sorry to bother!

Work of this paper is really great! Although there exists a tensorflow version, it doesn't suitable for I'm not familiar with the tensorflow framework.

So could you please tell me when could this pytorch version publish? Or, whether the version is been preparing.

GPU's capacity

I am training DGCN by my own dataset. Unlike S3DIS, my data looks like N * 20000 * 6, which means that each frame has 20000 points. I try to load it on my 2 GTX 1080Ti but I cannot even load 1 batch on one GPU. My question is : "Do you think Nvidia V100 can load my data for sure?" If you don't think V100 cannot load it neither, would you recommend that I down sample my dataset?

How much memory need in ogbn_products test?

how much memory need in ogbn_products test phase?
in dir "examples/ogb/ogbn_products" I run
python test.py --self_loop --num_layers 14 --gcn_aggr softmax_sg --t 0.1

Pycharm return
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

it seems to OOM, but I have 160GB+ memory. is it not enough for test?

about the reproducible result on S3DIS dataset

Hi, @lightaime ,

According to the provided pretrained models, the result only gives no greater then mIOU=53. However, the result in your paper had mentioned the mIOU performance is 60.0 in Table 2. Could you release the model related to mIOU=60.0?

THX!

pytorch code

Can you kindly share your pytorch code?
I don't mind the draft version:)
This is my email:
[email protected]

visualization

Hi, Could you please tell me how you visualize the S3DIS dataset?
I can't find the toolkit about from TorchTools.DataTools import indoor3d_util

Code seems to miss the fusion block

Thanks for the pytorch version!
But the code seems to miss the fusion block.

pytorch version:

        out = self.head(x, edge_index).unsqueeze(0)
        out = self.backbone(out, batch)[0]
        out = self.prediction(out.transpose(1, 0).contiguous().view(out.shape[1], -1))

tf version:

    graphs = self.build_gcn_backbone_block(self.inputs,
                                           vertex_layer_builder,
                                           edge_layer_builder,
                                           num_layers,
                                           skip_connect)
    fusion = self.build_fusion_block(graphs, num_vertices)
    self.pred = self.build_mlp_pred_block(fusion, num_classes)

Question about dataset.

When I run the code:

root@e62eeb3b75c5:/home/deep_gcns_torch-master/deep_gcns_torch-master# CUDA_VISIBLE_DEVICES=0,1 python examples/sem_seg_sparse/train.py  --multi_gpus --phase train --train_path /data/deepgcn/S3DIS
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
tensorflow is not installed.
usage: train.py [-h] [--phase PHASE] [--use_cpu] [--exp_name EXP_NAME]
                [--root_dir ROOT_DIR] [--data_dir DATA_DIR]
                [--batch_size BATCH_SIZE] [--in_channels IN_CHANNELS]
                [--total_epochs TOTAL_EPOCHS] [--save_freq SAVE_FREQ]
                [--iter ITER] [--lr_adjust_freq LR_ADJUST_FREQ] [--lr LR]
                [--lr_decay_rate LR_DECAY_RATE] [--print_freq PRINT_FREQ]
                [--postname POSTNAME] [--multi_gpus] [--seed SEED]
                [--no_clutter] [--pretrained_model PRETRAINED_MODEL] [--k K]
                [--block BLOCK] [--conv CONV] [--act ACT] [--norm NORM]
                [--bias BIAS] [--n_filters N_FILTERS] [--n_blocks N_BLOCKS]
                [--dropout DROPOUT] [--epsilon EPSILON]
                [--stochastic STOCHASTIC]
train.py: error: unrecognized arguments: --train_path /data/deepgcn/S3DIS

I can't handle this error, nor can I change the directory --data_dir. I try to put the downloaded zip into it, but the following error is displayed:

                [--root_dir ROOT_DIR] [--data_dir DATA_DIR]
                [--batch_size BATCH_SIZE] [--in_channels IN_CHANNELS]
                [--total_epochs TOTAL_EPOCHS] [--save_freq SAVE_FREQ]
                [--iter ITER] [--lr_adjust_freq LR_ADJUST_FREQ] [--lr LR]
                [--lr_decay_rate LR_DECAY_RATE] [--print_freq PRINT_FREQ]
                [--postname POSTNAME] [--multi_gpus] [--seed SEED]
                [--no_clutter] [--pretrained_model PRETRAINED_MODEL] [--k K]
                [--block BLOCK] [--conv CONV] [--act ACT] [--norm NORM]
                [--bias BIAS] [--n_filters N_FILTERS] [--n_blocks N_BLOCKS]
                [--dropout DROPOUT] [--epsilon EPSILON]
                [--stochastic STOCHASTIC]
train.py: error: unrecognized arguments: ----data_dir /data/deepgcn/S3DIS
root@e62eeb3b75c5:/home/ggbond/deep_gcns_torch-master/deep_gcns_torch-master# CUDA_VISIBLE_DEVICES=3,4 python examples/sem_seg_sparse/train.py  --multi_gpus --phase train --data_dir /data/deepgcn/S3DIS
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
tensorflow is not installed.
2021-02-09 05:52:05,148 saving log, checkpoint and back up code in folder: log/sem_seg_sparse-res-edge-n28-C64-k16-drop0.3-lr0.001_B16_20210209-055205_f660a4fa-88d0-41bc-9dfa-d21ab4889f75
2021-02-09 05:52:05,148 ==========       args      =============
2021-02-09 05:52:05,148 phase:train
2021-02-09 05:52:05,149 use_cpu:False
2021-02-09 05:52:05,149 exp_name:sem_seg_sparse
2021-02-09 05:52:05,149 root_dir:log
2021-02-09 05:52:05,149 data_dir:/data/deepgcn/S3DIS
2021-02-09 05:52:05,149 batch_size:16
2021-02-09 05:52:05,150 in_channels:9
2021-02-09 05:52:05,150 total_epochs:100
2021-02-09 05:52:05,150 save_freq:1
2021-02-09 05:52:05,150 iter:0
2021-02-09 05:52:05,150 lr_adjust_freq:20
2021-02-09 05:52:05,150 lr:0.001
2021-02-09 05:52:05,151 lr_decay_rate:0.5
2021-02-09 05:52:05,151 print_freq:100
2021-02-09 05:52:05,151 postname:
2021-02-09 05:52:05,151 multi_gpus:True
2021-02-09 05:52:05,151 seed:0
2021-02-09 05:52:05,151 no_clutter:False
2021-02-09 05:52:05,151 pretrained_model:
2021-02-09 05:52:05,152 k:16
2021-02-09 05:52:05,152 block:res
2021-02-09 05:52:05,152 conv:edge
2021-02-09 05:52:05,152 act:relu
2021-02-09 05:52:05,152 norm:batch
2021-02-09 05:52:05,152 bias:True
2021-02-09 05:52:05,153 n_filters:64
2021-02-09 05:52:05,153 n_blocks:28
2021-02-09 05:52:05,153 dropout:0.3
2021-02-09 05:52:05,153 epsilon:0.2
2021-02-09 05:52:05,153 stochastic:True
2021-02-09 05:52:05,153 device:cuda
2021-02-09 05:52:05,154 jobname:sem_seg_sparse-res-edge-n28-C64-k16-drop0.3-lr0.001_B16
2021-02-09 05:52:05,154 exp_dir:log/sem_seg_sparse-res-edge-n28-C64-k16-drop0.3-lr0.001_B16_20210209-055205_f660a4fa-88d0-41bc-9dfa-d21ab4889f75
2021-02-09 05:52:05,154 ckpt_dir:log/sem_seg_sparse-res-edge-n28-C64-k16-drop0.3-lr0.001_B16_20210209-055205_f660a4fa-88d0-41bc-9dfa-d21ab4889f75/checkpoint
2021-02-09 05:52:05,154 code_dir:log/sem_seg_sparse-res-edge-n28-C64-k16-drop0.3-lr0.001_B16_20210209-055205_f660a4fa-88d0-41bc-9dfa-d21ab4889f75/code
2021-02-09 05:52:05,154 writer:<torch.utils.tensorboard.writer.SummaryWriter object at 0x7f73c762b750>
2021-02-09 05:52:05,154 epoch:-1
2021-02-09 05:52:05,155 step:-1
2021-02-09 05:52:05,155 loglevel:info
2021-02-09 05:52:05,155 ==========     args END    =============
2021-02-09 05:52:05,155

2021-02-09 05:52:05,155 ===> Phase is train.
2021-02-09 05:52:05,155 ===> Creating dataloader ...
Using exist file indoor3d_sem_seg_hdf5_data.zip
Extracting /data/deepgcn/S3DIS/indoor3d_sem_seg_hdf5_data.zip
Traceback (most recent call last):
  File "examples/sem_seg_sparse/train.py", line 129, in <module>
    main()
  File "examples/sem_seg_sparse/train.py", line 20, in main
    train_dataset = GeoData.S3DIS(opt.data_dir, test_area=5, train=True, pre_transform=T.NormalizeScale())
  File "/opt/conda/lib/python3.7/site-packages/torch_geometric/datasets/s3dis.py", line 50, in __init__
    super(S3DIS, self).__init__(root, transform, pre_transform, pre_filter)
  File "/opt/conda/lib/python3.7/site-packages/torch_geometric/data/in_memory_dataset.py", line 54, in __init__
    pre_filter)
  File "/opt/conda/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 90, in __init__
    self._download()
  File "/opt/conda/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 142, in _download
    self.download()
  File "/opt/conda/lib/python3.7/site-packages/torch_geometric/datasets/s3dis.py", line 65, in download
    extract_zip(path, self.root)
  File "/opt/conda/lib/python3.7/site-packages/torch_geometric/data/extract.py", line 40, in extract_zip
    with zipfile.ZipFile(path, 'r') as f:
  File "/opt/conda/lib/python3.7/zipfile.py", line 1258, in __init__
    self._RealGetContents()
  File "/opt/conda/lib/python3.7/zipfile.py", line 1325, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

怎么让NeighborSampler 和deepgcn配合使用 ???

怎么让 torch_geometric.data.NeighborSampler 和deepgcn配合使用 ???

用自己数据和 sage 时, 需要用NeighborSampler 来加载数据, 怎么才能和deepgcn 配合使用??
尝试了一些没成功..
你们给个示例吗? 基于Reddit数据的节点分类就行.

the pytorch geometric install commands need to be updated

Hi, @lightaime,

According to the recent update of pytorch geometric installation step, the commands mentioned in deepgcn_env_install.sh need to be updated.

Pre-trained model

I hope to be able to obtain pre-trained network data for the experiment, I have submitted the application on Google Cloud Disk, my account is [email protected].

error occurs in training modelnet40 after epoch 20

Hi, @lightaime ,

I got the error RuntimeError: CUDA error: device-side assert triggered, after training modelnet40 in epoch 20 (details in log.txt)

Could you give me some hints on how to fix this issue?

Thanks~

My training is very slow

Hi, I trained the sem_seg_sparse model using 2 Tesla V100 GPUs, however, I found the training is very very slow and it takes several minutes to forward a batch! Just a batch! The training is under the default configuration. I had checked the GPU situation and found it is working. Do you have any idea about this?

Code for reproducing OGB results table

Is there a script for getting mean and standard error? Or for getting results for independent runs? Thank you.

Not using edge features

I'd like to apply the ogbg ppa model on my own data. However, my data doesn't have edge features. Would I then just set edge_attr = torch.zeros(edge_index.shape[1]).reshape(-1,1) in the forward function (in model.py) before edge_emb = self.edge_encoder(edge_attr)?

Thanks in advance!

RuntimeError: size mismatch, m1: [2339 x 100], m2: [20 x 256] at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/generic/THCTensorMathBlas.cu:268

When I train the model as
"python -u examples/ppi/main.py --phase train --data_dir /data/deepgcn/ppi"
I meet the issue that:
===> Loading the network ...
===> loading pre-trained ...
===> No pre-trained model
===> Init the optimizer ...
===> Init Metric ...
===> Start training ...
Traceback (most recent call last):
File "/home/msy/project/deep_gcns_original/examples/ppi/main.py", line 77, in train_step
out = model(data)
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/msy/project/deep_gcns_original/examples/ppi/architecture.py", line 49, in forward
feats = [self.head(x, edge_index)]
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/msy/project/deep_gcns_original/gcn_lib/sparse/torch_vertex.py", line 183, in forward
return self.gconv(x, edge_index)
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/msy/project/deep_gcns_original/gcn_lib/sparse/torch_vertex.py", line 22, in forward
return self.nn(torch.cat([x, x_j], dim=1))
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward
return F.linear(input, self.weight, self.bias)
File "/home/msy/anaconda3/envs/torch_1.2/lib/python3.7/site-packages/torch/nn/functional.py", line 1406, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [2339 x 100], m2: [20 x 256] at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/generic/THCTensorMathBlas.cu:268.

How can i solve it?

Cannot load pretrained model

Hi,

When I run
python test.py --use_gpu --conv_encode_edge --add_virtual_node --mlp_layers 2 --num_layers 14 --dataset ogbg-molpcba --block res+ --gcn_aggr softmax_sg --t 0.1 --model_load_path ogbg_molpcba_pretrained_model.pth
for ogbg_molpcba dataset, it returns shape mismatch errors. Can you have a check?

torch_sparse version problem

RuntimeError: Detected that PyTorch and torch_sparse were compiled with different CUDA versions. PyTorch has CUDA version 10.1 and torch_sparse has CUDA version 10.0. Please reinstall the torch_sparse that matches your PyTorch install.

GPU Memory is too high when using knn

thanks for your great job. But I use the GCNs for inference, it is too high when using knn compared with the traditional convolution operation. Do you have some suggestion or potential methods to reduce the GPU usage? thanks a lot

def dense_knn_matrix(x, k=16):

About OGB submission

Thanks for your ogbn-proteins submissions. Could you clearly specify the exact command to reproduce your reported result? We will update the leaderboard once this is confirmed.

I was confused since you provide multiple training and inference schemes. I would like you to make it clear which one you use to obtain the reported result.

Different way of defining global pooling?

I have a rather conceptual question. I noticed that the global pooling in your code find max feature from the 1024 dim vector and this is concatenated with per point feature. In the older dynamic edge convolution network, the global pooling was done across all points giving 512 dim global vector which is concatenated with per point feature.

I am wondering why did you choose to do this? In my opinion 1 dim vector per point doesn't contain much information and I think it might not make any difference in the performance.

Model cannot converge problem

Your work is very interesting. I use the network structure which introduced in your paper on the database of image classification. But I find that as the gcn network goes deeper, even use the Residual Connections of gcn , Convergence of networks get worse. when the depth of network is 2, the accuracy of train set can reach 80%, but when the depth of network grows to 7, the accuracy only drop to 50%. Is there also the problem of gradient disappearing?

default batch-size is too small, and the training is too slow

for ogbn-mol-pcba and ogbg_ppa, the default batch-size is too small, and the training is too slow, how could you finish the experiment? have you changed the batch-size, and the according other hyper parameters setting?

Have you applied DeepGCNs to the classification of objects in ModelNet40？

I think this is a natural idea that could evaluate the capacity of learning feature by DeepGCNs, which is more convenient to conduct than evaluation on S3DIS.

a small question about batch in gcn_lib/sparse/torch_edge.py

hello @guochengqian
I want to check what the layers is going on in gcn_lib/sparse/torch_edge.py. And I just wonder the line
batch_size=batch[-1]+1, so... what is batch argument? i.e., type(batch), or more specifically batch.shape should be what? (I guess it is a ```torch.Tensor````? )

here is part of the code I want to ask, with Chinese comment:

def knn_matrix(x, k=16, batch=None):
    """Get KNN based on the pairwise distance.
    Args:
        pairwise distance: (num_points, num_points)
        k: int
    Returns:
        nearest neighbors: (num_points*k ,1) (num_points, k)
    """
    # 这一步骤的操作有点没有想明白, 是不是某个地方写错了?
    batch_size = batch[-1] + 1
    # 转变形状. 
    x = x.view(batch_size, -1, x.shape[-1])

    # 这里修改为负数, 是为什么? 这是因为topk是选择其中最大的数据, 所以为了找到最小的, 自然要添加负数来进行修正. 
    neg_adj = - pairwise_distance(x)
    _, nn_idx = torch.topk(neg_adj, k=k)
    del neg_adj

    # 一共多少个点. 
    n_points = x.shape[1]

    # 加上了起始点的索引之后, 相当于是将原先的情况进行了一个修正. 将上文的相对索引修正成为绝对索引. 
    start_idx = torch.arange(0, n_points*batch_size, n_points).long().view(batch_size, 1, 1)
    if x.is_cuda:
        start_idx = start_idx.cuda()
    nn_idx += start_idx
    del start_idx


    if x.is_cuda:
        torch.cuda.empty_cache()

    nn_idx = nn_idx.view(1, -1)
    # 中心点的坐标
    center_idx = torch.arange(0, n_points*batch_size).repeat(k, 1).transpose(1, 0).contiguous().view(1, -1)
    if x.is_cuda:
        center_idx = center_idx.cuda()
    return nn_idx, center_idx

and I add a "main" function to test if it works. (just after the gcn_lib/sparse/torch_edge.py)

if __name__ == "__main__":
    import torch
    torch.manual_seed(0)
    edge_index = torch.randint(0, 10, [2, 40])
    d = Dilated(k=5, dilation=2, stochastic=True, epsilon=0.9)
    e_1 = d(edge_index)

    x = torch.randn([10, 20])
    batch = torch.randint(0, 3, [10, ])
    dknn = DilatedKnnGraph(k=5)
    e_2 = dknn(x, batch)

here is the problem:

when i changed the seed from 0 to 1, then everything goes fine?

yours sincerely,
@WMF1997

ImportError: cannot import name 'scale_translate_pointcloud' from 'utils'

When I try to run the train code of Part Segmentation on PartNet, I got this issue, can you help?

Traceback (most recent call last):
File "main.py", line 13, in
from utils import scale_translate_pointcloud
ImportError: cannot import name 'scale_translate_pointcloud' from 'utils' (/home/jrl/Github/deep_gcns_torch-master/examples/part_sem_seg/../../utils/init.py)

can add some examples such as cora, mnist ??

can add exsimples such as cora, mnist ??

PPI data load has a problem

when I run the ppi/main.py ,there is a problem about data load. how should I do?

PartNet code typo

Hi! There is a typo:
https://github.com/lightaime/deep_gcns_torch/blob/master/examples/part_sem_seg/main.py#L164

It should be the val_dataset not test_dataset for validation loader.
Thank you.

Error when training sem_seg_sparse with dense block

When using below command for training sem_seg_sparse model

CUDA_VISIBLE_DEVICES=0,1 python examples/sem_seg_sparse/train.py  --multi_gpus --phase train --train_path data/S3DIS --block dense --conv mr --batch_size 16

The following error message will pop up:

RuntimeError: size mismatch, m1: [16384 x 128], m2: [256 x 64] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:290

I think it is because the in_channels of DenseDynBlock in the https://github.com/lightaime/deep_gcns_torch/blob/master/gcn_lib/sparse/torch_vertex.py#L186 is wrong?

class DenseDynBlock(nn.Module):
    """
    Dense Dynamic graph convolution block
    """
    def __init__(self, channels,  kernel_size=9, dilation=1, conv='edge', act='relu', norm=None, bias=True, **kwargs):
        super(DenseDynBlock, self).__init__()
        self.body = DynConv(channels*2, channels, kernel_size, dilation, conv,
                            act, norm, bias, **kwargs)

    def forward(self, x, batch=None):
        dense = self.body(x, batch)
        return torch.cat((x, dense), 1), batch

@lightaime @guochengqian Could you please help confirm the issue reported here?

AttributeError: 'SummaryWriter' object has no attribute 'scalar_summary'

Hi @lightaime

Thanks for sharing your code!
During the training stage on the ParNet dataset, there is some bugs below:

Traceback (most recent call last):
  File "main.py", line 196, in <module>
    train(model, train_loader, val_loader, test_loader, opt)
  File "main.py", line 72, in train
    opt.writer.scalar_summary(tag, value, opt.step)
AttributeError: 'SummaryWriter' object has no attribute 'scalar_summary'

Could you help me to solve this problem? Does this a version confiliction?

Questions about 'torch_geometric.datasets.S3DIS'

Hi, thank you for your good works.
I think it is convinient to use the dataloader 'torch_geometric.datasets.S3DIS' provided by torch_geometric. However, when I look into the source code for torch_geometric.datasets.s3dis
https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/datasets/s3dis.html#S3DIS ,
I foud it only download the file 'indoor3d_sem_seg_hdf5_data.zip', which is used for training but not suitable for testing. As descripted in this repo, https://github.com/charlesq34/pointnet/tree/master/sem_seg#dataset,
the file 'indoor3d_sem_seg_hdf5_data.zip' is specified for training. When in testing, one shoud download Download 3D indoor parsing dataset then transform them into blocks and execute testing by using script 'collect_indoor3d_data.py' and 'batch_inference.py'

RuntimeError: The following operation failed in the TorchScript interpreter + CUDA OOM

Hi, I try to run the ogbn-arxiv example but end up getting this error:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch_scatter/composite/softmax.py", line 16, in scatter_softmax
    index = broadcast(index, src, dim)

    max_value_per_index = scatter_max(src, index, dim=dim)[0]
                          ~~~~~~~~~~~ <--- HERE
    max_per_src_element = max_value_per_index.gather(dim, index)
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch_scatter/scatter.py", line 79, in scatter_max
        out: Optional[torch.Tensor] = None,
        dim_size: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor]:
    return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: CUDA out of memory. Tried to allocate 2.37 GiB (GPU 0; 10.76 GiB total capacity; 6.39 GiB already allocated; 1.35 GiB free; 8.42 GiB reserved in total by PyTorch)

It's weird because I'm using the default setting in Readme and this error will disappear when it runs on CPU.
The GPU I'm using is vacant but there's also a CUDA OOM error, any suggestions on this one?

The full Traceback is here:

Traceback (most recent call last):
  File "/data5/xo2/deep_gcns_torch/examples/ogb/ogbn_arxiv/main.py", line 127, in <module>
    main()
  File "/data5/xo2/deep_gcns_torch/examples/ogb/ogbn_arxiv/main.py", line 98, in main
    epoch_loss = train(model, x, edge_index, y_true, train_idx, optimizer)
  File "/data5/xo2/deep_gcns_torch/examples/ogb/ogbn_arxiv/main.py", line 42, in train
    pred = model(x, edge_index)[train_idx]
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data5/xo2/deep_gcns_torch/examples/ogb/ogbn_arxiv/model.py", line 96, in forward
    res = checkpoint(self.gcns[layer], h2, edge_index)
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 163, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 74, in forward
    outputs = run_function(*args)
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data5/xo2/deep_gcns_torch/gcn_lib/sparse/torch_vertex.py", line 67, in forward
    m = self.propagate(edge_index, x=x, edge_attr=edge_emb)
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 253, in propagate
    out = self.aggregate(out, **aggr_kwargs)
  File "/data5/xo2/deep_gcns_torch/gcn_lib/sparse/torch_message.py", line 55, in aggregate
    out = scatter_softmax(inputs*self.t, index, dim=self.node_dim)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch_scatter/composite/softmax.py", line 16, in scatter_softmax
    index = broadcast(index, src, dim)

    max_value_per_index = scatter_max(src, index, dim=dim)[0]
                          ~~~~~~~~~~~ <--- HERE
    max_per_src_element = max_value_per_index.gather(dim, index)
  File "/data5/xo2/.conda/envs/deepgcn/lib/python3.8/site-packages/torch_scatter/scatter.py", line 79, in scatter_max
        out: Optional[torch.Tensor] = None,
        dim_size: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor]:
    return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: CUDA out of memory. Tried to allocate 2.37 GiB (GPU 0; 10.76 GiB total capacity; 6.39 GiB already allocated; 1.35 GiB free; 8.42 GiB reserved in total by PyTorch)

Thank you!!!

AttributeError: 'tuple' object has no attribute 'num_nodes'

multi_gpus have error .

python -u examples/ppi/main.py --phase train --data_dir data --block res --n_filters 256 --n_blocks 28 is ok .
But,
python -u examples/ppi/main.py --phase train --data_dir data --block res --n_filters 256 --n_blocks 28 --multi_gpus --batch_size 4
is error:
AttributeError: 'tuple' object has no attribute 'num_nodes'

error when running visualize.py for part_sem_seg example

Hi, @lightaime ,

When I follow the visualization step, I can successfully run step1 to generate *.obj.

>> python eval.py --phase test --category 1 --pretrained_model /data/code13/deep_gcns_torch/pretrained_models/part_sem_seg/ResGCN-28/PartnetSemanticSeg-Bed-L3-res-edge-n28-C64-k9-drop0.5-lr0.005_B6-val_best_model.pth --data_dir /data/code13/deep_gcns_torch/partnet --test_batch_size 4

However, when I run step2 (visualize.py) by this command:
>> python visualize.py --category 1 --obj_no 0 --dir_path /data/code13/deep_gcns_torch/pretrained_models/part_sem_seg/ResGCN-28/result --folders res, I got the following IndexError error:

Using vtk version 9.0.1
(2, 10000, 3)
0.0
0.5
Traceback (most recent call last):
  File "visualize.py", line 51, in <module>
    orientation='horizontal')
  File "/data/code13/deep_gcns_torch/utils/pc_viz.py", line 274, in visualize_part_seg
    orientation=orientation)
  File "/data/code13/deep_gcns_torch/utils/pc_viz.py", line 183, in show_pointclouds
    renderers[i].AddActor(ta)
IndexError: list index out of range

Any hints to solve this issue?

Thanks~

RuntimeError: Error(s) in loading state_dict for DeepGCN during loading the pretrained model

Hi, @lightaime ,

When I perform the testing for modelnet40 using the pretrained model by command:

python main.py --phase test --n_blocks 28 --block res  --pretrained_model ../../pretrained_models/modelnet_cls/ModelNet40-dense-edge-n14-C64-k9-drop0.5-lr0.001_B32_best_model.pth --data_dir /media/root/WDdata/dataset/modelnet40_dataset

I got the error: RuntimeError: Error(s) in loading state_dict for DeepGCN:
(details in log.txt)

Any hints to solve this issue?

Thanks~

DeeperGCNs on 3D Point Clouds

Hi, thanks for releasing this comprehensive codebase combining both DeepGCNs paper and DeeperGCNs paper. I was wondering whether you have experimented with the architectural advancements from DeeperGCNs on the 3D point cloud tasks presented in DeepGCNs? Do the new generalized aggregators, message norm, etc. improve performance for 3D point cloud tasks, too?

What is your version of PyTorch？1.2.0？or>=1.4.0

Hello! I would like to know: What is your version of PyTorch？1.2.0？or>=1.4.0
Thank you !

Latest versions of torch-sparse and torch-scatter are incompatible with pytorch 1.2

Running source deepgcn_env_install.sh to set up the local environment will report the error messages while building wheels of torch-sparse and torch-scatter as below:

/tmp/pip-install-bhreqzix/torch-scatter/csrc/segment_csr.cpp:43:24: error: ‘torch::autograd::AutogradContext’ has not been declared
   using torch::autograd::AutogradContext;

The official repository told me that the latest version requires pytorch 1.4 at least. However, the deepgcn_env_install.sh specifies pytorch=1.2. I fixed this issue by upgrading to pytorch 1.4 and reinstalling all of the extension libraries.

I'd suggest fixing the deepgcn_env_install.sh script by specifying 1.4 version of pytorch.

My two cents. Thanks.

Question about the sequence of act layer and norm layer

Thanks for the code. I'm wondering why the act layer is before norm layer in BasicConv located in torch_nn of gcn_lib.dense.

About pytorch average mIOU(over 6 areas) for S3DIS.

hello authors,

we want to get overall mIOU(6 areas) for my own trained models, but how can we get it in pytorch?

is it enough for altering from 5 to 1, ..., 6 of below sentence of attaching code(line 17-25 of test.py),
'test_dataset = GeoData.S3DIS(opt.test_path, 5, False, pre_transform=T.NormalizeScale())' ?

======
def main():
opt = OptInit().initialize()

print('===> Creating dataloader...')
test_dataset = GeoData.S3DIS(opt.test_path, 5, False, pre_transform=T.NormalizeScale())
test_loader = DenseDataLoader(test_dataset, batch_size=opt.batch_size, shuffle=False, num_workers=0)
opt.n_classes = test_loader.dataset.num_classes
if opt.no_clutter:
opt.n_classes -= 1

If the fast reply possible, we will be really grateful of you.

Thank you!

Question) Why use invertible Checkpoint Module in ogb_eff/ogbn_protiens

According to 'Training Graph Neural Networks with 1000 Layers', checkpoint consumes more memory than RevConv.
However, there is an invertible checkpoint on your code.

Is it correct that invertible checkpoint does the work of 'torch.utils.checkpoint' so that it is compatible with RevConv?
If yes, is there any reason for adding this feature even checkpoint consumes more memory than Reversible connection?

Can deep_gcn take point cloud repressed by 3d coordinate only?

I went through your paper and code and found each point has more than 3 features. However, could I take each point with xyz only and NOT including rest features like surface normal and color? Appreciate if you can provide more instruction how could I adjust the number of input channel. Thank you

TypeError: getattr(): attribute name must be string

Thank you for your good work！
But when training，i get some errors

Traceback (most recent call last):
File "sem_seg/train.py", line 141, in
main()
File "sem_seg/train.py", line 61, in main
train(model, train_loader, optimizer, scheduler, criterion, opt)
File "sem_seg/train.py", line 81, in train
out = model(data)
File "/root/anaconda3/envs/deepgcn37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/dgdr250/hd/hyydown/deep_gcns_torch/models/architecture.py", line 105, in forward
feats = [self.head(inputs, self.knn(inputs[:, 0:3]))]
File "/root/anaconda3/envs/deepgcn37/lib/python3.7/site-packages/torch_geometric/data/data.py", line 92, in getitem
return getattr(self, key, None)
TypeError: getattr(): attribute name must be string

Can you kindly tell me how to solve it?
Many thanks!!!

In the class GENConv, should the output dimension of edge_encoder be in_dim instead of emb_dim?

Hi, your work is great and interests me a lot.

In the class GENConv, should the output dimension of edge_encoder be in_dim instead of emb_dim, especially when the in_dim doesn't equal to the emb_dim? Thanks.

deep_gcns_torch/gcn_lib/sparse/torch_vertex.py

Line 57 in 03d553d

self.edge_encoder = torch.nn.Linear(edge_feat_dim, emb_dim)

Evaluation metrics for partnet dataset is different from original paper

The evaluation metrics for partnet dataset experiments is different from original paper:
https://github.com/daerduoCarey/partnet_seg_exps/blob/master/exps/sem_seg_pointcnn/test_general_seg.py#L114

There are a few points:

Second, the computation of shape IOU from your method will be wrong when batch size is greater than 1, as (i_1 + i_2) / (u_1 + u_2) is not equal to i_1 / u_1 + i_2 / u_2, where i is intersection and u is union, which is what your code is doing here

    I = np.sum(np.logical_and(cur_pred_mask, cur_gt_mask), dtype=np.float32)
    U = np.sum(np.logical_or(cur_pred_mask, cur_gt_mask), dtype=np.float32)
    cur_shape_iou_tot += I / U

Finally, imagine you have batch size 2, and you are checking whether U>0, it might be that for one shape union is non zero and for another it is zero. But your code with include points from both shapes in calculation. This will change the part mIOU metric also.

Please let me know if my understanding is not correct.

May be a serious bug

def train(data, dataset, model, optimizer, criterion, device):

loss_list = []
model.train()
sg_nodes, sg_edges, sg_edges_index, _ = data

**train_y = dataset.y[dataset.train_idx]**
idx_clusters = np.arange(len(sg_nodes))
np.random.shuffle(idx_clusters)

for idx in idx_clusters:

    x = dataset.x[sg_nodes[idx]].float().to(device)
    sg_nodes_idx = torch.LongTensor(sg_nodes[idx]).to(device)

    sg_edges_ = sg_edges[idx].to(device)
    sg_edges_attr = dataset.edge_attr[sg_edges_index[idx]].to(device)

    mapper = {node: idx for idx, node in enumerate(sg_nodes[idx])}

    **inter_idx = intersection(sg_nodes[idx], dataset.train_idx.tolist())**
    training_idx = [mapper[t_idx] for t_idx in inter_idx]

    optimizer.zero_grad()

    pred = model(x, sg_nodes_idx, sg_edges_, sg_edges_attr)

    **target = train_y[inter_idx].to(device)**  # inter_idx may be out of the maximal index range

    **loss = criterion(pred[training_idx].to(torch.float32), target.to(torch.float32))**
    loss.backward()
    optimizer.step()
    loss_list.append(loss.item())

return statistics.mean(loss_list)

About ogbg-molhiv results

Hello! (^_^)， thanks for the code.
I used the following command to train on the molhiv dataset：
python main.py --use_gpu --conv_encode_edge --num_layers 7 --dataset ogbg-molhiv --block res+ --gcn_aggr softmax --t 1.0 --learn_t --dropout 0.2
Why I can't reproduce the results in LeaderBoard？Is it something wrong with me?

My result is :
VAL_ROCAUC | TEST_ROCAUC
0.7976±0.0179 | 0.7744 ± 0.1234

LeaderBoard result:
VAL_ROCAUC | TEST_ROCAUC
0.8427 ± 0.0063 | 0.7858 ± 0.0117

lightaime / deep_gcns_torch Goto Github PK

deep_gcns_torch's Introduction

DeepGCNs: Can GCNs Go as Deep as CNNs?

Overview

How to train, test and evaluate our models

Recommended Requirements

Code Architecture

Citation

License

Contact

deep_gcns_torch's People

Contributors

Stargazers

Watchers

Forkers

deep_gcns_torch's Issues

Recommend Projects

Recommend Topics

Recommend Org