thangvubk / softgroup Goto Github PK

View Code? Open in Web Editor NEW

330.0 10.0 78.0 62.11 MB

[CVPR 2022 Oral] SoftGroup for Instance Segmentation on 3D Point Clouds

License: MIT License

Python 76.01% C++ 11.97% Cuda 9.06% C 2.61% Shell 0.35%

3d-instance-segmentation 3d-object-detection point-cloud scannet s3dis panoptic-segmentation semantic-kitti

softgroup's Introduction

SoftGroup

We provide code for reproducing results of two papers

SoftGroup for 3D Instance Segmentation on Point Clouds
Thang Vu, Kookhoi Kim, Tung M. Luu, Thanh Nguyen, and Chang D. Yoo.
CVPR 2022 (Oral).

Scalable SoftGroup for 3D Instance Segmentation on Point Clouds
Thang Vu, Kookhoi Kim, Tung M. Luu, Thanh Nguyen, Junyeong Kim, and Chang D. Yoo.
TPAMI 2023 (accepted).

Update

25/Nov/2022: Support SoftGroup++.
12/Sep/2022: Support panoptic segmentation on SemanticKITTI dataset.
28/Jun/2022: Support STPLS3D dataset. Add custom dataset guideline.
16/Apr/2022: The code base is refactored. Coding is more extendable, readable, and consistent. The following features are supported:
- Support up-to-date pytorch 1.11 and spconv 2.1.
- Support distributed and mix precision training. Training time on ScanNet v2 (on 4GPUs) reduces from 4 day to 10 hours.
- Faster inference speed, which requires only 288 ms per ScanNet scan on single Titan X.

Introduction

Existing state-of-the-art 3D instance segmentation methods perform semantic segmentation followed by grouping. The hard predictions are made when performing semantic segmentation such that each point is associated with a single class. However, the errors stemming from hard decision propagate into grouping that results in (1) low overlaps between the predicted instance with the ground truth and (2) substantial false positives. To address the aforementioned problems, this paper proposes a 3D instance segmentation method referred to as SoftGroup by performing bottom-up soft grouping followed by top-down refinement. SoftGroup allows each point to be associated with multiple classes to mitigate the problems stemming from semantic prediction errors and suppresses false positive instances by learning to categorize them as background. Experimental results on different datasets and multiple evaluation metrics demonstrate the efficacy of SoftGroup. Its performance surpasses the strongest prior method by a significant margin of +6.2% on the ScanNet v2 hidden test set and +6.8% on S3DIS Area 5 of AP_50.

Feature

State of the art performance on the ScanNet benchmark and S3DIS dataset (3/Mar/2022).
High speed of 345 ms per scan on ScanNet dataset, which is comparable with the existing fastest methods (HAIS). Our refactored implementation (this code) further reduce the inference time to 288 ms per scan.
Support multiple datasets: ScanNet, S3DIS, STPLS3D, SemanticKITTI.

Installation

Please refer to installation guide.

Data Preparation

Please refer to data preparation.

Pretrained models

Instance segmentation

Dataset	Model	AP	AP_50	AP_25	Download
S3DIS	SoftGroup	51.4	66.5	75.4	model
S3DIS	SoftGroup++	50.9	67.8	76.0	model
ScanNet v2	SoftGroup	45.8	67.4	79.1	model
ScanNet v2	SoftGroup++	45.9	67.9	79.4	above
STPLS3D	SoftGroup	47.3	63.1	71.4	model
STPLS3D	SoftGroup++	46.5	62.9	71.8	above

NOTE: SoftGroup and SoftGroup++ use can use same trained model for inference on ScanNet v2 and STPLS3D.

Panoptic segmentation

Dataset	PQ	Config	Model
SemanticKITTI	60.2	config	model

Training

We use the checkpoint of HAIS as pretrained backbone. We have already converted the checkpoint to work on spconv2.x. Download the pretrained HAIS-spconv2 model and put it in SoftGroup/ directory.

Converted hais checkpoint: model

Noted that for fair comparison with implementation in STPLS3D paper, we train SoftGroup on this dataset from scratch without pretrained backbone.

Training S3DIS dataset

The default configs suppose training on 4 GPU. If you use smaller number of GPUs, you should reduce the learning rate linearly.

First, finetune the pretrained HAIS point-wise prediction network (backbone) on S3DIS.

./tools/dist_train.sh configs/softgroup_s3dis_backbone_fold5.yaml 4

Then, train model from frozen backbone.

./tools/dist_train.sh configs/softgroup_s3dis_fold5.yaml 4

Training ScanNet V2 dataset

Training on ScanNet doesnot require finetuning the backbone. Just freeze pretrained backbone and train the model.

./tools/dist_train.sh configs/softgroup_scannet.yaml 4

Training STPLS3D dataset

./tools/dist_train.sh configs/softgroup_stpls3d_backbone.yaml 4
./tools/dist_train.sh configs/softgroup_stpls3d.yaml 4

Inference

./tools/dist_test.sh $CONFIG_FILE $CHECKPOINT $NUM_GPU

Inference without label

For example, on scannet test split, just change prefix to test and with_label to False before running inference.

Bounding box evaluation of ScanNet V2 dataset.

We provide script to evaluate detection performance on axis-aligned boxes from predicted/ground-truth instance.

Step 1: Change save_instance to True in config file.
Step 2: Run evaluation code.

CUDA_VISIBLE_DEVICES=0 python test.py --config config/softgroup_default_scannet.yaml --pretrain $PATH_TO_PRETRAIN_MODEL$

Step 3: Evaluate detection performance.

python eval_det.py

Visualization

Please refer to visualization guide for visualizing ScanNet and S3DIS results.

Custom dataset

Please refer to custom dataset guide.

Citation

If you find our work helpful for your research. Please consider citing our paper.

@inproceedings{vu2022softgroup,
  title={SoftGroup for 3D Instance Segmentation on 3D Point Clouds},
  author={Vu, Thang and Kim, Kookhoi and Luu, Tung M. and Nguyen, Xuan Thanh and Yoo, Chang D.},
  booktitle={CVPR},
  year={2022}
}

Acknowledgements

Code is built based on HAIS, PointGroup, and spconv

This work was partly supported by Institute for Information communications Technology Planning Evaluation (IITP) grant funded by the Korea government (MSIT) (2021-0-01381, Development of Causal AI through Video Understanding, and partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01371, Development of brain-inspired AI with human-like intelligence).

softgroup's People

Contributors

Stargazers

Watchers

Forkers

thanhkaist liuxinren456852 goosefather990 kookhoikim cv-ip xiaoyiyong555 aycatakmaz codewzt hgoyal194 sijanneupane49 dongxiao0401 hao-fang-92 jlutangchuan lizhinwafu sunjiahao1999 jonah-chen sjinch chenhsis ys-forks ngoductuanlhp cv-seg c6376315qqso migvega anzisheng tejaswid niuchunbo118 jackzhousz xinjunli-ustc peoplelu diagnocat shnhrtkyk zemovi botastic dddlr pointcloudyc eeqmcc wu-patrick josslei augustab krupal09 aerodyne-group zivzone hazelautumn aqsahassan steefeen dlrlseong triton99 eomsoohwan esyoon7 hee-suk-yoon jiwoohong93 tunglm2203 trungpx niuchunbo minh2h alvinchanmp u-aim-sw-starlab lihu577 linocomesana pedro-sidra-isi pointivo babyblue26 linfengup loinh1106 wangchen0812 yzj2019 zzc171582281 ranabarakat wangjuenew hridaybavle pushkalkatara windfill marcoayman perrydoremi yitao-lu

softgroup's Issues

A visualize problem consult

Hello, I have a question how you can find the typical scene to visulize for comparing w/wo. softgroup, could you please tell me the test scene name about the fig.5 in the paper? I'd like to visualize it for more information.

There was a problem training s3dis

[2022-04-09 21:26:07,585 INFO train.py line 108 23131] >>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>
Traceback (most recent call last):
File "/home/zgj/SoftGroup-main/train.py", line 220, in
eval_epoch(dataset.val_data_loader, model, model_fn, epoch)
File "/home/zgj/SoftGroup-main/train.py", line 126, in eval_epoch
logger.info("epoch: {}/{}, val loss: {:.4f}, time: {}s".format(epoch, cfg.epochs, am_dict['loss'].avg, time.time() - start_epoch))
KeyError: 'loss'

RuntimeError: There were no tensor arguments to this function

Hello, I downloaded pretrained model for ScanNetV2 from https://drive.google.com/file/d/1XUNRfred9QAEUY__VdmSgZxGQ7peG5ms/view?usp=sharing in README and ran inference.

However, I got the following error.

Traceback (most recent call last):
File "./tools/test.py", line 146, in
main()
File "./tools/test.py", line 102, in main
result = model(batch)
File "/home2/yejin/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home2/yejin/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home2/yejin/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home2/yejin/PointCloud_models/SoftGroup/softgroup/model/softgroup.py", line 99, in forward
return self.forward_test(**batch)
File "/home2/yejin/PointCloud_models/SoftGroup/softgroup/util/utils.py", line 171, in wrapper
return func(*new_args, **new_kwargs)
File "/home2/yejin/PointCloud_models/SoftGroup/softgroup/model/softgroup.py", line 246, in forward_test
self.grouping_cfg)
File "/home2/yejin/PointCloud_models/SoftGroup/softgroup/util/fp16.py", line 58, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home2/yejin/PointCloud_models/SoftGroup/softgroup/model/softgroup.py", line 344, in forward_grouping
proposals_idx = torch.cat(proposals_idx_list, dim=0)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

My GPU is A100, and using torch 1.8.1.

Also, I modified dist_test.sh line 7 to OMP_NUM_THREADS=1 python -m torch.distributed.launch --use_env --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/test.py $CONFIG $CHECK_POINT --dist ${@:4} since torchrun is supported from torch 1.11.0.

size mismatch for input_conv.0.weight: copying a param with shape torch.Size([3, 3, 3, 6, 32]) from checkpoint, the shape in current model is torch.Size([32, 3, 3, 3, 6]).

hi，I am using your previous code，I have got your pretrained model ，but when i inferrence，it throws errors as following：
Traceback (most recent call last):
File "test_s3dis.py", line 334, in
use_cuda, cfg.test_epoch, dist=False, f=cfg.pretrain)
File "/media/ketizu/086C03DE086C03DE/SoftGroup_main1_1/SoftGroup_main1/util/utils.py", line 87, in checkpoint_restore
model.load_state_dict(net_checkpoint)
File "/home/ketizu/anaconda3/envs/boge2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SoftGroup:
size mismatch for input_conv.0.weight: copying a param with shape torch.Size([3, 3, 3, 6, 32]) from checkpoint, the shape in current model is torch.Size([32, 3, 3, 3, 6]).
THANK YOU VERY MUCH

Incomplete categories after visualisation

Hi, when running visualize_open3d.py the parameter about the --prediction_path is passed in. /exp/scannetv2/softgroup/softgroup_default_scannet/result?

Is this the result of scene0016_00?

BFS group or Hierarchical grouping

Hello, Excuse me
I have read the previous version code of SoftGroup, that uses the hierarchical aggregation algorithm, but the latest version code use the BFS cluster, right? Is there any influence on the performance about the grouping algorithm choice

Semantic segmentation

Can we use this repo to do just semantic segmentation? Also can this repo be used without semantic_id and instance_id while annotating. I have created a custom dataset like S3DIS but just for semantic purpose. And I see they donot have semantic_id and instance_id on their annotation.

about S3DIS test

Hi！Thanks for the great work
I'm running the previous version of the program.I can test successfully according to readme.md. first ,
Generate preprocessed data after running bash prepare_data.sh; second, I use your open source pre training model（softgroup_s3dis.pth）to test and test data is Area2. then run "python test_s3dis.py --config/softgroup_fold5_default_s3dis.yaml'',
generate test results include semantic_pred、instance_pred; finally, I use script "visualization.py" for visualization。But,
Inference result is much worse than ground true，I don't know where went wrong。I hope I can get your answer。

Is it possible to train SoftGroup on our own dataset?

I have a dataset a point cloud. I labeled it using CloudCompare and add two Scalar Field values: the first is the semantic class and the second is the instance class.
Could I train it on my dataset?

Custom dataset training ?

Could you please suggest any resources for train the model with custom data like phone, laptop, bottle etc ,Thanking you in advance.

pretrained model

Hi ,could you please submit the pretrained model of s3dis for Baidu CloudDisk,I can not download the Google format.Thank you very much.

How to set the epoch and prepare_epoch

Hello, excuse me, I want to ask how to set the epoch value and the prepare_epoch value if I want to train from the scratch? have you tried which values to be set has a good performance

About training

Thank you for your great work.
Could you please describe how to train (train set + val set)? because i see there is a gap between val result and test result.

Is there any problem when using train_loader of scannetV2?

Hi!
I'm working with your code to train scannetV2 for 3d object detection.
Anyway, I want to test this code for a few dataset.
( ex. 2 train scene, 1 val scene, 1 test scene for scannetV2 dataset)

So, I modify train.txt, val.txt, test.txt for these 4 scene.
And my dataset structure is like below.

Fortunately, when I try to run train.py, It works.

However, there is an error like this.
In this error, it said that there is no 'loss' key in am_dict.
Therefore, I printed it and it shows that am_dict is an empty list.

Also, I tried to print train_loader of this dataset, by adding a code like "for i, batch in enumerate(train_loader): print (i)".
However, there isn't any result of this print statement. (It means, that there is no train_loader?)
So, I want to ask some help!
Thk 👍

/content/drive/MyDrive/softgroup/SoftGroup/util/config.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2022-03-21 15:47:04,679 INFO log.py line 39 7139] ************************ Start Logging ************************
[2022-03-21 15:47:04,751 INFO train.py line 22 7139] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=-1, batch_size=4, bg_thresh=0.0, block_reps=2, block_residual=True, class_numpoint_mean=[-1.0, -1.0, 3917.0, 12056.0, 2303.0, 8331.0, 3948.0, 3166.0, 5629.0, 11719.0, 1003.0, 3317.0, 4912.0, 10221.0, 3889.0, 4136.0, 2120.0, 945.0, 3967.0, 2589.0], classes=18, cluster_shift_meanActive=300, config='config/softgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', dist=False, epochs=500, eval=True, exp_path='exp/scannetv2/softgroup/softgroup_default_scannet', fg_thresh=1.0, filename_suffix='_inst_nostuff.pth', fix_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear'], full_scale=[128, 512], ignore_label=-100, input_channel=3, iou_thr=0.5, local_rank=0, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, manual_seed=123, max_npoint=250000, max_proposal_num=200, mode=4, model_dir='model/softgroup/softgroup.py', model_name='softgroup', momentum=0.9, multiplier=0.5, optim='Adam', point_aggr_radius=0.04, prepare_epochs=-1, pretrain=None, pretrain_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear', 'intra_ins_unet', 'intra_ins_outputlayer'], pretrain_path='hais_ckpt.pth', save_dir='exp', save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=20, score_mode=4, score_scale=50, score_thr=0.2, semantic_classes=20, semantic_only=False, split='val', step_epoch=200, task='train', test_epoch=500, test_mask_score_thre=-0.5, test_seed=567, test_workers=16, train_workers=4, use_coords=True, using_NMS=False, weight_decay=0.0001, width=32)
[2022-03-21 15:47:04,757 INFO train.py line 153 7139] => creating model ...
Load pretrained input_conv: 1/1
Load pretrained unet: 390/390
Load pretrained output_layer: 5/5
Load pretrained semantic_linear: 9/9
Load pretrained offset_linear: 9/9
Load pretrained intra_ins_unet: 85/85
Load pretrained intra_ins_outputlayer: 5/5
[2022-03-21 15:47:09,078 INFO train.py line 164 7139] cuda available: True
[2022-03-21 15:47:09,130 INFO train.py line 168 7139] #classifier parameters: 30839600
[2022-03-21 15:47:09,311 INFO scannetv2_inst.py line 50 7139] Training samples: 2
[2022-03-21 15:47:09,375 INFO scannetv2_inst.py line 84 7139] Validation samples: 1
Traceback (most recent call last):
File "train.py", line 221, in
train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
File "train.py", line 98, in train_epoch
logger.info("epoch: {}/{}, train loss: {:.4f}, time: {}s".format(epoch, cfg.epochs, am_dict['loss'].avg, time.time() - start_epoch))
KeyError: 'loss'

when I need test，the input format is txt with xyzrgb? i make my own dataset like s3dis,is right? Thanks

Fails when training from scratch

Hi, I tried to train the model from scratch without initializing from hais_ckpt. But it failed due to empty proposals_idx in softgroup_ops.hierarchical_aggregation. Is it possible to train from scratch?

AP = 0

My own dataset only has one class"box",Could you please help me check something wrong?

[2022-04-07 18:55:30,071 INFO test_s3dis.py line 312 29467] => creating model ...
[2022-04-07 18:55:30,071 INFO test_s3dis.py line 313 29467] Classes: 1
[2022-04-07 18:55:30,439 INFO test_s3dis.py line 325 29467] cuda available: True
[2022-04-07 18:55:32,541 INFO test_s3dis.py line 329 29467] #classifier parameters (model): 30837290
[2022-04-07 18:55:32,611 INFO utils.py line 67 29467] Restore from ./exp/s3dis/softgroup/softgroup_fold5_backbone_s3dis/softgroup_fold5_backbone_s3dis-000000030.pth
[2022-04-07 18:55:32,827 INFO test_s3dis.py line 37 29467] >>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>
[2022-04-07 18:55:32,905 INFO log.py line 39 29467] ************************ Start Logging ************************
[2022-04-07 18:55:32,906 INFO s3dis_inst.py line 92 29467] Testing samples (val): 3
[2022-04-07 18:55:44,316 INFO test_s3dis.py line 233 29467] instance iter: 1/3 point_num: 836374 ncluster: 1 inference time: 9.73s
[2022-04-07 18:55:53,497 INFO test_s3dis.py line 233 29467] instance iter: 2/3 point_num: 810770 ncluster: 1 inference time: 7.65s
[2022-04-07 18:56:00,431 INFO test_s3dis.py line 233 29467] instance iter: 3/3 point_num: 646888 ncluster: 1 inference time: 5.57s
[2022-04-07 18:56:00,435 INFO eval_s3dis.py line 289 29467]
[2022-04-07 18:56:00,435 INFO eval_s3dis.py line 290 29467] ################################################################
[2022-04-07 18:56:00,435 INFO eval_s3dis.py line 296 29467] what : AP AP_50% AP_25%
[2022-04-07 18:56:00,435 INFO eval_s3dis.py line 297 29467] ################################################################
[2022-04-07 18:56:00,435 INFO eval_s3dis.py line 307 29467] box : 0.000 0.000 0.000
[2022-04-07 18:56:00,435 INFO eval_s3dis.py line 313 29467] ----------------------------------------------------------------
[2022-04-07 18:56:00,436 INFO eval_s3dis.py line 318 29467] average : 0.000 0.000 0.000
[2022-04-07 18:56:00,436 INFO eval_s3dis.py line 319 29467]
[2022-04-07 18:56:00,436 INFO test_s3dis.py line 245 29467] whole set inference time: 22.94s, latency per frame: 7646.76ms
[2022-04-07 18:56:00,442 INFO test_s3dis.py line 250 29467] semantic_segmantation_accuracy: 1.0000
[2022-04-07 18:56:00,451 INFO test_s3dis.py line 252 29467] semantic_segmantation_mIoU: 1.0000

spconv version?

Hi bro, I hace great interesting in your paper. i want to know which version of spconv do your team have used?

Error when running the code "python setup.py bdist_wheel"

python setup.py bdist_wheel
running bdist_wheel
running build
running build_py
running build_ext
/home/shrijan_pf/SoftGroup/lib/spconv/build/lib.linux-x86_64-3.7
Release
-- Caffe2: CUDA detected: 11.1
-- Caffe2: CUDA nvcc is: /usr/local/cuda-11.1/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-11.1
-- Caffe2: Header version is: 11.1
CMake Warning (dev) at /home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/cmake/data/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
The package name passed to find_package_handle_standard_args (CUDNN) does
not match the name of the calling package (Caffe2). This can lead to
problems in calling code that expects find_package result variables
(e.g., _FOUND) to follow a certain pattern.
Call Stack (most recent call first):
/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:107 (find_package_handle_standard_args)
/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:40 (find_package)
CMakeLists.txt:23 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.

-- Found cuDNN: v? (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
CMake Error at /home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:159 (message):
PyTorch requires cuDNN 7 and above.
Call Stack (most recent call first):
/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:40 (find_package)
CMakeLists.txt:23 (find_package)

-- Configuring incomplete, errors occurred!
See also "/home/shrijan_pf/SoftGroup/lib/spconv/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
See also "/home/shrijan_pf/SoftGroup/lib/spconv/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log".
Traceback (most recent call last):
File "setup.py", line 86, in
zip_safe=False,
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "setup.py", line 39, in run
self.build_extension(ext)
File "setup.py", line 69, in build_extension
subprocess.check_call(['cmake', ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env)
File "/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/shrijan_pf/SoftGroup/lib/spconv', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/shrijan_pf/SoftGroup/lib/spconv/build/lib.linux-x86_64-3.7/spconv', '-DCMAKE_PREFIX_PATH=/home/shrijan_pf/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch', '-DPYBIND11_PYTHON_VERSION=3.7', '-DSPCONV_BuildTests=OFF', '-DCMAKE_CUDA_FLAGS="--expt-relaxed-constexpr"', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
(softgroup) shrijan_pf@BQ-DX1100-CT2:~/SoftGroup/lib/spconv$

custom dataset

Hello, this is my dataset groundtruth screen shot,my problem situation is that some of neatly stacked boxes.I want to segmantic them one by one,but when i test ,it always shows that the label has an AP with 0.Then I try to visualize it,it shows as follow screen shot:

It seems predict the whole surface as a box.Do you think my task is suitable for your work?By the way,my train dataset is 35 samples,is it because there are few samples?How many samples do you think is suitable? Thank you very much!

TypeError: transform_test() missing 2 required positional arguments: 'semantic_label' and 'instance_label'

When I run test.py on scannet test set, I got this error. There should be no labels for test set, but the dataloader asks for labels still.

Does the code work using Windows?

Hello, first of all, congratulation on your fantastic work. Currently, I am working on Windows and what I understood is that you are going to update the code to make it compatible with all OS.
Moreover, I also saw that you have already updated model/softgroup/softgroup.py. So, does the code work using Windows? If so, is it required to do the 5th step following the installation guide?

while executing "./tools/dist_train.sh configs/softgroup_s3dis_backbone_fold5.yaml 1"

Hi, I changed number of GPUs to 1 as I have only one GPU and then I started finetuning the pretrained HAIS point-wise prediction network (backbone) on S3DIS. I got firstly the following error with unchanged config file. I later changed batch size in dataloader: train: batch size to 1 and again started training. The following error is for unchanged softgroup_s3dis_backbone_fold5.yaml.

2022-04-26 16:02:13,337 - INFO - Epoch [1/20][120/1020] lr: 0.004, eta: 9:33:41, mem: 5561, data_time: 0.14, iter_time: 0.84, semantic_loss: 0.5760, offset_loss: 1.3002, loss: 1.8762
Traceback (most recent call last):
File "./tools/train.py", line 185, in
main()
File "./tools/train.py", line 178, in main
train(epoch, model, optimizer, scaler, train_loader, cfg, logger, writer)
File "./tools/train.py", line 48, in train
loss, log_vars = model(batch, return_loss=True)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, ++kwargs)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
output = self.module(*inputs[0], ++kwargs[0])
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, ++kwargs)
File "/home/shrijan_pf/SoftGroup/softgroup/model/softgroup.py", line 97, in forward
return self.forward_train(**batch)
File "/home/shrijan_pf/SoftGroup/softgroup/util/utils.py", line 171, in wrapper
return func(new_args, ++new_kwargs)
File "/home/shrijan_pf/SoftGroup/softgroup/model/softgroup.py", line 109, in forward_train
semantic_scores, pt_offsets, output_feats = self.forward_backbone(input, v2p_map)
File "/home/shrijan_pf/SoftGroup/softgroup/model/softgroup.py", line 268, in forward_backbone
pt_offsets = self.offset_linear(output_feats)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, ++kwargs)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, ++kwargs)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/modules/activation.py", line 98, in forward
return F.relu(input, inplace=self.inplace)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/nn/functional.py", line 1442, in relu
result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 106.00 MiB (GPU 0; 7.79 GiB total capacity; 5.49 GiB already allocated; 29.00 MiB free; 6.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 30288) of binary: /home/shrijan_pf/anaconda3/envs/softgroup2/bin/python
_Traceback (most recent call last):
File "/home/shrijan_pf/anaconda3/envs/softgroup2/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(args, kwargs)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
)(cmd_args)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self.entrypoint, list(args))
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

./tools/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-04-26_16:02:26
host : BQ-DX1100-CT2
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 30288)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Then I changed the batch size to 1 and started training again. After few epochs I got the following error.
2022-04-26 17:14:51,755 - INFO - Epoch [4/20][540/4080] lr: 0.0038, eta: 5:41:24, mem: 3137, data_time: 0.00, iter_time: 0.14, semantic_loss: 0.1476, offset_loss: 0.4536, loss: 0.6012
Traceback (most recent call last):
File "./tools/train.py", line 185, in
main()
File "./tools/train.py", line 178, in main
train(epoch, model, optimizer, scaler, train_loader, cfg, logger, writer)
File "./tools/train.py", line 44, in train
for i, batch in enumerate(train_loader, start=1):
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/shrijan_pf/SoftGroup/softgroup/data/s3dis.py", line 77, in collate_fn
return super().collate_fn(batch)
File "/home/shrijan_pf/SoftGroup/softgroup/data/custom.py", line 216, in collate_fn
assert batch_id > 0, 'empty batch'
AssertionError: empty batch

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 30691) of binary: /home/shrijan_pf/anaconda3/envs/softgroup2/bin/python
Traceback (most recent call last):
File "/home/shrijan_pf/anaconda3/envs/softgroup2/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
)(*cmd_args)
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/shrijan_pf/anaconda3/envs/softgroup2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

./tools/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-04-26_17:14:56
host : BQ-DX1100-CT2
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 30691)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

python train.py --config config/softgroup_fold5_backbone_s3dis.yaml

Hi,excuse me .When I run the python train.py --config config/softgroup_fold5_backbone_s3dis.yaml. It throws the following error.Could you please help me ?

Traceback (most recent call last):
File "train.py", line 218, in
train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
File "train.py", line 48, in train_epoch
for i, batch in enumerate(train_loader):
File "/home/ketizu/anaconda3/envs/boge2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/home/ketizu/anaconda3/envs/boge2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/ketizu/anaconda3/envs/boge2/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
TypeError: init() missing 1 required positional argument: 'dtype'

spconv build failed!

I'm using py37, torch 1.10, cuda 10.2, cudnn 8.4

Met error when running python setup.py bdist_wheel

Console printouts:

/SoftGroup/lib/spconv$ python setup.py bdist_wheel
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/spconv
copying spconv/conv.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/functional.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/modules.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/ops.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/pool.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/test_utils.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/init.py -> build/lib.linux-x86_64-3.7/spconv
creating build/lib.linux-x86_64-3.7/spconv/utils
copying spconv/utils/init.py -> build/lib.linux-x86_64-3.7/spconv/utils
running build_ext
/media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/build/lib.linux-x86_64-3.7
Release
-- The CXX compiler identification is GNU 7.5.0
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "10.2")
-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 10.2
-- Found CUDNN: /usr/local/cuda/lib64/libcudnn.so
-- Found cuDNN: v8.4.0 (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so)
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is 08c4863f
-- Autodetected CUDA architecture(s): 7.5+PTX
-- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_75,code=compute_75
CMake Warning at /media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
CMakeLists.txt:23 (find_package)

-- Found Torch: /media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/lib/libtorch.so
-- Found PythonInterp: /media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/bin/python3.7 (found suitable version "3.7.5", minimum required is "3.7")
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.7m.so
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- pybind11 v2.3.dev0
-- Configuring done
-- Generating done
-- Build files have been written to: /media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/build/temp.linux-x86_64-3.7
Scanning dependencies of target spconv
Scanning dependencies of target spconv_nms
[ 7%] Building CUDA object src/utils/CMakeFiles/spconv_nms.dir/nms.cu.o
[ 14%] Building CXX object src/spconv/CMakeFiles/spconv.dir/all.cc.o
[ 21%] Building CXX object src/spconv/CMakeFiles/spconv.dir/indice.cc.o
[ 28%] Building CUDA object src/spconv/CMakeFiles/spconv.dir/indice.cu.o
[ 35%] Linking CUDA device code CMakeFiles/spconv_nms.dir/cmake_device_link.o
[ 42%] Linking CUDA shared library ../../../lib.linux-x86_64-3.7/spconv/libspconv_nms.so
[ 42%] Built target spconv_nms
[ 50%] Building CXX object src/spconv/CMakeFiles/spconv.dir/reordering.cc.o
/media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/src/spconv/all.cc:20:91: error: no matching function for call to ‘torch::jit::RegisterOperators::RegisterOperators(const char [28], )’
torch::jit::RegisterOperators("spconv::get_indice_pairs_2d", &spconv::getIndicePair<2>)
^
In file included from /media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/script.h:7:0,
from /media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/include/spconv/pool_ops.h:20,
from /media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/src/spconv/all.cc:16:
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:22:12: note: candidate: torch::jit::RegisterOperators::RegisterOperators(std::vector<c10::optionaltorch::jit::Operator >)
explicit RegisterOperators(std::vector<c10::optional> operators) {
^~~~~~~~~~~~~~~~~
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:22:12: note: candidate expects 1 argument, 2 provided
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:17:3: note: candidate: constexpr torch::jit::RegisterOperators::RegisterOperators()
RegisterOperators() = default;
^~~~~~~~~~~~~~~~~
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:17:3: note: candidate expects 0 arguments, 2 provided
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:16:18: note: candidate: constexpr torch::jit::RegisterOperators::RegisterOperators(const torch::jit::RegisterOperators&)
struct TORCH_API RegisterOperators {
^~~~~~~~~~~~~~~~~
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:16:18: note: candidate expects 1 argument, 2 provided
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:16:18: note: candidate: constexpr torch::jit::RegisterOperators::RegisterOperators(torch::jit::RegisterOperators&&)
/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/include/torch/csrc/jit/runtime/custom_operator.h:16:18: note: candidate expects 1 argument, 2 provided
src/spconv/CMakeFiles/spconv.dir/build.make:62: recipe for target 'src/spconv/CMakeFiles/spconv.dir/all.cc.o' failed
make[2]: *** [src/spconv/CMakeFiles/spconv.dir/all.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
Scanning dependencies of target spconv_utils
[ 57%] Building CXX object src/utils/CMakeFiles/spconv_utils.dir/all.cc.o
CMakeFiles/Makefile2:108: recipe for target 'src/spconv/CMakeFiles/spconv.dir/all' failed
make[1]: *** [src/spconv/CMakeFiles/spconv.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 64%] Linking CXX shared library ../../../lib.linux-x86_64-3.7/spconv/spconv_utils.cpython-37m-x86_64-linux-gnu.so
[ 64%] Built target spconv_utils
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
File "setup.py", line 86, in
zip_safe=False,
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.7/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "setup.py", line 39, in run
self.build_extension(ext)
File "setup.py", line 70, in build_extension
subprocess.check_call(['cmake', '--build', '.'] + build_args, cwd=self.build_temp)
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j4']' returned non-zero exit status 2.

CMake Error Log:

Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed.
Compiler: /usr/local/cuda/bin/nvcc
Build flags: "--expt-relaxed-constexpr"
Id flags: -v;--keep;--keep-dir;tmp

The output was:
1
#$ NVVM_BRANCH=nvvm
#$ SPACE=
#$ CUDART=cudart
#$ HERE=/usr/local/cuda/bin
#$ THERE=/usr/local/cuda/bin
#$ TARGET_SIZE=
#$ TARGET_DIR=
#$ TARGET_DIR=targets/x86_64-linux
#$ TOP=/usr/local/cuda/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib:/usr/local/cuda/lib64:/usr/local/cuda/bin:/home/yangtian/anaconda3/condabin:/home/yangtian/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/yangtian/anaconda3/bin
#$ PATH=/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/bin:/usr/local/cuda/bin:/home/yangtian/anaconda3/condabin:/home/yangtian/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/yangtian/anaconda3/bin
#$ INCLUDES="-I/usr/local/cuda/bin/../targets/x86_64-linux/include"
#$ LIBRARIES= "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
nvcc fatal : Don't know what to do with '"--expt-relaxed-constexpr"'

Determining if the pthread_create exist failed with the following output:
Change Dir: /media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_de591/fast"
/usr/bin/make -f CMakeFiles/cmTC_de591.dir/build.make CMakeFiles/cmTC_de591.dir/build
make[1]: Entering directory '/media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeTmp'
Building CXX object CMakeFiles/cmTC_de591.dir/CheckSymbolExists.cxx.o
/usr/bin/c++ -DVERSION_INFO="1.0" -o CMakeFiles/cmTC_de591.dir/CheckSymbolExists.cxx.o -c /media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeTmp/CheckSymbolExists.cxx
Linking CXX executable cmTC_de591
/usr/local/cmake-3.13.5-Linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/cmTC_de591.dir/link.txt --verbose=1
/usr/bin/c++ -DVERSION_INFO="1.0" CMakeFiles/cmTC_de591.dir/CheckSymbolExists.cxx.o -o cmTC_de591
CMakeFiles/cmTC_de591.dir/CheckSymbolExists.cxx.o: In function main': CheckSymbolExists.cxx:(.text+0x1b): undefined reference to pthread_create'
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_de591.dir/build.make:86: recipe for target 'cmTC_de591' failed
make[1]: *** [cmTC_de591] Error 1
make[1]: Leaving directory '/media/yangtian/SATA3/Workspace/SoftGroup/lib/spconv/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeTmp'
Makefile:121: recipe for target 'cmTC_de591/fast' failed
make: *** [cmTC_de591/fast] Error 2

RuntimeErorr: Cuda out of memory.

I'm running the latest version of the program. I can train successfully using the softgroup_s3dis_backbone_fold5.yaml file. When training with the softgroup_s3dis_fold5.yaml file, when the first Epoch training is finished and validation starts, the progress bar starts to report an error after reaching 11%. RuntimeErorr: Cuda out of memory. I tried to reduce the learning rate to 0.001 but found that it did not work. I only have a 3080 video card with 10G of video memory, is there any way to continue the training if I want to?

duplicated

Error when use pytorch11.0

I meet some errors when compile the code using pytorh11.0+cuda11.4?
Could you provide a version which can support cuda >11? Thanks.

S3D

RuntimeError: expected a non-empty list of Tensors

Hello,when I run python train.py --config config/softgroup_fold5_default_s3dis.yaml,it throw the following error.Could you please tell me how to deal it?

Traceback (most recent call last):
File "test_s3dis.py", line 338, in
test(model, model_fn, data_name, cfg.test_epoch)
File "test_s3dis.py", line 66, in test
preds = model_fn(batch, model, epoch)
File "/home/vipuser/PycharmProjects/pythonProject/SoftGroup_main1/model/softgroup/softgroup.py", line 482, in test_model_fn
ret = model(input_, p2v_map, coords_float, batch_idxs, batch_offsets, epoch, 'test', split=True, semantic_only=semantic_only)
File "/home/vipuser/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/vipuser/PycharmProjects/pythonProject/SoftGroup_main1/model/softgroup/softgroup.py", line 385, in forward
proposals_idx = torch.cat(proposals_idx_list, dim=0)
RuntimeError: expected a non-empty list of Tensors

ValueError: invalid literal for int() with base 10: '106.000000'

bash prepare_data.sh
Processing: Area_1
prepare_data_inst.py:165: DtypeWarning: Columns (5) have mixed types.Specify dtype option on import or set low_memory=False.
room_label) = read_s3dis_format(area_id, room_name, data_root)
Traceback (most recent call last):
File "prepare_data_inst.py", line 165, in
room_label) = read_s3dis_format(area_id, room_name, data_root)
File "prepare_data_inst.py", line 67, in read_s3dis_format
rgb = np.ascontiguousarray(room_ver[:, 3:6], dtype='uint8')
ValueError: invalid literal for int() with base 10: '106.000000'
processing: random sample
1/41 Area_1_conferenceRoom_1
2/41 Area_1_conferenceRoom_2
3/41 Area_1_copyRoom_1
4/41 Area_1_hallway_1
5/41 Area_1_hallway_2
6/41 Area_1_hallway_4
7/41 Area_1_hallway_5
8/41 Area_1_hallway_6
9/41 Area_1_hallway_7
10/41 Area_1_hallway_8
11/41 Area_1_office_10
12/41 Area_1_office_11
13/41 Area_1_office_12
14/41 Area_1_office_13
15/41 Area_1_office_14
16/41 Area_1_office_15
17/41 Area_1_office_16
18/41 Area_1_office_17
19/41 Area_1_office_18
20/41 Area_1_office_19
21/41 Area_1_office_1
22/41 Area_1_office_20
23/41 Area_1_office_22
24/41 Area_1_office_23
25/41 Area_1_office_24
26/41 Area_1_office_25
27/41 Area_1_office_26
28/41 Area_1_office_27
29/41 Area_1_office_28
30/41 Area_1_office_29
31/41 Area_1_office_2
32/41 Area_1_office_30
33/41 Area_1_office_31
34/41 Area_1_office_3
35/41 Area_1_office_4
36/41 Area_1_office_5
37/41 Area_1_office_6
38/41 Area_1_office_7
39/41 Area_1_office_8
40/41 Area_1_office_9
41/41 Area_1_pantry_1

I got error when doing bash prepare_data.sh. Also I only got processing Area_1 only. Can you please help me with this issue. How can I process all Areas? Thank you.

TypeError: 'NoneType' object is not callable

I'm running the latest version of the code. During the training of s3dis data set, the following problems occurred at the end of the first epoch. I hope to seek help. Thank you.

2022-04-18 22:25:18,343 - INFO - Epoch [1/20][1020/1020] lr: 0.004, eta: 7:57:16, mem: 3470, data_time: 0.65, iter_time: 0.94, semantic_loss: 0.4898, offset_loss: 0.7772, loss: 1.2671
2022-04-18 22:25:24,027 - INFO - Epoch [1/20][1020/1020] lr: 0.004, eta: 7:56:36, mem: 3470, data_time: 0.00, iter_time: 0.19, semantic_loss: 0.3694, offset_loss: 0.6032, loss: 0.9726
2022-04-18 22:25:25,085 - INFO - Validation
0%| | 0/68 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/zgj/SoftGroup-main/tools/train.py", line 94, in validate
result = model(batch)
TypeError: 'NoneType' object is not callable
0%| | 0/68 [00:01<?, ?it/s]

Dataloader worker is killed

Hi, I encountered a suspicious problem after running a few hundreds of training epochs:

RuntimeError: DataLoader worker (pid 75890) is killed by signal: Killed.

The dataloader worker seemed to be stuck. Have you met the same problem and how can I solve it?

Got size error while loading checkpoints

I'm trying out spconv2, but got following error when loading provided pretrained model. spconv2 is installed by pip install spconv2-cu102.
Envs: ubuntu 18, pytorch 1.10, cuda 10.2, spconv2

[2022-04-13 11:52:35,200 INFO test.py line 306 11019] #classifier parameters (model): 30839600
[2022-04-13 11:52:35,244 INFO utils.py line 67 11019] Restore from checkpoints/softgroup_scannet.pth
Traceback (most recent call last):
File "test.py", line 311, in
use_cuda, cfg.test_epoch, dist=False, f=cfg.pretrain)
File "/media/yangtian/SATA3/Workspace/室内3d目标检测/SoftGroup/util/utils.py", line 83, in checkpoint_restore
model.load_state_dict(net_checkpoint)
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SoftGroup:
size mismatch for input_conv.0.weight: copying a param with shape torch.Size([3, 3, 3, 6, 32]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 32, 6]).
size mismatch for unet.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 32, 64]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 64, 32]).
size mismatch for unet.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 64, 96]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 96, 64]).
size mismatch for unet.u.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 96, 128]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 128, 96]).
size mismatch for unet.u.u.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 128, 160]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 160, 128]).
size mismatch for unet.u.u.u.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 160, 192]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 192, 160]).
size mismatch for unet.u.u.u.u.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 192, 224]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 224, 192]).
size mismatch for unet.u.u.u.u.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 224, 192]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 192, 224]).
size mismatch for unet.u.u.u.u.u.blocks_tail.block0.i_branch.0.weight: copying a param with shape torch.Size([1, 1, 1, 384, 192]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 192, 384]).
size mismatch for unet.u.u.u.u.u.blocks_tail.block0.conv_branch.2.weight: copying a param with shape torch.Size([3, 3, 3, 384, 192]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 192, 384]).
size mismatch for unet.u.u.u.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 192, 160]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 160, 192]).
size mismatch for unet.u.u.u.u.blocks_tail.block0.i_branch.0.weight: copying a param with shape torch.Size([1, 1, 1, 320, 160]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 160, 320]).
size mismatch for unet.u.u.u.u.blocks_tail.block0.conv_branch.2.weight: copying a param with shape torch.Size([3, 3, 3, 320, 160]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 160, 320]).
size mismatch for unet.u.u.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 160, 128]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 128, 160]).
size mismatch for unet.u.u.u.blocks_tail.block0.i_branch.0.weight: copying a param with shape torch.Size([1, 1, 1, 256, 128]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 128, 256]).
size mismatch for unet.u.u.u.blocks_tail.block0.conv_branch.2.weight: copying a param with shape torch.Size([3, 3, 3, 256, 128]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 128, 256]).
size mismatch for unet.u.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 128, 96]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 96, 128]).
size mismatch for unet.u.u.blocks_tail.block0.i_branch.0.weight: copying a param with shape torch.Size([1, 1, 1, 192, 96]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 96, 192]).
size mismatch for unet.u.u.blocks_tail.block0.conv_branch.2.weight: copying a param with shape torch.Size([3, 3, 3, 192, 96]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 96, 192]).
size mismatch for unet.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 96, 64]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 64, 96]).
size mismatch for unet.u.blocks_tail.block0.i_branch.0.weight: copying a param with shape torch.Size([1, 1, 1, 128, 64]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 64, 128]).
size mismatch for unet.u.blocks_tail.block0.conv_branch.2.weight: copying a param with shape torch.Size([3, 3, 3, 128, 64]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 64, 128]).
size mismatch for unet.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 64, 32]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 32, 64]).
size mismatch for unet.blocks_tail.block0.i_branch.0.weight: copying a param with shape torch.Size([1, 1, 1, 64, 32]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 32, 64]).
size mismatch for unet.blocks_tail.block0.conv_branch.2.weight: copying a param with shape torch.Size([3, 3, 3, 64, 32]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 32, 64]).
size mismatch for intra_ins_unet.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 32, 64]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 64, 32]).
size mismatch for intra_ins_unet.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 64, 32]) from checkpoint, the shape in current model is torch.Size([2, 2, 2, 32, 64]).
size mismatch for intra_ins_unet.blocks_tail.block0.i_branch.0.weight: copying a param with shape torch.Size([1, 1, 1, 64, 32]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 32, 64]).
size mismatch for intra_ins_unet.blocks_tail.block0.conv_branch.2.weight: copying a param with shape torch.Size([3, 3, 3, 64, 32]) from checkpoint, the shape in current model is torch.Size([3, 3, 3, 32, 64]).

is it taking .ply file to generate results?

Please tell me I have generated ply file can I use it to generate segmentation of classes

error: command 'gcc' failed with exit status 1

When I use your new code,i run the python setup.py build_ext develop,it throws the error command 'gcc' failed with exit status 1,have you met it?

Results on S3DIS benefit from pretraining on ScanNet, so is it a fair comparision with other methods?

Hi, thanks for your great job on SoftGroup. I have a question about the training of S3DIS. It seems that the model is initialized from a model pre-trained on ScanNet. This means that external data is used in the evaluation process of S3DIS. Are the results comparable to those of other work (like PointGroup, SSTNet, and HAIS)?

scannet training out of memory?

Training on RTX3090, the GPU consumption continues to rise.
At the 8th epoch, it's out of memory.

@thangvubk

Performance gap between self-tested and on paper

dataset

Hi, thank you for your job!
I made a dateset like S3DIS, but is smaller smaller than it. preprocess use --prepare_data.sh. But when I train, ……
WRONG: point num < 20000, continue
I try to change 20000 to 2000 or 20 , but it also happen.
can you give me some advice when I use small data. THANK YOU!

[WIP] Code refactor

#16 refactors the project aiming for cleaner and possibly faster code. Afterward, we have plan to support spconv2x, distributed training, mix-precision training, and possible more datasets/methods.

Visualize_open3D.py usage

Hello,
I trained the network with the S3DIS dataset.
Now I am trying to use the visualization, but I am probably doing something wrong.

I followed the exact steps in from the 'read me' and the data preparation and got a trained model with AP-50 of ~66%.
I did end up changing the backbone/default_s3dis.yaml files to use the s3dis_inst.py file instead of the default scannetv2_inst.py file (because I use the s3dis dataset, although it also worked with the scannet version which is weird isn't it?) and I enabled:
save_semantic: True
save_pt_offsets: True
save_instance: True
in the default_s3dis, not for the backbone training (does the backbone training need these settings enabled as well maybe?).
I did not change anything in the folder structure.

Currently I am using this command from ~/Softgroup:
$ python visualize_open3d.py --data_path dataset/s3dis/preprocess --prediction_path exp/s3dis/softgroup/softgroup_fold5_default_s3dis/result/val/Area_5/predicted_masks --data_split {} --room_name Area_5_conferenceRoom_2 --task {}

I am getting the following error:
AssertionError: File not exist - dataset/s3dis/preprocess/{}/Area_5_conferenceRoom_2_inst_nostuff.pth.

This is definitely true because the file path to the .pth file does not contain "/{}/", but even after changing this so it reaches the file does not work. If I do that I get this error:
xyz, rgb, label, inst_label = torch.load(input_file)
ValueError: too many values to unpack (expected 4)

I understand the errors, but I am confused because I am not sure what command to use to visualize it (in other words: what files does it expect?). I also read something on the need to open tensor board with the command but I couldn't get it to work.

Hopefully u can help me.
Kind Regards,
Goose.

About S3DIS evaluate

How to get these Metric mCov mWCov mPrec@50 mRec@50 on S3DIS?
Thank you in advance！

MemoryError: std::bad_alloc

Thanks for sharing the code. When I try to train a softgroup, the memory consumption increases slowly. When training to 100 epochs, the following error occurred:
Traceback (most recent call last): File "/home/logic/Desktop/Project/SoftGroup/train.py", line 220, in <module> train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch) File "/home/logic/Desktop/Project/SoftGroup/train.py", line 61, in train_epoch loss, _, visual_dict, meter_dict = model_fn(batch, model, epoch, semantic_only=cfg.semantic_only) File "/home/logic/Desktop/Project/SoftGroup/model/softgroup/softgroup.py", line 556, in model_fn ret = model(input_, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch, 'train', semantic_only=semantic_only) File "/home/logic/anaconda3/envs/hais/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/logic/Desktop/Project/SoftGroup/model/softgroup/softgroup.py", line 429, in forward input_feats, inp_map = self.clusters_voxelization(proposals_idx, proposals_offset, output_feats, coords, self.score_fullscale, self.score_scale, self.score_mode) File "/home/logic/Desktop/Project/SoftGroup/model/softgroup/softgroup.py", line 255, in clusters_voxelization out_coords, inp_map, out_map = softgroup_ops.voxelization_idx(clusters_coords, int(clusters_idx[-1, 0]) + 1, mode) File "/home/logic/Desktop/Project/SoftGroup/lib/softgroup_ops/functions/softgroup_ops.py", line 209, in forward SOFTGROUP_OP.voxelize_idx(coords, output_coords, input_map, output_map, batchsize, mode) MemoryError: std::bad_alloc
Have you encountered a similar problem, and can you provide some suggestions for solving this problem?

visualize.py usage

Hello.
Thank you for the great work and sharing it.

This is the command from #6.
python visualization.py --dataset s3dis --prediction_path ./exp/s3dis/softgroup/softgroup_fold5_default_s3dis/result/ --task instance_pred --out test.ply --data_split Area_5 --room_name Area_5_WC_1
In this command, where did exp folder come from?(--prediction_path)

I ran inference using test.py and I got the following folders.
-inference
-coords
-gt_instance
-offset_label
-offset_pred
-pred_instance
-semantic_label
-semantic_pred
So. I don't have a exp folder of result folder.

This my command.
!python ./tools/visualization.py --dataset s3dis --prediction_path ./inference --data_split Area_3 --room_name Area_3_conferenceRoom_1 --task instance_pred --out A3_CR1.ply

And I got the error
AssertionError: No instance result - ./inference/val/Area_3/Area_3_conferenceRoom_1.txt.

Hopefully you can help me.

Can I train the model without hais_ckpt?

fatal error: THC/THC.h: No such file or directory

Hi I would like to try out your repo but when I do installation step 5) python setup.py build_ext develop. It gives

In file included from /code/SoftGroup/lib/softgroup_ops/src/softgroup_ops.h:5,
from /code/SoftGroup/lib/softgroup_ops/src/softgroup_api.cpp:4:
/code/SoftGroup/lib/softgroup_ops/src/hierarchical_aggregation/hierarchical_aggregation.h:9:10: fatal error: THC/THC.h: No such file or directory

I found THC.h file is only in cuTorch code which is very old. Am I supposed to compile Torch7 and cuTorch?

Another note is that I see you are using spconv 1.x, and now sparseconv offers 2.x for faster speed. spconv 2.x can be installed with pip under cuda 11.4 version, which is more modern. Are you considering updating to spconv 2.x? For your interest, there is a summary on what you need to do to update from 1.x to 2.x.

the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:259

After downgrading pytorch to 1.1 and cuda to 10.0, the spconv1 build passed. But still met cuda error when running test.py

[2022-04-13 16:12:19,930 INFO test.py line 289 26347] => creating model ...
[2022-04-13 16:12:19,930 INFO test.py line 290 26347] Classes: 18
[2022-04-13 16:12:20,108 INFO test.py line 302 26347] cuda available: True
[2022-04-13 16:20:05,853 INFO test.py line 306 26347] #classifier parameters (model): 30839600
[2022-04-13 16:20:05,873 INFO utils.py line 67 26347] Restore from checkpoints/softgroup_scannet.pth
[2022-04-13 16:20:05,964 INFO test.py line 36 26347] >>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>
[2022-04-13 16:20:06,064 INFO scannetv2_inst.py line 92 26347] Testing samples (test): 2
Traceback (most recent call last):
File "test.py", line 315, in
test(model, model_fn, data_name, cfg.test_epoch)
File "test.py", line 58, in test
preds = model_fn(batch, model, epoch)
File "/media/yangtian/SATA3/Workspace/SoftGroup/model/softgroup/softgroup.py", line 473, in test_model_fn
ret = model(input_, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch, 'test', semantic_only=semantic_only)
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/yangtian/SATA3/Workspace/SoftGroup/model/softgroup/softgroup.py", line 316, in forward
output = self.input_conv(input)
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/spconv/conv.py", line 157, in forward
outids.shape[0])
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/spconv/functional.py", line 83, in forward
return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)
File "/media/yangtian/SATA3/PyEnvs/ubuntu/softgroup-env/lib/python3.7/site-packages/spconv/ops.py", line 112, in indice_conv
int(inverse), int(subm))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:259

Train SoftGroup on STPLS3D Dataset

First, thanks a lot for this big contribution to 3D point cloud analysis.

Recently ane dataset has been released and described in this paper:
STPLS3D: A Large-Scale Synthetic and Real Aerial Photogrammetry 3D Point Cloud Dataset
https://arxiv.org/abs/2203.09065
Do you have any recommendations on how to train SoftGroup on this dataset?

thangvubk / softgroup Goto Github PK

softgroup's Introduction

SoftGroup

Update

Introduction

Feature

Installation

Data Preparation

Pretrained models

Instance segmentation

Panoptic segmentation

Training

Training S3DIS dataset

Training ScanNet V2 dataset

Training STPLS3D dataset

Inference

Inference without label

Bounding box evaluation of ScanNet V2 dataset.

Visualization

Custom dataset

Citation

Acknowledgements

softgroup's People

Contributors

Stargazers

Watchers

Forkers

softgroup's Issues

./tools/train.py FAILED

Failures: <NO_OTHER_FAILURES>

./tools/train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2022-04-26_17:14:56 host : BQ-DX1100-CT2 rank : 0 (local_rank: 0) exitcode : 1 (pid: 30691) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-04-26_17:14:56
host : BQ-DX1100-CT2
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 30691)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html