haiyang-w / dsvt Goto Github PK
View Code? Open in Web Editor NEW[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
Home Page: https://arxiv.org/abs/2301.06051
License: Apache License 2.0
[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
Home Page: https://arxiv.org/abs/2301.06051
License: Apache License 2.0
Hello, POST_CENTER_LIMIT_RANGE: [-80, -80, -10.0, 80, 80, 10.0]
in POST_PROCESSING
, How is this range set? Does it have an effect on performance?
This is a great work, but there are no experiments conducted on the KITTI dataset in the paper. I would like to validate the performance of DSVT on the KITTI dataset. Could you please provide a configuration file for DSVT on the KITTI dataset?
Congratulations on being accepted by CVPR!
I got lots of 'invalid static_cast' error when I execute 'python setup.py install'
my env torch == 1.13.0+cu117 ,CUDA Version: 11.7
Once I annotated the following code in setup.py, the installation was complete,which means the error is in ingroup_inds_cuda
make_cuda_ext( name='ingroup_inds_cuda', module='pcdet.ops.ingroup_inds', sources=[ 'src/ingroup_inds.cpp', 'src/ingroup_inds_kernel.cu', ]
Maybe it's my pytorch version, but the versions you recommended below torch1.10 don't fit my cuda version 11.7,Since I don't have sudo permission, it's difficult to change the cuda version.
Could you please see if there is any solution?Thanks!
2023-06-30 02:10:13,430 INFO epoch: 14/20, acc_iter=112400, cur_iter=4263/7724, batch_size=4, time_cost(epoch): 1:04:10/52:05, time_cost(all): 16:04:06/10:33:21, loss=1.5466811966896057, d_time=0.02(0.02), f_time=0.81(0.88), b_time=0.83(0.90), norm=28.733774185180664, lr=0.002140055979797459
2023-06-30 02:10:58,540 INFO epoch: 14/20, acc_iter=112450, cur_iter=4313/7724, batch_size=4, time_cost(epoch): 1:04:55/51:20, time_cost(all): 16:04:51/10:32:35, loss=1.5467678713798523, d_time=0.02(0.02), f_time=0.87(0.88), b_time=0.89(0.90), norm=28.955286026000977, lr=0.002135863906495637
2023-06-30 02:11:03,331 INFO Save latest model to /root/paddlejob/workspace/env_run/DSVT/output/cfgs/dsvt_models/dsvt_plain_1f_onestage_nusences/default/ckpt/latest_model
2023-06-30 02:11:43,555 INFO epoch: 14/20, acc_iter=112500, cur_iter=4363/7724, batch_size=4, time_cost(epoch): 1:05:40/50:35, time_cost(all): 16:05:36/10:31:49, loss=1.548808376789093, d_time=0.02(0.02), f_time=0.80(0.88), b_time=0.82(0.90), norm=29.192405700683594, lr=0.002131672879084205
/bin/sh: gpustat: command not found
2023-06-30 02:11:43,990 INFO
[E ProcessGroupNCCL.cpp:587] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803000 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803001 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803057 milliseconds before timing out.
2023-06-30 09:52:59,721 INFO Save latest model to /root/paddlejob/workspace/env_run/DSVT/output/cfgs/dsvt_models/dsvt_plain_1f_onestage_nusences/default/ckpt/latest_model
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803001 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803000 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803057 milliseconds before timing out.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13116 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 13117) of binary: /usr/bin/python3.7
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2023-06-30_09:53:05
host : 10-67-245-145.local
rank : 2 (local_rank: 2)
exitcode : -6 (pid: 13120)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 13120
[2]:
time : 2023-06-30_09:53:05
host : 10-67-245-145.local
rank : 3 (local_rank: 3)
exitcode : -6 (pid: 13122)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 13122
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-06-30_09:53:05
host : 10-67-245-145.local
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 13117)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 13117
Hi! I tried to train the model with four Tesla V100 but encountered this problem. Could you please suggest any hint to fix it? thanks!
python3 -m torch.distributed.launch \
--nproc_per_node=4 \
--rdzv_endpoint=localhost:14430 train.py \
--launcher pytorch \
--cfg_file ./cfgs/dsvt_models/dsvt_plain_D512e.yaml \
--sync_bn --logger_iter_interval 500
/home/user/.local/lib/python3.10/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
terminate called after throwing an instance of 'std::bad_alloc'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
what(): std::bad_alloc
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 36) of binary: /usr/bin/python3
scripts/dist_train.sh: line 17: 26 Segmentation fault (core dumped) python3 -m torch.distributed.launch --nproc_per_node=${NGPUS} --rdzv_endpoint=localhost:${PORT} train.py --launcher pytorch ${PY_ARGS}
When I used DSVT on my own dataset, I found that it has excellent detection ability for nearby objects (0-30m), but the mean Average Precision drops significantly at long distances (30-60m,60m-inf) compared to Centerpoints model. Can we explain it?
May I ask how to calculate the fps of the model
Good work! Any plan to release tensorrt convert codes?
I want to know performance in the paper is base on which version of tensorrt.
Thanks for your excellent work, I wanted to reproduce the experimental results of the two-stage DSVT-TS on waymo's 100% training set, but I did not find the relevant two-stage configuration file with CT3D as the model, all of which are the first-stage configuration files with CenterPoint as the model.
Can you upload or tell me where it is?
Thank you very much
Hi, can DynamicPillarVFE be replaced with regular VFE layers like PillarVFE3D or even MeanVFE, and how bad is the performance difference? Thanks.
In paper B.1.1,it is mentioned here that both DSVT-P and DSVT-V have 4 DSVT blocks, but it seems that dsvt_models/dsvt_plain_1f_onestage.yaml
is only 1 block.
I encountered too slow evaluation process of WOD using waymo metrics. This issue seems a common issue for openpcdet. If I switch to kitti metrics it runs to OOM (RAM:120G). I wonder whether you may provide some instructions.
Hello, thank you for releasing nuscenes update!
Could you tell me please whether using DynPillarVFE is right in dsvt_plain_1f_onestage_nusences.yaml?
In other versions (e.g. for waymo) you used DynPillarVFE3D, so I'm wondering why is it different for nuscenes?
I use:
batchsize_pergpu=4, gpus = 2;
lr = 0.002
I know"If you encounter a gradient that becomes NaN during fp16 training, don't worry, it's normal. You can try a few more times."
But everytimes after a short training period (within 100 iterations), loss will become nan, and I have ensured that I have tried many times (dozens or even over a hundred times).
I only modified the path of the configuration file, and I ensured that every training session did not load the previous last_model with a loss of nan
It always shows:
epochs: 0%| | 0/24 [00:25<?, ?it/s, loss_hm=nan, loss_loc=nan, loss=nan, lr=2e-5, d_time=0.00(0.02), f_time=0.68(0.70), b_tWARNING:tensorboardX.x2num:NaN or Inf found in input tensor. | 33/19761 [00:23<3:55:35, 1.40it/s, total_it=33]
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
And this is log:
https://paste.imlgw.top/2277
Hi,
Since I'm new to this field, I didn't understand the voxel feature encoding
(VFE) module very easily.
It seems that this module (VFE) is a common structure, can you please elaborate on it?
Is it voxelnet or pointpillars?
Hello, thank you for your excellent work. I am currently limited to do experiments on the mini dataset. Have you done any relevant experiments on the mini-nuscenes and what are the metrics?
Hi, thanks for a great job!
I am having trouble converting the weights of the model trained with nuScenes dataset. As I understand, the deploy.py file only works with Waymo compatibility, and the nuScenes trtengine config yaml is not released. I have tried to write the nuScenes trtengine config file by just replacing the BACKBONE_3D part of the yaml, but before coming to this stage I had other problems with deploy.py. The input data (batch_dict) given by you is Waymo data and I think it is not a solution to just change this data with nuScenes because input_shapes and dynamic_axes config must be changed for nuScenes compatibility. So, is there a way to convert models trained with nuScenes to TensorRT right now, can I resolve this by myself? Are you planning to release some dataset-diagnostic deploy code in the near future?
I'm confused about the hybrid factors.
Given a base window shape [12, 12, 32], a hybrid one is [24, 24, 32] with the hybrid factor [2, 2, 1]. [12, 12, 32] is for non-shifting, while [24, 24, 32] is for shifting with shifts [6, 6, 0]. It seems that the only difference between non-hybrid (swin/sst-like) and hybrid version is the window shape when shifting.
In paper, hybrid window partition is for better efficiency, but I don't find the detailed explanation. Is efficiency related to the redundant padding voxel tokens, which are fewer with a larger window shape? And why not try both large windows, e.g., window1=(24,24) & window2=(24,24) in Table 5?
Look forward to your reply.^^
Hey thanks for open sourcing your work.
Leaving this question here so people can see/answer if anyone tests this on Pytorch 2.0
Congratulations for accepted in CVPR! I have a problem with Fig. 1. In your paper, your method is evaluated on an NVIDIA A100 GPU. But for other methods, you also evaluated them on A100 GPU? Especially for PointPillars, and CenterPoint-Pillar. Did you also evaluate their speed at the same A100 GPU device? The CenterPoint-Pillar's speed is only 30FPS, which seems strange. A100 has a higher computing power.
Thank you for your outstanding work. I have been following your work for a long time and have been wanting to try the DSVT model for 3D object detection myself. I ran the DSVT model on OpenPCDet. The training data I used was self-prepared and followed a format similar to KITTI, including point cloud data and 3D annotation files. I have completed the preprocessing of the dataset, and I have successfully trained and tested it using both the CenterPoint and PVRCNN++ models. However, when attempting to train using the DSVT model, I encountered an error during the training process. Below is the error message:
Traceback (most recent call last): File "train_single.py", line 245, in <module> main() File "train_single.py", line 189, in main train_model( File "/mnt/volumes/perception/lyb/openpcdet/tools/train_utils/train_utils.py", line 180, in train_model accumulated_iter = train_one_epoch( File "/mnt/volumes/perception/lyb/openpcdet/tools/train_utils/train_utils.py", line 56, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/__init__.py", line 44, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/detectors/centerpoint.py", line 12, in forward batch_dict = cur_module(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 125, in forward output = block(output, set_voxel_inds_list[stage_id], set_voxel_masks_list[stage_id], pos_embed_list[stage_id][i], \ File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 193, in forward output = layer(output, set_voxel_inds, set_voxel_masks, pos_embed) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 209, in forward src = self.win_attn(src, pos, set_voxel_masks, set_voxel_inds) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 273, in forward src = src + self.dropout1(src2) RuntimeError: The size of tensor a (322658) must match the size of tensor b (322639) at non-singleton dimension 0
After receiving this error message, I checked the relevant code location and discovered that the function takes the input feature "src" (src (Tensor[float]): Voxel features with shape (N, C), where N is the number of voxels) and applies an attention mechanism to obtain "src2." However, in the operation "src = src + self.dropout1(src2)," an error occurs due to the mismatch in dimensions between the two features, preventing the training process.
To troubleshoot this issue, I added the following code snippet:
# FFN layer print(f"src.shape: {src.shape}") print(f"src2.shape: {src2.shape}")
During runtime, the output results were as follows:
src.shape: torch.Size([546464, 192])
src2.shape: torch.Size([546464, 192])
src.shape: torch.Size([546464, 192])
src2.shape: torch.Size([546464, 192])
src.shape: torch.Size([406328, 192])
src2.shape: torch.Size([406328, 192])
src.shape: torch.Size([406328, 192])
src2.shape: torch.Size([406328, 192])
src.shape: torch.Size([322658, 192])
src2.shape: torch.Size([322639, 192])
The execution continued until the highlighted section above, where the errors started occurring.
I debugged the information for this line(src2 = self.self_attn(query, key, value, key_padding_mask)[0]) of code and obtained the following situation:
query shape: (13907, 48, 192)
key: (13907, 48, 192)
value: (13907, 48, 192)
key_padding_mask:(13907, 48)
src2 shape: (322639, 192)
I'm not sure how to resolve this error. Could you help take a look at it?
I've attached my model configuration. If you need more information, please feel free to message me privately.
dsvt_3d.yaml
`CLASS_NAMES: ['traffic_cone', 'traffic_column', 'Tripod']
DATA_CONFIG:
BASE_CONFIG: cfgs/dataset_configs/pandar_dataset_3class.yaml
OUTPUT_PATH: '/lpai/volumes/perception/lyb/output'
POINT_CLOUD_RANGE: [ -82.0, -60.0, -3.0, 82.0, 60.0, 3.0 ]
DATA_AUGMENTOR:
DISABLE_AUG_LIST: ['placeholder']
AUG_CONFIG_LIST:
- NAME: gt_sampling
USE_ROAD_PLANE: False
DB_INFO_PATH:
- pandar128_dbinfos_train.pkl
USE_SHARED_MEMORY: False # set it to True to speed up (it costs about 15GB shared memory)
PREPARE: {
filter_by_min_points: [ 'traffic_cone:5', 'traffic_column:5', 'Tripod:5'],
}
SAMPLE_GROUPS: [ 'traffic_cone:6', 'traffic_column:5', 'Tripod:5']
NUM_POINT_FEATURES: 4
REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
LIMIT_WHOLE_SCENE: True
- NAME: random_world_flip
ALONG_AXIS_LIST: ['x', 'y']
- NAME: random_world_rotation
WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]
- NAME: random_world_scaling
WORLD_SCALE_RANGE: [0.95, 1.05]
- NAME: random_world_translation
NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]
DATA_PROCESSOR:
- NAME: mask_points_and_boxes_outside_range
REMOVE_OUTSIDE_BOXES: True
- NAME: shuffle_points
SHUFFLE_ENABLED: {
'train': True,
'test': False
}
- NAME: transform_points_to_voxels_placeholder
VOXEL_SIZE: [ 0.1, 0.1, 0.15 ]
MODEL:
NAME: CenterPoint
VFE:
NAME: DynamicVoxelVFE
WITH_DISTANCE: False
USE_ABSLOTE_XYZ: True
USE_NORM: True
NUM_FILTERS: [ 192, 192 ]
BACKBONE_3D:
NAME: DSVT
INPUT_LAYER:
sparse_shape: [468, 468, 32]
downsample_stride: [[1, 1, 4], [1, 1, 4], [1, 1, 2]]
d_model: [192, 192, 192, 192]
set_info: [[48, 1], [48, 1], [48, 1], [48, 1]]
window_shape: [[12, 12, 32], [12, 12, 8], [12, 12, 2], [12, 12, 1]]
hybrid_factor: [2, 2, 1] # x, y, z
shifts_list: [[[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]]]
normalize_pos: False
block_name: ['DSVTBlock','DSVTBlock','DSVTBlock','DSVTBlock']
set_info: [[48, 1], [48, 1], [48, 1], [48, 1]]
d_model: [192, 192, 192, 192]
nhead: [8, 8, 8, 8]
dim_feedforward: [384, 384, 384, 384]
dropout: 0.0
activation: gelu
reduction_type: 'attention'
output_shape: [468, 468]
conv_out_channel: 192
MAP_TO_BEV:
NAME: PointPillarScatter3d
INPUT_SHAPE: [468, 468, 1]
NUM_BEV_FEATURES: 192
BACKBONE_2D:
NAME: BaseBEVResBackbone
LAYER_NUMS: [ 1, 2, 2 ]
LAYER_STRIDES: [ 1, 2, 2 ]
NUM_FILTERS: [ 128, 128, 256 ]
UPSAMPLE_STRIDES: [ 1, 2, 4 ]
NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]
DENSE_HEAD:
NAME: CenterHead
CLASS_AGNOSTIC: False
CLASS_NAMES_EACH_HEAD: [
['traffic_cone', 'traffic_column', 'Tripod']
]
SHARED_CONV_CHANNEL: 64
USE_BIAS_BEFORE_NORM: False
NUM_HM_CONV: 2
BN_EPS: 0.001
BN_MOM: 0.01
SEPARATE_HEAD_CFG:
HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
HEAD_DICT: {
'center': {'out_channels': 2, 'num_conv': 2},
'center_z': {'out_channels': 1, 'num_conv': 2},
'dim': {'out_channels': 3, 'num_conv': 2},
'rot': {'out_channels': 2, 'num_conv': 2},
'iou': {'out_channels': 1, 'num_conv': 2},
}
TARGET_ASSIGNER_CONFIG:
FEATURE_MAP_STRIDE: 1
NUM_MAX_OBJS: 500
GAUSSIAN_OVERLAP: 0.1
MIN_RADIUS: 2
IOU_REG_LOSS: True
LOSS_CONFIG:
LOSS_WEIGHTS: {
'cls_weight': 1.0,
'loc_weight': 2.0,
'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
}
POST_PROCESSING:
SCORE_THRESH: 0.5
POST_CENTER_LIMIT_RANGE: [ -82.0, -60.0, -3.0, 82.0, 60.0, 3.0 ]
MAX_OBJ_PER_SAMPLE: 500
USE_IOU_TO_RECTIFY_SCORE: True
IOU_RECTIFIER: [0.68, 0.71, 0.65]
NMS_CONFIG:
NMS_TYPE: multi_class_nms # only for centerhead, use mmdet3d version nms
NMS_THRESH: [0.5, 0.5, 0.6]
NMS_PRE_MAXSIZE: [4096, 4096, 4096]
NMS_POST_MAXSIZE: [500, 500, 500]
POST_PROCESSING:
RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
EVAL_METRIC: kitti
OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 30
OPTIMIZER: adam_onecycle
LR: 0.003
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9
MOMS: [0.95, 0.85]
PCT_START: 0.1
DIV_FACTOR: 100
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001
LR_WARMUP: False
WARMUP_EPOCH: 1
GRAD_NORM_CLIP: 10
LOSS_SCALE_FP16: 32.0
HOOK:
DisableAugmentationHook:
DISABLE_AUG_LIST: ['gt_sampling','random_world_flip','random_world_rotation','random_world_scaling', 'random_world_translation']
NUM_LAST_EPOCHS: 1`
I'm looking forward to your response and hoping to maintain communication with you.
Hello, nice works! I find that there is no tools/scripts, can you provide these files. Thanks!
Thanks for your incredible work!
Hello, my pytorch version is 1.8.1, I encountered this problem while training, does the pytorch version have to be higher than 1.9.0?
../pcdet/models/backbones_3d/dsvt.py", line 212, in __init__ self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first) TypeError: __init__() got an unexpected keyword argument 'batch_first'
In order to reduce the computational cost in my own project, I set ''feature_map_stride'' = 2 (rather than 1 in your setting) in ''TARGET_ASSIGNER_CONFIG'', I encountered the loss that becomes NaN or Inf (not during the fp16 training).
I tried three times, it didn't work. Do you know how to fix this problem?
Thank you!
Hi, thanks your great work!
Based on my understanding, in the deployment of DSVT, you converted the Transformer network part of DSVT into DSVT_TrtEngine, while DSVT_Input_Layer still uses original PyTorch code. I would like to ask if DSVT_Input_Layer can also be converted into ONNX-TRT? Because there are operators such as torch.sort and torch.unique in it that are not supported by TRT, I plan to convert DSVT_Input_Layer into a single CUDA kernel when deploying the entire model. Do you have any suggestions for a faster and more convenient approach?
Wish your reply.
when compling pcdet.ops.ingroup_inds added in this pull request, i met a runtime ERROR. and i could compile Openpcdet successfully.
Traceback (most recent call last):
File "setup.py", line 133, in
'src/ingroup_inds_kernel.cu',
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File ".local/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File .local/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 556, in build_extension
depends=ext.depends,
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 668, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1578, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
It seems that the performance of cyclists class in your paper is higher than that of pedestrians and vehicles? But there are more sample sizes for vehicle(4352210) and pedestrian(2037627) categories, while cyclists only have 49518 samples in the waymo training dataset, which doesn't seem reasonable. Did you use the fade strategy during your training process?
Hi,
I trained a DSVT model using the Waymo dataset with batch_size = 3 and 3 3090 GPUs(with torch.utils.checkpoints). The performance of mAP/H is about 1% below your benchmark.
Especially in the Ped. is particularly severe, but on the contrary, there are some improvements in the L2 of Vehicle.
Hello,
Will pretrained models be provided for this work? Thanks
Hi, thanks for your source code. I set up the environment according to the guide and try this training command under this codebase:
bash scripts/dist_train.sh 8 --cfg_file ./cfgs/dsvt_models/dsvt_plain_D512e.yaml --sync_bn --logger_iter_interval 500
The evaluation results are below and could not reach the official reference precision:
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.7226
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.7177
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APL: 0.7226
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.6382
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.6337
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APL: 0.6382
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/AP: 0.7706
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APH: 0.6910
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APL: 0.7706
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/AP: 0.6895
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APH: 0.6164
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APL: 0.6895
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/AP: 0.0000
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APH: 0.0000
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APL: 0.0000
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/AP: 0.0000
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APH: 0.0000
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APL: 0.0000
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/AP: 0.7039
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APH: 0.6915
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APL: 0.7039
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/AP: 0.6777
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APH: 0.6658
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APL: 0.6777
My training log is here
Could you give me some suggestions?
Hello! Congratulations! When will the code release? Thank you.
After training was completed using 3090x3 and it took 9 to 10 days, the evaluation did not run and stopped in the middle at epoch 20. After the value came out as shown in the photo, after loading 6019 data in the data val, loading the groud truth and filtering the prediction, an index error appeared.
Hi @Haiyang-W ,
Thanks for sharing me the unrefined trt deployment script!. I have a question regarding below lines:
batch_dict = torch.load("input data file(after vfe)", map_location="cuda")
points = batch_dict["points"]
inputs = points
with torch.no_grad():
ptranshierarchy3d = model.backbone_3d
# plain version, just one stage
ptransblocks_list = ptranshierarchy3d.stage_0
layer_norms_list = ptranshierarchy3d.residual_norm_stage_0
pillar_features, voxel_coords = model.vfe(inputs)
voxel_features = model.backbone_3d(pillar_features, voxel_coords)
voxel_info = ptranshierarchy3d.input_layer(pillar_features, voxel_coords)
set_voxel_inds_list = [[voxel_info[f'set_voxel_inds_stage{s}_shift{i}'] for i in range(2)] for s in range(1)]
set_voxel_masks_list = [[voxel_info[f'set_voxel_mask_stage{s}_shift{i}'] for i in range(2)] for s in range(1)]
pos_embed_list = [[[voxel_info[f'pos_embed_stage{s}_block{b}_shift{i}'] for i in range(2)] for b in range(4)] for s in range(1)]
allptransblockstrt_inputs = (
pillar_features,
set_voxel_inds_list[0][0],
set_voxel_inds_list[0][1],
set_voxel_masks_list[0][0],
set_voxel_masks_list[0][1],
torch.stack([torch.stack(v, dim=0) for v in pos_embed_list[0]], dim=0),
)
What is the input data file after vfe in the first line? How can I create or derive this file?
Hello!
I have some issues with configuring DSVT-P for Nuscenes.
In particular, i took POINT_CLOUD_RANGE and VOXEL_SIZE from nuscenes_dataset.yaml with dsvt_plain_1f_onestage.yaml,
but model forward fails in PointPillarScatter3d due to indices being out of range.
Could you please provide configuration for DSVT models on Nuscenes or the logic on how to choose POINT_CLOUD_RANGE and VOXEL_SIZE with respect to model configurations?
python setup.py develop
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running develop
/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/easy_install.py:147: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning,
/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running egg_info
creating pcdet.egg-info
writing pcdet.egg-info/PKG-INFO
writing dependency_links to pcdet.egg-info/dependency_links.txt
writing requirements to pcdet.egg-info/requires.txt
writing top-level names to pcdet.egg-info/top_level.txt
writing manifest file 'pcdet.egg-info/SOURCES.txt'
reading manifest file 'pcdet.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'pcdet.egg-info/SOURCES.txt'
running build_ext
building 'pcdet.ops.iou3d_nms.iou3d_nms_cuda' extension
creating /root/workspace/env_run/DSVT/build
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet/ops
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet/ops/iou3d_nms
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet/ops/iou3d_nms/src
Traceback (most recent call last):
File "setup.py", line 133, in
'src/ingroup_inds_kernel.cu',
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 177, in setup
return run_commands(dist)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 193, in run_commands
dist.run_commands()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
self.run_command(cmd)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1229, in run_command
super().run_command(command)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 317, in run_command
self.distribution.run_command(command)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1229, in run_command
super().run_command(command)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 459, in build_extensions
self._build_extensions_serial()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 485, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
depends=ext.depends,
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 524, in unix_wrap_ninja_compile
cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 423, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range
Hi,
I trained a DSVT model using the entire waymo dataset but with a smaller batch_size = 1 and lr_rate = 0.001. The performance of each category is 2-3% below your benchmark. Is it due to the smaller batch size and lr_rate? I attached my log below:
log_train_20230515-135943.txt
I attempted to deploy the dsvt model to TensorRT according to your deployment code, By the TensorRT official example code I used dynamic shape for dsvt_block model input, Model inference time is about 260ms. However, using pytorch version takes less time, about 140ms. Why the time takes more with TensorRT c++ code?
Environment
TensorRT Version: 8.5.1.7
CUDA Version: 11.8
CUDNN Version: 8.6
Hardware GPU: p4000
(the rest is the same as the public)
inference code
#include "trt_infer.h"
#include"cnpy.h"
TRTInfer::TRTInfer(TrtConfig trt_config): mEngine_(nullptr)
{
// return;
sum_cpy_feature_ = 0.0f;
sum_cpy_output_ = 0.0f;
count_ = 0;
trt_config_ = trt_config;
input_cpy_kind_ = cudaMemcpyHostToDevice;
output_cpy_kind_ = cudaMemcpyDeviceToHost;
build();
CHECKCUDA(cudaStreamCreate(&stream_), "failed to create cuda stream");
std::cout << "tensorrt init done." << std::endl;
}
bool TRTInfer::build()
{
auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(sample::gLogger.getTRTLogger()));
if (!builder)
{
return false;
}
SampleUniquePtr<nvinfer1::IRuntime> runtime{createInferRuntime(sample::gLogger.getTRTLogger())};
if (!runtime)
{
return false;
}
// CUDA stream used for profiling by the builder.
auto profileStream = samplesCommon::makeCudaStream();
if (!profileStream)
{
return false;
}
const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
if (!network)
{
return false;
}
auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
if (!config)
{
return false;
}
auto parser = SampleUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, sample::gLogger.getTRTLogger()));
if (!parser)
{
return false;
}
// auto constructed = constructNetwork(builder, network, config, parser);
// if (!constructed)
// {
// return false;
// }
//replace conscructNetwork with following code:
auto parsed = parser->parseFromFile(trt_config_.model_file.c_str(), static_cast<int>(sample::gLogger.getReportableSeverity()));
if (!parsed)
{
return false;
}
for (int i = 0; i < network->getNbInputs(); i++) {
std::cout << "network->getInput(i)->getDimensions(): " << network->getInput(i)->getDimensions() << std::endl;
mInputDims.push_back(network->getInput(i)->getDimensions());
}
for (int i = 0; i < network->getNbOutputs(); i++) {
mOutputDims.push_back(network->getOutput(i)->getDimensions());
}
config->setProfileStream(*profileStream);
config->setAvgTimingIterations(1);
config->setMinTimingIterations(1);
config->setMaxWorkspaceSize(static_cast<size_t>(trt_config_.max_workspace)<<20);
if (builder->platformHasFastFp16() && trt_config_.fp16mode)
{
config->setFlag(BuilderFlag::kFP16);
}
if (builder->platformHasFastInt8() && trt_config_.int8mode)
{
config->setFlag(BuilderFlag::kINT8);
// samplesCommon::setAllDynamicRanges(network.get(), 127.0f, 127.0f); // in case use int8 without calibration
}
builder->setMaxBatchSize(1);
std::unique_ptr<nvinfer1::IInt8Calibrator> calibrator;
if (builder->platformHasFastInt8() && trt_config_.int8mode)
{
MNISTBatchStream calibrationStream(trt_config_.calib_data);
calibrator.reset(new Int8EntropyCalibrator2<MNISTBatchStream>(calibrationStream, -1, trt_config_.net_name.c_str(), trt_config_.input_name.c_str()));
config->setInt8Calibrator(calibrator.get());
}
IOptimizationProfile* profile = builder->createOptimizationProfile();
profile->setDimensions("src", OptProfileSelector::kMIN, Dims2(1000,128));
profile->setDimensions("src", OptProfileSelector::kOPT, Dims2(24629,128));
profile->setDimensions("src", OptProfileSelector::kMAX, Dims2(100000,128));
profile->setDimensions("set_voxel_inds_tensor_shift_0", OptProfileSelector::kMIN, Dims3(2,50,36));
profile->setDimensions("set_voxel_inds_tensor_shift_0", OptProfileSelector::kOPT, Dims3(2,1156,36));
profile->setDimensions("set_voxel_inds_tensor_shift_0", OptProfileSelector::kMAX, Dims3(2,5000,36));
profile->setDimensions("set_voxel_inds_tensor_shift_1", OptProfileSelector::kMIN, Dims3(2,50,36));
profile->setDimensions("set_voxel_inds_tensor_shift_1", OptProfileSelector::kOPT, Dims3(2,834,36));
profile->setDimensions("set_voxel_inds_tensor_shift_1", OptProfileSelector::kMAX, Dims3(2,3200,36));
profile->setDimensions("set_voxel_masks_tensor_shift_0", OptProfileSelector::kMIN, Dims3(2,50,36));
profile->setDimensions("set_voxel_masks_tensor_shift_0", OptProfileSelector::kOPT, Dims3(2,1156,36));
profile->setDimensions("set_voxel_masks_tensor_shift_0", OptProfileSelector::kMAX, Dims3(2,5000,36));
profile->setDimensions("set_voxel_masks_tensor_shift_1", OptProfileSelector::kMIN, Dims3(2,50,36));
profile->setDimensions("set_voxel_masks_tensor_shift_1", OptProfileSelector::kOPT, Dims3(2,834,36));
profile->setDimensions("set_voxel_masks_tensor_shift_1", OptProfileSelector::kMAX, Dims3(2,3200,36));
profile->setDimensions("pos_embed_tensor", OptProfileSelector::kMIN, Dims4(4,2,1000,128));
profile->setDimensions("pos_embed_tensor", OptProfileSelector::kOPT, Dims4(4,2,24629,128));
profile->setDimensions("pos_embed_tensor", OptProfileSelector::kMAX, Dims4(4,2,100000,128));
config->addOptimizationProfile(profile);
SampleUniquePtr<nvinfer1::IHostMemory> plan{builder->buildSerializedNetwork(*network, *config)};
if (!plan)
{
return false;
}
mEngine_ = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(plan->data(), plan->size()), samplesCommon::InferDeleter());
if (!mEngine_)
{
return false;
}
// Create RAII buffer manager object
context_ = mEngine_->createExecutionContext();
if (!context_)
{
return false;
}
return true;
}
void TRTInfer::doinference(std::vector<void*> &inputs, std::vector<float*> &outputs, std::vector<int> &input_dynamic)
{
infer_dynamic(inputs, outputs, input_dynamic);
cudaStreamSynchronize(stream_);
}
bool TRTInfer::infer_dynamic(std::vector<void*> &inputs, std::vector<float*> &outputs, std::vector<int> &input_dynamic)
{
double t0 = getTime();
mInputDims[0] = Dims2{input_dynamic[0], 128};
mInputDims[1] = Dims3{2, input_dynamic[1], 36};
mInputDims[2] = Dims3{2, input_dynamic[2], 36};
mInputDims[3] = Dims3{2, input_dynamic[3], 36};
mInputDims[4] = Dims3{2, input_dynamic[4], 36};
mInputDims[5] = Dims4{4, 2, input_dynamic[5], 128};
mInput[0].hostBuffer.resize(mInputDims[0]);
mInput[1].hostBuffer.resize(mInputDims[1]);
mInput[2].hostBuffer.resize(mInputDims[2]);
mInput[3].hostBuffer.resize(mInputDims[3]);
mInput[4].hostBuffer.resize(mInputDims[4]);
mInput[5].hostBuffer.resize(mInputDims[5]);
std::copy((float*)(inputs[0]), (float*)(inputs[0]) + 1, static_cast<float*>(mInput[0].hostBuffer.data()));
std::copy((int*)inputs[1], (int*)inputs[1] + 2* input_dynamic[1] * 36, static_cast<int*>(mInput[1].hostBuffer.data()));
std::copy((int*)inputs[2], (int*)inputs[2] + 2* input_dynamic[2] * 36, static_cast<int*>(mInput[2].hostBuffer.data()));
std::copy((bool*)inputs[3], (bool*)inputs[3] + 2* input_dynamic[3] * 36, static_cast<bool*>(mInput[3].hostBuffer.data()));
std::copy((bool*)inputs[4], (bool*)inputs[4] + 2* input_dynamic[4] * 36, static_cast<bool*>(mInput[4].hostBuffer.data()));
std::copy((float*)inputs[5], (float*)inputs[5] + 4* 2* input_dynamic[5] * 128, static_cast<float*>(mInput[5].hostBuffer.data()));
cudaStreamSynchronize(stream_);
double t1 = getTime();
mInput[0].deviceBuffer.resize(mInputDims[0]);
mInput[1].deviceBuffer.resize(mInputDims[1]);
mInput[2].deviceBuffer.resize(mInputDims[2]);
mInput[3].deviceBuffer.resize(mInputDims[3]);
mInput[4].deviceBuffer.resize(mInputDims[4]);
mInput[5].deviceBuffer.resize(mInputDims[5]);
CHECK(cudaMemcpy(mInput[0].deviceBuffer.data(), mInput[0].hostBuffer.data(), mInput[0].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
CHECK(cudaMemcpy(mInput[1].deviceBuffer.data(), mInput[1].hostBuffer.data(), mInput[1].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
CHECK(cudaMemcpy(mInput[2].deviceBuffer.data(), mInput[2].hostBuffer.data(), mInput[2].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
CHECK(cudaMemcpy(mInput[3].deviceBuffer.data(), mInput[3].hostBuffer.data(), mInput[3].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
CHECK(cudaMemcpy(mInput[4].deviceBuffer.data(), mInput[4].hostBuffer.data(), mInput[4].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
CHECK(cudaMemcpy(mInput[5].deviceBuffer.data(), mInput[5].hostBuffer.data(), mInput[5].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
cudaStreamSynchronize(stream_);
double t2 = getTime();
context_->setBindingDimensions(0, mInputDims[0]);
context_->setBindingDimensions(1, mInputDims[1]);
context_->setBindingDimensions(2, mInputDims[2]);
context_->setBindingDimensions(3, mInputDims[3]);
context_->setBindingDimensions(4, mInputDims[4]);
context_->setBindingDimensions(5, mInputDims[5]);
// context_->setBindingDimensions(6, mInputDims[6]);
std::cout << "mEngine_->getNbBindings(): " << mEngine_->getNbBindings() << std::endl;
std::cout << " mEngine_->getBindingDimensions(i)" << mEngine_->getBindingDimensions(0) << std::endl;
std::cout << " context_->getBindingDimensions(i)" << context_->getBindingDimensions(0) << std::endl;
cudaStreamSynchronize(stream_);
double t3 = getTime();
// We can only run inference once all dynamic input shapes have been specified.
if (!context_->allInputDimensionsSpecified())
{
return false;
}
mOutputDims[0] = mInputDims[0];
mOutput[0].deviceBuffer.resize(mOutputDims[0]);
mOutput[0].hostBuffer.resize(mOutputDims[0]);
std::vector<void*> processorBindings = {mInput[0].deviceBuffer.data(),
mInput[1].deviceBuffer.data(),
mInput[2].deviceBuffer.data(),
mInput[3].deviceBuffer.data(),
mInput[4].deviceBuffer.data(),
mInput[5].deviceBuffer.data(),
mOutput[0].deviceBuffer.data()};
cudaStreamSynchronize(stream_);
double t4 = getTime();
bool status = context_->executeV2(processorBindings.data());
if (!status)
{
return false;
}
cudaStreamSynchronize(stream_);
double t5 = getTime();
CHECK(cudaMemcpy(mOutput[0].hostBuffer.data(), mOutput[0].deviceBuffer.data(), mOutput[0].deviceBuffer.nbBytes(),
cudaMemcpyDeviceToHost));
cudaStreamSynchronize(stream_);
double t6 = getTime();
// cnpy::npy_save("dsvt_output_tensor.npy", static_cast<float*>(mOutput[0].hostBuffer.data()), {mOutput[0].deviceBuffer.nbBytes()/4},"w");
std::cout << "time elapse:" << t1-t0 << std::endl;
std::cout << "time elapse:" << t2-t1 << std::endl;
std::cout << "time elapse:" << t3-t2 << std::endl;
std::cout << "time elapse:" << t4-t3 << std::endl;
std::cout << "time elapse:" << t5-t4 << std::endl;
std::cout << "time elapse:" << t6-t5 << std::endl;
return true;
}
according to results, the average time cost of each stage, as following:
t1-t0:0.00860953
t2-t1:0.0124242
t3-t2:4.72069e-05
t4-t3:8.10623e-06
t5-t4:0.260188
t6-t5:0.00110817
c++ code takes more time? Have some mistakes in inference code?
Hi, my pytorch version used to be 1.8.1, then I was able to run python setup.py develop
successfully.
But the required version is greater than 1.9. so I created a new environment, pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
, and had problems rompiling, it looks like it's in pcdet/ops/ingroup_inds
[2/2] /nvme/yanxiangchao/perl5/drivers/cuda-11.1/bin/nvcc -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/nvme/yanxiangchao/anacon
da3/envs/test/lib/python3.9/site-packages/torch/include/TH -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/THC -I/nvme/yanxiangchao/perl5/drivers/cuda-11.1/include -I/nvme/yanxiangchao/anaconda3/envs/test/include/python3.9 -c -c /nvme/yanxi
angchao/perl5/pretrain/openmdf_dsvt/openmdf/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.cu -o /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_N
O_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxa
bi1011"' -DTORCH_EXTENSION_NAME=ingroup_inds_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++14
FAILED: /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o
/nvme/yanxiangchao/perl5/drivers/cuda-11.1/bin/nvcc -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/nvme/yanxiangchao/anaconda3/en
vs/test/lib/python3.9/site-packages/torch/include/TH -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/THC -I/nvme/yanxiangchao/perl5/drivers/cuda-11.1/include -I/nvme/yanxiangchao/anaconda3/envs/test/include/python3.9 -c -c /nvme/yanxiangcha
o/perl5/pretrain/openmdf_dsvt/openmdf/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.cu -o /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF
CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011
"' -DTORCH_EXTENSION_NAME=ingroup_inds_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++14
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::CrossMapLRN2dImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::EmbeddingBagImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::EmbeddingImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ParameterDictImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::SequentialImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ModuleListImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ModuleDictImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::TransformerDecoderImpl]’:
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::TransformerEncoderImpl]’:
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
subprocess.run(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/setup.py", line 34, in
setup(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
self.run_command(cmd)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
self.distribution.run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
self._build_extensions_serial()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
self.build_extension(ext)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
objects = self.compiler.compile(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Thanks for sharing this amazing work!
Could you please provide the steps to generate gt data from Waymo original dataset?
Are you willing to provide the pertained models if the user sends his agreement to Waymo dataset licence?
Do you plan to publish a sample deployment script to TRT?
Thanks
Thanks for your work. In the process of studying your great open source, I leave a question.
There seems to be a bug in the process of using the position embedding vector in the operation of the DSVT Block.
It seems that the position embedding vector required for Attention is assigned incorrectly.
The questions have been summarized in the image below.
//////////////////////////////////////////////////////////////////
This is an example of a case where DSVT is composed of one stage and each stage is designed with two blocks.
Thanks.
Hi, thanks for your amazing paper and solid experiments, which proposes a fast, easy-to-deploy, and remarkably performed transformer backbone.
I wonder about the hyrid window sizes. Why did you set 2nd window size N times of 1st window size? In this way, does the model skip inter-window voxel relation between even adjacent window pairs of the first partition?
Hi, thanks your great work!
Another question about Pytorch-ONNX-TensorRT conversion.
In the forward function of SetAttention, indexing is used at the beginning to retrieve values. However, when deploying similar operations, TensorRT does not support this because the indices are also a changing tensor, rather than fixed indices like x=torch.tensor([1, 2]), x[0]. How did you solve this in your TensorRT engine? Could you please provide some guidance?
DSVT/pcdet/models/backbones_3d/dsvt.py
Line 245 in 3b825d1
Wish your reply.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.