Code Monkey home page Code Monkey logo

dsvt's People

Contributors

chenshi3 avatar haiyang-w avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsvt's Issues

Can you provide a configuration file for the KITTI dataset?

This is a great work, but there are no experiments conducted on the KITTI dataset in the paper. I would like to validate the performance of DSVT on the KITTI dataset. Could you please provide a configuration file for DSVT on the KITTI dataset?

Error about setup.py

Congratulations on being accepted by CVPR!
I got lots of 'invalid static_cast' error when I execute 'python setup.py install'
my env torch == 1.13.0+cu117 ,CUDA Version: 11.7
Once I annotated the following code in setup.py, the installation was complete,which means the error is in ingroup_inds_cuda
make_cuda_ext( name='ingroup_inds_cuda', module='pcdet.ops.ingroup_inds', sources=[ 'src/ingroup_inds.cpp', 'src/ingroup_inds_kernel.cu', ]
Maybe it's my pytorch version, but the versions you recommended below torch1.10 don't fit my cuda version 11.7,Since I don't have sudo permission, it's difficult to change the cuda version.

Could you please see if there is any solution?Thanks!

Timeout when training

2023-06-30 02:10:13,430   INFO  epoch: 14/20, acc_iter=112400, cur_iter=4263/7724, batch_size=4, time_cost(epoch): 1:04:10/52:05, time_cost(all): 16:04:06/10:33:21, loss=1.5466811966896057, d_time=0.02(0.02), f_time=0.81(0.88), b_time=0.83(0.90), norm=28.733774185180664, lr=0.002140055979797459
2023-06-30 02:10:58,540   INFO  epoch: 14/20, acc_iter=112450, cur_iter=4313/7724, batch_size=4, time_cost(epoch): 1:04:55/51:20, time_cost(all): 16:04:51/10:32:35, loss=1.5467678713798523, d_time=0.02(0.02), f_time=0.87(0.88), b_time=0.89(0.90), norm=28.955286026000977, lr=0.002135863906495637
2023-06-30 02:11:03,331   INFO  Save latest model to /root/paddlejob/workspace/env_run/DSVT/output/cfgs/dsvt_models/dsvt_plain_1f_onestage_nusences/default/ckpt/latest_model
2023-06-30 02:11:43,555   INFO  epoch: 14/20, acc_iter=112500, cur_iter=4363/7724, batch_size=4, time_cost(epoch): 1:05:40/50:35, time_cost(all): 16:05:36/10:31:49, loss=1.548808376789093, d_time=0.02(0.02), f_time=0.80(0.88), b_time=0.82(0.90), norm=29.192405700683594, lr=0.002131672879084205
/bin/sh: gpustat: command not found
2023-06-30 02:11:43,990   INFO  
[E ProcessGroupNCCL.cpp:587] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803000 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803001 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803057 milliseconds before timing out.
2023-06-30 09:52:59,721   INFO  Save latest model to /root/paddlejob/workspace/env_run/DSVT/output/cfgs/dsvt_models/dsvt_plain_1f_onestage_nusences/default/ckpt/latest_model
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803001 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803000 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1803057 milliseconds before timing out.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13116 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 13117) of binary: /usr/bin/python3.7
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 713, in run
    )(*cmd_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-06-30_09:53:05
  host      : 10-67-245-145.local
  rank      : 2 (local_rank: 2)
  exitcode  : -6 (pid: 13120)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 13120
[2]:
  time      : 2023-06-30_09:53:05
  host      : 10-67-245-145.local
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 13122)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 13122
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-06-30_09:53:05
  host      : 10-67-245-145.local
  rank      : 1 (local_rank: 1)
  exitcode  : -6 (pid: 13117)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 13117

terminate called after throwing an instance of 'std::bad_alloc'

Hi! I tried to train the model with four Tesla V100 but encountered this problem. Could you please suggest any hint to fix it? thanks!

python3 -m torch.distributed.launch \
 --nproc_per_node=4 \
 --rdzv_endpoint=localhost:14430 train.py \
 --launcher pytorch \
 --cfg_file ./cfgs/dsvt_models/dsvt_plain_D512e.yaml \
 --sync_bn --logger_iter_interval 500

/home/user/.local/lib/python3.10/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
terminate called after throwing an instance of 'std::bad_alloc'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
  what():  std::bad_alloc
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 36) of binary: /usr/bin/python3
scripts/dist_train.sh: line 17:    26 Segmentation fault      (core dumped) python3 -m torch.distributed.launch --nproc_per_node=${NGPUS} --rdzv_endpoint=localhost:${PORT} train.py --launcher pytorch ${PY_ARGS}


Detection ability drops significantly at long distances

When I used DSVT on my own dataset, I found that it has excellent detection ability for nearby objects (0-30m), but the mean Average Precision drops significantly at long distances (30-60m,60m-inf) compared to Centerpoints model. Can we explain it?

Question about the CT3D-based two-stage DSVT on WOD training yaml file

Thanks for your excellent work, I wanted to reproduce the experimental results of the two-stage DSVT-TS on waymo's 100% training set, but I did not find the relevant two-stage configuration file with CT3D as the model, all of which are the first-stage configuration files with CenterPoint as the model.

Can you upload or tell me where it is?

Thank you very much

DynamicPillarVFE

Hi, can DynamicPillarVFE be replaced with regular VFE layers like PillarVFE3D or even MeanVFE, and how bad is the performance difference? Thanks.

Question about DSVT-P on waymo

In paper B.1.1,it is mentioned here that both DSVT-P and DSVT-V have 4 DSVT blocks, but it seems that dsvt_models/dsvt_plain_1f_onestage.yaml is only 1 block.

Waymo open dataset evaluation too slow.

I encountered too slow evaluation process of WOD using waymo metrics. This issue seems a common issue for openpcdet. If I switch to kitti metrics it runs to OOM (RAM:120G). I wonder whether you may provide some instructions.

DynPillarVFE in nuscenes setup

Hello, thank you for releasing nuscenes update!
Could you tell me please whether using DynPillarVFE is right in dsvt_plain_1f_onestage_nusences.yaml?
In other versions (e.g. for waymo) you used DynPillarVFE3D, so I'm wondering why is it different for nuscenes?

Loss is nan while training with --fp16(A lots of times)

I use:
batchsize_pergpu=4, gpus = 2;
lr = 0.002
I know"If you encounter a gradient that becomes NaN during fp16 training, don't worry, it's normal. You can try a few more times."
But everytimes after a short training period (within 100 iterations), loss will become nan, and I have ensured that I have tried many times (dozens or even over a hundred times).
I only modified the path of the configuration file, and I ensured that every training session did not load the previous last_model with a loss of nan

It always shows:
epochs: 0%| | 0/24 [00:25<?, ?it/s, loss_hm=nan, loss_loc=nan, loss=nan, lr=2e-5, d_time=0.00(0.02), f_time=0.68(0.70), b_tWARNING:tensorboardX.x2num:NaN or Inf found in input tensor. | 33/19761 [00:23<3:55:35, 1.40it/s, total_it=33]
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.

And this is log:
https://paste.imlgw.top/2277

Detail about the VFE module

Hi,

Since I'm new to this field, I didn't understand the voxel feature encoding (VFE) module very easily.

It seems that this module (VFE) is a common structure, can you please elaborate on it?

Is it voxelnet or pointpillars?

Metric results for mini-nuscenes

Hello, thank you for your excellent work. I am currently limited to do experiments on the mini dataset. Have you done any relevant experiments on the mini-nuscenes and what are the metrics?

TensorRT Deployment for nuScenes Trained Model

Hi, thanks for a great job!
I am having trouble converting the weights of the model trained with nuScenes dataset. As I understand, the deploy.py file only works with Waymo compatibility, and the nuScenes trtengine config yaml is not released. I have tried to write the nuScenes trtengine config file by just replacing the BACKBONE_3D part of the yaml, but before coming to this stage I had other problems with deploy.py. The input data (batch_dict) given by you is Waymo data and I think it is not a solution to just change this data with nuScenes because input_shapes and dynamic_axes config must be changed for nuScenes compatibility. So, is there a way to convert models trained with nuScenes to TensorRT right now, can I resolve this by myself? Are you planning to release some dataset-diagnostic deploy code in the near future?

What is the principle of hybrid factors?

I'm confused about the hybrid factors.

Given a base window shape [12, 12, 32], a hybrid one is [24, 24, 32] with the hybrid factor [2, 2, 1]. [12, 12, 32] is for non-shifting, while [24, 24, 32] is for shifting with shifts [6, 6, 0]. It seems that the only difference between non-hybrid (swin/sst-like) and hybrid version is the window shape when shifting.

In paper, hybrid window partition is for better efficiency, but I don't find the detailed explanation. Is efficiency related to the redundant padding voxel tokens, which are fewer with a larger window shape? And why not try both large windows, e.g., window1=(24,24) & window2=(24,24) in Table 5?

Look forward to your reply.^^

Confusion about Figure 1 in paper

Congratulations for accepted in CVPR! I have a problem with Fig. 1. In your paper, your method is evaluated on an NVIDIA A100 GPU. But for other methods, you also evaluated them on A100 GPU? Especially for PointPillars, and CenterPoint-Pillar. Did you also evaluate their speed at the same A100 GPU device? The CenterPoint-Pillar's speed is only 30FPS, which seems strange. A100 has a higher computing power.

During the model training process, an error occurs in the forward propagation phase. The error message indicates a dimension mismatch between the feature processed by the attention mechanism in dsvt.py and the original feature.

Thank you for your outstanding work. I have been following your work for a long time and have been wanting to try the DSVT model for 3D object detection myself. I ran the DSVT model on OpenPCDet. The training data I used was self-prepared and followed a format similar to KITTI, including point cloud data and 3D annotation files. I have completed the preprocessing of the dataset, and I have successfully trained and tested it using both the CenterPoint and PVRCNN++ models. However, when attempting to train using the DSVT model, I encountered an error during the training process. Below is the error message:
Traceback (most recent call last): File "train_single.py", line 245, in <module> main() File "train_single.py", line 189, in main train_model( File "/mnt/volumes/perception/lyb/openpcdet/tools/train_utils/train_utils.py", line 180, in train_model accumulated_iter = train_one_epoch( File "/mnt/volumes/perception/lyb/openpcdet/tools/train_utils/train_utils.py", line 56, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/__init__.py", line 44, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/detectors/centerpoint.py", line 12, in forward batch_dict = cur_module(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 125, in forward output = block(output, set_voxel_inds_list[stage_id], set_voxel_masks_list[stage_id], pos_embed_list[stage_id][i], \ File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 193, in forward output = layer(output, set_voxel_inds, set_voxel_masks, pos_embed) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 209, in forward src = self.win_attn(src, pos, set_voxel_masks, set_voxel_inds) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/volumes/perception/lyb/openpcdet/tools/../pcdet/models/backbones_3d/dsvt.py", line 273, in forward src = src + self.dropout1(src2) RuntimeError: The size of tensor a (322658) must match the size of tensor b (322639) at non-singleton dimension 0

After receiving this error message, I checked the relevant code location and discovered that the function takes the input feature "src" (src (Tensor[float]): Voxel features with shape (N, C), where N is the number of voxels) and applies an attention mechanism to obtain "src2." However, in the operation "src = src + self.dropout1(src2)," an error occurs due to the mismatch in dimensions between the two features, preventing the training process.
To troubleshoot this issue, I added the following code snippet:
# FFN layer print(f"src.shape: {src.shape}") print(f"src2.shape: {src2.shape}")
During runtime, the output results were as follows:
src.shape: torch.Size([546464, 192])
src2.shape: torch.Size([546464, 192])
src.shape: torch.Size([546464, 192])
src2.shape: torch.Size([546464, 192])
src.shape: torch.Size([406328, 192])
src2.shape: torch.Size([406328, 192])
src.shape: torch.Size([406328, 192])
src2.shape: torch.Size([406328, 192])
src.shape: torch.Size([322658, 192])
src2.shape: torch.Size([322639, 192])

The execution continued until the highlighted section above, where the errors started occurring.

I debugged the information for this line(src2 = self.self_attn(query, key, value, key_padding_mask)[0]) of code and obtained the following situation:
query shape: (13907, 48, 192)
key: (13907, 48, 192)
value: (13907, 48, 192)
key_padding_mask:(13907, 48)
src2 shape: (322639, 192)

I'm not sure how to resolve this error. Could you help take a look at it?

I've attached my model configuration. If you need more information, please feel free to message me privately.
dsvt_3d.yaml

`CLASS_NAMES: ['traffic_cone', 'traffic_column', 'Tripod']

DATA_CONFIG:
BASE_CONFIG: cfgs/dataset_configs/pandar_dataset_3class.yaml
OUTPUT_PATH: '/lpai/volumes/perception/lyb/output'

POINT_CLOUD_RANGE: [ -82.0, -60.0, -3.0, 82.0, 60.0, 3.0 ]
DATA_AUGMENTOR:
DISABLE_AUG_LIST: ['placeholder']
AUG_CONFIG_LIST:
- NAME: gt_sampling
USE_ROAD_PLANE: False
DB_INFO_PATH:
- pandar128_dbinfos_train.pkl

    USE_SHARED_MEMORY: False  # set it to True to speed up (it costs about 15GB shared memory)

    PREPARE: {
      filter_by_min_points: [ 'traffic_cone:5', 'traffic_column:5', 'Tripod:5'],
    }

    SAMPLE_GROUPS: [ 'traffic_cone:6', 'traffic_column:5', 'Tripod:5']
    NUM_POINT_FEATURES: 4
    REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
    LIMIT_WHOLE_SCENE: True

  - NAME: random_world_flip
    ALONG_AXIS_LIST: ['x', 'y']

  - NAME: random_world_rotation
    WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

  - NAME: random_world_scaling
    WORLD_SCALE_RANGE: [0.95, 1.05]

  - NAME: random_world_translation
    NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]

DATA_PROCESSOR:
- NAME: mask_points_and_boxes_outside_range
REMOVE_OUTSIDE_BOXES: True

-   NAME: shuffle_points
    SHUFFLE_ENABLED: {
      'train': True,
      'test': False
    }

-   NAME: transform_points_to_voxels_placeholder
    VOXEL_SIZE: [ 0.1, 0.1, 0.15 ]

MODEL:
NAME: CenterPoint

VFE:
NAME: DynamicVoxelVFE
WITH_DISTANCE: False
USE_ABSLOTE_XYZ: True
USE_NORM: True
NUM_FILTERS: [ 192, 192 ]

BACKBONE_3D:
NAME: DSVT
INPUT_LAYER:
sparse_shape: [468, 468, 32]
downsample_stride: [[1, 1, 4], [1, 1, 4], [1, 1, 2]]
d_model: [192, 192, 192, 192]
set_info: [[48, 1], [48, 1], [48, 1], [48, 1]]
window_shape: [[12, 12, 32], [12, 12, 8], [12, 12, 2], [12, 12, 1]]
hybrid_factor: [2, 2, 1] # x, y, z
shifts_list: [[[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]]]
normalize_pos: False

block_name: ['DSVTBlock','DSVTBlock','DSVTBlock','DSVTBlock']
set_info: [[48, 1], [48, 1], [48, 1], [48, 1]]
d_model: [192, 192, 192, 192]
nhead: [8, 8, 8, 8]
dim_feedforward: [384, 384, 384, 384]
dropout: 0.0 
activation: gelu
reduction_type: 'attention'
output_shape: [468, 468]
conv_out_channel: 192

MAP_TO_BEV:
NAME: PointPillarScatter3d
INPUT_SHAPE: [468, 468, 1]
NUM_BEV_FEATURES: 192

BACKBONE_2D:
NAME: BaseBEVResBackbone
LAYER_NUMS: [ 1, 2, 2 ]
LAYER_STRIDES: [ 1, 2, 2 ]
NUM_FILTERS: [ 128, 128, 256 ]
UPSAMPLE_STRIDES: [ 1, 2, 4 ]
NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]

DENSE_HEAD:
NAME: CenterHead
CLASS_AGNOSTIC: False

CLASS_NAMES_EACH_HEAD: [
  ['traffic_cone', 'traffic_column', 'Tripod']
]

SHARED_CONV_CHANNEL: 64
USE_BIAS_BEFORE_NORM: False
NUM_HM_CONV: 2

BN_EPS: 0.001
BN_MOM: 0.01
SEPARATE_HEAD_CFG:
  HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
  HEAD_DICT: {
    'center': {'out_channels': 2, 'num_conv': 2},
    'center_z': {'out_channels': 1, 'num_conv': 2},
    'dim': {'out_channels': 3, 'num_conv': 2},
    'rot': {'out_channels': 2, 'num_conv': 2},
    'iou': {'out_channels': 1, 'num_conv': 2},
  }

TARGET_ASSIGNER_CONFIG:
  FEATURE_MAP_STRIDE: 1
  NUM_MAX_OBJS: 500
  GAUSSIAN_OVERLAP: 0.1
  MIN_RADIUS: 2

IOU_REG_LOSS: True

LOSS_CONFIG:
  LOSS_WEIGHTS: {
    'cls_weight': 1.0,
    'loc_weight': 2.0,
    'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
  }

POST_PROCESSING:
  SCORE_THRESH: 0.5
  POST_CENTER_LIMIT_RANGE: [ -82.0, -60.0, -3.0, 82.0, 60.0, 3.0 ]
  MAX_OBJ_PER_SAMPLE: 500

  USE_IOU_TO_RECTIFY_SCORE: True
  IOU_RECTIFIER: [0.68, 0.71, 0.65]

  NMS_CONFIG:
    NMS_TYPE: multi_class_nms  # only for centerhead, use mmdet3d version nms
    NMS_THRESH: [0.5, 0.5, 0.6]
    NMS_PRE_MAXSIZE: [4096, 4096, 4096]
    NMS_POST_MAXSIZE: [500, 500, 500]

POST_PROCESSING:
RECALL_THRESH_LIST: [0.3, 0.5, 0.7]

EVAL_METRIC: kitti

OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 30

OPTIMIZER: adam_onecycle
LR: 0.003
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9

MOMS: [0.95, 0.85]
PCT_START: 0.1
DIV_FACTOR: 100
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001

LR_WARMUP: False
WARMUP_EPOCH: 1

GRAD_NORM_CLIP: 10
LOSS_SCALE_FP16: 32.0

HOOK:
DisableAugmentationHook:
DISABLE_AUG_LIST: ['gt_sampling','random_world_flip','random_world_rotation','random_world_scaling', 'random_world_translation']
NUM_LAST_EPOCHS: 1`

I'm looking forward to your response and hoping to maintain communication with you.

No find files

Hello, nice works! I find that there is no tools/scripts, can you provide these files. Thanks!

GPU memory size

Thanks for your incredible work!

  1. I'm wondering how much minimum GPU memory should be in my setup to run training?
  2. I also did not find information in the article how much GPU memory should be for model inference?

Question about pytorch version

Hello, my pytorch version is 1.8.1, I encountered this problem while training, does the pytorch version have to be higher than 1.9.0?
../pcdet/models/backbones_3d/dsvt.py", line 212, in __init__ self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first) TypeError: __init__() got an unexpected keyword argument 'batch_first'

Loss is NaN or Inf

In order to reduce the computational cost in my own project, I set ''feature_map_stride'' = 2 (rather than 1 in your setting) in ''TARGET_ASSIGNER_CONFIG'', I encountered the loss that becomes NaN or Inf (not during the fp16 training).
I tried three times, it didn't work. Do you know how to fix this problem?
Thank you!

Deploy Problem

Hi, thanks your great work!
Based on my understanding, in the deployment of DSVT, you converted the Transformer network part of DSVT into DSVT_TrtEngine, while DSVT_Input_Layer still uses original PyTorch code. I would like to ask if DSVT_Input_Layer can also be converted into ONNX-TRT? Because there are operators such as torch.sort and torch.unique in it that are not supported by TRT, I plan to convert DSVT_Input_Layer into a single CUDA kernel when deploying the entire model. Do you have any suggestions for a faster and more convenient approach?
Wish your reply.

runtime ERROR when compiling ingroup_inds

when compling pcdet.ops.ingroup_inds added in this pull request, i met a runtime ERROR. and i could compile Openpcdet successfully.

Traceback (most recent call last):
File "setup.py", line 133, in
'src/ingroup_inds_kernel.cu',
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File ".local/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File .local/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File miniconda3/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 556, in build_extension
depends=ext.depends,
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 668, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1578, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "miniconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Do you use fade strategy during training process?

It seems that the performance of cyclists class in your paper is higher than that of pedestrians and vehicles? But there are more sample sizes for vehicle(4352210) and pedestrian(2037627) categories, while cyclists only have 49518 samples in the waymo training dataset, which doesn't seem reasonable. Did you use the fade strategy during your training process?

DSVT Voxel Performance using 100% waymo dataset

Hi,

I trained a DSVT model using the Waymo dataset with batch_size = 3 and 3 3090 GPUs(with torch.utils.checkpoints). The performance of mAP/H is about 1% below your benchmark.
Especially in the Ped. is particularly severe, but on the contrary, there are some improvements in the L2 of Vehicle.

my log

Could not reproduce the precision on 20% Waymo

Hi, thanks for your source code. I set up the environment according to the guide and try this training command under this codebase:

bash scripts/dist_train.sh 8 --cfg_file ./cfgs/dsvt_models/dsvt_plain_D512e.yaml --sync_bn --logger_iter_interval 500

The evaluation results are below and could not reach the official reference precision:

OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.7226 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.7177 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APL: 0.7226 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.6382 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.6337 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APL: 0.6382 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/AP: 0.7706 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APH: 0.6910 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APL: 0.7706 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/AP: 0.6895 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APH: 0.6164 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APL: 0.6895 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/AP: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APH: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APL: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/AP: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APH: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APL: 0.0000 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/AP: 0.7039 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APH: 0.6915 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APL: 0.7039 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/AP: 0.6777 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APH: 0.6658 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APL: 0.6777 

My training log is here
Could you give me some suggestions?

codes

Hello! Congratulations! When will the code release? Thank you.

param

image image 71M in your paper, however 7.1M in code?

IndexError in test using nuscenes

After training was completed using 3090x3 and it took 9 to 10 days, the evaluation did not run and stopped in the middle at epoch 20. After the value came out as shown in the photo, after loading 6019 data in the data val, loading the groud truth and filtering the prediction, an index error appeared.
Screenshot from 2023-07-07 17-54-47

TensorRT deployment question

Hi @Haiyang-W ,

Thanks for sharing me the unrefined trt deployment script!. I have a question regarding below lines:

batch_dict = torch.load("input data file(after vfe)", map_location="cuda")
points = batch_dict["points"]
inputs = points

with torch.no_grad():
    ptranshierarchy3d = model.backbone_3d
    # plain version, just one stage
    ptransblocks_list = ptranshierarchy3d.stage_0
    layer_norms_list = ptranshierarchy3d.residual_norm_stage_0

    pillar_features, voxel_coords = model.vfe(inputs)
    voxel_features = model.backbone_3d(pillar_features, voxel_coords)

    voxel_info = ptranshierarchy3d.input_layer(pillar_features, voxel_coords)
    set_voxel_inds_list = [[voxel_info[f'set_voxel_inds_stage{s}_shift{i}'] for i in range(2)] for s in range(1)]
    set_voxel_masks_list = [[voxel_info[f'set_voxel_mask_stage{s}_shift{i}'] for i in range(2)] for s in range(1)]
    pos_embed_list = [[[voxel_info[f'pos_embed_stage{s}_block{b}_shift{i}'] for i in range(2)] for b in range(4)] for s in range(1)]

    allptransblockstrt_inputs = (
        pillar_features,
        set_voxel_inds_list[0][0],
        set_voxel_inds_list[0][1],
        set_voxel_masks_list[0][0],
        set_voxel_masks_list[0][1],
        torch.stack([torch.stack(v, dim=0) for v in pos_embed_list[0]], dim=0),
    )

What is the input data file after vfe in the first line? How can I create or derive this file?

Setup for Nuscenes

Hello!
I have some issues with configuring DSVT-P for Nuscenes.
In particular, i took POINT_CLOUD_RANGE and VOXEL_SIZE from nuscenes_dataset.yaml with dsvt_plain_1f_onestage.yaml,
but model forward fails in PointPillarScatter3d due to indices being out of range.
Could you please provide configuration for DSVT models on Nuscenes or the logic on how to choose POINT_CLOUD_RANGE and VOXEL_SIZE with respect to model configurations?

setup error

python setup.py develop
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running develop
/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/easy_install.py:147: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning,
/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running egg_info
creating pcdet.egg-info
writing pcdet.egg-info/PKG-INFO
writing dependency_links to pcdet.egg-info/dependency_links.txt
writing requirements to pcdet.egg-info/requires.txt
writing top-level names to pcdet.egg-info/top_level.txt
writing manifest file 'pcdet.egg-info/SOURCES.txt'
reading manifest file 'pcdet.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'pcdet.egg-info/SOURCES.txt'
running build_ext
building 'pcdet.ops.iou3d_nms.iou3d_nms_cuda' extension
creating /root/workspace/env_run/DSVT/build
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet/ops
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet/ops/iou3d_nms
creating /root/workspace/env_run/DSVT/build/temp.linux-x86_64-cpython-37/pcdet/ops/iou3d_nms/src
Traceback (most recent call last):
File "setup.py", line 133, in
'src/ingroup_inds_kernel.cu',
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 177, in setup
return run_commands(dist)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 193, in run_commands
dist.run_commands()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
self.run_command(cmd)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1229, in run_command
super().run_command(command)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 317, in run_command
self.distribution.run_command(command)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/dist.py", line 1229, in run_command
super().run_command(command)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 459, in build_extensions
self._build_extensions_serial()
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 485, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
depends=ext.depends,
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 524, in unix_wrap_ninja_compile
cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 423, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/opt/conda/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range

TensorRT show no improvement in inference speed

I attempted to deploy the dsvt model to TensorRT according to your deployment code, By the TensorRT official example code I used dynamic shape for dsvt_block model input, Model inference time is about 260ms. However, using pytorch version takes less time, about 140ms. Why the time takes more with TensorRT c++ code?

Environment
TensorRT Version: 8.5.1.7
CUDA Version: 11.8
CUDNN Version: 8.6
Hardware GPU: p4000
(the rest is the same as the public)

inference code

#include "trt_infer.h"
#include"cnpy.h"
TRTInfer::TRTInfer(TrtConfig trt_config): mEngine_(nullptr)
{
    // return;
    sum_cpy_feature_ = 0.0f;
    sum_cpy_output_ = 0.0f;
    count_ = 0;
    trt_config_ = trt_config;

    input_cpy_kind_ = cudaMemcpyHostToDevice;
    output_cpy_kind_ = cudaMemcpyDeviceToHost;

    build();

    CHECKCUDA(cudaStreamCreate(&stream_), "failed to create cuda stream");

    std::cout << "tensorrt init done." << std::endl;
}


bool TRTInfer::build()
{
    auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(sample::gLogger.getTRTLogger()));
    if (!builder)
    {
        return false;
    }

    SampleUniquePtr<nvinfer1::IRuntime> runtime{createInferRuntime(sample::gLogger.getTRTLogger())};
    if (!runtime)
    {
        return false;
    }

    // CUDA stream used for profiling by the builder.
    auto profileStream = samplesCommon::makeCudaStream();
    if (!profileStream)
    {
        return false;
    }

    const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
    if (!network)
    {
        return false;
    }

    auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
    if (!config)
    {
        return false;
    }

    auto parser = SampleUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, sample::gLogger.getTRTLogger()));
    if (!parser)
    {
        return false;
    }

    // auto constructed = constructNetwork(builder, network, config, parser);
    // if (!constructed)
    // {
    //     return false;
    // }

    //replace conscructNetwork with following code:
    auto parsed = parser->parseFromFile(trt_config_.model_file.c_str(), static_cast<int>(sample::gLogger.getReportableSeverity()));
    if (!parsed)
    {
        return false;
    }

    for (int i = 0; i < network->getNbInputs(); i++) {
        std::cout << "network->getInput(i)->getDimensions(): " << network->getInput(i)->getDimensions() << std::endl;
        mInputDims.push_back(network->getInput(i)->getDimensions());
    }
    for (int i = 0; i < network->getNbOutputs(); i++) {
        mOutputDims.push_back(network->getOutput(i)->getDimensions());
    }

    config->setProfileStream(*profileStream);


    config->setAvgTimingIterations(1);
    config->setMinTimingIterations(1);
    config->setMaxWorkspaceSize(static_cast<size_t>(trt_config_.max_workspace)<<20);
    if (builder->platformHasFastFp16() && trt_config_.fp16mode)
    {
        config->setFlag(BuilderFlag::kFP16);
    }
    if (builder->platformHasFastInt8() && trt_config_.int8mode)
    {
        config->setFlag(BuilderFlag::kINT8);
        // samplesCommon::setAllDynamicRanges(network.get(), 127.0f, 127.0f); // in case use int8 without calibration
    }
    builder->setMaxBatchSize(1);
    
    std::unique_ptr<nvinfer1::IInt8Calibrator> calibrator;
    if (builder->platformHasFastInt8() && trt_config_.int8mode)
    {
        MNISTBatchStream calibrationStream(trt_config_.calib_data);
        calibrator.reset(new Int8EntropyCalibrator2<MNISTBatchStream>(calibrationStream, -1, trt_config_.net_name.c_str(), trt_config_.input_name.c_str()));
        config->setInt8Calibrator(calibrator.get());
    }

    IOptimizationProfile* profile = builder->createOptimizationProfile();
    profile->setDimensions("src", OptProfileSelector::kMIN, Dims2(1000,128));
    profile->setDimensions("src", OptProfileSelector::kOPT, Dims2(24629,128));
    profile->setDimensions("src", OptProfileSelector::kMAX, Dims2(100000,128));
    profile->setDimensions("set_voxel_inds_tensor_shift_0", OptProfileSelector::kMIN, Dims3(2,50,36));
    profile->setDimensions("set_voxel_inds_tensor_shift_0", OptProfileSelector::kOPT, Dims3(2,1156,36));
    profile->setDimensions("set_voxel_inds_tensor_shift_0", OptProfileSelector::kMAX, Dims3(2,5000,36));
    profile->setDimensions("set_voxel_inds_tensor_shift_1", OptProfileSelector::kMIN, Dims3(2,50,36));
    profile->setDimensions("set_voxel_inds_tensor_shift_1", OptProfileSelector::kOPT, Dims3(2,834,36));
    profile->setDimensions("set_voxel_inds_tensor_shift_1", OptProfileSelector::kMAX, Dims3(2,3200,36));
    profile->setDimensions("set_voxel_masks_tensor_shift_0", OptProfileSelector::kMIN, Dims3(2,50,36));
    profile->setDimensions("set_voxel_masks_tensor_shift_0", OptProfileSelector::kOPT, Dims3(2,1156,36));
    profile->setDimensions("set_voxel_masks_tensor_shift_0", OptProfileSelector::kMAX, Dims3(2,5000,36));
    profile->setDimensions("set_voxel_masks_tensor_shift_1", OptProfileSelector::kMIN, Dims3(2,50,36));
    profile->setDimensions("set_voxel_masks_tensor_shift_1", OptProfileSelector::kOPT, Dims3(2,834,36));
    profile->setDimensions("set_voxel_masks_tensor_shift_1", OptProfileSelector::kMAX, Dims3(2,3200,36));
    profile->setDimensions("pos_embed_tensor", OptProfileSelector::kMIN, Dims4(4,2,1000,128));
    profile->setDimensions("pos_embed_tensor", OptProfileSelector::kOPT, Dims4(4,2,24629,128));
    profile->setDimensions("pos_embed_tensor", OptProfileSelector::kMAX, Dims4(4,2,100000,128));
    config->addOptimizationProfile(profile);

    SampleUniquePtr<nvinfer1::IHostMemory> plan{builder->buildSerializedNetwork(*network, *config)};
    if (!plan)
    {
        return false;
    }

    mEngine_ = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(plan->data(), plan->size()), samplesCommon::InferDeleter());
    if (!mEngine_)
    {
        return false;
    }

    // Create RAII buffer manager object
    context_ = mEngine_->createExecutionContext();
    if (!context_)
    {
        return false;
    }

    return true;

}


void TRTInfer::doinference(std::vector<void*> &inputs, std::vector<float*> &outputs, std::vector<int> &input_dynamic)
{
   infer_dynamic(inputs, outputs, input_dynamic);
   cudaStreamSynchronize(stream_);
}


bool TRTInfer::infer_dynamic(std::vector<void*> &inputs, std::vector<float*> &outputs, std::vector<int> &input_dynamic)
{
    double t0 = getTime();
    mInputDims[0] = Dims2{input_dynamic[0], 128};
    mInputDims[1] = Dims3{2, input_dynamic[1], 36};
    mInputDims[2] = Dims3{2, input_dynamic[2], 36};
    mInputDims[3] = Dims3{2, input_dynamic[3], 36};
    mInputDims[4] = Dims3{2, input_dynamic[4], 36};
    mInputDims[5] = Dims4{4, 2, input_dynamic[5], 128};

    mInput[0].hostBuffer.resize(mInputDims[0]);
    mInput[1].hostBuffer.resize(mInputDims[1]);
    mInput[2].hostBuffer.resize(mInputDims[2]);
    mInput[3].hostBuffer.resize(mInputDims[3]);
    mInput[4].hostBuffer.resize(mInputDims[4]);
    mInput[5].hostBuffer.resize(mInputDims[5]);
    

    std::copy((float*)(inputs[0]), (float*)(inputs[0]) + 1, static_cast<float*>(mInput[0].hostBuffer.data()));
    std::copy((int*)inputs[1], (int*)inputs[1] + 2* input_dynamic[1] * 36, static_cast<int*>(mInput[1].hostBuffer.data()));
    std::copy((int*)inputs[2], (int*)inputs[2] + 2* input_dynamic[2] * 36, static_cast<int*>(mInput[2].hostBuffer.data()));
    std::copy((bool*)inputs[3], (bool*)inputs[3] + 2* input_dynamic[3] * 36, static_cast<bool*>(mInput[3].hostBuffer.data()));
    std::copy((bool*)inputs[4], (bool*)inputs[4] + 2* input_dynamic[4] * 36, static_cast<bool*>(mInput[4].hostBuffer.data()));
    std::copy((float*)inputs[5], (float*)inputs[5] + 4* 2* input_dynamic[5] * 128, static_cast<float*>(mInput[5].hostBuffer.data()));
    cudaStreamSynchronize(stream_);
    double t1 = getTime();

    mInput[0].deviceBuffer.resize(mInputDims[0]);
    mInput[1].deviceBuffer.resize(mInputDims[1]);
    mInput[2].deviceBuffer.resize(mInputDims[2]);
    mInput[3].deviceBuffer.resize(mInputDims[3]);
    mInput[4].deviceBuffer.resize(mInputDims[4]);
    mInput[5].deviceBuffer.resize(mInputDims[5]);

    CHECK(cudaMemcpy(mInput[0].deviceBuffer.data(), mInput[0].hostBuffer.data(), mInput[0].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
    CHECK(cudaMemcpy(mInput[1].deviceBuffer.data(), mInput[1].hostBuffer.data(), mInput[1].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
    CHECK(cudaMemcpy(mInput[2].deviceBuffer.data(), mInput[2].hostBuffer.data(), mInput[2].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
    CHECK(cudaMemcpy(mInput[3].deviceBuffer.data(), mInput[3].hostBuffer.data(), mInput[3].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
    CHECK(cudaMemcpy(mInput[4].deviceBuffer.data(), mInput[4].hostBuffer.data(), mInput[4].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
    CHECK(cudaMemcpy(mInput[5].deviceBuffer.data(), mInput[5].hostBuffer.data(), mInput[5].hostBuffer.nbBytes(), cudaMemcpyHostToDevice));
    cudaStreamSynchronize(stream_);
    double t2 = getTime();

    context_->setBindingDimensions(0, mInputDims[0]);
    context_->setBindingDimensions(1, mInputDims[1]);
    context_->setBindingDimensions(2, mInputDims[2]);
    context_->setBindingDimensions(3, mInputDims[3]);
    context_->setBindingDimensions(4, mInputDims[4]);
    context_->setBindingDimensions(5, mInputDims[5]);
    // context_->setBindingDimensions(6, mInputDims[6]);
    std::cout << "mEngine_->getNbBindings(): " << mEngine_->getNbBindings() << std::endl;
    std::cout << " mEngine_->getBindingDimensions(i)" <<  mEngine_->getBindingDimensions(0) << std::endl;
    std::cout << " context_->getBindingDimensions(i)" <<  context_->getBindingDimensions(0) << std::endl;
    cudaStreamSynchronize(stream_);
    double t3 = getTime();

    // We can only run inference once all dynamic input shapes have been specified.
    if (!context_->allInputDimensionsSpecified())
    {
        return false;
    }
    mOutputDims[0] = mInputDims[0];
    mOutput[0].deviceBuffer.resize(mOutputDims[0]);
    mOutput[0].hostBuffer.resize(mOutputDims[0]);
    std::vector<void*> processorBindings = {mInput[0].deviceBuffer.data(),
                                            mInput[1].deviceBuffer.data(),
                                            mInput[2].deviceBuffer.data(),
                                            mInput[3].deviceBuffer.data(),
                                            mInput[4].deviceBuffer.data(),
                                            mInput[5].deviceBuffer.data(),
                                            mOutput[0].deviceBuffer.data()};
    cudaStreamSynchronize(stream_);
    double t4 = getTime();
    bool status = context_->executeV2(processorBindings.data());
    if (!status)
    {
        return false;
    }
    cudaStreamSynchronize(stream_);
    double t5 = getTime();

    CHECK(cudaMemcpy(mOutput[0].hostBuffer.data(), mOutput[0].deviceBuffer.data(), mOutput[0].deviceBuffer.nbBytes(),
        cudaMemcpyDeviceToHost));
    cudaStreamSynchronize(stream_);
    double t6 = getTime();
    // cnpy::npy_save("dsvt_output_tensor.npy", static_cast<float*>(mOutput[0].hostBuffer.data()), {mOutput[0].deviceBuffer.nbBytes()/4},"w");
    std::cout << "time elapse:" << t1-t0 << std::endl;
    std::cout << "time elapse:" << t2-t1 << std::endl;
    std::cout << "time elapse:" << t3-t2 << std::endl;
    std::cout << "time elapse:" << t4-t3 << std::endl;
    std::cout << "time elapse:" << t5-t4 << std::endl;
    std::cout << "time elapse:" << t6-t5 << std::endl;
    return true;

}

according to results, the average time cost of each stage, as following:
t1-t0:0.00860953
t2-t1:0.0124242
t3-t2:4.72069e-05
t4-t3:8.10623e-06
t5-t4:0.260188
t6-t5:0.00110817

c++ code takes more time? Have some mistakes in inference code?

Error when python setup.py develop

Hi, my pytorch version used to be 1.8.1, then I was able to run python setup.py develop successfully.
But the required version is greater than 1.9. so I created a new environment, pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html, and had problems rompiling, it looks like it's in pcdet/ops/ingroup_inds

[2/2] /nvme/yanxiangchao/perl5/drivers/cuda-11.1/bin/nvcc -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/nvme/yanxiangchao/anacon
da3/envs/test/lib/python3.9/site-packages/torch/include/TH -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/THC -I/nvme/yanxiangchao/perl5/drivers/cuda-11.1/include -I/nvme/yanxiangchao/anaconda3/envs/test/include/python3.9 -c -c /nvme/yanxi
angchao/perl5/pretrain/openmdf_dsvt/openmdf/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.cu -o /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_N
O_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxa
bi1011"' -DTORCH_EXTENSION_NAME=ingroup_inds_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++14
FAILED: /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o
/nvme/yanxiangchao/perl5/drivers/cuda-11.1/bin/nvcc -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/nvme/yanxiangchao/anaconda3/en
vs/test/lib/python3.9/site-packages/torch/include/TH -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/THC -I/nvme/yanxiangchao/perl5/drivers/cuda-11.1/include -I/nvme/yanxiangchao/anaconda3/envs/test/include/python3.9 -c -c /nvme/yanxiangcha
o/perl5/pretrain/openmdf_dsvt/openmdf/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.cu -o /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF
CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011
"' -DTORCH_EXTENSION_NAME=ingroup_inds_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++14
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::CrossMapLRN2dImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::EmbeddingBagImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::EmbeddingImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ParameterDictImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::SequentialImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ModuleListImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ModuleDictImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::TransformerDecoderImpl]’:
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::TransformerEncoderImpl]’:
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
subprocess.run(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/setup.py", line 34, in
setup(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
self.run_command(cmd)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
self.distribution.run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
self._build_extensions_serial()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
self.build_extension(ext)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
objects = self.compiler.compile(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Data preparation, weights and TRT deployment

Thanks for sharing this amazing work!
Could you please provide the steps to generate gt data from Waymo original dataset?
Are you willing to provide the pertained models if the user sends his agreement to Waymo dataset licence?
Do you plan to publish a sample deployment script to TRT?

Thanks

Errors included in the process of assigning the Position embedding vector for multi-head self attention

Thanks for your work. In the process of studying your great open source, I leave a question.

There seems to be a bug in the process of using the position embedding vector in the operation of the DSVT Block.

It seems that the position embedding vector required for Attention is assigned incorrectly.

The questions have been summarized in the image below.
//////////////////////////////////////////////////////////////////
This is an example of a case where DSVT is composed of one stage and each stage is designed with two blocks.
bug_0

bug_1 //////////////////////////////////////////////////////////////////

Thanks.

Why window sizes are set as multiples

Hi, thanks for your amazing paper and solid experiments, which proposes a fast, easy-to-deploy, and remarkably performed transformer backbone.

I wonder about the hyrid window sizes. Why did you set 2nd window size N times of 1st window size? In this way, does the model skip inter-window voxel relation between even adjacent window pairs of the first partition?

Deploy Problem 2

Hi, thanks your great work!
Another question about Pytorch-ONNX-TensorRT conversion.

In the forward function of SetAttention, indexing is used at the beginning to retrieve values. However, when deploying similar operations, TensorRT does not support this because the indices are also a changing tensor, rather than fixed indices like x=torch.tensor([1, 2]), x[0]. How did you solve this in your TensorRT engine? Could you please provide some guidance?

set_features = src[voxel_inds]

Wish your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.