haohaonju / centerpoint Goto Github PK

View Code? Open in Web Editor NEW

261.0 7.0 55.0 107.47 MB

TensorRT deployment for CenterPoint Lidar Detection Model.

License: MIT License

C++ 46.16% CMake 6.23% Python 37.45% C 0.62% Cuda 3.02% Shell 0.15% Common Lisp 4.54% JavaScript 1.83% HTML 0.01%

3d-detection deployment export-onnx object-tracking tensorrt

centerpoint's People

Contributors

Stargazers

Watchers

Forkers

speshowbuaa chaomath jlqzzz sean-wade gujiaqivadin collector-m xiangzhaohong jingcun yukke42 zhutongseu1994 jiangzongkang lzhbrian zhouzhubin shallowdream-x luogantt liangzhao123 ngsford sainttelant perhapswo gluecklichste s95huang 061840colin xhh1566 coolzhangfeng stidk iloveai8086 zyl1336110861 enginbozkurt jahorl 98xiao zhouleidcc rocinant jonygu bridgecrew-perf7 zivzone dl19940602 shawxiaodahua mediumcore jizhishutong datouready tshiamor lijiunderstand vehxianfish fds1995 chuckiesup mrhagchwh qing7ling0 kun-woo-park avi9700 ryanyej jie311 bradgers whuhxb wangyoucaocxl tianchaohuo

centerpoint's Issues

tensorrt version

which version of tensorrt are you using?

libpointpillars.so: undefined reference to `createNvOnnxParser_INTERNAL'

I run make ,get this error!

w, l, h order is incorrect

Wouldn't you change ordering of width, length, height in the following:

CenterPoint/src/postprocess.cpp

Line 284 in 7728be9

box.l = host_boxes[i + 3 * boxSizeAft];

to:

            box.w = host_boxes[i +  3 * boxSizeAft];    // dx
            box.l = host_boxes[i + 4 * boxSizeAft];     // dy
            box.h = host_boxes[i + 5 * boxSizeAft];     // dz

cudaErrorInvalidDeviceFunction

Hello, in samplecenterpoint.cpp, I have set params.load_engine = false,
and provided the onnx file path by setting params.pfeOnnxFilePath and params.rpnOnnxFilePath,
compiling succeeded, but when I ran command "./centerpoint", something went wrong.
Please see below:

&&&& RUNNING TensorRT.sample_onnx_centerpoint [TensorRT v8001] # ./centerpoint
[04/11/2022-09:12:41] [I] Building and running a GPU inference engine for CenterPoint
[04/11/2022-09:12:41] [I] Building pfe engine . . .  
[04/11/2022-09:12:42] [I] [TRT] [MemUsageChange] Init CUDA: CPU +149, GPU +0, now: CPU 162, GPU 585 (MiB)
[04/11/2022-09:12:42] [I] ConstructNetwork !
[04/11/2022-09:12:42] [I] [TRT] ----------------------------------------------------------------
[04/11/2022-09:12:42] [I] [TRT] Input filename:   /home/wang/CenterPointTensorRT/pfe_baseline32000.onnx
[04/11/2022-09:12:42] [I] [TRT] ONNX IR version:  0.0.6
[04/11/2022-09:12:42] [I] [TRT] Opset version:    11
[04/11/2022-09:12:42] [I] [TRT] Producer name:    pytorch
[04/11/2022-09:12:42] [I] [TRT] Producer version: 1.9
[04/11/2022-09:12:42] [I] [TRT] Domain:           
[04/11/2022-09:12:42] [I] [TRT] Model version:    0
[04/11/2022-09:12:42] [I] [TRT] Doc string:       
[04/11/2022-09:12:42] [I] [TRT] ----------------------------------------------------------------
[04/11/2022-09:12:42] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/11/2022-09:12:42] [I] [TRT] MatMul_0: broadcasting input1 to make tensors conform, dims(input0)=[32000,20,10][NONE] dims(input1)=[1,10,32][NONE].
[04/11/2022-09:12:42] [I] [TRT] MatMul_0: broadcasting input1 to make tensors conform, dims(input0)=[32000,20,10][NONE] dims(input1)=[1,10,32][NONE].
[04/11/2022-09:12:42] [I] [TRT] MatMul_0: broadcasting input1 to make tensors conform, dims(input0)=[32000,20,10][NONE] dims(input1)=[1,10,32][NONE].
[04/11/2022-09:12:43] [I] [TRT] MatMul_0: broadcasting input1 to make tensors conform, dims(input0)=[32000,20,10][NONE] dims(input1)=[1,10,32][NONE].
[04/11/2022-09:12:43] [I] [TRT] MatMul_18: broadcasting input1 to make tensors conform, dims(input0)=[32000,20,64][NONE] dims(input1)=[1,64,64][NONE].
[04/11/2022-09:12:43] [I] [TRT] MatMul_0: broadcasting input1 to make tensors conform, dims(input0)=[32000,20,10][NONE] dims(input1)=[1,10,32][NONE].
[04/11/2022-09:12:43] [I] [TRT] MatMul_18: broadcasting input1 to make tensors conform, dims(input0)=[32000,20,64][NONE] dims(input1)=[1,64,64][NONE].
[04/11/2022-09:12:43] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 162 MiB, GPU 585 MiB
[04/11/2022-09:12:44] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +215, GPU +85, now: CPU 377, GPU 670 (MiB)
[04/11/2022-09:12:45] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +175, GPU +90, now: CPU 552, GPU 760 (MiB)
[04/11/2022-09:12:45] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[04/11/2022-09:12:55] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/11/2022-09:12:55] [I] [TRT] Total Host Persistent Memory: 928
[04/11/2022-09:12:55] [I] [TRT] Total Device Persistent Memory: 0
[04/11/2022-09:12:55] [I] [TRT] Total Scratch Memory: 0
[04/11/2022-09:12:55] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 256 MiB
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 555, GPU 768 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 555, GPU 778 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 555, GPU 762 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 555, GPU 746 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 555 MiB, GPU 746 MiB
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 555, GPU 742 (MiB)
[04/11/2022-09:12:55] [I] Create ICudaEngine  !
[04/11/2022-09:12:55] [I] [TRT] Loaded engine size: 0 MB
[04/11/2022-09:12:55] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 555 MiB, GPU 742 MiB
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 555, GPU 750 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 555, GPU 758 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 555, GPU 742 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 555 MiB, GPU 742 MiB
[04/11/2022-09:12:55] [I] getNbInputs: 1 

[04/11/2022-09:12:55] [I] getNbOutputs: 1 

[04/11/2022-09:12:55] [I] getNbOutputs Name: 47 

[04/11/2022-09:12:55] [I] Building rpn engine . . .  
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 555, GPU 742 (MiB)
[04/11/2022-09:12:55] [I] ConstructNetwork !
[04/11/2022-09:12:55] [I] [TRT] ----------------------------------------------------------------
[04/11/2022-09:12:55] [I] [TRT] Input filename:   /home/wang/CenterPointTensorRT/rpn_baseline.onnx
[04/11/2022-09:12:55] [I] [TRT] ONNX IR version:  0.0.6
[04/11/2022-09:12:55] [I] [TRT] Opset version:    10
[04/11/2022-09:12:55] [I] [TRT] Producer name:    pytorch
[04/11/2022-09:12:55] [I] [TRT] Producer version: 1.9
[04/11/2022-09:12:55] [I] [TRT] Domain:           
[04/11/2022-09:12:55] [I] [TRT] Model version:    0
[04/11/2022-09:12:55] [I] [TRT] Doc string:       
[04/11/2022-09:12:55] [I] [TRT] ----------------------------------------------------------------
[04/11/2022-09:12:55] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[04/11/2022-09:12:55] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 572 MiB, GPU 742 MiB
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 575, GPU 750 (MiB)
[04/11/2022-09:12:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 575, GPU 758 (MiB)
[04/11/2022-09:12:55] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[04/11/2022-09:13:11] [F] [TRT] [virtualMemoryBuffer.cpp::resizePhysical::79] Error Code 2: OutOfMemory (no further information)
[04/11/2022-09:13:11] [F] [TRT] [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[04/11/2022-09:13:11] [W] [TRT] -------------- The current system memory allocations dump as below --------------
[0x55b6e4bd5350]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 2151 time: 1.127e-06
[0x55b6e4bd20d0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 2148 time: 9.33e-07
[0x55b6e4bd1d30]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 2145 time: 8e-07
[0x55b6e4bd1b60]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 2142 time: 1.128e-06
[0x55b6dab9b5b0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 2139 time: 2.12e-07
[0x55b6dab9b2a0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 2136 time: 1.91e-07
[0x55b6db2f3740]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 27 time: 7.9e-08
[0x55b6db03be30]:1280 :Conv Aspect merge bias in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 33 time: 8.2e-07
[0x55b6db041330]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1067 time: 5.7e-07
[0x55b6db01ed60]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 24 time: 7.5e-08
[0x55b6db4a7750]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 13 time: 9.5e-08
[0x55b6db0eb700]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 21 time: 7.8e-08
[0x55b6db071d40]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 22 time: 7.3e-08
[0x55b6c16af680]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 15 time: 7.5e-08
[0x55b6db075390]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 12 time: 9.7e-08
[0x55b6e4bcd490]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 2133 time: 1.45e-07
[0x55b6dab0a390]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 35 time: 6.95e-07
[0x55b6c15cfe90]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 17 time: 7.7e-08
[0x55b6db0af810]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 25 time: 8.1e-08
[0x55b6db2f5830]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 11 time: 8.1e-08
[0x55b6dab8b1c0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 50 time: 1.311e-06
[0x55b6dab99760]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 422 time: 2.69e-07
[0x55b6db0aa730]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 26 time: 7.9e-08
[0x55b6e4bca430]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1750 time: 1.45e-07
[0x55b6db2da8c0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 28 time: 7.4e-08
[0x55b6dab9b970]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 428 time: 8.35e-07
[0x55b6e3d0ed20]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 749 time: 2.53e-07
[0x55b6db4dad80]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 5 time: 7.5e-08
[0x55b6d9e8b930]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 7 time: 8.5e-08
[0x55b6db06e7d0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 9 time: 8.3e-08
[0x55b6e4bca5c0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1444 time: 1.75e-07
[0x55b6db1218c0]:262144 :Layer Aspects merge kernel weights in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 30 time: 3.11e-06
[0x55b6db03d3a0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 3 time: 3.2e-08
[0x55b6da957780]:737280 :Conv Aspect merge weights in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 32 time: 5.72e-07
[0x55b6da70fff0]:2097152 :Layer Aspects merge kernel weights in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 31 time: 5.51e-07
[0x55b6db066b60]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 10 time: 7e-08
[0x55b6dab9bbe0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 431 time: 7.78e-07
[0x55b6db2d9290]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 14 time: 8.3e-08
[0x55b6e3d0efa0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 752 time: 1.51e-07
[0x55b6db0429c0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1076 time: 9.48e-07
[0x55b6db2f3980]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 29 time: 9e-08
[0x55b6daf94d50]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 18 time: 9.5e-08
[0x55b6ceefdff0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 6 time: 7.1e-08
[0x55b6e4bce890]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1762 time: 7.17e-07
[0x55b6db03cea0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 0 time: 1.12e-07
[0x55b6dab8a040]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 44 time: 6.91e-07
[0x55b6db07de90]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 4 time: 1.24e-07
[0x55b6db30bc10]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 1 time: 5.5e-08
[0x55b6dab89e50]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 38 time: 9.59e-07
[0x55b6dab8b500]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 41 time: 1.187e-06
[0x55b6dab8a630]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 47 time: 7.69e-07
[0x55b6dab8b360]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 53 time: 6.13e-07
[0x55b6db4774d0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 16 time: 7.9e-08
[0x55b6dab8acb0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 416 time: 2.23e-07
[0x55b6e4bca520]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1441 time: 2.25e-07
[0x55b6dab9ad70]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1753 time: 2.01e-07
[0x55b6c16c4510]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 20 time: 9.9e-08
[0x55b6e4bcdb70]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1765 time: 8.13e-07
[0x55b6db2f28c0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 419 time: 1.79e-07
[0x55b6e3d0d790]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 425 time: 8.89e-07
[0x55b6db452960]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 23 time: 8.3e-08
[0x55b6e3d0eb30]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 434 time: 3.45e-07
[0x55b6e3d0edc0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 755 time: 1.42e-07
[0x55b6db0422f0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 764 time: 6.61e-07
[0x55b6e4bccf60]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1759 time: 1.113e-06
[0x55b6db041ca0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 758 time: 1.117e-06
[0x55b6e3d0f140]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 761 time: 6.96e-07
[0x55b6e4bc97a0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1447 time: 1.69e-07
[0x55b6db042410]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 767 time: 6.52e-07
[0x55b6e4bc9a30]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1450 time: 5.76e-07
[0x55b6db03d350]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 2 time: 3.1e-08
[0x55b6db0411f0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1058 time: 1.08e-07
[0x55b6e4bc9bd0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1453 time: 6.98e-07
[0x55b6db041580]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1061 time: 1.76e-07
[0x55b6db041290]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1064 time: 1.61e-07
[0x55b6db458e40]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 19 time: 1.02e-07
[0x55b6e4bbd490]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1073 time: 1.34e-07
[0x55b6e4bc9f90]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1456 time: 7.06e-07
[0x55b6e4bca100]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1459 time: 5.64e-07
[0x55b6dab9b340]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1756 time: 1.67e-07
[0x55b6e4bcdd10]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1768 time: 7.89e-07
[0x55b6db0ea9e0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 8 time: 9.3e-08
[0x55b6e3d0ef00]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 1070 time: 1.33e-07
-------------- The current device memory allocations dump as below --------------
[0]:4294967296 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 6 time: 0.000150233
[0x5021d6200]:18432 :GpuGlob deserialization in load: at runtime/deserialization/safeDeserialize.cpp: 349 idx: 4 time: 1.4569e-05
[0x504200000]:134217728 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 5 time: 0.000363279
[04/11/2022-09:13:11] [E] [TRT] Requested amount of GPU memory (4294967296 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[04/11/2022-09:13:11] [W] [TRT] Skipping tactic 2 due to oom error on requested size of 4294967296 detected for tactic 2.
Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
[04/11/2022-09:13:46] [I] [TRT] Detected 1 inputs and 6 output network tensors.
[04/11/2022-09:13:46] [I] [TRT] Total Host Persistent Memory: 17536
[04/11/2022-09:13:46] [I] [TRT] Total Device Persistent Memory: 51033088
[04/11/2022-09:13:46] [I] [TRT] Total Scratch Memory: 140175360
[04/11/2022-09:13:46] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 5 MiB, GPU 4224 MiB
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 753, GPU 881 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 753, GPU 889 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 753, GPU 873 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 752, GPU 857 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 752 MiB, GPU 857 MiB
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 800, GPU 805 (MiB)
[04/11/2022-09:13:46] [I] Create ICudaEngine  !
[04/11/2022-09:13:46] [I] [TRT] Loaded engine size: 51 MB
[04/11/2022-09:13:46] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 800 MiB, GPU 805 MiB
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 800, GPU 864 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 801, GPU 872 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 800, GPU 856 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 800 MiB, GPU 856 MiB
[04/11/2022-09:13:46] [I] getNbInputs: 1 

[04/11/2022-09:13:46] [I] getNbOutputs: 6 

[04/11/2022-09:13:46] [I] getNbOutputs Name: 246 

[04/11/2022-09:13:46] [I] All has Built !  
[04/11/2022-09:13:46] [I] Creating pfe context 
[04/11/2022-09:13:46] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 761 MiB, GPU 889 MiB
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 761, GPU 897 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 761, GPU 905 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 761 MiB, GPU 1299 MiB
[04/11/2022-09:13:46] [I] Creating rpn context 
[04/11/2022-09:13:46] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 823 MiB, GPU 1362 MiB
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 823, GPU 1370 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 823, GPU 1378 (MiB)
[04/11/2022-09:13:46] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 823 MiB, GPU 2109 MiB
===========FilePath[0/10]:../../lidars/seq_0_frame_100.bin==============
[04/11/2022-09:13:47] [I] [INFO] pointNum : 177125
Success to read and Point Num  Is: 177125
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
Aborted (core dumped)

Is this an error in bin file creation?

First of all, thank you for your project.

I converted waymo dataset to pkl file according to this guide.
And I created a bin file with your "generate_input_data.py".

When I used these bin files, the detection result did not reach your sample result.
Again, the detection results of your samples (seq_0_frame_101.bin and seq_0_frame_100.bin) were very good.

Is this an error in bin file creation?
If my PC environment was a problem, the detection results of seq_0_frame_101.bin and seq_0_frame_100.bin would not be as good as the sample.

The environment in which I created the bin file is as follows.
And I attach the bin file and the result that I used.

dataset: waymo open dataset v1.2.0
Waymo-open-dataset devkit: waymo-open-dataset-tf-2.11.0==1.5.0
Command
./build/centerpoint --pfeOnnxPath=models/pfe_baseline32000.onnx --rpnOnnxPath=models/rpn_baseline.onnx --savePath=results --filePath=lidars --fp16

seq_0_frame_197.bin.txt
seq_0_frame_197.zip

performance of the tensorrt model on Waymo

Hello, I am trying to evaluate the performance of the provided tensorrt model on waymo validation set, but not getting reasonable numbers. The data I used is from the preprocessed version of https://github.com/tianweiy/CenterPoint, which is also *.bin format.
I wonder if the provided model is only for evaluating the running time but not the mAP?

Hey, I have written a python script to show the detection results

If anyone needs to show the detection results, you can find the python script here:

https://github.com/xiaxinkai/CenterPointTensorRTDisplay

Thank you!

File Size Error! 1

pointNum : {51916}[01/18/2022-11:34:29] [E] [Error] File Size Error! 1038336
[01/18/2022-11:34:29] [I] [INFO] pointNum : 51916
Success to read and Point Num Is: 51916

backbone when exporting to onnx

Is backbone included when exporting to onnx? I can't seem to find it.

inference time about int8 pfe

Thank you for your excellent work, @Abraham423

In the Computation Speed session of README.md, I noticed that int8 mode doesn't run faster than fp32/fp16 mode for pfe module.

Do you know what is the reason?

convert model core dumped

hello:
convert model from onnx to engine：pfe can convert, but fpn can not convert.
[03/23/2023-15:04:59] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[03/23/2023-15:04:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[03/23/2023-15:04:59] [TRT] [W] Check verbose logs for the list of affected weights.
[03/23/2023-15:04:59] [TRT] [W] - 41 weights are affected by this issue: Detected subnormal FP16 values.
[03/23/2023-15:04:59] [TRT] [W] - 21 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
deserialize the engine . . .
[03/23/2023-15:04:59] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
context_rpn <tensorrt.tensorrt.IExecutionContext object at 0x7f77521fa458>
Segmentation fault (core dumped)

run C++ tensorRT: the fpn also core dumped

ONNX model incorrect!

This your pfe onnx model visual dispaly.

This my pfe onnx model visual dispaly

I think my model is not correct. Why did my action produce an incorrect model?

Question About PointDim Waymo

Hi, I'm new to point cloud processing. I want to know why is the point dim here is 5? Is it x,y,z,i and what is the 5th is it elongation?
And do you have idea what to put in feature if the lidar dont provide elongation?

Thanks

Computation Speed on GTX 1080

Hi Abraham,

First of all, thank you for your excellent work.

GPU: GTX 1080
GPU driver: 460.91.03
Cuda: 11.1
tensorRT: 8.0.1

I got that error:
The engine plan file is generated on an incompatible device, expecting compute 6.1 got compute 8.6, please rebuild.

I rebuild engine files according to #3 .

code output:
[01/26/2022-08:11:39] [I] Building and running a GPU inference engine for CenterPoint
[01/26/2022-08:11:39] [I] Building pfe engine . . .
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +153, GPU +0, now: CPU 168, GPU 198 (MiB)
[01/26/2022-08:11:39] [I] Create ICudaEngine !
[01/26/2022-08:11:39] [I] [TRT] Loaded engine size: 0 MB
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 168 MiB, GPU 198 MiB
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +183, GPU +74, now: CPU 351, GPU 272 (MiB)
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +179, GPU +70, now: CPU 530, GPU 342 (MiB)
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 529, GPU 326 (MiB)
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 529 MiB, GPU 326 MiB
[01/26/2022-08:11:39] [I] Building rpn engine . . .
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 580, GPU 326 (MiB)
[01/26/2022-08:11:39] [I] Create ICudaEngine !
[01/26/2022-08:11:39] [I] [TRT] Loaded engine size: 51 MB
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 580 MiB, GPU 326 MiB
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 581, GPU 386 (MiB)
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 581, GPU 394 (MiB)
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 581, GPU 378 (MiB)
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 581 MiB, GPU 378 MiB
[01/26/2022-08:11:39] [I] All has Built !
[01/26/2022-08:11:39] [I] Creating pfe context
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 560 MiB, GPU 412 MiB
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 560, GPU 420 (MiB)
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 560, GPU 428 (MiB)
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 560 MiB, GPU 824 MiB
[01/26/2022-08:11:39] [I] Creating rpn context
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 622 MiB, GPU 888 MiB
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 622, GPU 896 (MiB)
[01/26/2022-08:11:39] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 622, GPU 906 (MiB)
[01/26/2022-08:11:39] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[01/26/2022-08:11:39] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 622 MiB, GPU 1638 MiB
===========FilePath[0/10]:../../lidars/seq_0_frame_100.bin==============
[01/26/2022-08:11:39] [I] [INFO] pointNum : 177125
Success to read and Point Num Is: 177125
Num boxes before 1315
Num boxes after 500
===========FilePath[1/10]:../../lidars/seq_0_frame_101.bin==============
[01/26/2022-08:11:39] [I] [INFO] pointNum : 175893
Success to read and Point Num Is: 175893
Num boxes before 1372
Num boxes after 500
===========FilePath[2/10]:../../lidars/seq_0_frame_102.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 177130
Success to read and Point Num Is: 177130
Num boxes before 1483
Num boxes after 500
===========FilePath[3/10]:../../lidars/seq_0_frame_103.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 176870
Success to read and Point Num Is: 176870
Num boxes before 1400
Num boxes after 500
===========FilePath[4/10]:../../lidars/seq_0_frame_104.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 174146
Success to read and Point Num Is: 174146
Num boxes before 1503
Num boxes after 500
===========FilePath[5/10]:../../lidars/seq_0_frame_105.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 172962
Success to read and Point Num Is: 172962
Num boxes before 1481
Num boxes after 500
===========FilePath[6/10]:../../lidars/seq_0_frame_106.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 172704
Success to read and Point Num Is: 172704
Num boxes before 1367
Num boxes after 500
===========FilePath[7/10]:../../lidars/seq_0_frame_107.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 171648
Success to read and Point Num Is: 171648
Num boxes before 1429
Num boxes after 500
===========FilePath[8/10]:../../lidars/seq_0_frame_108.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 170759
Success to read and Point Num Is: 170759
Num boxes before 1376
Num boxes after 500
===========FilePath[9/10]:../../lidars/seq_0_frame_109.bin==============
[01/26/2022-08:11:40] [I] [INFO] pointNum : 168130
Success to read and Point Num Is: 168130
Num boxes before 1404
Num boxes after 500
[01/26/2022-08:11:40] [I] Average PreProcess Time: 6.97109 ms
[01/26/2022-08:11:40] [I] Average PfeInfer Time: 30.7645 ms
[01/26/2022-08:11:40] [I] Average ScatterInfer Time: 0.374614 ms
[01/26/2022-08:11:40] [I] Average RpnInfer Time: 54.2483 ms
[01/26/2022-08:11:40] [I] Average PostProcess Time: 7.86936 ms
[01/26/2022-08:11:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 627, GPU 1572 (MiB)
[01/26/2022-08:11:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 565, GPU 810 (MiB)
&&&& PASSED TensorRT.sample_onnx_centerpoint [TensorRT v8001] # ./centerpoint
Free Variables .

I wonder if average times are ok. What do you think?

export_onnx set to true

why do you skip most part in forward function when export_onnx set to true? won't it affect the result?

why did my detection output so weak unlike fp_det.gif showed?

I coverted default onnx model to tensorrt, and using waymo newest dataset,
which is "individual_files_validation_segment-10203656353524179475_7625_000_7645_000_with_camera_labels.tfrecord".
my result are shown as below

few or no detection box.
and your result showed good detection result as below

Could you please show me what cause this different result?

Compared with pointpillar

Hi, I'm a new user of TensroRT. I wonder how much mAP centerpoint-pointpillar increase and how much FPS it decrease compared to pointpillar.

Why postprocess need 3DNMS?

why did postprocess needed 3DNMS?

for same type or differenct type?

3d tracker sort

Will you provide C++ implemention of 3d sort tracker? Thank you.

nvidia pytorch_quantization package in8 quantization

Hi, have you tried nvidia's pytorch_quantization package for Quantization aware training?

Cuda Runtime (context is destroyed)

Thanks for your great work!
I got this error when I tried to generate TRT engine using create_engine.py. I am using cuda 11.3 and TRT 8.0.1.6. Any suggestions?

root@desktop:/home/CenterPoint/tools# python3 create_engine.py         --config waymo_centerpoint_pp_two_pfn_stride1_3x.py         --pfe_onnx_path ../models/pfe_baseline32000.onnx         --rpn_onnx_path ../mod
els/rpn_baseline.onnx         --pfe_engine_path pfe.engine         --rpn_engine_path rpn.engine;                                                                                                                                                              
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.                                                                          
building pfe trt engine . . .                                                                                                                                                                                                                                 
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead                                                                                                                                                                                
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
deserialize the engine . . .                                                                                                                                                                                                                                  
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
context_pfe <tensorrt.tensorrt.IExecutionContext object at 0x7fee2f8ba8b0>                                                                                                                                                                                    
[TensorRT] WARNING: The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. TensorRT maintains only a single logger pointer at any given time, so the existing value, which can be retrieve
d with getLogger(), will be used instead. In order to use a new logger, first destroy all existing builder, runner or refitter objects.                                                                                                                       
                                                                                                                                                                                                                                                              
[TensorRT] WARNING: Tensor DataType is determined at build time for tensors not marked as input or output.                                                                                                                                                    
building rpn trt engine . . .                                                                                                                                                                                                                                 
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead                                                                                                                                                                                
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
deserialize the engine . . .                                                                                                                                                                                                                                  
[TensorRT] WARNING: The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. TensorRT maintains only a single logger pointer at any given time, so the existing value, which can be retrieve
d with getLogger(), will be used instead. In order to use a new logger, first destroy all existing builder, runner or refitter objects.                                                                                                                       
                                                                                                                                                                                                                                                              
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2                                                                                                                                                    
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0                                                                                                                                                                            
context_rpn <tensorrt.tensorrt.IExecutionContext object at 0x7fee2f8baa70>                                                                                                                                                                                    
[TensorRT] ERROR: 1: [hardwareContext.cpp::terminateCommonContext::141] Error Code 1: Cuda Runtime (context is destroyed)                                                                                                                                     
Segmentation fault (core dumped)

How to evaluate the mAP

Hello,
Firstly, thanks for your wonderful work, I managed to reproduce the test results.

I have a small question: in your readme, there are references to mAP data, but I can't find anything about it in this project. How can I figure out the mAP?
Thank you!

PostProcess time is too long

Hello,I tested your postprocessGPU function.the result is 60+ms! in nvidia A6000， then I test cuda fucntion "sort_by_key" and "_raw_nms_gpu"，it costs average 7.5ms "sort_by_key" every taskidx and 1~7ms "_raw_nms_gpu",so why it costs so long time

What is the input for `tools/generate_input_data.py`

As mentioned in title, when I run inference, I need give the parameter --filePath=/PATH/TO/DATA. However, you see filePath refers to input bin files generated by tools/generate_input_data.py. So, what is the input data for tools/generate_input_data.py.

Looks for your reply!
Thx!

Original checkpoint.pth file

Hello,
I have successfully run your CenterPoint TensorRT project with great results and would like to thank you from the bottom of my heart for sharing your excellent work!

As the waymo dataset is too large and I don't have a GPU device at hand that can support the training samples, could you please share your checkpoint.pth file in your busy schedule, I would like to learn to run through the steps of exporting to onnx and test the effect of porting your project to run on our lab AI development board.

Thanks!

Segmentation fault (core dumped)

(centerpoint) yixin@yixin:~/Desktop/CenterPoint$ python3 tools/export_onnx.py \

--config tools/waymo_centerpoint_pp_two_pfn_stride1_3x.py
--ckpt tools/centerpoint_pp_36.pth
--pfe_save_path /home/yixin/Desktop/CenterPoint/models/pfezyx.onnx
--rpn_save_path /home/yixin/Desktop/CenterPoint/models/rpnzyx.onnx
Import spconv fail, no support for sparse convolution!
iou3d cuda not built. You don't need this if you use circle_nms. Otherwise, refer to the advanced installation part to build this cuda extension
no apex
2022-06-18 23:18:09.004104: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Segmentation fault (core dumped)

rtx3060 tensort 8.0.1 cuda 11.0

how to show the result

velocity

I used waymo tracker from CenterPoint, How could I obtain velocity from the model? Thanks

error: 'class nvinfer1::IBuilder' has no member named 'buildSerializedNetwork'

i want to carry out the trt engin inference according to your opencode(https://github.com/Abraham423/CenterPoint)

but i can't make successfully.

when i run cmake .. && make
always get the following errors,can you give me a compiled docker image ? or tell me the details about how to solve it:

src/centerpoint.cpp:134:48: error: 'class nvinfer1::IBuilder' has no member named 'buildSerializedNetwork'

134 | SampleUniquePtr plan{builder->buildSerializedNetwork(*network, *config)};

  |                                                ^~~~~~~~~~~~~~~~~~~~~~

/data/CUDA-PointPillars-main/CenterPoint_int8/src/centerpoint.cpp:134:89: error: no matching function for call to 'std::unique_ptr<nvinfer1::IHostMemory, samplesCommon::InferDeleter>::unique_ptr()'

134 | SampleUniquePtr plan{builder->buildSerializedNetwork(*network, *config)};

How to download your requirement document because it will report an error.

dim error when infer by Tensorrt

[01/12/2023-10:56:18] [I] [TRT] Successfully created plugin: ScatterND
[01/12/2023-10:56:18] [E] [TRT] MatMul_177: last dimension of input0 = 15 and second to last dimension of input1 = 10 but must match.
[01/12/2023-10:56:18] [E] [TRT] ModelImporter.cpp:720: While parsing node number 177 [MatMul -> "195"]:
[01/12/2023-10:56:18] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[01/12/2023-10:56:18] [E] [TRT] ModelImporter.cpp:722: input: "193"
input: "229"
output: "195"
name: "MatMul_177"
op_type: "MatMul"

[01/12/2023-10:56:18] [E] [TRT] ModelImporter.cpp:723: --- End node ---

About velocity

Hi, I wonder how you obtain velocity in your implementation. I read the paper and it states that results from two time-steps are compared to calculate the velocity but I don't see it in your code. Thanks.

Paper statement: velocity estimate is special, as it requires two input map-views the current and previous time-step. It predicts
the difference in object position between the current and the past frame.

大佬，如果用nuscenes，需要改哪些地方？

(^_^)

/usr/bin/ld: libpointpillars.so: undefined reference to `createNvOnnxParser_INTERNAL'

I try six times ,alway has this error ,help!!!

ONNX creation

How did you obtain those onnx files? Could you please provide guidance on that?

About box visualization

I want to test your model on the sample bin data you provided in the repo. I am trying to visualize the prediction boxes using open3d library but boxes seem incorrect such that person box in blue color is lying horizontally like car while car boxes in red color is displayed correctly. I used the following code to visualize results:

annotation_file = open('result/seq_0_frame_100.bin.txt', 'r')
boxes = []
lines = annotation_file.readlines()
for line in lines:
    tokens = line.split()
    x = float(tokens[0])
    y = float(tokens[1])
    z = float(tokens[2])
    h = float(tokens[3])
    w = float(tokens[4])
    l = float(tokens[5])
    velX = float(tokens[6])
    velY = float(tokens[7])
    theta = float(tokens[8])
    score = float(tokens[9])
    cls = int(tokens[10])
    if score > 0.2:
        boxes.append([x,y,z,h,w,l,theta,cls])
        
        #box = [h,w,l,x,y,z,rot]

def roty(t):
    """
    Rotation about the y-axis.
    """
    c = np.cos(t)
    s = np.sin(t)
    return np.array([[c, 0, s],
                     [0, 1, 0],
                     [-s, 0, c]])

def box_center_to_corner(box):
    translation = box[0:3]
    h, w, l = box[3], box[4], box[5]
    #if the angle value is in radian then use below mentioned conversion
    #rotation_y = box[6]
    #rotation = rotation_y * (180/math.pi)                             #rad to degree
    rotation = box[6]

    # Create a bounding box outline if x,y,z is center point then use defination bounding_box as mentioned below
    bounding_box = np.array([
        [-l/2, -l/2, l/2, l/2, -l/2, -l/2, l/2, l/2],
        [w/2, -w/2, -w/2, w/2, w/2, -w/2, -w/2, w/2],
        [-h/2, -h/2, -h/2, -h/2, h/2, h/2, h/2, h/2]])
                      

    # Standard 3x3 rotation matrix around the Z axis
    rotation_matrix = np.array([
        [np.cos(rotation), -np.sin(rotation), 0.0],
        [np.sin(rotation), np.cos(rotation), 0.0],
        [0.0, 0.0, 1.0]])

    # Repeat the [x, y, z] eight times
    eight_points = np.tile(translation, (8, 1))

    # Translate the rotated bounding box by the
    # original center position to obtain the final box
    corner_box = np.dot(rotation_matrix, bounding_box) + eight_points.transpose()

    return corner_box.transpose()
    
    entities_to_draw = []
    
for box in boxes:
    boxes3d_pts = box_center_to_corner(box)
    boxes3d_pts = boxes3d_pts.T
    boxes3d_pts = o3d.utility.Vector3dVector(boxes3d_pts.T)
    box3d = o3d.geometry.OrientedBoundingBox.create_from_points(boxes3d_pts)
    if box[-1] == 0:
        box3d.color = [1, 0, 0]           #Box color would be red box.color = [R,G,B]
    elif box[-1] == 1:
        box3d.color = [0, 0, 1]
    else:
        box3d.color = [0, 1, 0]
    entities_to_draw.append(box3d)

I will appreciate it if you provide a feedback on this issue. Thank you.

Test with ouster lidar

I would like to test model on ouster lidar data. How can I set the following parameters? what is the difference between two X_MIN, and X_CENTER_MIN etc and what is X_STEP?

// pillar size 
#define X_STEP 0.32f
#define Y_STEP 0.32f
#define X_MIN -74.88f
#define X_MAX 74.88f
#define Y_MIN -74.88f
#define Y_MAX 74.88f
#define Z_MIN -2.0f
#define Z_MAX 4.0f

#define X_CENTER_MIN -80.0f
#define X_CENTER_MAX 80.0f
#define Y_CENTER_MIN -80.0f
#define Y_CENTER_MAX 80.0f
#define Z_CENTER_MIN -10.0f
#define Z_CENTER_MAX 10.0f

#define PI 3.141592653f
// paramerters for preprocess
#define BEV_W 468
#define BEV_H 468
#define MAX_PILLARS 32000 //20000 //32000
#define MAX_PIONT_IN_PILLARS 20
#define FEATURE_NUM 10
#define PFE_OUTPUT_DIM 64
#define THREAD_NUM 4

How do we toggle GPU preprocessing?

[09/17/2023-00:43:46] [I] Average PreProcess Time: 15.7913 ms
[09/17/2023-00:43:46] [I] Average PfeInfer Time: 9.98707 ms
[09/17/2023-00:43:46] [I] Average ScatterInfer Time: 0.410112 ms
[09/17/2023-00:43:46] [I] Average RpnInfer Time: 18.2958 ms
[09/17/2023-00:43:46] [I] Average PostProcess Time: 3.27232 ms

These are my times as seen above. Based on the times, it seems like the CPU is handling preprocessed. How do I ensure the GPU is handling the preprocessing?

Why two subgraphs instead of one whole graph

In the README, it says "Here we extract two pure nn models from the whole computation graph---pfe and rpn, this is to make it easier for trt to optimize its inference engines, and we use cuda to connect these nn engines."

Is there any repo/docu/link/tutorial that supports this argument? i.e., why is it easier for trt to optimize its inference engines? (one onnx vs two onnxs)

The engine plan file is generated on an incompatible device, expecting compute 7.0 got compute 8.6, please rebuild.

I use V100，please provide the origin model ，thank you