chenjianqu / solov2-tensorrt-cpp Goto Github PK

View Code? Open in Web Editor NEW

34.0 3.0 12.0 1.64 MB

in this repo, we deployed SOLOv2 to TensorRT with C++.

License: MIT License

CMake 0.42% C++ 91.72% Python 7.85%

solov2-tensorrt-cpp's Introduction

Solov2-TensorRT-CPP

in this repo, we deployed SOLOv2 to TensorRT with C++. See the video.

Requirements

Ubuntu 16.04/18.04/20.04
Cuda10.2
Cudnn8
TensorRT8.0.1
OpenCV 3.4
Libtorch 1.8.2
CMake 3.20

Acknowledge

SOLO SOLOv2.tensorRT

Getting Started

1. Install Solov2 from SOLO

Download,and run it successfully

2. Export the ONNX model from original model

You can follow with SOLOv2.tensorRT.
Use a pre-exported model

baidudisk, Fetch Code:qdsm

Export models by yourself

That is, before export, you have to modify some parts of the original SOLOv2 first:

2.1. Modify SOLO-master/mmdet/models/anchor_heads/solov2_head.py:154:0：

Original code of solov2_head.py is:

# Origin from SOLO
x_range = torch.linspace(-1, 1, ins_feat.shape[-1], device=ins_feat.device)
y_range = torch.linspace(-1, 1, ins_feat.shape[-2], device=ins_feat.device)
y, x = torch.meshgrid(y_range, x_range)
y = y.expand([ins_feat.shape[0], 1, -1, -1])
x = x.expand([ins_feat.shape[0], 1, -1, -1])

change to:

#Modify for onnx export, frozen the input size = 800x800, batch size = 1
size = {0: 100, 1: 100, 2: 50, 3: 25, 4: 25}
feat_h, feat_w = ins_kernel_feat.shape[-2], ins_kernel_feat.shape[-1]
feat_h, feat_w = int(feat_h.cpu().numpy() if isinstance(feat_h, torch.Tensor) else feat_h), int(feat_w.cpu().numpy() if isinstance(feat_w, torch.Tensor) else feat_w)
x_range = torch.linspace(-1, 1, feat_w, device=ins_kernel_feat.device)
y_range = torch.linspace(-1, 1, feat_h, device=ins_kernel_feat.device)
y, x = torch.meshgrid(y_range, x_range)
y = y.expand([1, 1, -1, -1])
x = x.expand([1, 1, -1, -1])
coord_feat = torch.cat([x, y], 1)
ins_kernel_feat = torch.cat([ins_kernel_feat, coord_feat], 1)

2.2 Modify SOLO-master/mmdet/models/detectors/single_stage_ins.py

In the function named forward_dummy(), add the forward_dummy of mask, such as :

def forward_dummy(self, img):
        x = self.extract_feat(img)
        outs = self.bbox_head(x)
        if self.with_mask_feat_head:
            mask_feat_pred = self.mask_feat_head(
                x[self.mask_feat_head.start_level:self.mask_feat_head.end_level + 1])
            outs = (outs[0], outs[1], mask_feat_pred)
        return outs

2.3 Modify SOLO-master/mmdet/models/mask_heads/mask_feat_head.py

In line 108 of mask_feat_head.py, original code is:

x_range = torch.linspace(-1, 1, input_feat.shape[-1], device=input_feat.device)
y_range = torch.linspace(-1, 1, input_feat.shape[-2], device=input_feat.device)

change to:

feat_h, feat_w = input_feat.shape[-2], input_feat.shape[-1]  # shape get tensor during onnx.export()
feat_h, feat_w = int(feat_h.cpu().numpy() if isinstance(feat_h, torch.Tensor) else feat_h), \
int(feat_w.cpu().numpy() if isinstance(feat_w, torch.Tensor) else feat_w)
x_range = torch.linspace(-1, 1, feat_w, device=input_feat.device)
y_range = torch.linspace(-1, 1, feat_h, device=input_feat.device)

2.4 Export onnx model

Move the onnx_exporter.py and common.py to the SOLO/demo/, then run

#kitti size
python onnx_exporter.py ../configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py ../weights/SOLOv2_light_R34.onnx --checkpoint ../checkpoints/SOLOv2_LIGHT_448_R34_3x.pth --shape 384 1152

3. build the tensorrt model

First, edit the config file:config.yaml

%YAML:1.0

IMAGE_WIDTH: 1226
IMAGE_HEIGHT: 370

#SOLO
ONNX_PATH: "/home/chen/ws/dynamic_ws/src/dynamic_vins/weights/solo/SOLOv2_light_R34_1152x384_cuda102.onnx"
SERIALIZE_PATH: "/home/chen/ws/dynamic_ws/src/dynamic_vins/weights/solo/tensorrt_model_1152x384.bin"

SOLO_NMS_PRE: 500
SOLO_MAX_PER_IMG: 100
SOLO_NMS_KERNEL: "gaussian"
#SOLO_NMS_SIGMA=2.0
SOLO_NMS_SIGMA: 2.0
SOLO_SCORE_THR: 0.1
SOLO_MASK_THR: 0.5
SOLO_UPDATE_THR: 0.2

LOG_PATH: "./segmentor_log.txt"
LOG_LEVEL: "debug"
LOG_FLUSH: "debug"

DATASET_DIR: "/media/chen/EC4A17F64A17BBF0/datasets/kitti/odometry/colors/07/image_2/"
WARN_UP_IMAGE_PATH: "/home/chen/CLionProjects/InstanceSegment/config/kitti.png"

and then,compile the CMake project:

mkdir build && cd build

cmake ..

make -j10

Finally, build the tensorrt model:

cd ..
./build/build_model ./config/config.yaml

4. run the demo

if you have the KITTI dataset, set config.yaml with right path DATASET_DIR ,run:

./build/segment ./config/config.yaml

if you not , and just want run at a image, set config.yaml with right image path kWarnUpImagePath, then run :

./build/demo ./config/config.yaml

solov2-tensorrt-cpp's People

Contributors

Stargazers

Watchers

Forkers

wangzhenlin123 peterwkl2013 rongrong9005 elepherai enginbozaba jmock-sfl sinead-li hbelief1998 leofengxin guofenggitlearning yznmur yes-jumby

solov2-tensorrt-cpp's Issues

SOLO link broken

The link for install SOLO is broken, error: Page not found

追踪问题

感谢作者分享代码~
在追踪模块的运动参数滤波部分，使用的卡尔曼滤波模型和原始论文不一致：
https://github.com/chenjianqu/Tracking-Solov2-Deepsort/blob/master/InstanceTracking/KalmanTracker.cpp
计算匹配测度（目标相似性和几何距离）时，有时测度为负值。

同样的实现来自这里：
https://github.com/weixu000/libtorch-yolov3-deepsort

和原始 DeepSort-Python 相比，追踪结果不能和原始算法对齐。

Hello teacher, when I compile your code, I always get this error (c++ 14/17): /home/hermione/library/libtorch/include/ATen/TensorIterator.h:200:3: error: reference to 'DeviceType' is ambiguous
DeviceType device_type(int arg=0) const { return device(arg).type(); }.
Do you know the reason for this, I am using libtorch1.82+cuda10.2.

运行代码出错

您好，我在执行./build/segment ./config/config.yaml出现GetQueueShapeIndex failed:[1, 128, 160, 120]，请问是什么原因？

solov2_head.py:

#Modify for onnx export, frozen the input size = 800x800, batch size = 1
size = {0: 100, 1: 100, 2: 50, 3: 25, 4: 25}

这句size是干啥的

not use libtorch for deployment solov2 tensorrt?

hi,professor:
is there any possible for deployment solov2 don't use libtorch? just use tensorrt deserialize api, then write some postprocess code? beacause it has fewer dependence, install libtorch on jetson isn't friendly!
for your build_model , it can create raw tensorrt engine , so the demo can just read it as file,and then create tensorrt context,then deploy,
so, please help!

onnx转换失败

我发现使用onnx转换后，输出是11个，不是代码里面的三个，而且我用shape 768 1344去转换，直接报错
Traceback (most recent call last):
File "onnx_exporter.py", line 231, in
check(args, dummy_input, check_onnx=True, check_trt=False)
File "onnx_exporter.py", line 126, in check
sess = rt.InferenceSession(args.out)
File "/home/tao/anaconda3/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 195, in init
self._create_inference_session(providers, provider_options)
File "/home/tao/anaconda3/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 205, in _create_inference_session
sess.initialize_session(providers or [], provider_options or [])
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_637) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=49 Target=48 Dimension=2

这个是我在single_stage_ins 加了你介绍的代码后出现的，不加就不会报错，但输出就变成10个了，求教

cate_scores.prob error

根据你的方法导出的tensorrt模型，存在计算scores大于1的情况，请问这个可能是什么原因导致的？

Error in reasoning with the replacement model

Hello Professor,Using your model conversion file, I can't export the model with deformable convolution, what am I doing wrong?

Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr

When i run "./build/segment ./config/config.yaml", i get an Error "[E] [TRT] 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr)", and what maybe the reason?
If i get the right ONNX and tensorrt_model_bin?
This is my output information of running "build_model" and "demo":

./build/build_model ./config/config.yaml

/Solov2-TensorRT-CPP/cmake-build-debug/build_model ./config/config.yaml
config_file:./config/config.yaml
createInferBuilder
[05/25/2022-22:57:19] [I] [TRT] [MemUsageChange] Init CUDA: CPU +299, GPU +0, now: CPU 301, GPU 309 (MiB)
createNetwork
createBuilderConfig
createParser
parseFromFile:/Solov2-TensorRT-CPP/ONNX/SOLOv2_light_R34.onnx
[05/25/2022-22:57:19] [I] [TRT] ----------------------------------------------------------------
[05/25/2022-22:57:19] [I] [TRT] Input filename: ~/Solov2-TensorRT-CPP/ONNX/SOLOv2_light_R34.onnx
[05/25/2022-22:57:19] [I] [TRT] ONNX IR version: 0.0.4
[05/25/2022-22:57:19] [I] [TRT] Opset version: 11
[05/25/2022-22:57:19] [I] [TRT] Producer name: pytorch
[05/25/2022-22:57:19] [I] [TRT] Producer version: 1.3
[05/25/2022-22:57:19] [I] [TRT] Domain:
[05/25/2022-22:57:19] [I] [TRT] Model version: 0
[05/25/2022-22:57:19] [I] [TRT] Doc string:
[05/25/2022-22:57:19] [I] [TRT] ----------------------------------------------------------------
[05/25/2022-22:57:19] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
input shape:input (1, 3, 384, 1152)
output shape:cate_pred (3872, 80)
enableDLA
buildEngineWithConfig
[05/25/2022-22:57:20] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 664 MiB, GPU 671 MiB
[05/25/2022-22:57:21] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +70, GPU +68, now: CPU 822, GPU 1012 (MiB)
[05/25/2022-22:57:21] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 822, GPU 1022 (MiB)
[05/25/2022-22:57:21] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[05/25/2022-22:57:24] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[05/25/2022-22:58:39] [I] [TRT] Detected 1 inputs and 13 output network tensors.
[05/25/2022-22:58:39] [I] [TRT] Total Host Persistent Memory: 274640
[05/25/2022-22:58:39] [I] [TRT] Total Device Persistent Memory: 83921920
[05/25/2022-22:58:39] [I] [TRT] Total Scratch Memory: 0
[05/25/2022-22:58:39] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 158 MiB, GPU 675 MiB
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1298, GPU 1635 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1298, GPU 1643 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1298, GPU 1627 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1297, GPU 1611 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1210 MiB, GPU 1381 MiB
serializeModel
done

Process finished with exit code 0

./build/demo ./config/config.yaml

~/Solov2-TensorRT-CPP/cmake-build-debug/segment ./config/config.yaml
config_file:./config/config.yaml
[05/25/2022-23:35:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +298, GPU +0, now: CPU 411, GPU 309 (MiB)
[05/25/2022-23:35:11] [I] [TRT] Loaded engine size: 81 MB
[05/25/2022-23:35:11] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 411 MiB, GPU 309 MiB
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +2140, GPU +980, now: CPU 2804, GPU 1731 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2804, GPU 1741 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2804, GPU 1725 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 2804 MiB, GPU 1725 MiB
[05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 2804 MiB, GPU 1725 MiB
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2804, GPU 1733 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2804, GPU 1741 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 2811 MiB, GPU 2166 MiB
[05/25/2022-23:35:23] [E] [TRT] 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr
)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: invalid argument
Exception raised from getDeviceFromPtr at ../aten/src/ATen/cuda/CUDADevice.h:13 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7f30d25c1b29 in ~/NVIDIA/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xd2 (0x7f30d25beab2 in /home/cqyd/NVIDIA/libtorch/lib/libc10.so)
frame #2: + 0x36d1ea7 (0x7f306f824ea7 in ~/NVIDIA/libtorch/lib/libtorch_cuda.so)
frame #3: + 0x7c87c (0x559bd949387c in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #4: + 0x7cdf1 (0x559bd9493df1 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #5: + 0x7d2a7 (0x559bd94942a7 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #6: + 0x7dff8 (0x559bd9494ff8 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #7: + 0x7e0b2 (0x559bd94950b2 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #8: + 0x79af5 (0x559bd9490af5 in ~/SOLOV2model/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #9: + 0x84b4d (0x559bd949bb4d in ~/SOLOV2model/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #10: + 0x84827 (0x559bd949b827 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #11: + 0x16f77 (0x559bd942df77 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #12: __libc_start_main + 0xf3 (0x7f306b82c083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: + 0x1606e (0x559bd942d06e in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Version issue

What are your versions of pytorch and tensorrt? My CUDA for 10.2 pytorch for 1.8 does not configure the environment

导出onnx文件出错

您好，在执行您步骤中的

python onnx_exporter.py ../configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py ../weights/SOLOv2_light_R34.onnx --checkpoint ../checkpoints/SOLOv2_LIGHT_448_R34_3x.pth --shape 384 1152

出现下面的报错，请问您知道什么原因吗？
RuntimeError: Given groups=1, weight of size [256, 258, 3, 3], expected input[1, 260, 40, 40] to have 258 channels, but got 260 channels instead

后面我更换了权重文件，换成了SOLOv2_LIGHT_512_DCN_R50_3x，执行您的命令没有报错，但是没有找到对应的.onnx文件的输出，请问为什么呢？

编译的时候出现error: reference to ‘DeviceType’ is ambiguous 错误

/home/zhang/3drparty/libtorch/include/ATen/core/DeprecatedTypeProperties.h:37:3: error: reference to ‘DeviceType’ is ambiguous 37 | DeviceType device_type() const { | ^~~~~~~~~~
请问这时为什么？

compile error!

hi,professor:
when i compile (build_model , i got error:
Solov2-TensorRT-CPP/InstanceSegment/common.h:388:9: error: ‘virtual nvinfer1::IBuilder::~IBuilder()’ is protected within this context
delete obj;
^~~~~~
why? my enviroment:
nvidia jetson tx2 with jetpack4.5.1
tensorrt 7.1.3
please help!