Code Monkey home page Code Monkey logo

solov2-tensorrt-cpp's Introduction

Solov2-TensorRT-CPP

in this repo, we deployed SOLOv2 to TensorRT with C++. See the video. solov2_cpp

Requirements

  • Ubuntu 16.04/18.04/20.04
  • Cuda10.2
  • Cudnn8
  • TensorRT8.0.1
  • OpenCV 3.4
  • Libtorch 1.8.2
  • CMake 3.20

Acknowledge

SOLO SOLOv2.tensorRT

Getting Started

1. Install Solov2 from SOLO

Download,and run it successfully

2. Export the ONNX model from original model

baidudisk, Fetch Code:qdsm

  • Export models by yourself

That is, before export, you have to modify some parts of the original SOLOv2 first:

  • 2.1. Modify SOLO-master/mmdet/models/anchor_heads/solov2_head.py:154:0

Original code of solov2_head.py is:

# Origin from SOLO
x_range = torch.linspace(-1, 1, ins_feat.shape[-1], device=ins_feat.device)
y_range = torch.linspace(-1, 1, ins_feat.shape[-2], device=ins_feat.device)
y, x = torch.meshgrid(y_range, x_range)
y = y.expand([ins_feat.shape[0], 1, -1, -1])
x = x.expand([ins_feat.shape[0], 1, -1, -1])

change to:

#Modify for onnx export, frozen the input size = 800x800, batch size = 1
size = {0: 100, 1: 100, 2: 50, 3: 25, 4: 25}
feat_h, feat_w = ins_kernel_feat.shape[-2], ins_kernel_feat.shape[-1]
feat_h, feat_w = int(feat_h.cpu().numpy() if isinstance(feat_h, torch.Tensor) else feat_h), int(feat_w.cpu().numpy() if isinstance(feat_w, torch.Tensor) else feat_w)
x_range = torch.linspace(-1, 1, feat_w, device=ins_kernel_feat.device)
y_range = torch.linspace(-1, 1, feat_h, device=ins_kernel_feat.device)
y, x = torch.meshgrid(y_range, x_range)
y = y.expand([1, 1, -1, -1])
x = x.expand([1, 1, -1, -1])
coord_feat = torch.cat([x, y], 1)
ins_kernel_feat = torch.cat([ins_kernel_feat, coord_feat], 1)
  • 2.2 Modify SOLO-master/mmdet/models/detectors/single_stage_ins.py

In the function named forward_dummy(), add the forward_dummy of mask, such as :

def forward_dummy(self, img):
        x = self.extract_feat(img)
        outs = self.bbox_head(x)
        if self.with_mask_feat_head:
            mask_feat_pred = self.mask_feat_head(
                x[self.mask_feat_head.start_level:self.mask_feat_head.end_level + 1])
            outs = (outs[0], outs[1], mask_feat_pred)
        return outs
  • 2.3 Modify SOLO-master/mmdet/models/mask_heads/mask_feat_head.py

In line 108 of mask_feat_head.py, original code is:

x_range = torch.linspace(-1, 1, input_feat.shape[-1], device=input_feat.device)
y_range = torch.linspace(-1, 1, input_feat.shape[-2], device=input_feat.device)

change to:

feat_h, feat_w = input_feat.shape[-2], input_feat.shape[-1]  # shape get tensor during onnx.export()
feat_h, feat_w = int(feat_h.cpu().numpy() if isinstance(feat_h, torch.Tensor) else feat_h), \
int(feat_w.cpu().numpy() if isinstance(feat_w, torch.Tensor) else feat_w)
x_range = torch.linspace(-1, 1, feat_w, device=input_feat.device)
y_range = torch.linspace(-1, 1, feat_h, device=input_feat.device)
  • 2.4 Export onnx model

Move the onnx_exporter.py and common.py to the SOLO/demo/, then run

#kitti size
python onnx_exporter.py ../configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py ../weights/SOLOv2_light_R34.onnx --checkpoint ../checkpoints/SOLOv2_LIGHT_448_R34_3x.pth --shape 384 1152

3. build the tensorrt model

First, edit the config file:config.yaml

%YAML:1.0

IMAGE_WIDTH: 1226
IMAGE_HEIGHT: 370

#SOLO
ONNX_PATH: "/home/chen/ws/dynamic_ws/src/dynamic_vins/weights/solo/SOLOv2_light_R34_1152x384_cuda102.onnx"
SERIALIZE_PATH: "/home/chen/ws/dynamic_ws/src/dynamic_vins/weights/solo/tensorrt_model_1152x384.bin"

SOLO_NMS_PRE: 500
SOLO_MAX_PER_IMG: 100
SOLO_NMS_KERNEL: "gaussian"
#SOLO_NMS_SIGMA=2.0
SOLO_NMS_SIGMA: 2.0
SOLO_SCORE_THR: 0.1
SOLO_MASK_THR: 0.5
SOLO_UPDATE_THR: 0.2

LOG_PATH: "./segmentor_log.txt"
LOG_LEVEL: "debug"
LOG_FLUSH: "debug"

DATASET_DIR: "/media/chen/EC4A17F64A17BBF0/datasets/kitti/odometry/colors/07/image_2/"
WARN_UP_IMAGE_PATH: "/home/chen/CLionProjects/InstanceSegment/config/kitti.png"

and then,compile the CMake project:

mkdir build && cd build

cmake ..

make -j10

Finally, build the tensorrt model:

cd ..
./build/build_model ./config/config.yaml

4. run the demo

if you have the KITTI dataset, set config.yaml with right path DATASET_DIR ,run:

./build/segment ./config/config.yaml

if you not , and just want run at a image, set config.yaml with right image path kWarnUpImagePath, then run :

./build/demo ./config/config.yaml

solov2-tensorrt-cpp's People

Contributors

chenjianqu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

solov2-tensorrt-cpp's Issues

compile error

Hello teacher, when I compile your code, I always get this error (c++ 14/17): /home/hermione/library/libtorch/include/ATen/TensorIterator.h:200:3: error: reference to 'DeviceType' is ambiguous
DeviceType device_type(int arg=0) const { return device(arg).type(); }.
Do you know the reason for this, I am using libtorch1.82+cuda10.2.

运行代码出错

您好,我在执行./build/segment ./config/config.yaml出现GetQueueShapeIndex failed:[1, 128, 160, 120],请问是什么原因?

solov2_head.py:

#Modify for onnx export, frozen the input size = 800x800, batch size = 1
size = {0: 100, 1: 100, 2: 50, 3: 25, 4: 25}

这句size是干啥的

not use libtorch for deployment solov2 tensorrt?

hi,professor:
is there any possible for deployment solov2 don't use libtorch? just use tensorrt deserialize api, then write some postprocess code? beacause it has fewer dependence, install libtorch on jetson isn't friendly!
for your build_model , it can create raw tensorrt engine , so the demo can just read it as file,and then create tensorrt context,then deploy,
so, please help!

onnx转换失败

我发现使用onnx转换后,输出是11个,不是代码里面的三个,而且我用shape 768 1344去转换,直接报错
Traceback (most recent call last):
File "onnx_exporter.py", line 231, in
check(args, dummy_input, check_onnx=True, check_trt=False)
File "onnx_exporter.py", line 126, in check
sess = rt.InferenceSession(args.out)
File "/home/tao/anaconda3/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 195, in init
self._create_inference_session(providers, provider_options)
File "/home/tao/anaconda3/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 205, in _create_inference_session
sess.initialize_session(providers or [], provider_options or [])
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_637) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=49 Target=48 Dimension=2

这个是我在single_stage_ins 加了你介绍的代码后出现的,不加就不会报错,但输出就变成10个了,求教

cate_scores.prob error

根据你的方法导出的tensorrt模型,存在计算scores大于1的情况,请问这个可能是什么原因导致的?

Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr

When i run "./build/segment ./config/config.yaml", i get an Error "[E] [TRT] 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr)", and what maybe the reason?
If i get the right ONNX and tensorrt_model_bin?
This is my output information of running "build_model" and "demo":

./build/build_model ./config/config.yaml

/Solov2-TensorRT-CPP/cmake-build-debug/build_model ./config/config.yaml
config_file:./config/config.yaml
createInferBuilder
[05/25/2022-22:57:19] [I] [TRT] [MemUsageChange] Init CUDA: CPU +299, GPU +0, now: CPU 301, GPU 309 (MiB)
createNetwork
createBuilderConfig
createParser
parseFromFile:
/Solov2-TensorRT-CPP/ONNX/SOLOv2_light_R34.onnx
[05/25/2022-22:57:19] [I] [TRT] ----------------------------------------------------------------
[05/25/2022-22:57:19] [I] [TRT] Input filename: ~/Solov2-TensorRT-CPP/ONNX/SOLOv2_light_R34.onnx
[05/25/2022-22:57:19] [I] [TRT] ONNX IR version: 0.0.4
[05/25/2022-22:57:19] [I] [TRT] Opset version: 11
[05/25/2022-22:57:19] [I] [TRT] Producer name: pytorch
[05/25/2022-22:57:19] [I] [TRT] Producer version: 1.3
[05/25/2022-22:57:19] [I] [TRT] Domain:
[05/25/2022-22:57:19] [I] [TRT] Model version: 0
[05/25/2022-22:57:19] [I] [TRT] Doc string:
[05/25/2022-22:57:19] [I] [TRT] ----------------------------------------------------------------
[05/25/2022-22:57:19] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
input shape:input (1, 3, 384, 1152)
output shape:cate_pred (3872, 80)
enableDLA
buildEngineWithConfig
[05/25/2022-22:57:20] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 664 MiB, GPU 671 MiB
[05/25/2022-22:57:21] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +70, GPU +68, now: CPU 822, GPU 1012 (MiB)
[05/25/2022-22:57:21] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 822, GPU 1022 (MiB)
[05/25/2022-22:57:21] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[05/25/2022-22:57:24] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[05/25/2022-22:58:39] [I] [TRT] Detected 1 inputs and 13 output network tensors.
[05/25/2022-22:58:39] [I] [TRT] Total Host Persistent Memory: 274640
[05/25/2022-22:58:39] [I] [TRT] Total Device Persistent Memory: 83921920
[05/25/2022-22:58:39] [I] [TRT] Total Scratch Memory: 0
[05/25/2022-22:58:39] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 158 MiB, GPU 675 MiB
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1298, GPU 1635 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1298, GPU 1643 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1298, GPU 1627 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1297, GPU 1611 (MiB)
[05/25/2022-22:58:39] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1210 MiB, GPU 1381 MiB
serializeModel
done

Process finished with exit code 0


./build/demo ./config/config.yaml

~/Solov2-TensorRT-CPP/cmake-build-debug/segment ./config/config.yaml
config_file:./config/config.yaml
[05/25/2022-23:35:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +298, GPU +0, now: CPU 411, GPU 309 (MiB)
[05/25/2022-23:35:11] [I] [TRT] Loaded engine size: 81 MB
[05/25/2022-23:35:11] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 411 MiB, GPU 309 MiB
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +2140, GPU +980, now: CPU 2804, GPU 1731 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2804, GPU 1741 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2804, GPU 1725 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 2804 MiB, GPU 1725 MiB
[05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 2804 MiB, GPU 1725 MiB
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2804, GPU 1733 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2804, GPU 1741 (MiB)
[05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 2811 MiB, GPU 2166 MiB
[05/25/2022-23:35:23] [E] [TRT] 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr
)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: invalid argument
Exception raised from getDeviceFromPtr at ../aten/src/ATen/cuda/CUDADevice.h:13 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7f30d25c1b29 in ~/NVIDIA/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xd2 (0x7f30d25beab2 in /home/cqyd/NVIDIA/libtorch/lib/libc10.so)
frame #2: + 0x36d1ea7 (0x7f306f824ea7 in ~/NVIDIA/libtorch/lib/libtorch_cuda.so)
frame #3: + 0x7c87c (0x559bd949387c in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #4: + 0x7cdf1 (0x559bd9493df1 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #5: + 0x7d2a7 (0x559bd94942a7 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #6: + 0x7dff8 (0x559bd9494ff8 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #7: + 0x7e0b2 (0x559bd94950b2 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #8: + 0x79af5 (0x559bd9490af5 in ~/SOLOV2model/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #9: + 0x84b4d (0x559bd949bb4d in ~/SOLOV2model/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #10: + 0x84827 (0x559bd949b827 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #11: + 0x16f77 (0x559bd942df77 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)
frame #12: __libc_start_main + 0xf3 (0x7f306b82c083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: + 0x1606e (0x559bd942d06e in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Version issue

What are your versions of pytorch and tensorrt? My CUDA for 10.2 pytorch for 1.8 does not configure the environment

导出onnx文件出错

您好,在执行您步骤中的

python onnx_exporter.py ../configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py ../weights/SOLOv2_light_R34.onnx --checkpoint ../checkpoints/SOLOv2_LIGHT_448_R34_3x.pth --shape 384 1152

出现下面的报错,请问您知道什么原因吗?
RuntimeError: Given groups=1, weight of size [256, 258, 3, 3], expected input[1, 260, 40, 40] to have 258 channels, but got 260 channels instead

后面我更换了权重文件,换成了SOLOv2_LIGHT_512_DCN_R50_3x,执行您的命令没有报错,但是没有找到对应的.onnx文件的输出,请问为什么呢?

compile error!

hi,professor:
when i compile (build_model , i got error:
Solov2-TensorRT-CPP/InstanceSegment/common.h:388:9: error: ‘virtual nvinfer1::IBuilder::~IBuilder()’ is protected within this context
delete obj;
^~~~~~
why? my enviroment:
nvidia jetson tx2 with jetpack4.5.1
tensorrt 7.1.3
please help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.