ttanzhiqiang / onnx_tensorrt_project Goto Github PK

Support Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet. use darknet/libtorch/pytorch/mxnet to onnx to tensorrt

C++ 62.54% C 7.45% Python 15.33% Cuda 14.68%

batch-inference centerface centernet classify darknet libtorch mxnet onnx-tensorrt pytorch retinaface unet yolor yolov4 yolov5 yolox

onnx_tensorrt_project's Introduction

ONNX-TensorRT

Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet Implementation

Yolov4/Yolov3/Yolov5/yolor/YoloX

centernet

Unet

CenterFace

retinaface

INTRODUCTION

you have the trained model file from the darknet/libtorch/pytorch/mxnet

Features

inequal net width and height
batch inference

onnx-tensorrt batch inference : onnx re-export(batch:2)
support FP32(m_config.mode = 0),FP16(m_config.mode = 1),INT8(m_config.mode = 2)
dynamic input size(tiny_tensorrt_dyn_onnx)

BENCHMARK

window x64 (detect time)

model	size	gpu	fp32	fp16	INT8	GPU(MB)(FP32/FP16/INT8)
yolov3	608x608	2080ti	28.14ms	19.79ms	18.53ms	1382/945/778
yolov4	320x320	2080ti	8.85ms	6.62ms	6.33ms	1130/1075/961
yolov4	416x416	2080ti	12.19ms	10.20ms	9.35ms	1740/1193/1066
yolov4	512x512	2080ti	15.63ms	12.66ms	12.19ms	1960/1251/1218
yolov4	608x608	2080ti	24.39ms	17.54ms	17.24ms	1448/1180/1128
yolov4	320x320	3070	9.70ms	7.30ms	6.37ms	1393/1366/1238
yolov4	416x416	3070	14.08ms	9.80ms	9.70ms	1429/1394/1266
yolov4	512x512	3070	18.87ms	13.51ms	13.51ms	1485/1436/1299
yolov4	608x608	3070	28.57ms	19.60ms	18.52ms	1508/1483/1326
yolov4	320x320	1070	18.52ms	\	12.82ms	686//442
yolov4	416x416	1070	27.03ms	\	20.83ms	1480//477
yolov4	512x512	1070	34.48ms	\	27.03ms	1546//515
yolov4	608x608	1070	50ms	\	35.71ms	1272//584
yolov4	320x320	1660TI	16.39ms	11.90ms	10.20ms	1034/863/787
yolov4	416x416	1660TI	23.25ms	17.24ms	13.70ms	1675/1227/816
yolov4	512x512	1660TI	29.41ms	24.39ms	21.27ms	1906/1322/843
yolov4	608x608	1660TI	43.48ms	34.48ms	26.32ms	1445/1100/950
yolov5 5s	640x640	2080ti	24.47ms	22.46ms	22.38ms	720/666/652
yolov5 5m	640x640	2080ti	30.61ms	24.02ms	23.73ms	851/728/679
yolov5 5l	640x640	2080ti	32.58ms	25.84ms	24.44ms	1154/834/738
yolov5 5x	640x640	2080ti	40.69ms	29.81ms	27.19ms	1530/1001/827
yolor_csp_x	512x512	2080ti	27.89ms	20.54ms	18.71ms	2373/1060/853
yolor_csp	512x512	2080ti	21.30ms	18.06ms	17.03ms	1720/856/763
YOLOX-Nano	416x416	2080ti	6.84ms	6.81ms	6.69ms	795/782/780
YOLOX-Tiny	416x416	2080ti	7.86ms	7.13ms	6.73ms	823/798/790
YOLOX-S	640x640	2080ti	19.51ms	16.62ms	16.33ms	940/836/794
YOLOX-M	640x640	2080ti	23.35ms	18.67ms	17.87ms	919/716/684
YOLOX-L	640x640	2080ti	28.25ms	20.36ms	19.24ms	1410/855/769
YOLOX-Darknet53	640x640	2080ti	29.95ms	20.38ms	18.91ms	1552/928/772
YOLOX-X	640x640	2080ti	40.40ms	22.95ms	21.99ms	1691/1187/1020
darknet53	224*224	2080ti	3.53ms	1.84ms	1.71ms	1005/769/658
darknet53	224*224	3070	4.29ms	2.16ms	1.75ms	1227/1017/951
resnet18-v2-7	224*224	2080ti	1.89ms	1.29ms	1.18ms	878/655/624
unet	512*512	2080ti	20.91ms	17.01ms	16.05ms	1334/766/744
retinaface_r50	512x512	2080ti	12.33ms	8.96ms	8.22ms	1189/745/678
mnet.25	512x512	2080ti	6.90ms	6.32ms	6.23ms	782/603/615

x64(inference / detect time)

model	size	gpu	fp32(inference/detect)	fp16(inference/detect)	INT8(inference/detect)	GPU(MB)(FP32/FP16/INT8)
centernet	512x512	2080ti	17.8ms/39.7ms	15.7ms/36.49ms	14.37ms/36.34ms	1839/1567/1563
centerface	640x640	2080ti	5.56ms/11.79ms	4.23ms/10.89ms	/	854/646/640
centerface_bnmerged	640x640	2080ti	5.67ms/11.82ms	4.22ms/10.46ms	/	850/651/645

windows10

dependency : spdlog，onnx，onnx-tensorrt，protobuf-3.11.4，TensorRT 7.2.2.3 , cuda 11.1 , cudnn 8.0 , opencv3.4, vs2019
build:

open MSVC tiny_tensorrt_onnx.sln file

tiny_tensorrt_dyn_onnx:dynamic shape

tiny_tensorrt_onnx: normal
build onnx-tensorrt

step1: https://github.com/onnx/onnx-tensorrt.git

step2: https://drive.google.com/drive/folders/1DndiqyCZ796p3-xXI3O4AMCIGcUWQ1q2?usp=sharing or https://share.weiyun.com/CJCwngAM

step3: builtin_op_importers.cpp replace onnx-tensorrt\builtin_op_importers.cpp

step4: tortoiseGit->apply patch serial and choose 0001-Compile-onnx-tensorrt-by-MSVC-on-Windows.patch

look https://github.com/ttanzhiqiang/onnx-tensorrt_7.2.1

step5:build onnx.lib\onnx_proto.lib\nvonnxparser.dll\nvonnxparser_static.lib

Model and 3rdparty

model : https://drive.google.com/drive/folders/1KzBjmCOG9ghcq9L6-iqfz6QwBQq6Hl4_?usp=sharing or https://share.weiyun.com/td9CRDhW

3rdparty:https://drive.google.com/drive/folders/1SddUgQ5kGlv6dDGPqnVWZxgCoBY85rM2?usp=sharing or https://share.weiyun.com/WEZ3TGtb

API

struct Config
{
    std::string cfgFile = "configs/yolov3.cfg";

    std::string onnxModelpath = "configs/yolov3.onnx";

    std::string engineFile = "configs/yolov3.engine";

    std::string calibration_image_list_file = "configs/images/";

    std::vector<std::string> customOutput;

    int calibration_width = 0;

    int calibration_height = 0;
    
    int maxBatchSize = 1;

    int mode; //0，1，2

    //std::string calibration_image_list_file_txt = "configs/calibration_images.txt";
};

class YoloDectector
{
void init(Config config);
void detect(const std::vector<cv::Mat>& vec_image,
	std::vector<BatchResult>& vec_batch_result);
}

REFERENCE

https://github.com/onnx/onnx-tensorrt.git

https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleDynamicReshape

https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps

https://github.com/enazoe/yolo-tensorrt.git

https://github.com/zerollzeng/tiny-tensorrt.git

Contact

onnx_tensorrt_project's People

Contributors

Stargazers

Watchers

Forkers

shining-love sixgodgg dingjianfei ibrandiay hwijune wuqiangch chaucerg wangzhenlin123 xiaomujiang zhulei2016 amanda-barbara svwriting xuekunnan hebskjcc xinsuinizhuan mesa3 geo000 xiaowenhe max-liulin mathpopo sunchuanxi yingning reecezhu wstchhwp jiheng1982 jiangzongkang canaxx niluwin daipuwei manjatech liulongee aiwenzhu shidengfeng janus1984 leofengxin frankhe303 sean-wade wangxudong-cq jackyuan4334 ajunlonglive gaulle2005 zouwen198317 kirainstorm

onnx_tensorrt_project's Issues

More about installation

Hello,

Thanks for the great work!!!

Can one use ubuntu? or must it be on Windows?

If yes, please can you provide more information for non windows users.

Thanks once again

yolov5s在Pascal架构显卡（P4卡，1080卡）进行int8量化时，效果非常差

UNet training

Hi,

I trained a model with the public dataset but the result is strange. Could you please some tips for training.

Thanks.

I have one class and I set the classes param to 2

unet_model = Unet(encoder_name="resnet50", encoder_weights="imagenet", decoder_channels=(256, 128, 64, 32, 16),
                  in_channels=3, classes=2)

--width: 512
--height: 512
--epoch: 30
--batchsize: 2

dataset sample:
900 images

Result:

yolov5-v5的yolov5x模型，在python版本测试结果和该项目tensorrt下跑的结果不一致问题？

如题，不知道作者是否遇到过，训练完成的yolov5x模型，在python版本正确率为98%，但是转换为tensorrt后经过测试，正确率只有90%左右，其中模型转换过程log如下：
[09/27/2021-11:27:04] [I] Host Latency
[09/27/2021-11:27:04] [I] min: 11.3848 ms (end to end 21.3677 ms)
[09/27/2021-11:27:04] [I] max: 13.1256 ms (end to end 24.1753 ms)
[09/27/2021-11:27:04] [I] mean: 11.67 ms (end to end 21.9034 ms)
[09/27/2021-11:27:04] [I] median: 11.5836 ms (end to end 21.7285 ms)
[09/27/2021-11:27:04] [I] percentile: 12.5283 ms at 99% (end to end 23.6667 ms at 99%)
[09/27/2021-11:27:04] [I] throughput: 0 qps
[09/27/2021-11:27:04] [I] walltime: 3.03151 s
[09/27/2021-11:27:04] [I] Enqueue Time
[09/27/2021-11:27:04] [I] min: 1.04535 ms
[09/27/2021-11:27:04] [I] max: 4.6637 ms
[09/27/2021-11:27:04] [I] median: 1.61969 ms
[09/27/2021-11:27:04] [I] GPU Compute
[09/27/2021-11:27:04] [I] min: 10.8311 ms
[09/27/2021-11:27:04] [I] max: 12.5458 ms
[09/27/2021-11:27:04] [I] mean: 11.0955 ms
[09/27/2021-11:27:04] [I] median: 11.0142 ms
[09/27/2021-11:27:04] [I] percentile: 11.9821 ms at 99%
[09/27/2021-11:27:04] [I] total compute time: 3.01798 s
&&&& PASSED TensorRT.trtexec # trtexec.exe --onnx=best.onnx --saveEngine=best.engine --fp16

classify.cpp中多batchsize推理的结果不对

(https://github.com/ttanzhiqiang/onnx_tensorrt_project/tree/main/src/classify)/classify.cpp 143行开始

每个image都是同一个auto& tensor : m_OutputTensors，那每张图的结果都是一样了？？

yolov5 量化int8怎么操作？

如题，我看源码是支持int8的，想尝试一下，但是不知道步骤，大神有具体的步骤吗？我尝试时老是失败，是需要先生成int8的模型吗？还是怎么操作呢？

yolov5_detector.cpp在多批次下报错

如题，yolov5_detector.cpp在推理阶段时如果输入的batch是多个的话（即大于1）

那么会报以下错误：

请问有可能是什么原因导致的呢？

yolov3-ocr.cfg does not have down_stride

Hi, I tried to transfer yolo3-spp pt file to onnx, and here is the error:

Traceback (most recent call last):
File "Libtorch_yolo_to_onnx.py", line 779, in
main()
File "Libtorch_yolo_to_onnx.py", line 771, in main
model_def = builder.build_onnx_graph(
File "Libtorch_yolo_to_onnx.py", line 353, in build_onnx_graph
major_node_specs = self._make_onnx_node(layer_name, layer_dict)
File "Libtorch_yolo_to_onnx.py", line 426, in _make_onnx_node
node_creators[layer_type](layer_name, layer_dict)
File "Libtorch_yolo_to_onnx.py", line 729, in _make_yolo_node
down_stride = int(layer_dict['down_stride'])
KeyError: 'down_stride'

many thanks!

-Scott

how to run?

my windows eviroment is:
cuda 11.0
vs2019
how to run?
when i run, it tip me:
can not find the nvrtc64_111_0.dll

which third_model should i to recompile them?

Unet推理速度和占用显存问题

如题，我按照您的配置成功运行起来了unet.cpp，图像指定大小是512*512，显卡型号3060，mode是2（INT8）但是检测下来检测速度是在77ms左右，与您在此项目中的Benchmark中提到的16ms相差甚远，显存占用也有1.2G，请问根据您的经验来看我还有哪些地方没设置对？

yolov5x_fp32_batch_1.engine 是什么文件？如何生成？

您好，感谢您的开源工作，如题我现在已经转换得到onnx了，但是另一个文件yolov5x_fp32_batch_1.engine不知道怎么生成或者得到的，新手，希望大神能指导一下，谢谢

the retinaface is not right

I run the retinafece, it show this, so it is not right

weights 转onnx c++ 例子有吗

libtorch to onnx

is there a demo to convert libtorch nn::Module trained model to onnx model? python can convert it by torch.onnx.export, but in libtorch c++ can not export it. can you help me to solve this isssue?

yolo5生成失败：

reading calib cache: E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s.table
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
Detected 1 inputs and 7 output network tensors.
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
Starting Calibration.
dog.jpg 0
Calibrated batch 0 in 1.35348 seconds.
person.jpg 1
Calibrated batch 1 in 1.3612 seconds.
Post Processing Calibration data in 0.0005131 seconds.
Calibration completed in 20.8498 seconds.
reading calib cache: E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s.table
Writing Calibration Cache for calibrator: TRT-7203-MinMaxCalibration
writing calib cache: E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s.table size: 4711
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
C:\source\rtSafe\cuda\cudaConvolutionRunner.cpp (483) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionRunner::executeConv: 8 (CUDNN_STATUS_EXECUTION_FAILED)
C:\source\rtSafe\cuda\cudaConvolutionRunner.cpp (483) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionRunner::executeConv: 8 (CUDNN_STATUS_EXECUTION_FAILED)
[2021-08-08 11:28:54.729] [info] serialize engine to E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s_fp32_batch_1.engine
[2021-08-08 11:28:54.730] [error] engine is empty, save engine failed
[2021-08-08 11:28:54.731] [info] create execute context and malloc device memory...
[2021-08-08 11:28:54.731] [info] init engine...