Code Monkey home page Code Monkey logo

onnx_tensorrt_project's Introduction

ONNX-TensorRT

Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet Implementation

Yolov4/Yolov3/Yolov5/yolor/YoloX

centernet

Unet

CenterFace

retinaface

INTRODUCTION

you have the trained model file from the darknet/libtorch/pytorch/mxnet

  • yolov5-4.0(5s/5m/5s/5x)
  • yolov5-5.0(5s/5m/5s/5x)
  • yolov4 , yolov4-tiny
  • yolov3 , yolov3-tiny
  • yolor
  • YoloX
  • centernet
  • Unet
  • CenterFace
  • RetinaFace
  • classify(mnist\alexnet\resnet18\resnet34\resnet50\shufflenet_v2\mobilenet_v2)

Features

  • inequal net width and height

  • batch inference


    onnx-tensorrt batch inference : onnx re-export(batch:2)

  • support FP32(m_config.mode = 0),FP16(m_config.mode = 1),INT8(m_config.mode = 2)

  • dynamic input size(tiny_tensorrt_dyn_onnx)

BENCHMARK

window x64 (detect time)

model size gpu fp32 fp16 INT8 GPU(MB)(FP32/FP16/INT8)
yolov3 608x608 2080ti 28.14ms 19.79ms 18.53ms 1382/945/778
yolov4 320x320 2080ti 8.85ms 6.62ms 6.33ms 1130/1075/961
yolov4 416x416 2080ti 12.19ms 10.20ms 9.35ms 1740/1193/1066
yolov4 512x512 2080ti 15.63ms 12.66ms 12.19ms 1960/1251/1218
yolov4 608x608 2080ti 24.39ms 17.54ms 17.24ms 1448/1180/1128
yolov4 320x320 3070 9.70ms 7.30ms 6.37ms 1393/1366/1238
yolov4 416x416 3070 14.08ms 9.80ms 9.70ms 1429/1394/1266
yolov4 512x512 3070 18.87ms 13.51ms 13.51ms 1485/1436/1299
yolov4 608x608 3070 28.57ms 19.60ms 18.52ms 1508/1483/1326
yolov4 320x320 1070 18.52ms \ 12.82ms 686//442
yolov4 416x416 1070 27.03ms \ 20.83ms 1480//477
yolov4 512x512 1070 34.48ms \ 27.03ms 1546//515
yolov4 608x608 1070 50ms \ 35.71ms 1272//584
yolov4 320x320 1660TI 16.39ms 11.90ms 10.20ms 1034/863/787
yolov4 416x416 1660TI 23.25ms 17.24ms 13.70ms 1675/1227/816
yolov4 512x512 1660TI 29.41ms 24.39ms 21.27ms 1906/1322/843
yolov4 608x608 1660TI 43.48ms 34.48ms 26.32ms 1445/1100/950
yolov5 5s 640x640 2080ti 24.47ms 22.46ms 22.38ms 720/666/652
yolov5 5m 640x640 2080ti 30.61ms 24.02ms 23.73ms 851/728/679
yolov5 5l 640x640 2080ti 32.58ms 25.84ms 24.44ms 1154/834/738
yolov5 5x 640x640 2080ti 40.69ms 29.81ms 27.19ms 1530/1001/827
yolor_csp_x 512x512 2080ti 27.89ms 20.54ms 18.71ms 2373/1060/853
yolor_csp 512x512 2080ti 21.30ms 18.06ms 17.03ms 1720/856/763
YOLOX-Nano 416x416 2080ti 6.84ms 6.81ms 6.69ms 795/782/780
YOLOX-Tiny 416x416 2080ti 7.86ms 7.13ms 6.73ms 823/798/790
YOLOX-S 640x640 2080ti 19.51ms 16.62ms 16.33ms 940/836/794
YOLOX-M 640x640 2080ti 23.35ms 18.67ms 17.87ms 919/716/684
YOLOX-L 640x640 2080ti 28.25ms 20.36ms 19.24ms 1410/855/769
YOLOX-Darknet53 640x640 2080ti 29.95ms 20.38ms 18.91ms 1552/928/772
YOLOX-X 640x640 2080ti 40.40ms 22.95ms 21.99ms 1691/1187/1020
darknet53 224*224 2080ti 3.53ms 1.84ms 1.71ms 1005/769/658
darknet53 224*224 3070 4.29ms 2.16ms 1.75ms 1227/1017/951
resnet18-v2-7 224*224 2080ti 1.89ms 1.29ms 1.18ms 878/655/624
unet 512*512 2080ti 20.91ms 17.01ms 16.05ms 1334/766/744
retinaface_r50 512x512 2080ti 12.33ms 8.96ms 8.22ms 1189/745/678
mnet.25 512x512 2080ti 6.90ms 6.32ms 6.23ms 782/603/615

x64(inference / detect time)

model size gpu fp32(inference/detect) fp16(inference/detect) INT8(inference/detect) GPU(MB)(FP32/FP16/INT8)
centernet 512x512 2080ti 17.8ms/39.7ms 15.7ms/36.49ms 14.37ms/36.34ms 1839/1567/1563
centerface 640x640 2080ti 5.56ms/11.79ms 4.23ms/10.89ms / 854/646/640
centerface_bnmerged 640x640 2080ti 5.67ms/11.82ms 4.22ms/10.46ms / 850/651/645

windows10

Model and 3rdparty

model : https://drive.google.com/drive/folders/1KzBjmCOG9ghcq9L6-iqfz6QwBQq6Hl4_?usp=sharing or https://share.weiyun.com/td9CRDhW

3rdparty:https://drive.google.com/drive/folders/1SddUgQ5kGlv6dDGPqnVWZxgCoBY85rM2?usp=sharing or https://share.weiyun.com/WEZ3TGtb

API

struct Config
{
    std::string cfgFile = "configs/yolov3.cfg";

    std::string onnxModelpath = "configs/yolov3.onnx";

    std::string engineFile = "configs/yolov3.engine";

    std::string calibration_image_list_file = "configs/images/";

    std::vector<std::string> customOutput;

    int calibration_width = 0;

    int calibration_height = 0;
    
    int maxBatchSize = 1;

    int mode; //0,1,2

    //std::string calibration_image_list_file_txt = "configs/calibration_images.txt";
};

class YoloDectector
{
void init(Config config);
void detect(const std::vector<cv::Mat>& vec_image,
	std::vector<BatchResult>& vec_batch_result);
}

REFERENCE

https://github.com/onnx/onnx-tensorrt.git

https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleDynamicReshape

https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps

https://github.com/enazoe/yolo-tensorrt.git

https://github.com/zerollzeng/tiny-tensorrt.git

Contact

onnx_tensorrt_project's People

Contributors

ttanzhiqiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

onnx_tensorrt_project's Issues

More about installation

Hello,

Thanks for the great work!!!

Can one use ubuntu? or must it be on Windows?

If yes, please can you provide more information for non windows users.

Thanks once again

UNet training

Hi,

I trained a model with the public dataset but the result is strange. Could you please some tips for training.

Thanks.

I have one class and I set the classes param to 2

unet_model = Unet(encoder_name="resnet50", encoder_weights="imagenet", decoder_channels=(256, 128, 64, 32, 16),
                  in_channels=3, classes=2)

--width: 512
--height: 512
--epoch: 30
--batchsize: 2

dataset sample:
900 images
ISIC_0000000
ISIC_0000000_Segmentation

Result:
image

yolov5-v5的yolov5x模型,在python版本测试结果和该项目tensorrt下跑的结果不一致问题?

如题,不知道作者是否遇到过,训练完成的yolov5x模型,在python版本正确率为98%,但是转换为tensorrt后经过测试,正确率只有90%左右,其中模型转换过程log如下:
[09/27/2021-11:27:04] [I] Host Latency
[09/27/2021-11:27:04] [I] min: 11.3848 ms (end to end 21.3677 ms)
[09/27/2021-11:27:04] [I] max: 13.1256 ms (end to end 24.1753 ms)
[09/27/2021-11:27:04] [I] mean: 11.67 ms (end to end 21.9034 ms)
[09/27/2021-11:27:04] [I] median: 11.5836 ms (end to end 21.7285 ms)
[09/27/2021-11:27:04] [I] percentile: 12.5283 ms at 99% (end to end 23.6667 ms at 99%)
[09/27/2021-11:27:04] [I] throughput: 0 qps
[09/27/2021-11:27:04] [I] walltime: 3.03151 s
[09/27/2021-11:27:04] [I] Enqueue Time
[09/27/2021-11:27:04] [I] min: 1.04535 ms
[09/27/2021-11:27:04] [I] max: 4.6637 ms
[09/27/2021-11:27:04] [I] median: 1.61969 ms
[09/27/2021-11:27:04] [I] GPU Compute
[09/27/2021-11:27:04] [I] min: 10.8311 ms
[09/27/2021-11:27:04] [I] max: 12.5458 ms
[09/27/2021-11:27:04] [I] mean: 11.0955 ms
[09/27/2021-11:27:04] [I] median: 11.0142 ms
[09/27/2021-11:27:04] [I] percentile: 11.9821 ms at 99%
[09/27/2021-11:27:04] [I] total compute time: 3.01798 s
&&&& PASSED TensorRT.trtexec # trtexec.exe --onnx=best.onnx --saveEngine=best.engine --fp16

yolov5 量化int8怎么操作?

如题,我看源码是支持int8的,想尝试一下,但是不知道步骤,大神有具体的步骤吗?我尝试时老是失败,是需要先生成int8的模型吗?还是怎么操作呢?
image
image

yolov5_detector.cpp在多批次下报错

如题,yolov5_detector.cpp在推理阶段时如果输入的batch是多个的话(即大于1)
yolov5-detector输入多batch
那么会报以下错误:
yolov5-detector报错
请问有可能是什么原因导致的呢?

yolov3-ocr.cfg does not have down_stride

Hi, I tried to transfer yolo3-spp pt file to onnx, and here is the error:

Traceback (most recent call last):
File "Libtorch_yolo_to_onnx.py", line 779, in
main()
File "Libtorch_yolo_to_onnx.py", line 771, in main
model_def = builder.build_onnx_graph(
File "Libtorch_yolo_to_onnx.py", line 353, in build_onnx_graph
major_node_specs = self._make_onnx_node(layer_name, layer_dict)
File "Libtorch_yolo_to_onnx.py", line 426, in _make_onnx_node
node_creators[layer_type](layer_name, layer_dict)
File "Libtorch_yolo_to_onnx.py", line 729, in _make_yolo_node
down_stride = int(layer_dict['down_stride'])
KeyError: 'down_stride'

many thanks!

-Scott

how to run?

my windows eviroment is:
cuda 11.0
vs2019
how to run?
when i run, it tip me:
can not find the nvrtc64_111_0.dll

which third_model should i to recompile them?

Unet推理速度和占用显存问题

如题,我按照您的配置成功运行起来了unet.cpp,图像指定大小是512*512,显卡型号3060,mode是2(INT8)但是检测下来检测速度是在77ms左右,与您在此项目中的Benchmark中提到的16ms相差甚远,显存占用也有1.2G,请问根据您的经验来看我还有哪些地方没设置对?
Snipaste_2021-09-18_13-49-24

libtorch to onnx

is there a demo to convert libtorch nn::Module trained model to onnx model? python can convert it by torch.onnx.export, but in libtorch c++ can not export it. can you help me to solve this isssue?

yolo5生成失败:

reading calib cache: E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s.table
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
Detected 1 inputs and 7 output network tensors.
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
Starting Calibration.
dog.jpg 0
Calibrated batch 0 in 1.35348 seconds.
person.jpg 1
Calibrated batch 1 in 1.3612 seconds.
Post Processing Calibration data in 0.0005131 seconds.
Calibration completed in 20.8498 seconds.
reading calib cache: E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s.table
Writing Calibration Cache for calibrator: TRT-7203-MinMaxCalibration
writing calib cache: E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s.table size: 4711
TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.2
C:\source\rtSafe\cuda\cudaConvolutionRunner.cpp (483) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionRunner::executeConv: 8 (CUDNN_STATUS_EXECUTION_FAILED)
C:\source\rtSafe\cuda\cudaConvolutionRunner.cpp (483) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionRunner::executeConv: 8 (CUDNN_STATUS_EXECUTION_FAILED)
[2021-08-08 11:28:54.729] [info] serialize engine to E:\comm_Item\Item_done\onnx_tensorrt_pro\onnx_tensorrt_project-main\model\pytorch_onnx_tensorrt_yolov5\yolov5s_fp32_batch_1.engine
[2021-08-08 11:28:54.730] [error] engine is empty, save engine failed
[2021-08-08 11:28:54.731] [info] create execute context and malloc device memory...
[2021-08-08 11:28:54.731] [info] init engine...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.