Comments (22)
@makaveli10 I already converted the model to onnx and inferenced it on TensorRT.
However, this involved some special operations different than this repo does, and accordingly on TensorRT side needs some special operation to do. Overall, the TensorRT accelerated speed is about: 38ms with a 1280x768 input resolution, the performance is quite well:
you can add my wechat: jintianiloveu
if you intested in this accelerate tech.
from yolov5.
Output should look like this. You might want to git pull
also, we've made recent changes to onnx export.
...
%416 = Shape(%285)
%417 = Constant[value = <Scalar Tensor []>]()
%418 = Gather[axis = 0](%416, %417)
%419 = Shape(%285)
%420 = Constant[value = <Scalar Tensor []>]()
%421 = Gather[axis = 0](%419, %420)
%422 = Shape(%285)
%423 = Constant[value = <Scalar Tensor []>]()
%424 = Gather[axis = 0](%422, %423)
%427 = Unsqueeze[axes = [0]](%418)
%430 = Unsqueeze[axes = [0]](%421)
%431 = Unsqueeze[axes = [0]](%424)
%432 = Concat[axis = 0](%427, %439, %440, %430, %431)
%433 = Reshape(%285, %432)
%434 = Transpose[perm = [0, 1, 3, 4, 2]](%433)
return %output, %415, %434
}
Export complete. ONNX model saved to ./weights/yolov5s.onnx
View with https://github.com/lutzroeder/netron
from yolov5.
@jinfagang you should be able to export to onnx like this. Run this command from the /yolov5 directory.
export PYTHONPATH="$PWD"
python models/onnx_export.py --weights ./weights/yolov5s.pt --img 640 640 --batch 1
from yolov5.
@glenn-jocher turns out my model is trained on GPU, and the model serialized as cuda device, so the input does not to cuda throw this error.
However, when I force it to cuda, it still got error oppsite, seems some code inside model, still using CPU tensor instead.
is there any special reason for using cpu tensor there?
from yolov5.
the generated onnx default seems enabled augmentation.
How to obtain boxes and scores and class from these outputs?
from yolov5.
@jinfagang onnx export should only be done when the model is on cpu.
The netron image you show is correct. The boxes are part of the v5 architecture, they are not related to image augmentation during training.
At the moment onnx export stops at the output features. This is an example P3 output (smallest boxes) for 3 anchors with a grid size 40x24. The 85 features are xywh, objectness, and 80 class confidences.
from yolov5.
@jinfagang I ran into a cuda issue with an onnx export today, and pushed a fix 1e2cb6b for this. This may or may not solve your original issue.
from yolov5.
@glenn-jocher So the output is same with yolov3 in your previous repo? I wanna access the outputs and accelerate it in tensorrt.
from yolov5.
@glenn-jocher Does anchors decodee process can also exported into onnx? So that it can be more end2end when transfer into other paltforms for inference?
from yolov5.
@jinfagang yes, this would be more useful. It is more complicated to implement though, especially if you want a clean onnx graph. We will try to add this in the future.
from yolov5.
@glenn-jocher I had a tiny experiments on this, it ends involved a ScatterND op there, this op is hard to convert to other platforms. If we want eliminate this op, postprocess scripts (Detect layer here) need re-written (only for export mode, in a more complicated way but can export and works perfectly)
from yolov5.
@jinfagang I also ran into this issue. Resolved it by converting the model to cuda and then saving the weights. Used those weights to convert to onnx model but I ran into some issue in converting onnx to tensorRT.
If you successfully converted the model to TensorRT please let me know how you did that.
Thanks
from yolov5.
@jinfagang great work! What is the speedup compared to using detect.py? What GPU are you using?
from yolov5.
@glenn-jocher Am using GTX1080Ti, speed tested on this. The speed measured included post process time (from engine forward to nms and copy data back to CPU etc.). I think the speed is almost same with darknet version yolov4 converted tensorrt. (I previously tested with 800x800 input).
the speed can still be optimized by including all postprocess to cuda kernel and fp16 or int8 quantization
from yolov5.
@jinfagang amazing to see you got it running in such a short time. I'm able to convert the pth files to onnx format but I keep gettting this error when I try to convert to tensorrt6:
(Unnamed Layer* 0) [Slice]: slice is out of input range
While parsing node number 9 [Slice]:
3
If you have some pointers for me, I would really appreciate it. Connecting on WeChat is difficult for me cause I don't have an account and don't have a friend who can validate my new account.
from yolov5.
@jinfagang I dont have an account on WeChat. Neither I have a friend who can verify a new account. Can you please share your code to inference onnx on TensorRT somehow? I am getting incorrect outputs from the engine that I generated using onnx model.
from yolov5.
@makaveli10 mind sharing how you were able to generate an onnx model that worked with TensorRT? Also, which version of TRT did you use?
from yolov5.
@aj-ames https://github.com/TrojanXu/yolov5-tensorrt
Let me know if you make any sort of progress please!
from yolov5.
@makaveli10 thanks. I will update my findings here.
from yolov5.
@aj-ames https://github.com/TrojanXu/yolov5-tensorrt
Let me know if you make any sort of progress please!
I use this project. But I enconter the same error when I try to convert to tensorrt6:
[TensorRT] ERROR: (Unnamed Layer* 0) [Slice]: slice is out of input range
ERROR: Failed to parse the ONNX file.
If you have some pointers for me, I would really appreciate it.
from yolov5.
@glenn-jocher Am using GTX1080Ti, speed tested on this. The speed measured included post process time (from engine forward to nms and copy data back to CPU etc.). I think the speed is almost same with darknet version yolov4 converted tensorrt. (I previously tested with 800x800 input).
the speed can still be optimized by including all postprocess to cuda kernel and fp16 or int8 quantization
When I convert onnx to tensorrt, I enconter the same error when I try to convert to tensorrt6:
[TensorRT] ERROR: (Unnamed Layer* 0) [Slice]: slice is out of input range
ERROR: Failed to parse the ONNX file.
I use tensorrt 6.0 and onnx 1.5.0 or 1.6.0, they are all not work.
If you have some pointers for me, I would really appreciate it.
from yolov5.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from yolov5.
Related Issues (20)
- Exception: cannot instantiate 'WindowsPath' on your system. Cache may be out of date HOT 4
- Benefit of providing images larger than the training size? HOT 5
- How do I get Dice from val in the segmentation model? HOT 3
- Error loading self trained model HOT 4
- Image not found error HOT 1
- RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 32 for tensor number 1 in the list. HOT 1
- How to do instance segmention on video or streaming data HOT 2
- Multi-GPU train HOT 1
- No labels in D:\yolov5\datasets\img\train.cache. Can not train without labels HOT 2
- Manual Execution HOT 2
- Add ghost modules into tf.py for exporting yolov5s-ghost.pt to tensorflow saved_model or tflite HOT 2
- polygon annotation to object detection HOT 1
- FP16推理TensorRT报错,使用python export.py --weights yolov5s.onnx --include engine --half --device 0 HOT 2
- The prediction of Yolov5 HOT 2
- yolo:latest image opencv waiting "xcb" code error? HOT 11
- Similar Dataloader in yolov5 HOT 3
- Regarding predictions of yolov5 HOT 5
- Example "detect.py" get somesthing wrong HOT 3
- Extremely low precision but high mAP HOT 2
- Can yolov5 use as a part of commercial project , if so do we need to open-source the code or the whole project ? HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov5.