Code Monkey home page Code Monkey logo

jetsonyolov5's Introduction

Running YoloV5 with TensorRT Engine on Jetson.

This repository contains step by step guide to build and convert YoloV5 model into a TensorRT engine on Jetson. This has been tested on Jetson Nano or Jetson Xavier

Please install Jetpack OS version 4.6 as mentioned by Nvidia and follow below steps. Please follow each steps exactly mentioned in the video links below :

Build YoloV5 TensorRT Engine on Jetson Nano: https://www.youtube.com/watch?v=ErWC3nBuV6k

Object Detection YoloV5 TensorRT Engine on Jetson Nano: https://www.youtube.com/watch?v=-Vu65N1NRWw

Jetson Xavier:

Install Libraries

Please install below libraries::

$ sudo apt-get update
$ sudo apt-get install -y liblapack-dev libblas-dev gfortran libfreetype6-dev libopenblas-base libopenmpi-dev libjpeg-dev zlib1g-dev
$ sudo apt-get install -y python3-pip

Install below python packages

Numpy comes pre installed with Jetpack, so make sure you uninstall it first and then confirm if it's uninstalled or not. Then install below packages:

$ numpy==1.19.0
$ pandas==0.22.0
$ Pillow==8.4.0
$ PyYAML==3.12
$ scipy==1.5.4
$ psutil
$ tqdm==4.64.1
$ imutils

Install PyCuda

We need to first export few paths

$ export PATH=/usr/local/cuda-10.2/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
$ python3 -m pip install pycuda --user

Install Seaborn

$ sudo apt install python3-seaborn

Install torch & torchvision

$ wget https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl -O torch-1.10.0-cp36-cp36m-linux_aarch64.whl
$ pip3 install torch-1.10.0-cp36-cp36m-linux_aarch64.whl
$ git clone --branch v0.11.1 https://github.com/pytorch/vision torchvision
$ cd torchvision
$ sudo python3 setup.py install 

Not required but good library

sudo python3 -m pip install -U jetson-stats==3.1.4

This marks the installation of all the required libraries.


Generate wts file from pt file

Yolov5s.pt and Yolov5n.pt are already provided in the repo. But if you want you can download any other version of the yolov5 model. Then run below command to convert .pt file into .wts file

$ cd JetsonYoloV5
$ python3 gen_wts.py -w yolov5s.pt -o yolov5s.wts

Make

Create a build directory inside yolov5. Copy and paste generated wts file into build directory and run below commands. If using custom model, make sure to update kNumClas in yolov5/src/config.h

$ cd yolov5/
$ mkdir build
$ cd build
$ cp ../../yolov5s.wts .
$ cmake ..
$ make 

Build Engine file

$ ./yolov5_det -s yolov5s.wts yolov5s.engine s

Testing Engine file

$ ./yolov5_det -d yolov5s.engine ../images

This will do inferencing over images and output will be saved in build directory.


Python Object Detection

Use app.py to do inferencing on any video file or camera.

$ python3 app.py

If you have custom model, make sure to update categories as per your classes in yolovDet.py .

jetsonyolov5's People

Contributors

mailrocketsystems avatar

Stargazers

 avatar Seong Minsik avatar Şerafettin Doruk SEZER avatar  avatar  avatar  avatar Rajkumar Soni avatar  avatar Rakshith R avatar  avatar Hallvard avatar Đức Trí avatar  avatar Nguyen Hoang Viet avatar heckzi avatar  avatar Ryan_wu avatar hto-pu avatar shibo avatar Mughees Ahmad avatar Hoang Pham avatar

Watchers

 avatar

jetsonyolov5's Issues

[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin YoloLayer_TRT version 1

I have followed the tutorial and run successfully with "./yolo_det -d custom_model.engine ../images" but when I tried with "python3 app.py", this error occurs:
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin YoloLayer_TRT version 1
[TensorRT] ERROR: safeDeserializationUtils.cpp (323) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
Traceback (most recent call last):
File "app.py", line 7, in
model = YoloTRT(library="yolov5/build/libmyplugins.so", engine="yolov5/build/license_plate_detection.engine", conf=0.5, yolo_ver="v5")
File "/home/jetson/Documents/JetsonYolov5_backup/yoloDet.py", line 39, in init
self.batch_size = self.engine.max_batch_size
AttributeError: 'NoneType' object has no attribute 'max_batch_size'

Segmentation Fault while using ROS melodic for Inferencing using TensorRT on Jetson Nano

Hello everyone,

I am currently working on a project using a Jetson Nano where I am trying to perform inference using TensorRT (Yolov5 object detection) in a ROS (Melodic) node. I have encountered a Bus Error(Core dumped) / segmentation fault (core dumped), which seems to arise from a memory conflict between ROS and TensorRT. I am seeking advice or solutions from anyone who might have faced and resolved a similar issue.

I have provided the code to my inference node below. I would greatly appreciate any insights, suggestions, or solutions.

Environment Details:

Jetpack Version: 4.6.
ROS Version: ROS Melodic
TensorRT Version: 8.2.1.8
PyCUDA: 2019.1.2
Torch version: 1.10.0
CUDA: 10.2

`import cv2
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import random
import ctypes
import pycuda.driver as cuda
import time

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
host_inputs = []
cuda_inputs = []
host_outputs = []
cuda_outputs = []
bindings = []

class YoloTRT():
def init(self, library, engine, conf, yolo_ver):
self.CONF_THRESH = conf
self.IOU_THRESHOLD = 0.4
self.LEN_ALL_RESULT = 38001
self.LEN_ONE_RESULT = 38
self.yolo_version = yolo_ver
self.categories = ["pedestrian", "people", "bicycle", "car", "van", "truck", "tricycle", "awning-tricycle", "bus", "motor"]
'''
self.categories = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush"]
'''
TRT_LOGGER = trt.Logger(trt.Logger.INFO)

    ctypes.CDLL(library)

    with open(engine, 'rb') as f:
        serialized_engine = f.read()

    runtime = trt.Runtime(TRT_LOGGER)
    self.engine = runtime.deserialize_cuda_engine(serialized_engine)
    self.batch_size = self.engine.max_batch_size

    for binding in self.engine:
        size = trt.volume(self.engine.get_binding_shape(binding)) * self.batch_size
        dtype = trt.nptype(self.engine.get_binding_dtype(binding))
        host_mem = cuda.pagelocked_empty(size, dtype)
        cuda_mem = cuda.mem_alloc(host_mem.nbytes)

        bindings.append(int(cuda_mem))
        if self.engine.binding_is_input(binding):
            self.input_w = self.engine.get_binding_shape(binding)[-1]
            self.input_h = self.engine.get_binding_shape(binding)[-2]
            host_inputs.append(host_mem)
            cuda_inputs.append(cuda_mem)
        else:
            host_outputs.append(host_mem)
            cuda_outputs.append(cuda_mem)

def PreProcessImg(self, img):
    image_raw = img
    h, w, c = image_raw.shape
    image = cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB)
    r_w = self.input_w / w
    r_h = self.input_h / h
    if r_h > r_w:
        tw = self.input_w
        th = int(r_w * h)
        tx1 = tx2 = 0
        ty1 = int((self.input_h - th) / 2)
        ty2 = self.input_h - th - ty1
    else:
        tw = int(r_h * w)
        th = self.input_h
        tx1 = int((self.input_w - tw) / 2)
        tx2 = self.input_w - tw - tx1
        ty1 = ty2 = 0
    image = cv2.resize(image, (tw, th))
    image = cv2.copyMakeBorder(image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, None, (128, 128, 128))
    image = image.astype(np.float32)
    image /= 255.0
    image = np.transpose(image, [2, 0, 1])
    image = np.expand_dims(image, axis=0)
    image = np.ascontiguousarray(image)
    return image, image_raw, h, w

def Inference(self, img):
    input_image, image_raw, origin_h, origin_w = self.PreProcessImg(img)
    np.copyto(host_inputs[0], input_image.ravel())
    stream = cuda.Stream()
    self.context = self.engine.create_execution_context()
    cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
    t1 = time.time()
    self.context.execute_async(self.batch_size, bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
    stream.synchronize()
    t2 = time.time()
    output = host_outputs[0]
            
    for i in range(self.batch_size):
        result_boxes, result_scores, result_classid = self.PostProcess(output[i * self.LEN_ALL_RESULT: (i + 1) * self.LEN_ALL_RESULT], origin_h, origin_w)
        
    det_res = []
    for j in range(len(result_boxes)):
        box = result_boxes[j]
        det = dict()
        det["class"] = self.categories[int(result_classid[j])]
        det["conf"] = result_scores[j]
        det["box"] = box 
        det_res.append(det)
        self.PlotBbox(box, img, label="{}:{:.2f}".format(self.categories[int(result_classid[j])], result_scores[j]),)
    return det_res, t2-t1

def PostProcess(self, output, origin_h, origin_w):
    num = int(output[0])
    if self.yolo_version == "v5":
        pred = np.reshape(output[1:], (-1, self.LEN_ONE_RESULT))[:num, :]
        pred = pred[:, :6]
    elif self.yolo_version == "v7":
        pred = np.reshape(output[1:], (-1, 6))[:num, :]
    
    boxes = self.NonMaxSuppression(pred, origin_h, origin_w, conf_thres=self.CONF_THRESH, nms_thres=self.IOU_THRESHOLD)
    result_boxes = boxes[:, :4] if len(boxes) else np.array([])
    result_scores = boxes[:, 4] if len(boxes) else np.array([])
    result_classid = boxes[:, 5] if len(boxes) else np.array([])
    return result_boxes, result_scores, result_classid

def NonMaxSuppression(self, prediction, origin_h, origin_w, conf_thres=0.5, nms_thres=0.4):
    boxes = prediction[prediction[:, 4] >= conf_thres]
    boxes[:, :4] = self.xywh2xyxy(origin_h, origin_w, boxes[:, :4])
    boxes[:, 0] = np.clip(boxes[:, 0], 0, origin_w -1)
    boxes[:, 2] = np.clip(boxes[:, 2], 0, origin_w -1)
    boxes[:, 1] = np.clip(boxes[:, 1], 0, origin_h -1)
    boxes[:, 3] = np.clip(boxes[:, 3], 0, origin_h -1)
    confs = boxes[:, 4]
    boxes = boxes[np.argsort(-confs)]
    keep_boxes = []
    while boxes.shape[0]:
        large_overlap = self.bbox_iou(np.expand_dims(boxes[0, :4], 0), boxes[:, :4]) > nms_thres
        label_match = boxes[0, -1] == boxes[:, -1]
        # Indices of boxes with lower confidence scores, large IOUs and matching labels
        invalid = large_overlap & label_match
        keep_boxes += [boxes[0]]
        boxes = boxes[~invalid]
    boxes = np.stack(keep_boxes, 0) if len(keep_boxes) else np.array([])
    return boxes

def xywh2xyxy(self, origin_h, origin_w, x):
    y = np.zeros_like(x)
    r_w = self.input_w / origin_w
    r_h = self.input_h / origin_h
    if r_h > r_w:
        y[:, 0] = x[:, 0] - x[:, 2] / 2
        y[:, 2] = x[:, 0] + x[:, 2] / 2
        y[:, 1] = x[:, 1] - x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2
        y[:, 3] = x[:, 1] + x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2
        y /= r_w
    else:
        y[:, 0] = x[:, 0] - x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2
        y[:, 2] = x[:, 0] + x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2
        y[:, 1] = x[:, 1] - x[:, 3] / 2
        y[:, 3] = x[:, 1] + x[:, 3] / 2
        y /= r_h
    return y

def bbox_iou(self, box1, box2, x1y1x2y2=True):
    if not x1y1x2y2:
        # Transform from center and width to exact coordinates
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        # Get the coordinates of bounding boxes
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    inter_rect_x1 = np.maximum(b1_x1, b2_x1)
    inter_rect_y1 = np.maximum(b1_y1, b2_y1)
    inter_rect_x2 = np.minimum(b1_x2, b2_x2)
    inter_rect_y2 = np.minimum(b1_y2, b2_y2)
    inter_area = np.clip(inter_rect_x2 - inter_rect_x1 + 1, 0, None) * \
                 np.clip(inter_rect_y2 - inter_rect_y1 + 1, 0, None)
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)

    iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)

    return iou

def PlotBbox(self, x, img, color=None, label=None, line_thickness=None):
    tl = (line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1)  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA,)

I am using the following node(camera.py) to give the input (both the above node and this node are in the same rospackage):
import sys
import cv2
import imutils
import time
from yoloDet import YoloTRT

use path for library and engine file

model = YoloTRT(library="yolov5/build/libmyplugins.so", engine="yolov5/build/yolov5s_1504.engine", conf=0.5, yolo_ver="v5")

cap = cv2.VideoCapture("./DJI_trimmed.mp4")
frame_process_time = []

while True:
ret, frame = cap.read()
if not ret:
print("Failed to read the frame or end of video reached")
break
frame = imutils.resize(frame, width=1504)
detections, t = model.Inference(frame)
for obj in detections:
print(obj['class'], obj['conf'], obj['box'])
print("FPS: {} sec".format(1/t))

frame_process_time.append(t)  # Store the time taken for this frame
cv2.imshow("Output", frame)
key = cv2.waitKey(1)
if key == ord('q'):
    break

cap.release()
cv2.destroyAllWindows()

Calculate Average FPS

if frame_process_time:
average_time_per_frame = sum(frame_process_time) / len(frame_process_time)
average_fps = 1.0 / average_time_per_frame
print(f"Average FPS: {average_fps}")
else:
print("No frames were processed.")`

pycuda._driver.LogicError For camera stream

Traceback (most recent call last):
File "app.py", line 32, in
detections, t = model.Inference(frame)
File "/jetson-inference/data/networks/JetsonYolov5/yoloDet.py", line 93, in Inference
stream = cuda.Stream()
pycuda._driver.LogicError: cuStreamCreate failed: context is destroyed

model runs perfectly with local file and when I feed my live camera I am getting this above error. Can someone guide me about this issue. I appreciate your help

Yolov5 Segmentation on Jetson

@mailrocketsystems I need to implement yolov5 segmentation on jetson tx2. I followed your repo for yolov5 detection on jetson. For segmentation I think i need to create a script similar to YoloDet.py. Are you working on that or can you guide on how to implement yolov5 segmentation on jetson?

for YOLOv5m model

Hi So the repo is initially made for yolov5 small model ryt it works perfectly fine for the small model but when we use it to generate weights for yolov5 Medium (M) model it give many errors any parameters we should change before using it ?

performance drastically drops over time

Hi,
having followed the tutorial and all of the steps, I've managed to read the image from my USB camera on Jetson Nano 4GB. However, the longer it runs, the slower it gets and finally crashes the app (talking about ~10 seconds here). I'm using xrdp to run it and have a look at the output.
Are there any ways to optimize it and make it less laggy and infer the frames faster? I need it for person following.

Thanks in advance

How to get bounding box coordinates and set maximum detection?

Hi, i'm currently building a project for detecting ball with real time object detection on jetson platform. The code i'm currently using is works perfectly fine but i need the bounding box coordinates and set the maximum detection to only detect 3 ball, can you help me with that? Thanks

Unable to convert wts file to engine file

For Only person and head detection i have downloaded the crowd human yolov5 pt file , and successfully converted that pt file to wts file but while converting that wts file to engine file i got error, core dumped (aborted) that yolov5 pt file i have downloaded that is medium architecture. i want to run crowd human weight file.Please give me some suggestion how can i convert this file through rocket system.
Thanks in Advance

cannot able to generate wts file for custom train model yolov5n

I was trying to convert the custom trained model into wts file but it is not working. using the below command
python3 gen_wts.py -w yolov5s.pt -o yolov5s.wts
It is showing following error

Traceback (most recent call last):
File "gen_wts.py", line 43, in
delattr(model.model[-1], 'anchor_grid') # model.model[-1] is detect layer
File "/home/cnhi/models/TRT/TRT_env_2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1685, in delattr
super().delattr(name)
AttributeError: anchor_grid

If we comment out the line it shows the next error
delattr(model.model[-1], 'anchor_grid') # model.model[-1] is detect layer
Traceback (most recent call last):
File "gen_wts.py", line 46, in
model.model[-1].register_buffer("strides", model.model[-1].stride)
File "/home/cnhi/models/TRT/TRT_env_2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 523, in register_buffer
raise KeyError("attribute '{}' already exists".format(name))
KeyError: "attribute 'strides' already exists"

If comment this line also it is generating wts file but while converting wts to engine it is showing error
missing strides

But when I tried to converted the pretrained coco yolov5n model it get converted into wts and in engine and working fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.