abhyantrika / nanonets_object_tracking Goto Github PK

View Code? Open in Web Editor NEW

209.0 4.0 120.0 13.56 MB

Home Page: https://nanonets.com/blog/object-tracking-deepsort/

Python 100.00%

deepsort yolo python object-tracking detection kalman-filter siamese-network

nanonets_object_tracking's Introduction

Object Tracking

Installation: Use this command to install all the necessary packages. Note that we are using python3

Link to the blog click here

pip install -r requirements.txt

This module is built on top of the original deep sort module https://github.com/nwojke/deep_sort Since, the primary objective is to track objects, We assume that the detections are already available to us, for the given video. The det/ folder contains detections from Yolo, SSD and Mask-RCNN for the given video.

deepsort.py is our bridge class that utilizes the original deep sort implementation, with our custom configs. We simply need to specify the encoder (feature extractor) we want to use and pass on the detection outputs to get the tracked bounding boxes. test_on_video.py is our example code, that runs deepsort on a video whose detection bounding boxes are already given to us.

A simplified overview:

#Initialize deep sort object.
deepsort = deepsort_rbc(wt_path='ckpts/model640.pt') #path to the feature extractor model.

#Obtain all the detections for the given frame.
detections,out_scores = get_gt(frame,frame_id,gt_dict)

#Pass detections to the deepsort object and obtain the track information.
tracker,detections_class = deepsort.run_deep_sort(frame,out_scores,detections)

#Obtain info from the tracks.
for track in tracker.tracks:
    bbox = track.to_tlbr() #Get the corrected/predicted bounding box
    id_num = str(track.track_id) #Get the ID for the particular track.
    features = track.features #Get the feature vector corresponding to the detection.

The tracker object returned by deepsort contains all necessary info like the track_id, the predicted bounding boxes and the corresponding feature vector of the object.

Download the test video from here.

The pre-trained weights of the feature extractor are present in ckpts/ folder. With the video downloaded and all packages installed correctly, you should be able to run the demo with

python test_on_video.py

If you want to train your own feature extractor, proceed to the next section.

Training a custom feature extractor

Since, the original deepsort focused on MARS dataset, which is based on people, the feature extractor is trained on humans. We need an equivalent feature extractor for vehicles. We shall be training a Siamese network for the same. More info on siamese nets can be found here and here

We have a training and testing set, extracted from the NVIDIA AI city Challenge dataset. You can download it from here.

Extract the crops and crops_test folders in the same working directory. Both folders have 184 different sub-folders, each of which contains crops of a certain vehicle, shot in various views. Once, the folders have been extracted, we can go through the network configurations and the various options in siamese_net.py and siamese_dataloader.py. If satisfied, we can start the training process by:

python siamese_train.py

The trained weights will be stored in ckpts/ folder. We can use python siamese_test.py to test the accuracy of the trained model. Once trained, this model can be plugged in to our deepsort class instance.

nanonets_object_tracking's People

Contributors

Stargazers

Watchers

Forkers

lqchien ek9852 vishalchak aniljetti07 waiyannyeinnaing yaosman ssbagalkar doutdex sanjanaudayashankar johnnylord ingeniousfrog turgunyusuf neerajgulia 1026295417 tiberium24 cow5566bad cxz compliceu icgog sadjadasghari khanhcon tomaszpanek for-aiur rashidch gaopeng5 epsilonz zilipeng edhenry jalywang123 varma88 lakyg khoronus sarimmehdi hulingpakakak rdj94 akshu281 michaeldu1 faisalshahbaz zeta1999 renping0535 lanshiyinging deancavaliere xkotaro raymond513 rafiuddinkhan harut0726 rezaa89 xompass alexsisu heejae1213 juanlp stevearonson davidloq tuanthi danhlephuoc lhphong1409 xwentian2013 allenysl kanasite navpreetnp7 wahyurahmaniar alejandroteran197 hasshir1 bhagya327 ahmad4633 bkb-med viplix3 choodly olemeyer drapado rrace001 msminhas93 codenamewei v-italy biankaursul wmarkwilkinson praveen94 youyongquan xjohnxjohn amitchavda17 johannakarras muhammedakyuzlu mahsaasd alexmuresan mert-cihangiroglu madewithstone kwlee0220 pluxus8480 adblu chalmers-revere aleksdominik tienhoangvan wijjj alinn5718 zyerusha adithyag123 purampilum benjaminchun xyqfountain anjalipc

nanonets_object_tracking's Issues

Why did you choose Sieamese network for appearance vectors

Hi, I really appreciate for your code.
It helped me to understand better for the deep SORT algorithm!

I want to know why did you choose Siamese network for appearance feature extraction.

The original paper is mentioning that they have used wide residual network for the feature.

As far as I know, the Siamese network is specialized in a few-shot learning.

Thanks in advance.

how to train custom dataset using deep sortmodel?

@abhyantrika , hope you are doing great. I have a question on "how to train the deep sort model using custom dataset" . Let's say, I have 3 classes namely car, motorbike, bus.
I have a doubt about creating the folders here.

|__car
|    |__car_image1.jpg
|    |__car_image2.jpg
|    |__car_image3.jpg
|__bus
|    |__bus_image1.jpg
|    |__bus_image2.jpg
|    |__bus_image3.jpg
|__motorbike
|    |__motorbike_image1.jpg
|    |__motorbike_image2.jpg
|    |__motorbike_image3.jpg

I have another doubt when I downloaded the example from this repository under the crops folder there are 329 folders and each folder contains samples of the same car images at a different angle. Here my questions raise, do we need to create a folder for each and every car like a separate folder green colour car and place the images of the green colour car inside it. And a separate folder for red colour car and place the images of the red colour car inside it. should it be like the below folder structure?

|__car
|    |_green_car
|    |  |__green_car_image1.jpg
|    |  |__green_car_image2.jpg
|    |  |__green_image3.jpg
|    |_red_car
|    |  |__red_car_image1.jpg
|    |  |__red_car_image2.jpg
|    |  |__red_car_image3.jpg

Last question, what would be the minimum number of images to be considered for each class?

How to evaluate the accuracy of tracking?

This below is an example of a detection result which is the input for deep_sort.
Later on, the second digit will be changed into an object ID.

In some cases, the tracker failed (wrongly assigned object ID).
It also often makes new IDs for the reappeared object.
What I need to know is how to evaluate the accuracy of tracking so that we can compare this algorithm with other algorithms?

Any help would be appreciated.

1,-1,613.5829,88.5738,10.7212,10.5208,-1,-1,-1
1,-1,159.3612,412.3874,11.2697,9.9765,-1,-1,-1
1,-1,612.6691,227.6705,11.7314,9.7182,-1,-1,-1
1,-1,484.7934,39.8948,12.2918,10.8645,-1,-1,-1
1,-1,268.6557,315.8924,11.2140,10.8330,-1,-1,-1
..
2,-1,568.5021,400.1530,10.8805,10.7782,-1,-1,-1
2,-1,135.4661,29.4275,11.4167,9.2357,-1,-1,-1
2,-1,564.1170,315.1545,13.9052,9.7275,-1,-1,-1
2,-1,503.0762,435.7337,11.3504,9.8643,-1,-1,-1
2,-1,611.6464,111.1447,10.4282,12.9108,-1,-1,-1
..

The dets is not match the video

I use the det_ssd512.txt deections, and download the video from readme
I found that video have (211(s)*10(FPS)=2110 frames), but in file, only 1995 frames
I write the rectangle with cv2, but it didn't select the cars

MOT results sucks using siamese network

I test deepsort + siamese network under MOT eval, here is the results below:

It hardly show any improvements than official deepsort results, i am confused that whether i use it properly.
I will be much appreciate if anyone who can help me, thank you in advance.

How to implement lambda in equation 5?

It is mentioned official deepsort repo's issues that the author's implementation doesn't contain any lambda(lambda=0).
nwojke/deep_sort#112 (comment)
I want do some experiments with that equation.

Can anyone help me to include that equation into the code.

Is there permission to share NVIDIA AI city dataset

As far as I know, the dataset is not open access. Is it okay to share it?

Video is no longer available

this is what I get when I try to open the video link from the ReadME file.

How do you use Yolo v4-5-6-7 versions for detections?

Hi, just wanted to ask that if I want to test the versions of YOLO, then what is the format for conversion? The newer versions give results like:

as .txt

17 0.100259 0.460938 0.200517 0.199219
17 0.536223 0.521484 0.109961 0.222656

and as numpy array

array([[          0,         183,         318,         411,     0.95672,          17],
       [        433,         213,         596,         347,       0.954,          17]]

how to pass ditections from an object detector to deep sort

Hi,

I am working on a project where i need to track objects. the hurdle i am facing is how i can adopt this detection code to incorporate the deepsort tracking code, by passing detection.


import os
# comment out below line to enable tensorflow logging outputs
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import time
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
from absl import app, flags, logging
from absl.flags import FLAGS
import core.utils as utils
from core.yolov4 import filter_boxes
from tensorflow.python.saved_model import tag_constants
from core.config import cfg
from PIL import Image
import cv2
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
# deep sort imports
from deep_sort import preprocessing, nn_matching
from deep_sort.detection import Detection
from deep_sort.tracker import Tracker
from tools import generate_detections as gdet
flags.DEFINE_string('framework', 'tf', '(tf, tflite, trt')
flags.DEFINE_string('weights', './checkpoints/yolov4-416',
                    'path to weights file')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_boolean('tiny', False, 'yolo or yolo-tiny')
flags.DEFINE_string('model', 'yolov4', 'yolov3 or yolov4')
flags.DEFINE_string('video', './data/video/test.mp4', 'path to input video or set to 0 for webcam')
flags.DEFINE_string('output', None, 'path to output video')
flags.DEFINE_string('output_format', 'XVID', 'codec used in VideoWriter when saving video to file')
flags.DEFINE_float('iou', 0.45, 'iou threshold')
flags.DEFINE_float('score', 0.50, 'score threshold')
flags.DEFINE_boolean('dont_show', False, 'dont show video output')
flags.DEFINE_boolean('info', False, 'show detailed info of tracked objects')
flags.DEFINE_boolean('count', False, 'count objects being tracked on screen')

def main(_argv):
    # Definition of the parameters
    max_cosine_distance = 0.4
    nn_budget = None
    nms_max_overlap = 0.8
    
    # initialize deep sort
    model_filename = 'model_data/mars-small128.pb'
    encoder = gdet.create_box_encoder(model_filename, batch_size=1)
    # calculate cosine distance metric
    metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget)
    # initialize tracker
    tracker = Tracker(metric)

    # load configuration for object detector
    config = ConfigProto()
    config.gpu_options.allow_growth = True
    session = InteractiveSession(config=config)
    STRIDES, ANCHORS, NUM_CLASS, XYSCALE = utils.load_config(FLAGS)
    input_size = FLAGS.size
    video_path = FLAGS.video

    # load tflite model if flag is set
    if FLAGS.framework == 'tflite':
        interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
        interpreter.allocate_tensors()
        input_details = interpreter.get_input_details()
        output_details = interpreter.get_output_details()
        print(input_details)
        print(output_details)
    # otherwise load standard tensorflow saved model
    else:
        saved_model_loaded = tf.saved_model.load(FLAGS.weights, tags=[tag_constants.SERVING])
        infer = saved_model_loaded.signatures['serving_default']

    # begin video capture
    try:
        vid = cv2.VideoCapture(int(video_path))
    except:
        vid = cv2.VideoCapture(video_path)

    out = None

    # get video ready to save locally if flag is set
    if FLAGS.output:
        # by default VideoCapture returns float instead of int
        width = int(vid.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = int(vid.get(cv2.CAP_PROP_FPS))
        codec = cv2.VideoWriter_fourcc(*FLAGS.output_format)
        out = cv2.VideoWriter(FLAGS.output, codec, fps, (width, height))

    frame_num = 0
    # while video is running
    while True:
        return_value, frame = vid.read()
        if return_value:
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            image = Image.fromarray(frame)
        else:
            print('Video has ended or failed, try a different video format!')
            break
        frame_num +=1
        # print('Frame #: ', frame_num)
        frame_size = frame.shape[:2]
        image_data = cv2.resize(frame, (input_size, input_size))
        image_data = image_data / 255.
        image_data = image_data[np.newaxis, ...].astype(np.float32)
        start_time = time.time()

        # run detections on tflite if flag is set
        if FLAGS.framework == 'tflite':
            interpreter.set_tensor(input_details[0]['index'], image_data)
            interpreter.invoke()
            pred = [interpreter.get_tensor(output_details[i]['index']) for i in range(len(output_details))]
            # run detections using yolov3 if flag is set
            if FLAGS.model == 'yolov3' and FLAGS.tiny == True:
                boxes, pred_conf = filter_boxes(pred[1], pred[0], score_threshold=0.25,
                                                input_shape=tf.constant([input_size, input_size]))
            else:
                boxes, pred_conf = filter_boxes(pred[0], pred[1], score_threshold=0.25,
                                                input_shape=tf.constant([input_size, input_size]))
        else:
            batch_data = tf.constant(image_data)
            pred_bbox = infer(batch_data)
            for key, value in pred_bbox.items():
                boxes = value[:, :, 0:4]
                pred_conf = value[:, :, 4:]

        boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
            boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
            scores=tf.reshape(
                pred_conf, (tf.shape(pred_conf)[0], -1, tf.shape(pred_conf)[-1])),
            max_output_size_per_class=50,
            max_total_size=50,
            iou_threshold=FLAGS.iou,
            score_threshold=FLAGS.score
        )

        # convert data to numpy arrays and slice out unused elements
        num_objects = valid_detections.numpy()[0]
        bboxes = boxes.numpy()[0]
        bboxes = bboxes[0:int(num_objects)]
        scores = scores.numpy()[0]
        scores = scores[0:int(num_objects)]
        classes = classes.numpy()[0]
        classes = classes[0:int(num_objects)]

        # format bounding boxes from normalized ymin, xmin, ymax, xmax ---> xmin, ymin, width, height
        original_h, original_w, _ = frame.shape
        bboxes = utils.format_boxes(bboxes, original_h, original_w)

        # store all predictions in one parameter for simplicity when calling functions
        pred_bbox = [bboxes, scores, classes, num_objects]

        # read in all class names from config
        class_names = utils.read_class_names(cfg.YOLO.CLASSES)

        # by default allow all classes in .names file
        allowed_classes = list(class_names.values())
        
        # custom allowed classes (uncomment line below to customize tracker for only people)
        #allowed_classes = ['person']

        # loop through objects and use class index to get class name, allow only classes in allowed_classes list
        names = []
        deleted_indx = []
        for i in range(num_objects):
            class_indx = int(classes[i])
            class_name = class_names[class_indx]
            if class_name not in allowed_classes:
                deleted_indx.append(i)
            else:
                names.append(class_name)
        names = np.array(names)
        count = len(names)
        if FLAGS.count:
            cv2.putText(frame, "Objects being tracked: {}".format(count), (5, 35), cv2.FONT_HERSHEY_COMPLEX_SMALL, 2, (0, 255, 0), 2)
            print("Objects being tracked: {}".format(count))
        # delete detections that are not in allowed_classes
        bboxes = np.delete(bboxes, deleted_indx, axis=0)
        scores = np.delete(scores, deleted_indx, axis=0)

        # encode yolo detections and feed to tracker
        features = encoder(frame, bboxes)
        detections = [Detection(bbox, score, class_name, feature) for bbox, score, class_name, feature in zip(bboxes, scores, names, features)]
        # run non-maxima supression
        boxs = np.array([d.tlwh for d in detections])
        scores = np.array([d.confidence for d in detections])
        classes = np.array([d.class_name for d in detections])
        indices = preprocessing.non_max_suppression(boxs, classes, nms_max_overlap, scores)
        detections = [detections[i] for i in indices]       
        # Call the tracker
        tracker.predict()
        tracker.update(detections)
        
        cmap = plt.get_cmap('tab20b')
        colors = [cmap(i)[:3] for i in np.linspace(0,1,20)]

        # update tracks
        for track in tracker.tracks:
            if not track.is_confirmed() or track.time_since_update > 1:
                continue 
            bbox = track.to_tlbr()
            class_name = track.get_class()
            #frame = dd(bbox,frame,track,class_name,frame_num)
                    
        # draw bbox on screen
            color = colors[int(track.track_id) % len(colors)]
            color = [i * 255 for i in color]
            cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), color, 1)
            cv2.rectangle(frame, (int(bbox[0]), int(bbox[1]-30)), (int(bbox[0])+(len(class_name)+len(str(track.track_id)))*17, int(bbox[1])), color, -1)
            cv2.putText(frame, class_name + "-" + str(track.track_id),(int(bbox[0]), int(bbox[1]-10)),0, 0.75, (255,255,255),2)

            center = ((bbox[0]+abs(bbox[0]-bbox[2])/2), (bbox[1]+abs(bbox[1]-bbox[3])/2))
        # if enable info flag then print details about each track
            if FLAGS.info:
                print("Tracker ID: {}, Class: {},  BBox Coords (xmin, ymin, xmax, ymax): {}".format(str(track.track_id), class_name, (int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3]))))


        # calculate frames per second of running detections
        fps = 1.0 / (time.time() - start_time)
        print("FPS: %.2f" % fps)
        result = np.asarray(frame)
        result = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
        
        # if not FLAGS.dont_show:
            # cv2.imshow("Output Video", result)
        
        # if output flag is set, save video file
        # if FLAGS.output:
        #     out.write(result)
        if cv2.waitKey(1) & 0xFF == ord('q'): break
    cv2.destroyAllWindows()

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

The way the deepsort is incorporated here is different to what is in this repo.

id really appreciate any advice.

Thank you

Interpolation in frames with no detection

Hi, thanks for the awesome work!
I have a video and the extracted detections miss many frames. Is there a way to handle this?
Thanks!

Training time

Could you comment on how long it takes for you to train the model on a single GPU? It seems for me to take very long especially the beginning of an epoch almost hangs for many minutes.

Train on my own vehicles dataset

Can I change the dataset for training this feature extractor. Because my project includes motorbikes, bus, trucks.
Thank you

Video and detection does not match.

It seems like the video linked to in the README.md (https://drive.google.com/open?id=1h2Wnb98tDVB6JlCDNQXCeZpG20x6AiZ2) does not match with the detections in Nanonets_object_tracking/det/.

In each of the det_*.txt files there are 1955 frames and the video consist of 2110 frames.
This is also confirmed visually (Bounding boxes (detections) are not matching where the cars actually are) when using either the given model640.pt or a self-trained feature extractor on the given data and the program crashes when trying to process frame 1956 (for good reason).

Is there a new video or what is going on here?

No detection bug

in case of "no detection"
if out_boxes==[]:
self.tracker.predict()
print('No detections')
trackers = self.tracker.tracks
return trackers

And I got
ValueError: not enough values to unpack (expected 2, got 1)

Can you control it ? @abhyantrika

ValueError: operands could not be broadcast together with shapes (540,960,3) (1080,1920,3)

frame = frame * mask
ValueError: operands could not be broadcast together with shapes (540,960,3) (1080,1920,3)

Im getting the above error after giving a Video as input from MARS dataset, to run test_on_video.py file

has anyone tired this with custom dataset?

I would to understand folder structure. can someone help me with the folder structure?

detection file generation and what does each column signify?

Hi, I have q question regarding the detection files from YOLO, Fast-RCNN. How was it generated?
As for example in det_yolo3.txt,
1,-1,1094.277,257.304,63.898,62.347,0.400,-1,-1,-1, what does -1 signify?
I am guessing that "1" signifies the trackID and the floating numbers following "1" signify the positions of the bounding box and the confidence score. Where is the frame number, class?
I have trained my dataset on yolo and was hoping to use the code for tracking. Your help in this regard is really appreciated

Has anyone compared this with dlib correlation tracker?

Which one is superior in terms of FPS and accuracy?

Worse result than the original repo

hi, I have implemented and compared the result on vehicle tracking & counting and see that the original deep sort performs better. .,i.e,It switches ID in very short occlusions.
This repo re-trained cnn to extract embedding for vehicles. From this, it must be better performance than the original (cnn for people features).
Should I fine-turn any hyperparameters or somethings? Thank you

ERROR: Could not find a version that satisfies the requirement torch==1.0.1.post2

Trying to run the README.md

Used pip install -r requirements.txt and got:

ERROR: Could not find a version that satisfies the requirement torch==1.0.1.post2 (from -r requirements.txt (line 6)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.0.1.post2 (from -r requirements.txt (line 6))

Any thoughts/solutions?

No module named 'siamese_net'

@abhyantrika Hello, I am trying to load the model in a separate python file but keep facing this issue : No module named 'siamese_net' .. Any tips on that ?

Test video not available

running on custom dataset with no multiple views

Hi, i am currently trying to track grape bunches on a vineyard row. I have a dataset for training a maskrcnn model, so my idea was to just crop the grape bunches from my dataset, and feed it to the siamese network. So the problem is that for every bunch I have a single camera view, instead of multiple ones.
Do you think that deep SORT method is not useful in my case or I can apply deep SORT with your implementation even in my case?

Video file not available

Can someone please provide the vdo.avi as it has been removed from the repository.

model is saved after what epoch ?

Started training first model that was saved was model0.pt, Now after how many epochs another model is going to be saved??

Thanks

How to generate the detections text file?

Hi, can anyone suggest how to generate detections text file? I used detectron2 but not sure how to output those detections to a text file. Any help would be appreciated.

NVIDIA AI trained weights

Are the checkpoints in ckpts/ trained on the NVIDIA AI City dataset?

Testing on faster rcnn model

I have trained a model on faster rcnn to detect the cycle riders in a frame.
I want to assign an unique id to each rider and track their movement with each frame and increase the count once they cross a assigned finishing line
Can please someone let me know how can I use this repository for my use case

Thanks in advance!

version compatibility issue

Hi @abhyantrika , I have tried to install the requirements.txt file, but unfortunately i'm unable install the the python packages. get below issue, could you please tell me the exact versions(python and other python modules) which you used?