Code Monkey home page Code Monkey logo

nanonets_object_tracking's Introduction

Object Tracking

Installation: Use this command to install all the necessary packages. Note that we are using python3

Link to the blog click here

pip install -r requirements.txt

This module is built on top of the original deep sort module https://github.com/nwojke/deep_sort Since, the primary objective is to track objects, We assume that the detections are already available to us, for the given video. The det/ folder contains detections from Yolo, SSD and Mask-RCNN for the given video.

deepsort.py is our bridge class that utilizes the original deep sort implementation, with our custom configs. We simply need to specify the encoder (feature extractor) we want to use and pass on the detection outputs to get the tracked bounding boxes. test_on_video.py is our example code, that runs deepsort on a video whose detection bounding boxes are already given to us.

A simplified overview:

#Initialize deep sort object.
deepsort = deepsort_rbc(wt_path='ckpts/model640.pt') #path to the feature extractor model.

#Obtain all the detections for the given frame.
detections,out_scores = get_gt(frame,frame_id,gt_dict)

#Pass detections to the deepsort object and obtain the track information.
tracker,detections_class = deepsort.run_deep_sort(frame,out_scores,detections)

#Obtain info from the tracks.
for track in tracker.tracks:
    bbox = track.to_tlbr() #Get the corrected/predicted bounding box
    id_num = str(track.track_id) #Get the ID for the particular track.
    features = track.features #Get the feature vector corresponding to the detection.

The tracker object returned by deepsort contains all necessary info like the track_id, the predicted bounding boxes and the corresponding feature vector of the object.

Download the test video from here.

The pre-trained weights of the feature extractor are present in ckpts/ folder. With the video downloaded and all packages installed correctly, you should be able to run the demo with

python test_on_video.py

If you want to train your own feature extractor, proceed to the next section.

Training a custom feature extractor

Since, the original deepsort focused on MARS dataset, which is based on people, the feature extractor is trained on humans. We need an equivalent feature extractor for vehicles. We shall be training a Siamese network for the same. More info on siamese nets can be found here and here

We have a training and testing set, extracted from the NVIDIA AI city Challenge dataset. You can download it from here.

Extract the crops and crops_test folders in the same working directory. Both folders have 184 different sub-folders, each of which contains crops of a certain vehicle, shot in various views. Once, the folders have been extracted, we can go through the network configurations and the various options in siamese_net.py and siamese_dataloader.py. If satisfied, we can start the training process by:

python siamese_train.py

The trained weights will be stored in ckpts/ folder. We can use python siamese_test.py to test the accuracy of the trained model. Once trained, this model can be plugged in to our deepsort class instance.

nanonets_object_tracking's People

Contributors

abhyantrika avatar dependabot[bot] avatar viplix3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nanonets_object_tracking's Issues

Why did you choose Sieamese network for appearance vectors

Hi, I really appreciate for your code.
It helped me to understand better for the deep SORT algorithm!

I want to know why did you choose Siamese network for appearance feature extraction.

The original paper is mentioning that they have used wide residual network for the feature.

As far as I know, the Siamese network is specialized in a few-shot learning.

Thanks in advance.

how to train custom dataset using deep sortmodel?

@abhyantrika , hope you are doing great. I have a question on "how to train the deep sort model using custom dataset" . Let's say, I have 3 classes namely car, motorbike, bus.
I have a doubt about creating the folders here.

|__car
|    |__car_image1.jpg
|    |__car_image2.jpg
|    |__car_image3.jpg
|__bus
|    |__bus_image1.jpg
|    |__bus_image2.jpg
|    |__bus_image3.jpg
|__motorbike
|    |__motorbike_image1.jpg
|    |__motorbike_image2.jpg
|    |__motorbike_image3.jpg

I have another doubt when I downloaded the example from this repository under the crops folder there are 329 folders and each folder contains samples of the same car images at a different angle. Here my questions raise, do we need to create a folder for each and every car like a separate folder green colour car and place the images of the green colour car inside it. And a separate folder for red colour car and place the images of the red colour car inside it. should it be like the below folder structure?

|__car
|    |_green_car
|    |  |__green_car_image1.jpg
|    |  |__green_car_image2.jpg
|    |  |__green_image3.jpg
|    |_red_car
|    |  |__red_car_image1.jpg
|    |  |__red_car_image2.jpg
|    |  |__red_car_image3.jpg

Last question, what would be the minimum number of images to be considered for each class?

How to evaluate the accuracy of tracking?

This below is an example of a detection result which is the input for deep_sort.
Later on, the second digit will be changed into an object ID.

In some cases, the tracker failed (wrongly assigned object ID).
It also often makes new IDs for the reappeared object.
What I need to know is how to evaluate the accuracy of tracking so that we can compare this algorithm with other algorithms?

Any help would be appreciated.

1,-1,613.5829,88.5738,10.7212,10.5208,-1,-1,-1
1,-1,159.3612,412.3874,11.2697,9.9765,-1,-1,-1
1,-1,612.6691,227.6705,11.7314,9.7182,-1,-1,-1
1,-1,484.7934,39.8948,12.2918,10.8645,-1,-1,-1
1,-1,268.6557,315.8924,11.2140,10.8330,-1,-1,-1
..
2,-1,568.5021,400.1530,10.8805,10.7782,-1,-1,-1
2,-1,135.4661,29.4275,11.4167,9.2357,-1,-1,-1
2,-1,564.1170,315.1545,13.9052,9.7275,-1,-1,-1
2,-1,503.0762,435.7337,11.3504,9.8643,-1,-1,-1
2,-1,611.6464,111.1447,10.4282,12.9108,-1,-1,-1
..

The dets is not match the video

  1. I use the det_ssd512.txt deections, and download the video from readme
  2. I found that video have (211(s)*10(FPS)=2110 frames), but in file, only 1995 frames
  3. I write the rectangle with cv2, but it didn't select the cars

MOT results sucks using siamese network

I test deepsort + siamese network under MOT eval, here is the results below:
image
It hardly show any improvements than official deepsort results, i am confused that whether i use it properly.
I will be much appreciate if anyone who can help me, thank you in advance.

How do you use Yolo v4-5-6-7 versions for detections?

Hi, just wanted to ask that if I want to test the versions of YOLO, then what is the format for conversion? The newer versions give results like:

as .txt

17 0.100259 0.460938 0.200517 0.199219
17 0.536223 0.521484 0.109961 0.222656

and as numpy array

array([[          0,         183,         318,         411,     0.95672,          17],
       [        433,         213,         596,         347,       0.954,          17]]

how to pass ditections from an object detector to deep sort

Hi,

I am working on a project where i need to track objects. the hurdle i am facing is how i can adopt this detection code to incorporate the deepsort tracking code, by passing detection.


import os
# comment out below line to enable tensorflow logging outputs
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import time
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
from absl import app, flags, logging
from absl.flags import FLAGS
import core.utils as utils
from core.yolov4 import filter_boxes
from tensorflow.python.saved_model import tag_constants
from core.config import cfg
from PIL import Image
import cv2
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
# deep sort imports
from deep_sort import preprocessing, nn_matching
from deep_sort.detection import Detection
from deep_sort.tracker import Tracker
from tools import generate_detections as gdet
flags.DEFINE_string('framework', 'tf', '(tf, tflite, trt')
flags.DEFINE_string('weights', './checkpoints/yolov4-416',
                    'path to weights file')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_boolean('tiny', False, 'yolo or yolo-tiny')
flags.DEFINE_string('model', 'yolov4', 'yolov3 or yolov4')
flags.DEFINE_string('video', './data/video/test.mp4', 'path to input video or set to 0 for webcam')
flags.DEFINE_string('output', None, 'path to output video')
flags.DEFINE_string('output_format', 'XVID', 'codec used in VideoWriter when saving video to file')
flags.DEFINE_float('iou', 0.45, 'iou threshold')
flags.DEFINE_float('score', 0.50, 'score threshold')
flags.DEFINE_boolean('dont_show', False, 'dont show video output')
flags.DEFINE_boolean('info', False, 'show detailed info of tracked objects')
flags.DEFINE_boolean('count', False, 'count objects being tracked on screen')

def main(_argv):
    # Definition of the parameters
    max_cosine_distance = 0.4
    nn_budget = None
    nms_max_overlap = 0.8
    
    # initialize deep sort
    model_filename = 'model_data/mars-small128.pb'
    encoder = gdet.create_box_encoder(model_filename, batch_size=1)
    # calculate cosine distance metric
    metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget)
    # initialize tracker
    tracker = Tracker(metric)

    # load configuration for object detector
    config = ConfigProto()
    config.gpu_options.allow_growth = True
    session = InteractiveSession(config=config)
    STRIDES, ANCHORS, NUM_CLASS, XYSCALE = utils.load_config(FLAGS)
    input_size = FLAGS.size
    video_path = FLAGS.video

    # load tflite model if flag is set
    if FLAGS.framework == 'tflite':
        interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
        interpreter.allocate_tensors()
        input_details = interpreter.get_input_details()
        output_details = interpreter.get_output_details()
        print(input_details)
        print(output_details)
    # otherwise load standard tensorflow saved model
    else:
        saved_model_loaded = tf.saved_model.load(FLAGS.weights, tags=[tag_constants.SERVING])
        infer = saved_model_loaded.signatures['serving_default']

    # begin video capture
    try:
        vid = cv2.VideoCapture(int(video_path))
    except:
        vid = cv2.VideoCapture(video_path)

    out = None

    # get video ready to save locally if flag is set
    if FLAGS.output:
        # by default VideoCapture returns float instead of int
        width = int(vid.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = int(vid.get(cv2.CAP_PROP_FPS))
        codec = cv2.VideoWriter_fourcc(*FLAGS.output_format)
        out = cv2.VideoWriter(FLAGS.output, codec, fps, (width, height))

    frame_num = 0
    # while video is running
    while True:
        return_value, frame = vid.read()
        if return_value:
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            image = Image.fromarray(frame)
        else:
            print('Video has ended or failed, try a different video format!')
            break
        frame_num +=1
        # print('Frame #: ', frame_num)
        frame_size = frame.shape[:2]
        image_data = cv2.resize(frame, (input_size, input_size))
        image_data = image_data / 255.
        image_data = image_data[np.newaxis, ...].astype(np.float32)
        start_time = time.time()

        # run detections on tflite if flag is set
        if FLAGS.framework == 'tflite':
            interpreter.set_tensor(input_details[0]['index'], image_data)
            interpreter.invoke()
            pred = [interpreter.get_tensor(output_details[i]['index']) for i in range(len(output_details))]
            # run detections using yolov3 if flag is set
            if FLAGS.model == 'yolov3' and FLAGS.tiny == True:
                boxes, pred_conf = filter_boxes(pred[1], pred[0], score_threshold=0.25,
                                                input_shape=tf.constant([input_size, input_size]))
            else:
                boxes, pred_conf = filter_boxes(pred[0], pred[1], score_threshold=0.25,
                                                input_shape=tf.constant([input_size, input_size]))
        else:
            batch_data = tf.constant(image_data)
            pred_bbox = infer(batch_data)
            for key, value in pred_bbox.items():
                boxes = value[:, :, 0:4]
                pred_conf = value[:, :, 4:]

        boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
            boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
            scores=tf.reshape(
                pred_conf, (tf.shape(pred_conf)[0], -1, tf.shape(pred_conf)[-1])),
            max_output_size_per_class=50,
            max_total_size=50,
            iou_threshold=FLAGS.iou,
            score_threshold=FLAGS.score
        )

        # convert data to numpy arrays and slice out unused elements
        num_objects = valid_detections.numpy()[0]
        bboxes = boxes.numpy()[0]
        bboxes = bboxes[0:int(num_objects)]
        scores = scores.numpy()[0]
        scores = scores[0:int(num_objects)]
        classes = classes.numpy()[0]
        classes = classes[0:int(num_objects)]

        # format bounding boxes from normalized ymin, xmin, ymax, xmax ---> xmin, ymin, width, height
        original_h, original_w, _ = frame.shape
        bboxes = utils.format_boxes(bboxes, original_h, original_w)

        # store all predictions in one parameter for simplicity when calling functions
        pred_bbox = [bboxes, scores, classes, num_objects]

        # read in all class names from config
        class_names = utils.read_class_names(cfg.YOLO.CLASSES)

        # by default allow all classes in .names file
        allowed_classes = list(class_names.values())
        
        # custom allowed classes (uncomment line below to customize tracker for only people)
        #allowed_classes = ['person']

        # loop through objects and use class index to get class name, allow only classes in allowed_classes list
        names = []
        deleted_indx = []
        for i in range(num_objects):
            class_indx = int(classes[i])
            class_name = class_names[class_indx]
            if class_name not in allowed_classes:
                deleted_indx.append(i)
            else:
                names.append(class_name)
        names = np.array(names)
        count = len(names)
        if FLAGS.count:
            cv2.putText(frame, "Objects being tracked: {}".format(count), (5, 35), cv2.FONT_HERSHEY_COMPLEX_SMALL, 2, (0, 255, 0), 2)
            print("Objects being tracked: {}".format(count))
        # delete detections that are not in allowed_classes
        bboxes = np.delete(bboxes, deleted_indx, axis=0)
        scores = np.delete(scores, deleted_indx, axis=0)

        # encode yolo detections and feed to tracker
        features = encoder(frame, bboxes)
        detections = [Detection(bbox, score, class_name, feature) for bbox, score, class_name, feature in zip(bboxes, scores, names, features)]
        # run non-maxima supression
        boxs = np.array([d.tlwh for d in detections])
        scores = np.array([d.confidence for d in detections])
        classes = np.array([d.class_name for d in detections])
        indices = preprocessing.non_max_suppression(boxs, classes, nms_max_overlap, scores)
        detections = [detections[i] for i in indices]       
        # Call the tracker
        tracker.predict()
        tracker.update(detections)
        
        cmap = plt.get_cmap('tab20b')
        colors = [cmap(i)[:3] for i in np.linspace(0,1,20)]

        # update tracks
        for track in tracker.tracks:
            if not track.is_confirmed() or track.time_since_update > 1:
                continue 
            bbox = track.to_tlbr()
            class_name = track.get_class()
            #frame = dd(bbox,frame,track,class_name,frame_num)
                    
        # draw bbox on screen
            color = colors[int(track.track_id) % len(colors)]
            color = [i * 255 for i in color]
            cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), color, 1)
            cv2.rectangle(frame, (int(bbox[0]), int(bbox[1]-30)), (int(bbox[0])+(len(class_name)+len(str(track.track_id)))*17, int(bbox[1])), color, -1)
            cv2.putText(frame, class_name + "-" + str(track.track_id),(int(bbox[0]), int(bbox[1]-10)),0, 0.75, (255,255,255),2)

            center = ((bbox[0]+abs(bbox[0]-bbox[2])/2), (bbox[1]+abs(bbox[1]-bbox[3])/2))
        # if enable info flag then print details about each track
            if FLAGS.info:
                print("Tracker ID: {}, Class: {},  BBox Coords (xmin, ymin, xmax, ymax): {}".format(str(track.track_id), class_name, (int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3]))))


        # calculate frames per second of running detections
        fps = 1.0 / (time.time() - start_time)
        print("FPS: %.2f" % fps)
        result = np.asarray(frame)
        result = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
        
        # if not FLAGS.dont_show:
            # cv2.imshow("Output Video", result)
        
        # if output flag is set, save video file
        # if FLAGS.output:
        #     out.write(result)
        if cv2.waitKey(1) & 0xFF == ord('q'): break
    cv2.destroyAllWindows()

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

The way the deepsort is incorporated here is different to what is in this repo.

id really appreciate any advice.

Thank you

Training time

Could you comment on how long it takes for you to train the model on a single GPU? It seems for me to take very long especially the beginning of an epoch almost hangs for many minutes.

Video and detection does not match.

It seems like the video linked to in the README.md (https://drive.google.com/open?id=1h2Wnb98tDVB6JlCDNQXCeZpG20x6AiZ2) does not match with the detections in Nanonets_object_tracking/det/.

In each of the det_*.txt files there are 1955 frames and the video consist of 2110 frames.
This is also confirmed visually (Bounding boxes (detections) are not matching where the cars actually are) when using either the given model640.pt or a self-trained feature extractor on the given data and the program crashes when trying to process frame 1956 (for good reason).

Is there a new video or what is going on here?

No detection bug

in case of "no detection"
if out_boxes==[]:
self.tracker.predict()
print('No detections')
trackers = self.tracker.tracks
return trackers

And I got
ValueError: not enough values to unpack (expected 2, got 1)

Can you control it ? @abhyantrika

detection file generation and what does each column signify?

Hi, I have q question regarding the detection files from YOLO, Fast-RCNN. How was it generated?
As for example in det_yolo3.txt,
1,-1,1094.277,257.304,63.898,62.347,0.400,-1,-1,-1, what does -1 signify?
I am guessing that "1" signifies the trackID and the floating numbers following "1" signify the positions of the bounding box and the confidence score. Where is the frame number, class?
I have trained my dataset on yolo and was hoping to use the code for tracking. Your help in this regard is really appreciated

Worse result than the original repo

hi, I have implemented and compared the result on vehicle tracking & counting and see that the original deep sort performs better. .,i.e,It switches ID in very short occlusions.
This repo re-trained cnn to extract embedding for vehicles. From this, it must be better performance than the original (cnn for people features).
Should I fine-turn any hyperparameters or somethings? Thank you

ERROR: Could not find a version that satisfies the requirement torch==1.0.1.post2

Trying to run the README.md

Used pip install -r requirements.txt and got:

ERROR: Could not find a version that satisfies the requirement torch==1.0.1.post2 (from -r requirements.txt (line 6)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.0.1.post2 (from -r requirements.txt (line 6))

Any thoughts/solutions?

running on custom dataset with no multiple views

Hi, i am currently trying to track grape bunches on a vineyard row. I have a dataset for training a maskrcnn model, so my idea was to just crop the grape bunches from my dataset, and feed it to the siamese network. So the problem is that for every bunch I have a single camera view, instead of multiple ones.
Do you think that deep SORT method is not useful in my case or I can apply deep SORT with your implementation even in my case?

How to generate the detections text file?

Hi, can anyone suggest how to generate detections text file? I used detectron2 but not sure how to output those detections to a text file. Any help would be appreciated.

Testing on faster rcnn model

I have trained a model on faster rcnn to detect the cycle riders in a frame.
I want to assign an unique id to each rider and track their movement with each frame and increase the count once they cross a assigned finishing line
Can please someone let me know how can I use this repository for my use case

Thanks in advance!

version compatibility issue

Hi @abhyantrika , I have tried to install the requirements.txt file, but unfortunately i'm unable install the the python packages. get below issue, could you please tell me the exact versions(python and other python modules) which you used?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.