Code Monkey home page Code Monkey logo

mot_annot's Introduction

MOT_Annot

This repo contains the annotation tools in Python for labeling multiple object tracking in a single camera and across multiple cameras, as well as randomly generating re-identification datasets. It was used in CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification, CVPR 2019.

[Paper] [Presentation] [Slides] [Poster]

Introduction

This package is designed for semi-automatic annotation of multiple object tracking in a single camera and across multiple cameras. It requires baseline detection and single-camera tracking results. Then the user can create manual labels and incorporate them to the tracking results to generate the ground truths. When multi-camera tracking labels are available, there is also a script for randomly generating a re-identification dataset.

Getting Started

Environment

The code was developed and tested with Python 2.7.15 on Windows 10. Other platforms may work but are not fully tested.

How to Use

We highly recommend to create a virtual environment for the following steps. For example, an introduction to Conda environments can be found here.

  1. Clone the repo, and change the current working directory to MOT_Annot, which will be referred to as ${ANNOT_ROOT}:

    cd ${ANNOT_ROOT}
    
  2. Add the videos from various data splits, scenes, and cameras to be annotated. Your directory tree may look like this:

    ${ANNOT_ROOT}
     |-- LICENSE
     |-- README.md
     |-- src
     |-- train
         |-- S01
             |-- c001
                 |-- vdo.avi
                 |-- roi.jpg
                 |-- ...
             |-- ...
         |-- ...
     |-- validation
         |-- ...
     `-- test
         |-- ...
    
    
  3. Extract frame images from input video files (frame indices starting from 1 by default):

    python src/extract_vdo_frms.py --data-root train/S01
    

    Extracted frame images

  4. Use a baseline object detection method, e.g., Detectron (Mask/Faster R-CNN), to output detection (and segmentation) results. The results need to be converted to the MOTChallenge format. An example script for the conversion is given:

    python src/convert_to_motchallenge_det.py --data-root train/S01
    
  5. Plot the detection results to visualize the performance and confirm that it is satisfactory:

    python src/plot_det_results.py --data-root train/S01
    

    Plotted detection results

  6. Use a baseline single-camera tracking method, e.g., TrackletNet, to output multi-target single-camera (MTSC) tracking results. The results need to be converted to the MOTChallenge format. An example script for the conversion is given:

    python src/convert_to_motchallenge_mtsc.py --data-root train/S01
    
  7. Plot the MTSC tracking results to visualize the performance and confirm that it is satisfactory:

    python src/plot_mtsc_results.py --data-root train/S01
    

    Plotted single-camera tracking results

  8. By checking the plotted baseline MTSC results frame by frame, manually create an annotation file, e.g. annotation.txt. There are 3 types of operations that can be entered in the annotation file at each row (the order doesn't matter):

    • Assign a global ID to a vehicle trajectory: assign,<local_ID>,<global_ID>
      • The vehicle(s) that are not assigned will be ignored.
    • Insert an instance to replace an existing one or fill in a missing one: insert,<frame_index>,<local_ID>,<bbox_x>,<bbox_y>,<bbox_wid>,<bbox_hei>
      • Use a tool like IrfanView to draw/adjust bounding boxes and read the coordinates.

        Annotation by IrfanView

      • The missing instance(s) in a continuous trajectory will be interpolated linearly, so there is no need to insert at every frame index.

      • The instance(s) occluded by more than 50% will be automatically detected and removed.

    • Remove a range of instance(s): remove,<frame_range>,<local_ID>
      • The <frame_range> can be represented as <frame_index>, <frm_index_start>-, -<frm_index_end>, or <frame_index_start>-<frame_index_end> (inclusive).
  9. Incorporate the annotations to the baseline MTSC results and generate the ground truths:

    python src/generate_ground_truths.py --data-root train/S01
    
  10. Plot the ground truths using the above script for plotting MTSC results (change the input and output paths accordingly) to confirm that the annotations are accurate. If not, modify the corresponding lines in annotation.txt and repeat steps 8 and 9 again.

  11. Plot the ground truth crop of each global ID at each frame for further validation:

    python src/plot_gt_crops.py --data-root train/S01
    

    Plotted ground truths

  12. Generate the labels of ground truths for the evaluation system:

    python src/generate_ground_truths_eval_system.py --data-root train/S01
    
  13. Generate a random dataset for re-identification (according to the format of the VeRi dataset):

    python src/generate_reid_dataset.py --data-root ./ --output-root reid
    

References

Please cite these papers if you use this code in your research:

@inproceedings{Tang19CityFlow,
  author = {Zheng Tang and Milind Naphade and Ming-Yu Liu and Xiaodong Yang and Stan Birchfield and Shuo Wang and Ratnesh Kumar and David Anastasiu and Jenq-Neng Hwang},
  title = {City{F}low: {A} city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification},
  booktitle = {Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages = {8797โ€“-8806},
  address = {Long Beach, CA, USA},
  month = Jun,
  year = 2019
}

@inproceedings{Naphade19AIC19,
  author = {Milind Naphade and Zheng Tang and Ming-Ching Chang and David C. Anastasiu and Anuj Sharma and Rama Chellappa and Shuo Wang and Pranamesh Chakraborty and Tingting Huang and Jenq-Neng Hwang and Siwei Lyu},
  title = {The 2019 {AI} {C}ity {C}hallenge},
  booktitle = {Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  pages = {452--460},
  address = {Long Beach, CA, USA},
  month = Jun,
  year = 2019
}

License

Code in the repository, unless otherwise specified, is licensed under the MIT License.

Contact

For any questions please contact Zheng (Thomas) Tang.

mot_annot's People

Contributors

zhengthomastang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.