masterbin-iiau / unicorn Goto Github PK

[ECCV'22 Oral] Towards Grand Unification of Object Tracking

License: MIT License

Python 90.90% C++ 3.01% Shell 0.06% Cuda 6.03%

multi-object-tracking-segmentation multiple-object-tracking object-tracking single-object-tracking video-object-segmentation

unicorn's Introduction

Unicorn 🦄 : Towards Grand Unification of Object Tracking

This repository is the project page for the paper Towards Grand Unification of Object Tracking

Highlight

Unicorn is accepted to ECCV 2022 as an oral presentation!
Unicorn first demonstrates grand unification for four object-tracking tasks.
Unicorn achieves strong performance in eight tracking benchmarks.

Introduction

The object tracking field mainly consists of four sub-tasks: Single Object Tracking (SOT), Multiple Object Tracking (MOT), Video Object Segmentation (VOS), and Multi-Object Tracking and Segmentation (MOTS). Most previous approaches are developed for only one of or part of the sub-tasks.
For the first time, Unicorn accomplishes the great unification of the network architecture and the learning paradigm for four tracking tasks. Besides, Unicorn puts forwards new state-of-the-art performance on many challenging tracking benchmarks using the same model parameters.

This repository supports the following tasks:

Image-level

Object Detection
Instance Segmentation

Video-level

Single Object Tracking (SOT)
Multiple Object Tracking (MOT)
Video Object Segmentation (VOS)
Multi-Object Tracking and Segmentation (MOTS)

Demo

Unicorn conquers four tracking tasks (SOT, MOT, VOS, MOTS) using the same network with the same parameters.

video_demo_unicorn.mp4

Results

SOT

MOT (MOT17)

MOT (BDD100K)

VOS

MOTS (MOTS Challenge)

MOTS (BDD100K MOTS)

Getting started

Installation: Please refer to install.md for more details.
Data preparation: Please refer to data.md for more details.
Training: Please refer to train.md for more details.
Testing: Please refer to test.md for more details.
Model zoo: Please refer to model_zoo.md for more details.

Citing Unicorn

If you find Unicorn useful in your research, please consider citing:

@inproceedings{unicorn,
  title={Towards Grand Unification of Object Tracking},
  author={Yan, Bin and Jiang, Yi and Sun, Peize and Wang, Dong and Yuan, Zehuan and Luo, Ping and Lu, Huchuan},
  booktitle={ECCV},
  year={2022}
}

Acknowledgments

Thanks YOLOX and CondInst for providing strong baseline for object detection and instance segmentation.
Thanks STARK and PyTracking for providing useful inference and evaluation toolkits for SOT and VOS.
Thanks ByteTrack, QDTrack and PCAN for providing useful data-processing scripts and evalution codes for MOT and MOTS.

unicorn's People

Contributors

Stargazers

Watchers

Forkers

guttappa1238 lymdlut vujas-eteph chisyliu fyaft2012 cv-tracking actis92 rohitdhankar neerajkanhere deanofthewebb fangwudi xudingyi312 kulin-wot sidtalia dbarbedillo egidijus jbottala02 tinyloop lianzhouhui joskid zhujiabao rebnej dieptran43 ray-lee-94 youonlytrackonce yach1603 tlc-10 morgancree alejandrosuarez nimaboscarino l-net-1992 bqm1111 abhay5991 xrdevieee iftikhar574 nomiscientist iamweiweishi deep-learning-plan swan2015 xhniu ahmedkhaled945 klonggan wolfworld6 atlasgooo2 senstar-hsoleimani uctoronto nahidalam pjvazquez cui-shaowei wwppnn samwaterbury spdb-innovationlab-sz luomingshuang dogank01 ockhius zhucm1 pandinosaurus deep-learning-all goyallon janesunflower yinglang sialeid lil9991 huynhbaobk changgimoon husnejahan opentld ai-jie01 wangbo-zhao bovey0809 gg-big-org isbecky27 theoh-io open-shuaijun nemonameless haoliyoupai09 monsterzuo deepbehavier lavender-lee msathishkumar1990 eyalsel paperwave tianxintony mightycrane carlos-cell kubranurtiryaki keyongxing

unicorn's Issues

Question regarding input tensor preprocessing

Hi,

While following along the inference.py code to see how the model is working, I noticed something in the preprocessing code.

PreprocessorX at external/lib/test/tracker/unicorn_sot.py:111 turns RGB format back to BGR format and the normalization process is missing. The self.normalize is not referenced at all as well. So the input seems to be raw image in BGR format with values between range [0, 255].

I wasn't able to find any code that performs normalization in the forward functions of the inner models as well.
Is it just that the model was trained on raw pixel values or am I missing anything?

Installation error

I am going through the installation step, can't understand the following error. Can anyone help me to resolve this.

Checkpoint for mot17 is missing

Hi Authors,

Thanks and congratulations on the wonderful work. I wonder if you can please share the checkpoint for the second group of models unicorn_track_large_mot_challenge_mask. The link is missing from the readme and from Huggingface https://huggingface.co/NimaBoscarino/unicorn_track_large_mot_challenge_mask/tree/main

Regards
Harkirat

ValueError: Invalid num_classes

Hello @MasterBin-IIAU, when i use exp file unicorn_track_large.py, along with the weights unicorn_det_convnext_large_800x1280, model loads well, as long as num of classes is 8 or 80, but when i try to accommodate this to my dataset (num_classes = 11), to retrain on my custom database, the model raises the error Invalid num_classes,
Is this the expected behaviour? i want to train a detector that can be used in qdtrack association (track_omni.py script), trained on my dataset (11 classes), is that possible with the current scripts?
Thanks in advance.

How to get or prepare the citypersons.train and eth.train files?

Thank you.

我在进行复现时找不到unicorn_track_tiny_vos_only的latest_ckpt.pth

如题，我点击您给的连接后一直显示失败，请问您可以发送模型到[email protected]吗？如果不方便的话请问是否可以提供其他地址？万分感谢！

Which device for test？

Hi, awesome work!
I want to know which GPU you use for the inference speed in paper?
Or what is the minimum gpu device requirements to run the model?

Demo command for SOT and MOT

Hello, thanks for your work on Tracking Unification which is really promising! I've seen that you are currently working on a demo script, but could you provide simple example commands for SOT and MOT inference ?

Some explanations of the arguments would also be appreciated, I'm not sure to understand the purpose of the experiment description file.

Thanks for your time,
Would be glad to help, working on the documentation for example.

[Code] The purpose of learnable broadcast sum in UnicornHead

Thanks for your job,as mentioned in title,did you do Ablations for the learnable broadcast

I have a simple question. Why did you use transformer inside the code for mot tracking? As there is not much explanation about it inside the paper.

Using custom Yolo model and video inference

Is there a way to upload a custom trained model then use it to track and perform inference in a custom video?
Does Unicorn only accept YoloX, or would it be able to accept a generic (ex Yolo5/6/7) trained model?

This may be addressed somewhere in the code, but I have not found it yet. Any help would be amazing!

MOT/MOTS推理问题

在MOT/MOTS推理阶段是否存在类似于feature propagation这样的操作？论文framework图中的reference targets是如何体现在推理阶段的？

Custom dataset

Hello, thanks for this interesting project, wanted to ask how can i apply the tracker to a custom-trained yolox model of my own,
I have the model and i already integrated it with ByteTrack, is there any script of readme that can help me with this?

BDD100K数据存在失效的下载链接

https://bdd-data-storage-release.s3.us-west-2.amazonaws.com/bdd100k/2021/bdd100k_ins_seg_labels_trainval.zip
https://bdd-data-storage-release.s3.us-west-2.amazonaws.com/bdd100k/2021/bdd100k_seg_track_20_labels_trainval.zip

Single GPU?

What needs to be changed if I’d like to train the network with only a single gpu? Is this possible?

thank you! And great work :)

How to split your framework like unitrack ?

Hi,

      How to split your Unicorn framework  as Detection,  Apperance Model(embeding), association model?

      So we can easy take replace of different module.

能否提供百度网盘模型下载？

由于某个众所周知的原因。。。

List index out of range in convert_to_coco_format

Hello @MasterBin-IIAU, thank you for your work and publishing it.

I am currently trying to setup an environment to benchmark Unicorn vs. another algorithm from someone in my company during a project to proof my expertise in an internal AI degree. So bear with me when I am not 100% sure about wording and what I am doing :-).

In a first step, I intend to run a MOT only test on MOT Challenge 17 data, as in the beginning BDD data was not completely loaded. I installed the python environment, although I use python 3.8 and CUDA 11.6 as well as PyTorch 1.12 (just to let you know)

Then I was able to run

python launch_uni-py --name unicorn_track_tiny_mot_only.py --nproc_per_node=2 --batch 16 --mode multiple

but as training would take 10 days, I want to use your provided model zoo. Therefore I created a directory called Unicorn_outputs/unicorn_track_tiny_mot_only and placed the pre-trained latest_ckpt.pth from model zoo in it. I also changed mot_test_name to motchallenge in exp/unicorn_track.py but there is anyhow no difference when I don't change it (after I now loaded all BDD data as well).

When I call

python tools/track.py -f expos/default/unicorn_track_tiny_mot_only.py -c Unicorn_outputs/unicorn_track_tiny_mot_only/latest_ckpt.pth -b 1 -d 1

it throws an list index out of range error in the function convert_to_coco_format where it retrieves the label, as the data loader.dataset.class_ids is only of dimension 1, which means it only knows one I think this is expected, as MOT 17 only knows one class. But the actual output it is working on contains label-numbers 0,1,2,3,6, and 7. Variable cls is a tensor starting with 10 '0' values that work, but certainly the '7' in the next position is throwing the error.

My assumption is that something is still wrong with the number of classes but don't know how to proceed. I did some debugging but currently I don't find the solution.

Thanks for an answer, Carsten.

install failed

my version are:
torch 1.10.0
torchaudio 0.10.0
torchvision 0.11.0

but this error:

Traceback (most recent call last):
File "tools/test_omni.py", line 11, in
from mmdet.datasets import build_dataset
File "/home/yj/.local/lib/python3.7/site-packages/mmdet/init.py", line 18, in
f'MMCV=={mmcv.version} is used but incompatible. '
AssertionError: MMCV==1.4.6 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.1.0.

Using Pretrained-embeddings along with custom trained detections

So i was trying to train for tracking, using qdtrack association, and this requires a lot of computational power, which i can get, but first i wanted to test how efficient will the method be,

I have a custom detector that i trained, can i use this detector for detections, and your pretrained model for embeddings and id association, or that would tear up the association accuracy?

Thanks in advance.

MOT模式下embedding的loss与qdtrack的loss是一样的么？

scripts to reproduce the results in paper

Hi, thanks for your awesome work.

According to the code in this repo, you perform 3-stages training: detection, tracking, tracking-plus-mask, right? Could you please provide the scripts (with specific hyper-parameters, e.g., batch size) to reproduce the results shown in your paper?

SOT参数问题

大佬好啊，我是刚入门的小白，想问下test.py的tracker_name和tracker_para可以填啥嘞，想跑那个SOT的测试，但是发现这个和pytracking不太一样，想知道在那个default里填啥才可以跑SOT的模型

关于找不到文件 About file not found：../Unicorn_outputs/unicorn_det_convnext_tiny_800x1280/best_ckpt.pth

Hello, you provided a good way to solve the exhibition. When I tried to carry out multi-target tracking training, I found that the pre training weight you provided could not be obtained. Have you modified the code, or is there any way to obtain it?

The unicorn_track_large_mot_challenge_mask model is still missing

新地址依然找不到mot17的预训练模型，里面没有latest_ckpt.pth文件

I want like to test the mote using a video I shot myself. What should I do?

python3.7Scikit-learn requires Python 3.8 or later.

bdd100k 's some requirements
Scikit-learn requires Python 3.8 or later.
how do you solve it?
is bdd100k used for convert label
if the answer is yes,i can set another env for bdd100

could support the C++ inference? Export to to onnx，and forward with tensorrt?

mmtracking framework support ?

Hi,

This is a very awesome work. I want to integrate it into mmtracking framework.
Do you agree it? I want to refactor code and Pull requests to mmtracking git .

请问一下数据集需要多大硬盘空间

非常感谢您对跟踪领域的重大贡献，由于数据集太大，请问具体需要多大呢？4T能否装下呢?

为什么没有尝试一下使用RepLKNet作为backbone呢？

ConvNext似乎在下游任务上表现不是非常优，您能做到非常优秀的结果，是否有什么好的经验呢？
二是您有没有尝试过RepLKNet呢？这里面的选择有什么考量吗？如果做过的话，也希望如果方便的话，最好也能提供一下实验结果。
最后关注到您是使用16a100完成的，请问如果显存只有11g的卡8能否容得下呢？

再次感谢您大统一的优秀工作！冒昧再次打扰，感谢！

ConvNext does not seem to be doing very well on downstream tasks. Do you have any good experience in achieving very good results?
Second, have you tried RepLKNet? Is there any consideration in the selection? If so, I also hope to provide the experimental results if it is convenient.
Finally, I noticed that you used 16* A100 to complete the task. Can the card *8 with only 11G memory hold the task?

Thank you again for your great work! Thank you for bothering me again!

Cuda mismatch error when building module for ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

I am facing when running I faced ModuleNotFoundError: No module named 'MultiScaleDeformableAttention' error so I tried unicorn/models/ops/python setup.py build and python setup.py install but I am getting the error below

raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda)) RuntimeError: The detected CUDA version (10.1) mismatches the version that was used to compile PyTorch (11.7). Please make sure to use the same CUDA versions.

I see that Deformable-DETR is only available until 10.1 or something below, how to fix it?

where is the tools/trt.py?

I found the tools/demo.py support the TensorRT inference, so I try to convert it to TensorTR model, but where is the tools/trt.py mentioned in tools/demo.py?
if args.trt: assert not args.fuse, "TensorRT model is not support model fusing!" trt_file = os.path.join(file_name, "model_trt.pth") assert os.path.exists( trt_file ), "TensorRT model is not found!\n Run python3 tools/trt.py first!" model.head.decode_in_inference = False decoder = model.head.decode_outputs logger.info("Using TensorRT to inference") else: trt_file = None decoder = None

Dockerfile anywhere?

Do you have any plans to create a Dockerfile for all the environments?

Thanks

Is there any demo for inference video or images?

Hi, author. I have read your paper, which is a fascinating piece of work. Is there any code of demo for inference video or images?

为什么不管怎么调bs，单机多卡训练的时候显存占用都不变

为什么不管怎么调bs，单机多卡训练的时候显存占用都不变。

Web demo and models on Hugging Face

Hi there, congrats on the release and on the acceptance to ECCV 2022! I got SOT working on my local machine, but getting the other video-level tasks to work has been a bit difficult, so I wondered if you'd find it useful to have a demo available. To make it easier for people to tinker with your work, would you be interested in adding the models and a web demo to Hugging Face? The Hugging Face hub offers free hosting, and I'd be more than happy to help out if it's something you're interested in.

why Train dataset include detection dataset ?

Hi ,

Why train tracking step1 ,need BDD detection dataset?

AttributeError: 'tuple' object has no attribute 'new'

i run to demo but have some error

box_corner = prediction.new(prediction.shape)
AttributeError: 'tuple' object has no attribute 'new'

what is this error?

on edge devices

does the models work on edge devices and what specs/qualifications

What is the experimental setting in the paper?

Hi, in Sec 4.1, the paper said that Unicorn in four tasks uses the same model parameters, does this mean that the Unicorn model is trained in a multi task manner? Meanwhile, I found that in the ablation study in Sec 4.6, the four results of Unification in the Single task part of Table 7 are different from the results in Tables 1, 3, 4, 6, what is the difference between the Unification setting in Table 7 and those in Table 1,3, 4, 6?

Exception has occurred: RuntimeError YOU HAVE NOT SETUP YOUR local.py!!!

It is a very nice working. When I try to reproduce this work, I encountered this problem. I'm tired of this. It is very annoying!!! Hope author give me a favor, thanks a lot!!!

Background: Test SOT on LaSOT.

How to evaluate on vot2020?

Hello!
I want to compare unicorn with our method on vot2020.
[unicorn] #
label = unicorn
protocol = traxpython
command = import tools.run_vot as run_vot; run_vot.run_vot2020('unicorn_vos', 'unicorn_track_r50_mask') # Set the tracker name and the parameter name

Specify a path to trax python wrapper if it is not visible (separate by ; if using multiple paths)

paths = /media/wuhan/disk1/wh_code_backup/Unicorn

Additional environment paths

env_PATH = /home/wuhan/anaconda3/envs/unicorn/bin/python;${PATH}

And I modified the Unicorn/external/lib/test/tracker/unicorn_vos.py

def initialize(self, image, info: dict):
    self.frame_id = 0
    # process init_info
    self.init_object_ids = info["init_object_ids"]
    self.sequence_object_ids = info['sequence_object_ids']
    # assert self.init_object_ids == self.sequence_object_ids
    # forward the reference frame once
    """resize the original image and transform the coordinates"""
    self.H, self.W, _ = image.shape
    ref_frame_t, r = self.preprocessor.process(image, self.input_size)
    """forward the network"""
    with torch.no_grad():
        _, self.out_dict_pre = self.model(imgs=ref_frame_t, mode="backbone")  # backbone output (previous frame) (b, 3, H, W)
    self.dh, self.dw = self.out_dict_pre["h"] * 2, self.out_dict_pre["w"] * 2  # STRIDE = 8
    """get initial label mask (K, H/8*W/8)"""
    self.lbs_pre_dict = {}
    self.state_pre_dict = {}
    for obj_id in self.init_object_ids:
        self.state_pre_dict[obj_id] = info["init_bbox"]
        init_box = torch.tensor(info["init_bbox"]).view(-1)
        init_box[2:] += init_box[:2] # (x1, y1, x2, y2)
        init_box_rsz = init_box * r # coordinates on the resized image
        self.lbs_pre_dict[obj_id] = F.interpolate(get_label_map(init_box_rsz, self.input_size[0], self.input_size[1]) \
            , scale_factor=1/8, mode="bilinear", align_corners=False)[0].flatten(-2).to(self.device) # (1, H/8*W/8)
    """deal with new-incoming instances"""
    self.out_dict_pre_new = [] # a list containing out_dict for new in-coming instances
    self.obj_ids_new = []

def track(self, image, info: dict = None, bboxes=None, scores=None, gt_box=None):
    self.frame_id += 1
    """resize the original image and transform the coordinates"""
    cur_frame_t, r = self.preprocessor.process(image, self.input_size)
    with torch.no_grad():
        with torch.cuda.amp.autocast(enabled=False):
            fpn_outs_cur, out_dict_cur = self.model(imgs=cur_frame_t, mode="backbone")  # backbone output (current frame)
    # deal with instances from the first frame
    final_mask_dict, inst_scores = self.get_mask_results(fpn_outs_cur, out_dict_cur, self.out_dict_pre, r, self.init_object_ids)
    # deal with instances from the intermediate frames
    for (out_dict_pre, init_object_ids) in zip(self.out_dict_pre_new, self.obj_ids_new):
        final_mask_dict_tmp, inst_scores_tmp = self.get_mask_results(fpn_outs_cur, out_dict_cur, out_dict_pre, r, init_object_ids)
        final_mask_dict.update(final_mask_dict_tmp)
        inst_scores = np.concatenate([inst_scores, inst_scores_tmp])
    # deal with instances from the current frame"""
    if "init_object_ids" in info.keys():
        self.out_dict_pre_new.append(out_dict_cur)
        self.obj_ids_new.append(info["init_object_ids"])
        inst_scores_tmp = np.ones((len(info["init_object_ids"]),))
        inst_scores = np.concatenate([inst_scores, inst_scores_tmp])
        for obj_id in info["init_object_ids"]:
            self.state_pre_dict[obj_id] = info["init_bbox"]
            init_box = torch.tensor(info["init_bbox"]).view(-1)
            init_box[2:] += init_box[:2] # (x1, y1, x2, y2)
            init_box_rsz = init_box * r # coordinates on the resized image
            self.lbs_pre_dict[obj_id] = F.interpolate(get_label_map(init_box_rsz, self.input_size[0], self.input_size[1]) \
                , scale_factor=1/8, mode="bilinear", align_corners=False)[0].flatten(-2).to(self.device) # (1, H/8*W/8)
            final_mask_dict[obj_id] = info["init_mask"]
    # Deal with overlapped masks
    cur_obj_ids = copy.deepcopy(self.init_object_ids)
    for obj_ids_inter in self.obj_ids_new:
        cur_obj_ids += obj_ids_inter
    if "init_object_ids" in info.keys():
        cur_obj_ids += info["init_object_ids"]
    # soft aggregation
    cur_obj_ids_int = [int(x) for x in cur_obj_ids]
    mask_merge = np.zeros((self.H, self.W, max(cur_obj_ids_int)+1)) # (H, W, N+1)
    tmp_list = []
    for cur_id in cur_obj_ids:
        mask_merge[:, :, int(cur_id)] = final_mask_dict[cur_id]
        tmp_list.append(final_mask_dict[cur_id])
    back_prob = np.prod(1 - np.stack(tmp_list, axis=-1), axis=-1, keepdims=False)
    mask_merge[:, :, 0] = back_prob
    mask_merge_final = np.argmax(mask_merge, axis=-1) # (H, W)
    for cur_id in cur_obj_ids:
        final_mask_dict[cur_id] = (mask_merge_final == int(cur_id))
    """get the final result"""
    final_mask = np.zeros((self.H, self.W), dtype=np.uint8)
    # for obj_id in cur_obj_ids:
    #     final_mask[final_mask_dict[obj_id]==1] = int(obj_id)
    final_mask = mask_merge_final
    return {"segmentation": final_mask}

But the tracking and segmentation results is "0, 0, 0, 0"

Can you help me?

The number of classes in the detection loss.

Wonderful work! As stated in the paper, the model is trained under the supervision of corresponding loss and detection loss using the data from SOT and MOT.
For the part of classification loss in detection, I want to know how to handle the instances from SOT datasets whose classes are outside MOT datasets. Also, do you think the SOT performance of this model is related to whether the tracked object class is in the MOT dataset?

CPU-mode not supported?

Under the assumption that I can try out Unicorn with some pre-trained models, I tried to install it on a Mac and a PC, both w/o nVidia GPU. In step 3 when executing bash make.sh, setup.py is called which is testing for torch.cuda.is_available() and is raising an exception if the function returns false. So running in CPU-mode is not foreseen?