sjtu-luhe / transvod Goto Github PK

View Code? Open in Web Editor NEW

203.0 203.0 28.0 4.74 MB

The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"

License: Apache License 2.0

Python 81.87% Shell 1.89% C++ 1.51% Cuda 14.74%

transvod's People

Contributors

Stargazers

Watchers

transvod's Issues

When will the code be made available? The accept notification of ACM MM has already been released.

Code/model for TransVOD++ and TransVOD Lite

Hi,

Thanks for the open source contribution of your work. Are you planning to release code or models of TrasnVOD Lite/++ anytime soon?
Since the results of TransVOD Lite are more compelling, releasing the source code of this model specifically would be of enormous help.

Cheers :)

single train & multi train

Hello, Thank you for your nice work about "TransVOD"!

I have a question here: "single train" only trains the first half of the network, after learning the output head after STD, the fixed weight begins to train the full network, so why not train the output head and the temporal network together? because of Slow convergence？

Waiting for your reply!

when will the code be released?

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] =-1.000

Code hints please run evaluate() first,but I don't know what to do... Can you give me some advice?

Here is a bug?

TransVOD/models/deformable_transformer_multi.py

Lines 226 to 229 in ef864f8

    
           cur_pos_embed = lvl_pos_embed_flatten[0:1] 
        
           ref_pos_embed_list = torch.chunk(lvl_pos_embed_flatten[1:], self.num_ref_frames, dim=0) 
        
           ref_pos_embed = torch.cat(ref_pos_embed_list, 1) 
        
           ref_memory = ref_memory + ref_pos_embed

Is there a bug when num_feature_levels = 4?

I can run the code when num_feature_levels = 1.

When num_feature_levels = 4, here is the error (ref_frame_num = 10):

File "deformable_transformer_multi.py", line 231, in forward
ref_spatial_shapes = spatial_shapes.expand(BS,self.num_ref_frames, 2).contiguous()
RuntimeError: The expanded size of the tensor (10) must match the existing size (4) at non-singleton dimension 1. Target sizes: [1, 10, 2]. Tensor sizes: [4, 2]

Installatioin

英文说不明白，就中文了，望作者谅解，我在复现过程中发现一下问题：
Installation的最后一步：sh ./make.sh 后显示第5、7、8、16行都“未找到命令”是什么情况？

random sampling for reference images?

Hi, thanks for the amazing work! I have some questions regarding your code:

consequent frames? : In the paper, it states that consequent frames are used as the input of TransVOD. However, in this line,

TransVOD/datasets/vid_multi.py

Line 82 in 5a44640

ref_img_ids = random.sample(sample_range, self.num_ref_frames)

reference samples are randomly selected ignoring consequent nature of frames.
remove current frame from batch? : in this line

TransVOD/datasets/vid_multi.py

Line 77 in 5a44640

if self.filter_key_img and img_id in sample_range:

the code removes current img_id from the sample range. In the paper, it states that adjacent frames are fed into the code. Am I missing something here?

Please help me with demo script

Can anyone explain how to create a demo script for TransVOD? I'm trying to use MMtracking but I'm having trouble making it. If anyone has managed to get it running and get a demo video up, I would be very grateful if anyone can help.

Weights for r101, and multistage

Dear All. I tested with r50 single-stage model, but I couldn't find the initial weights for other models such as r101, multi-stage from the deformable detr repository.
Can anyone help me find the weights?

Very low precision on custom dataset

Very nice work and thank you for sharing the code. However, when I used the multi-frame model on my own dataset, I got very low precision near zero, while the single-frame deformable DETR obtained 20 to 30 mAP, which is normal for my dataset. I followed the default settings in my implementation. I wonder what might be the reason and can you upload the training logs for reference?

Not using the TDTE module in your actually project?

TransVOD/models/deformable_transformer_multi.py

Line 39 in ef864f8

self.TDAM = False

Directions to train TransVOD on custom dataset

Please let me know how to train it on a custom dataset and the necessary structure of the dataset.

Very bad results with transVOD_multi but normal for transVOD_single

the single version of transVOD performs right and gets a 60% MAP on my own data but 0.2% MAP with the multi version of transVOD. Some problem must happen but I don't know where it is. I checked the input data and it should be Okey. Anyone meets this problem?

Link for ILSVRC2015 VID is dead

The links for ILSVRC2015 VID dataset is dead. I can find some archives on baidu netdisk, but it's painful to download them to my local PC and then scp them to my linux server. Would you mind uploading the VID dataset to google drive so that I can use gdown to pull this dataset directly inside my linux server?

Thank you in advance.

I want to train a model with my custom dataset, so I want to know what the dataset should look like？

Running videos for viewable object detection

I was wondering how we can run our own videos for object detection so that we can get video output with the bounding boxes and labels like shown in figure 9 of the connected paper? I saw something about how mmtracking has demo scripts, but I couldn't figure out how to use TransVOD similarly to get the results I need. Thank you.

$1 r50 $2 Chmod: cannot access' r50 ': there is no file or directory

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh $1 r50 $2 configs/r50_train_single.sh
where is r50，is it a pretrained weight file？
https://drive.google.com/drive/folders/1brcbFIUupa7mfDj9sbh9mvO4BsuUBFbU/ right?
I would be grateful if you could answer

Window size of reference frames

Hi! I have a question regarding the code. Why is the decision made to sample from all video frames when the number of reference is greater than 10? I cant seem to find it in the paper.

TransVOD/datasets/vid_multi.py

Lines 75 to 76 in 5a44640

    
           if self.num_ref_frames >= 10: 
        
               sample_range=img_ids

Custom dataset training

Thank you for the great work and making code public.

Suppose I want to train on my own video dataset, say X. (I can replace "ILSVRC2015 VID" with X). Then what dataset should I use in place of ILSVRC2015 DET?

Kindly let me know. Thank you.

Program crashing on evaluation

I have been trying to get this code-base working on my computer, however when I try evaluating the pre-trained model on the VID dataset, the program terminates after about an hour of testing.

I set num_workers=8 in the config scripts and ran the following commands:

GPUS_PER_NODE=1 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh

and

GPUS_PER_NODE=1 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_single.sh

and both caused my docker container to crash without an error log after running a few tests.

My two questions are:

Is this caused by a lack of hardware or something else? (see below)
If this issue is caused by a lack of hardware, how can I run the above commands more efficiently while still reproducing the results of the paper?
What is the difference between single and multi?

I am running with the following specs:

SYSTEM:
Ubuntu 22.04 LTS

HARDWARE:
AMD Ryzen 7 3700X 8-Core Processor
GeForce RTX 2060 SUPER
RAM - 16GB

DOCKER:
image - nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04

LIBRARIES:
- conda-forge::cudatoolkit=11.3.1
- numpy=1.21.2
- pillow=8.4.0
- pip=21.2.4
- python=3.9.7
- pytorch::pytorch=1.10.2=py3.9_cuda11.3_cudnn8.2.0_0
- scipy=1.7.3
- pytorch::torchvision=0.11.3=py39_cu113
- ffmpeg=4.2.2
- tqdm=4.62.3
- pycocotools=(latest)
- tqdm=(latest)
- cython=(latest)
- scipy=(latest)
- ninja=(latest)

Thanks

where is code?

about the command cd ./models/ops sh ./make.sh

Hi, I would like to ask if I can simply execute this command without modifying the content in the make.sh file.
And,do you have any requirements for cuda,?because I don't know why the setup.py of other code I has never been executed successfully.

cuda version

I tried to use the repo on the RTX3070, but that did not work because Cuda should be 10.1 and that can not be installed on my system.

I tried to reference conda with export LD _library to cuda toolkit but I do not know what the path is or if that is a good way?

I also tried to build on cuda 11, but I could not

Can someone give me some guidance?

when will the code be available~

Installation

when i run sh ./make.sh, errors as follows:

Custom dataset training

I would like to train a model with my custom dataset, so I wonder how the dataset should be; its format, and JSON files?

Will the code be available soon?

Thanks for this great study. I am so interested in Trans-VODs. Hope to see the code available in here.

custom data

Hi, I want to know if we can run this pipeline on the custom data.

Please release origin model and code for VOD and trained model for VOD++ and VOD Lite.

@qianyu666 @SJTU-LuHe Hi! Guys Please also make me as the developer and release the code ASAP.

RuntimeError: CUDA error: the launch timed out and was terminated

Hello author,thank you for your work.I would like to ask you a question about CUDA. When I try to evaluate your provided multi-frame model with 14 ref frames, using r50_eval_multi.sh, When I was doing the evaluation experiment, near the end there was an error is RuntimeError: CUDA error: the launch timed out and was terminated . I have 4 GPUs
the command: GPUS_PER_NODE=4 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh
The logs:
Test: [42690/44032] eta: 0:26:59 class_error: 0.00 loss: 1.0426 (1.1629) loss_bbox: 0.3274 (0.2952) loss_ce: 0.3063 (0.5222) loss_giou: 0.3216 (0.3455) cardinality_error_unscaled: 299.0000 (298.3894) class_error_unscaled: 0.0000 (18.9314) loss_bbox_unscaled: 0.0655 (0.0590) loss_ce_unscaled: 0.1531 (0.2611) loss_giou_unscaled: 0.1608 (0.1728) time: 1.2177 data: 0.0269 max mem: 2606
Traceback (most recent call last):
File "main.py", line 331, in
main(args)
File "main.py", line 280, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/wmt/anaconda3/envs/Trans/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/media/wmt/Data/exp/TransVOD/engine_multi.py", line 104, in evaluate
loss_dict = criterion(outputs, targets)
File "/home/wmt/anaconda3/envs/Trans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/wmt/Data/exp/TransVOD/models/deformable_detr_multi.py", line 355, in forward
num_boxes = torch.clamp(num_boxes / get_world_size(), min=1).item()
RuntimeError: CUDA error: the launch timed out and was terminated

Good job

The idea in your paper is wonderful and enlightening. The codes in this repository are brief, effective, and easy to follow. You are a rising star in the science world.

Two files in the file imagenet2coco_vid.py

How do I get the Lists/VID_train_15frames.txt and Lists/VID_val_videos.txt in the file imagenet2coco_vid.py

Can you provide demo script?

Thanks.

Some questions about the experiment in this paper

After reading your paper, I was deeply inspired.Your work has led to the successful application of Transformer on VOD.
However, there are three questions I want to ask:

What is the type and quantity of GPU used in the experiment in the paper?
How long does it take to train 10(or 12) epochs?
What is the inference speed(FPS) of TransVOD?
Thanks!

for instance segmentation

Hello, I want to know whether the method you proposed can be used for instance segmentation tasks.

CUDA out of memory for coco_evaluator

First of all thank you for your work. I wanted to ask if you know how can I solve this problem. When I try to evaluate your provided multi-frame model with 14 ref frames, using r50_eval_multi.sh, the evaluation crushes with CUDA out of memory error. I have to mention that when I trained with r50_train_multi it worked just fine, and when I perform evaluation with the single frame model using a single GPU it also works fine.

My setup is: 4 x TITAN Xp GPUs with 12196 MiB, in my opinion this should be enough for a validation... What is strange is that during the evaluation each gpu is at aroung 4000 MiB memory-usage, so it shouldn't be a problem...

The logs:

Test: Total time: 6:59:35 (0.5718 s / it)
Averaged stats: class_error: 37.50 loss: 1.6011 (1.0219) loss_bbox: 0.3632 (0.2952) loss_ce: 0.9131 (0.3930) loss_giou: 0.2270 (0.3336) cardinality_error_unscaled: 298.5000 (295.6008) class_error_unscaled: 50.0000 (14.1343) loss_bbox_unscaled: 0.0726 (0.0590) loss_ce_unscaled: 0.4565 (0.1965) loss_giou_unscaled: 0.1135 (0.1668)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 0; 11.91 GiB total capacity; 11.24 GiB already allocated; 40.62 MiB free; 11.32 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 3; 11.91 GiB total capacity; 11.27 GiB already allocated; 6.62 MiB free; 11.35 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 2; 11.91 GiB total capacity; 11.22 GiB already allocated; 54.62 MiB free; 11.30 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 1; 11.91 GiB total capacity; 11.23 GiB already allocated; 28.62 MiB free; 11.33 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 188, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['configs/r50_eval_multi.sh']' returned non-zero exit status 1.

Some questions about the lite version and ++ version

Dear authors:
Thanks for your great works but I have some questions about the lite version and ++ version in your paper.

1.With the ResNet-101 backbone, ++version can outperform lite version about 1.x ap@50, but why the situation changed when using Swin-B as backbone.

May I ask the training setting of Swin base version and the FPS of single frame baseline?
Why the lite version can be so fast but accuracy drops significantly when window size = 1 compared to single frame baseline?

I will be appreciate for your response.

	cur_pos_embed = lvl_pos_embed_flatten[0:1]
	ref_pos_embed_list = torch.chunk(lvl_pos_embed_flatten[1:], self.num_ref_frames, dim=0)
	ref_pos_embed = torch.cat(ref_pos_embed_list, 1)
	ref_memory = ref_memory + ref_pos_embed

sjtu-luhe / transvod Goto Github PK

transvod's People

Contributors

Stargazers

Watchers

Forkers

transvod's Issues

Recommend Projects

Recommend Topics

Recommend Org