sjtu-luhe / transvod Goto Github PK
View Code? Open in Web Editor NEWThe repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
License: Apache License 2.0
The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
License: Apache License 2.0
Hi,
Thanks for the open source contribution of your work. Are you planning to release code or models of TrasnVOD Lite/++ anytime soon?
Since the results of TransVOD Lite are more compelling, releasing the source code of this model specifically would be of enormous help.
Cheers :)
Hello, Thank you for your nice work about "TransVOD"!
I have a question here: "single train" only trains the first half of the network, after learning the output head after STD, the fixed weight begins to train the full network, so why not train the output head and the temporal network together? because of Slow convergence?
Waiting for your reply!
Hello, when I do evaluation, run the following code,
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh
and all the results are -1:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] =-1.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] =-1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] =-1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Code hints please run evaluate() first,but I don't know what to do... Can you give me some advice?
TransVOD/models/deformable_transformer_multi.py
Lines 226 to 229 in ef864f8
I can run the code when num_feature_levels = 1.
When num_feature_levels = 4, here is the error (ref_frame_num = 10):
File "deformable_transformer_multi.py", line 231, in forward
ref_spatial_shapes = spatial_shapes.expand(BS,self.num_ref_frames, 2).contiguous()
RuntimeError: The expanded size of the tensor (10) must match the existing size (4) at non-singleton dimension 1. Target sizes: [1, 10, 2]. Tensor sizes: [4, 2]
Hi, thanks for the amazing work! I have some questions regarding your code:
consequent frames? : In the paper, it states that consequent frames are used as the input of TransVOD. However, in this line,
TransVOD/datasets/vid_multi.py
Line 82 in 5a44640
remove current frame from batch? : in this line
TransVOD/datasets/vid_multi.py
Line 77 in 5a44640
Can anyone explain how to create a demo script for TransVOD? I'm trying to use MMtracking but I'm having trouble making it. If anyone has managed to get it running and get a demo video up, I would be very grateful if anyone can help.
Dear All. I tested with r50 single-stage model, but I couldn't find the initial weights for other models such as r101, multi-stage from the deformable detr repository.
Can anyone help me find the weights?
Very nice work and thank you for sharing the code. However, when I used the multi-frame model on my own dataset, I got very low precision near zero, while the single-frame deformable DETR obtained 20 to 30 mAP, which is normal for my dataset. I followed the default settings in my implementation. I wonder what might be the reason and can you upload the training logs for reference?
Please let me know how to train it on a custom dataset and the necessary structure of the dataset.
the single version of transVOD performs right and gets a 60% MAP on my own data but 0.2% MAP with the multi version of transVOD. Some problem must happen but I don't know where it is. I checked the input data and it should be Okey. Anyone meets this problem?
The links for ILSVRC2015 VID
dataset is dead. I can find some archives on baidu netdisk, but it's painful to download them to my local PC and then scp
them to my linux server. Would you mind uploading the VID
dataset to google drive so that I can use gdown
to pull this dataset directly inside my linux server?
Thank you in advance.
I was wondering how we can run our own videos for object detection so that we can get video output with the bounding boxes and labels like shown in figure 9 of the connected paper? I saw something about how mmtracking has demo scripts, but I couldn't figure out how to use TransVOD similarly to get the results I need. Thank you.
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh $1 r50 $2 configs/r50_train_single.sh
where is r50,is it a pretrained weight file?
https://drive.google.com/drive/folders/1brcbFIUupa7mfDj9sbh9mvO4BsuUBFbU/ right?
I would be grateful if you could answer
Hi! I have a question regarding the code. Why is the decision made to sample from all video frames when the number of reference is greater than 10? I cant seem to find it in the paper.
TransVOD/datasets/vid_multi.py
Lines 75 to 76 in 5a44640
Thank you for the great work and making code public.
Suppose I want to train on my own video dataset, say X. (I can replace "ILSVRC2015 VID" with X). Then what dataset should I use in place of ILSVRC2015 DET?
Kindly let me know. Thank you.
I have been trying to get this code-base working on my computer, however when I try evaluating the pre-trained model on the VID dataset, the program terminates after about an hour of testing.
I set num_workers=8 in the config scripts and ran the following commands:
GPUS_PER_NODE=1 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh
and
GPUS_PER_NODE=1 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_single.sh
and both caused my docker container to crash without an error log after running a few tests.
My two questions are:
I am running with the following specs:
SYSTEM:
Ubuntu 22.04 LTS
HARDWARE:
AMD Ryzen 7 3700X 8-Core Processor
GeForce RTX 2060 SUPER
RAM - 16GB
DOCKER:
image - nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04
LIBRARIES:
- conda-forge::cudatoolkit=11.3.1
- numpy=1.21.2
- pillow=8.4.0
- pip=21.2.4
- python=3.9.7
- pytorch::pytorch=1.10.2=py3.9_cuda11.3_cudnn8.2.0_0
- scipy=1.7.3
- pytorch::torchvision=0.11.3=py39_cu113
- ffmpeg=4.2.2
- tqdm=4.62.3
- pycocotools=(latest)
- tqdm=(latest)
- cython=(latest)
- scipy=(latest)
- ninja=(latest)
Thanks
Hi, I would like to ask if I can simply execute this command without modifying the content in the make.sh file.
And,do you have any requirements for cuda,?because I don't know why the setup.py of other code I has never been executed successfully.
I tried to use the repo on the RTX3070, but that did not work because Cuda should be 10.1 and that can not be installed on my system.
I tried to reference conda with export LD _library to cuda toolkit but I do not know what the path is or if that is a good way?
I also tried to build on cuda 11, but I could not
Can someone give me some guidance?
I would like to train a model with my custom dataset, so I wonder how the dataset should be; its format, and JSON files?
Thanks for this great study. I am so interested in Trans-VODs. Hope to see the code available in here.
Hi, I want to know if we can run this pipeline on the custom data.
@qianyu666 @SJTU-LuHe Hi! Guys Please also make me as the developer and release the code ASAP.
Hello author,thank you for your work.I would like to ask you a question about CUDA. When I try to evaluate your provided multi-frame model with 14 ref frames, using r50_eval_multi.sh, When I was doing the evaluation experiment, near the end there was an error is RuntimeError: CUDA error: the launch timed out and was terminated . I have 4 GPUs
the command: GPUS_PER_NODE=4 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh
The logs:
Test: [42690/44032] eta: 0:26:59 class_error: 0.00 loss: 1.0426 (1.1629) loss_bbox: 0.3274 (0.2952) loss_ce: 0.3063 (0.5222) loss_giou: 0.3216 (0.3455) cardinality_error_unscaled: 299.0000 (298.3894) class_error_unscaled: 0.0000 (18.9314) loss_bbox_unscaled: 0.0655 (0.0590) loss_ce_unscaled: 0.1531 (0.2611) loss_giou_unscaled: 0.1608 (0.1728) time: 1.2177 data: 0.0269 max mem: 2606
Traceback (most recent call last):
File "main.py", line 331, in
main(args)
File "main.py", line 280, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/wmt/anaconda3/envs/Trans/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/media/wmt/Data/exp/TransVOD/engine_multi.py", line 104, in evaluate
loss_dict = criterion(outputs, targets)
File "/home/wmt/anaconda3/envs/Trans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/wmt/Data/exp/TransVOD/models/deformable_detr_multi.py", line 355, in forward
num_boxes = torch.clamp(num_boxes / get_world_size(), min=1).item()
RuntimeError: CUDA error: the launch timed out and was terminated
The idea in your paper is wonderful and enlightening. The codes in this repository are brief, effective, and easy to follow. You are a rising star in the science world.
How do I get the Lists/VID_train_15frames.txt and Lists/VID_val_videos.txt in the file imagenet2coco_vid.py
Thanks.
After reading your paper, I was deeply inspired.Your work has led to the successful application of Transformer on VOD.
However, there are three questions I want to ask:
Hello, I want to know whether the method you proposed can be used for instance segmentation tasks.
First of all thank you for your work. I wanted to ask if you know how can I solve this problem. When I try to evaluate your provided multi-frame model with 14 ref frames, using r50_eval_multi.sh, the evaluation crushes with CUDA out of memory error. I have to mention that when I trained with r50_train_multi it worked just fine, and when I perform evaluation with the single frame model using a single GPU it also works fine.
My setup is: 4 x TITAN Xp GPUs with 12196 MiB, in my opinion this should be enough for a validation... What is strange is that during the evaluation each gpu is at aroung 4000 MiB memory-usage, so it shouldn't be a problem...
The logs:
Test: Total time: 6:59:35 (0.5718 s / it)
Averaged stats: class_error: 37.50 loss: 1.6011 (1.0219) loss_bbox: 0.3632 (0.2952) loss_ce: 0.9131 (0.3930) loss_giou: 0.2270 (0.3336) cardinality_error_unscaled: 298.5000 (295.6008) class_error_unscaled: 50.0000 (14.1343) loss_bbox_unscaled: 0.0726 (0.0590) loss_ce_unscaled: 0.4565 (0.1965) loss_giou_unscaled: 0.1135 (0.1668)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 0; 11.91 GiB total capacity; 11.24 GiB already allocated; 40.62 MiB free; 11.32 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 3; 11.91 GiB total capacity; 11.27 GiB already allocated; 6.62 MiB free; 11.35 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 2; 11.91 GiB total capacity; 11.22 GiB already allocated; 54.62 MiB free; 11.30 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 1; 11.91 GiB total capacity; 11.23 GiB already allocated; 28.62 MiB free; 11.33 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 188, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['configs/r50_eval_multi.sh']' returned non-zero exit status 1.
Dear authors:
Thanks for your great works but I have some questions about the lite version and ++ version in your paper.
1.With the ResNet-101 backbone, ++version can outperform lite version about 1.x ap@50, but why the situation changed when using Swin-B as backbone.
May I ask the training setting of Swin base version and the FPS of single frame baseline?
Why the lite version can be so fast but accuracy drops significantly when window size = 1 compared to single frame baseline?
I will be appreciate for your response.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.