Code Monkey home page Code Monkey logo

transvod's People

Contributors

lxtgh avatar qianyuzqy avatar sjtu-luhe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transvod's Issues

Code/model for TransVOD++ and TransVOD Lite

Hi,

Thanks for the open source contribution of your work. Are you planning to release code or models of TrasnVOD Lite/++ anytime soon?
Since the results of TransVOD Lite are more compelling, releasing the source code of this model specifically would be of enormous help.

Cheers :)

single train & multi train

Hello, Thank you for your nice work about "TransVOD"!

I have a question here: "single train" only trains the first half of the network, after learning the output head after STD, the fixed weight begins to train the full network, so why not train the output head and the temporal network together? because of Slow convergence?

Waiting for your reply! 

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] =-1.000

Hello, when I do evaluation, run the following code,
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh
and all the results are -1:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] =-1.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] =-1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] =-1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

Code hints please run evaluate() first,but I don't know what to do... Can you give me some advice?

Is there a bug when num_feature_levels = 4?

I can run the code when num_feature_levels = 1.

When num_feature_levels = 4, here is the error (ref_frame_num = 10):

File "deformable_transformer_multi.py", line 231, in forward
ref_spatial_shapes = spatial_shapes.expand(BS,self.num_ref_frames, 2).contiguous()
RuntimeError: The expanded size of the tensor (10) must match the existing size (4) at non-singleton dimension 1. Target sizes: [1, 10, 2]. Tensor sizes: [4, 2]

Installatioin

英文说不明白,就中文了,望作者谅解,我在复现过程中发现一下问题:
Installation的最后一步:sh ./make.sh 后显示第5、7、8、16行都“未找到命令”是什么情况?
问题截图

random sampling for reference images?

Hi, thanks for the amazing work! I have some questions regarding your code:

  1. consequent frames? : In the paper, it states that consequent frames are used as the input of TransVOD. However, in this line,

    ref_img_ids = random.sample(sample_range, self.num_ref_frames)

    reference samples are randomly selected ignoring consequent nature of frames.

  2. remove current frame from batch? : in this line

    if self.filter_key_img and img_id in sample_range:
    the code removes current img_id from the sample range. In the paper, it states that adjacent frames are fed into the code. Am I missing something here?

Please help me with demo script

Can anyone explain how to create a demo script for TransVOD? I'm trying to use MMtracking but I'm having trouble making it. If anyone has managed to get it running and get a demo video up, I would be very grateful if anyone can help.

Weights for r101, and multistage

Dear All. I tested with r50 single-stage model, but I couldn't find the initial weights for other models such as r101, multi-stage from the deformable detr repository.
Can anyone help me find the weights?

Very low precision on custom dataset

Very nice work and thank you for sharing the code. However, when I used the multi-frame model on my own dataset, I got very low precision near zero, while the single-frame deformable DETR obtained 20 to 30 mAP, which is normal for my dataset. I followed the default settings in my implementation. I wonder what might be the reason and can you upload the training logs for reference?

Link for ILSVRC2015 VID is dead

The links for ILSVRC2015 VID dataset is dead. I can find some archives on baidu netdisk, but it's painful to download them to my local PC and then scp them to my linux server. Would you mind uploading the VID dataset to google drive so that I can use gdown to pull this dataset directly inside my linux server?

Thank you in advance.

Running videos for viewable object detection

I was wondering how we can run our own videos for object detection so that we can get video output with the bounding boxes and labels like shown in figure 9 of the connected paper? I saw something about how mmtracking has demo scripts, but I couldn't figure out how to use TransVOD similarly to get the results I need. Thank you.

Custom dataset training

Thank you for the great work and making code public.

Suppose I want to train on my own video dataset, say X. (I can replace "ILSVRC2015 VID" with X). Then what dataset should I use in place of ILSVRC2015 DET?

Kindly let me know. Thank you.

Program crashing on evaluation

I have been trying to get this code-base working on my computer, however when I try evaluating the pre-trained model on the VID dataset, the program terminates after about an hour of testing.

I set num_workers=8 in the config scripts and ran the following commands:

GPUS_PER_NODE=1 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh

and

GPUS_PER_NODE=1 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_single.sh

and both caused my docker container to crash without an error log after running a few tests.

My two questions are:

  • Is this caused by a lack of hardware or something else? (see below)
  • If this issue is caused by a lack of hardware, how can I run the above commands more efficiently while still reproducing the results of the paper?
  • What is the difference between single and multi?

I am running with the following specs:

SYSTEM:
Ubuntu 22.04 LTS

HARDWARE:
AMD Ryzen 7 3700X 8-Core Processor
GeForce RTX 2060 SUPER
RAM - 16GB

DOCKER:
image - nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04

LIBRARIES:
- conda-forge::cudatoolkit=11.3.1
- numpy=1.21.2
- pillow=8.4.0
- pip=21.2.4
- python=3.9.7
- pytorch::pytorch=1.10.2=py3.9_cuda11.3_cudnn8.2.0_0
- scipy=1.7.3
- pytorch::torchvision=0.11.3=py39_cu113
- ffmpeg=4.2.2
- tqdm=4.62.3
- pycocotools=(latest)
- tqdm=(latest)
- cython=(latest)
- scipy=(latest)
- ninja=(latest)

Thanks

about the command cd ./models/ops sh ./make.sh

Hi, I would like to ask if I can simply execute this command without modifying the content in the make.sh file.
And,do you have any requirements for cuda,?because I don't know why the setup.py of other code I has never been executed successfully.

cuda version

I tried to use the repo on the RTX3070, but that did not work because Cuda should be 10.1 and that can not be installed on my system.

I tried to reference conda with export LD _library to cuda toolkit but I do not know what the path is or if that is a good way?

I also tried to build on cuda 11, but I could not

Can someone give me some guidance?

Custom dataset training

I would like to train a model with my custom dataset, so I wonder how the dataset should be; its format, and JSON files?

custom data

Hi, I want to know if we can run this pipeline on the custom data.

RuntimeError: CUDA error: the launch timed out and was terminated

Hello author,thank you for your work.I would like to ask you a question about CUDA. When I try to evaluate your provided multi-frame model with 14 ref frames, using r50_eval_multi.sh, When I was doing the evaluation experiment, near the end there was an error is RuntimeError: CUDA error: the launch timed out and was terminated . I have 4 GPUs
the command: GPUS_PER_NODE=4 ./tools/run_dist_launch.sh $1 eval_r50 $2 configs/r50_eval_multi.sh
The logs:
Test: [42690/44032] eta: 0:26:59 class_error: 0.00 loss: 1.0426 (1.1629) loss_bbox: 0.3274 (0.2952) loss_ce: 0.3063 (0.5222) loss_giou: 0.3216 (0.3455) cardinality_error_unscaled: 299.0000 (298.3894) class_error_unscaled: 0.0000 (18.9314) loss_bbox_unscaled: 0.0655 (0.0590) loss_ce_unscaled: 0.1531 (0.2611) loss_giou_unscaled: 0.1608 (0.1728) time: 1.2177 data: 0.0269 max mem: 2606
Traceback (most recent call last):
File "main.py", line 331, in
main(args)
File "main.py", line 280, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/wmt/anaconda3/envs/Trans/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/media/wmt/Data/exp/TransVOD/engine_multi.py", line 104, in evaluate
loss_dict = criterion(outputs, targets)
File "/home/wmt/anaconda3/envs/Trans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/wmt/Data/exp/TransVOD/models/deformable_detr_multi.py", line 355, in forward
num_boxes = torch.clamp(num_boxes / get_world_size(), min=1).item()
RuntimeError: CUDA error: the launch timed out and was terminated

Good job

The idea in your paper is wonderful and enlightening. The codes in this repository are brief, effective, and easy to follow. You are a rising star in the science world.

Some questions about the experiment in this paper

After reading your paper, I was deeply inspired.Your work has led to the successful application of Transformer on VOD.
However, there are three questions I want to ask:

  1. What is the type and quantity of GPU used in the experiment in the paper?
  2. How long does it take to train 10(or 12) epochs?
  3. What is the inference speed(FPS) of TransVOD?
    Thanks!

for instance segmentation

Hello, I want to know whether the method you proposed can be used for instance segmentation tasks.

CUDA out of memory for coco_evaluator

First of all thank you for your work. I wanted to ask if you know how can I solve this problem. When I try to evaluate your provided multi-frame model with 14 ref frames, using r50_eval_multi.sh, the evaluation crushes with CUDA out of memory error. I have to mention that when I trained with r50_train_multi it worked just fine, and when I perform evaluation with the single frame model using a single GPU it also works fine.

My setup is: 4 x TITAN Xp GPUs with 12196 MiB, in my opinion this should be enough for a validation... What is strange is that during the evaluation each gpu is at aroung 4000 MiB memory-usage, so it shouldn't be a problem...

The logs:

Test: Total time: 6:59:35 (0.5718 s / it)
Averaged stats: class_error: 37.50 loss: 1.6011 (1.0219) loss_bbox: 0.3632 (0.2952) loss_ce: 0.9131 (0.3930) loss_giou: 0.2270 (0.3336) cardinality_error_unscaled: 298.5000 (295.6008) class_error_unscaled: 50.0000 (14.1343) loss_bbox_unscaled: 0.0726 (0.0590) loss_ce_unscaled: 0.4565 (0.1965) loss_giou_unscaled: 0.1135 (0.1668)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 0; 11.91 GiB total capacity; 11.24 GiB already allocated; 40.62 MiB free; 11.32 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 3; 11.91 GiB total capacity; 11.27 GiB already allocated; 6.62 MiB free; 11.35 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 2; 11.91 GiB total capacity; 11.22 GiB already allocated; 54.62 MiB free; 11.30 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "main.py", line 355, in
main(args)
File "main.py", line 288, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/alandrei/miniforge-pypy3/envs/py369/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/data3/alandrei/Temporal_OD/TransVOD/engine_multi.py", line 141, in evaluate
coco_evaluator.synchronize_between_processes()
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 66, in synchronize_between_processes
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 201, in create_common_coco_eval
img_ids, eval_imgs = merge(img_ids, eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/datasets/coco_eval.py", line 180, in merge
all_eval_imgs = all_gather(eval_imgs)
File "/data3/alandrei/Temporal_OD/TransVOD/util/misc.py", line 153, in all_gather
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
RuntimeError: CUDA out of memory. Tried to allocate 2.72 GiB (GPU 1; 11.91 GiB total capacity; 11.23 GiB already allocated; 28.62 MiB free; 11.33 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 188, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['configs/r50_eval_multi.sh']' returned non-zero exit status 1.

Some questions about the lite version and ++ version

Dear authors:
Thanks for your great works but I have some questions about the lite version and ++ version in your paper.

1.With the ResNet-101 backbone, ++version can outperform lite version about 1.x ap@50, but why the situation changed when using Swin-B as backbone.

  1. May I ask the training setting of Swin base version and the FPS of single frame baseline?

  2. Why the lite version can be so fast but accuracy drops significantly when window size = 1 compared to single frame baseline?

I will be appreciate for your response.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.