idea-research / dab-detr Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
License: Apache License 2.0
[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
License: Apache License 2.0
Hi there - I'm trying to finetune on my own dataset (2 classes) and I'd like to know what params from the trained model I should remove besides these ones:
del checkpoint['model']['class_embed.0.weight']
del checkpoint['model']['class_embed.0.bias']
del checkpoint['model']['class_embed.1.weight']
del checkpoint['model']['class_embed.1.bias']
del checkpoint['model']['class_embed.2.weight']
del checkpoint['model']['class_embed.2.bias']
del checkpoint['model']['class_embed.3.weight']
del checkpoint['model']['class_embed.3.bias']
del checkpoint['model']['class_embed.4.weight']
del checkpoint['model']['class_embed.4.bias']
del checkpoint['model']['class_embed.5.weight']
del checkpoint['model']['class_embed.5.bias']
There's something else I need to change as I'm receiving the following error:
: RuntimeErrorThe size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0
: The size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: The size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0
Any hint? Thanks!
Thank you very much for your work, how can I visualize the selfattention part? For example the outputs of encoder or the inputs of crossattention?
I use 8-node V100 and the environment is below:
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch
torch.version
'1.9.0+cu102'
Error info: cuda out of memory
test.py
When I setup deformable multi head attention, report the error as below:
nvcc fatal : Unsupported gpu architecture 'compute_86'
Traceback (most recent call last):
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
env=env)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "setup.py", line 72, in
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 565, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1404, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Hello, thanks for your great work!
When I finish training and get log.txt, I want to visualize it using plot_logs, as follows:
But I get an ERROR on this line:
Traceback (most recent call last):
File "H:/yjs/code/DAB-DETR-main/tmp.py", line 53, in <module>
fig, axs = plot_logs(log_path)
File "H:\yjs\code\DAB-DETR-main\util\plot_utils.py", line 65, in plot_logs
df.interpolate().ewm(com=ewm_col).mean().plot(
File "H:\Anaconda\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "H:\Anaconda\lib\site-packages\pandas\core\frame.py", line 10712, in interpolate
return super().interpolate(
File "H:\Anaconda\lib\site-packages\pandas\core\generic.py", line 6899, in interpolate
new_data = obj._mgr.interpolate(
File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 377, in interpolate
return self.apply("interpolate", **kwargs)
File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply
applied = getattr(b, f)(**kwargs)
File "H:\Anaconda\lib\site-packages\pandas\core\internals\blocks.py", line 1369, in interpolate
new_values = values.fillna(value=fill_value, method=method, limit=limit)
File "H:\Anaconda\lib\site-packages\pandas\core\arrays\_mixins.py", line 218, in fillna
value, method = validate_fillna_kwargs(
File "H:\Anaconda\lib\site-packages\pandas\util\_validators.py", line 372, in validate_fillna_kwargs
method = clean_fill_method(method)
File "H:\Anaconda\lib\site-packages\pandas\core\missing.py", line 120, in clean_fill_method
raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear
My version of pandas is 1.3.5
I don't know if I am using it in a wrong way or it is a bug in pandas, how can I fix it?
Hello, I noticed that in https://github.com/IDEA-opensource/DAB-DETR/blob/94f20eda2be7cf38ddf94102db92e949b6846a65/models/DAB_DETR/transformer.py#L333 , there is a parameter rm_self_attn_decoder
that is set to False
by default.
Have you ever set it to True? What's the performance?
Can I use your project to train on my own VOC dataset?
Thanks for your great work. I have some questions about the implementation of DAB-Deformable DETR.
sinehw
, while in DAB-Deformable-DETR it uses the original sine
. Is there any reason for this difference?dim_feedforward=2048
. How does it performance with 1024?I have some doubts in understanding the modulated HW attentions part of DAB-DETR. In line 238 of transformer.py, query_sine_embed has only intercepted the embedding of x and y. Shouldn't the 243 and 244 lines of modulat HW attentions be done for the embedding of w and h? Why do you still operate on the embedding of x and y? It's a bit confusing.
The model predicts bbox offsets twice,
in DAB-DETR/blob/main/models/DAB_DETR/transformer.py, Line 255-265,
in DAB-DETR/blob/main/models/DAB_DETR/DABDETR.py, Line 171-184.
The difference is that the input of the second predict is normed.
The second predict seems unnecessary.
Am I right?
model/DAB_DETR/DABDETR.py line 85 should be
"bbox_embed_diff_each_layer: dont share weights of prediction heads. Default for False.(shared weights.)"
rather than the origin "Default for True". So it consistent with line 72.
Am I right?
Thanks for your work!
It seems that the 'look forward twice' strategy mentioned in DINO has been implemented in DAB DETR, even in Deformable DETR.
Because I notice that the iterative regression operation is implemented in both dab_deformable_deter.py and deformable_transformer.py.
Is there anything wrong with my understanding?
Looking forward to your reply!
i've tried to train dab-deformable-detr with multi-gpus in one ubuntu server by using 'torch.nn.Parallel', but a runtime error raised which is "Expected tensor for augment #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1(while checking arguments for cudnn_convolution)". Is this a bug? Or there is any other way to train dab-deformable-detr with multi-gpus?
COCO only has 80 classes in object detection dataset ,why are 91 categories predicted in the code?
Hi, thanks for you nice work!
But I have a confuse about the code.What is the purpose of the minus of max attention weight?
Looking for your reply!
I am having some trouble understanding the role of refpoint_embed under DABDeformableDETR module. Particularly what are the 4 values represent. Do they represent x,y,w,h or correspond to the 4 levels of the input feature maps? Because from line 391 in models/dab_deformanle_detr/deformable_transformer.py(link) the four values seem to be multiplied with valid ratios from four levels and broadcast along the xywh dimension. Also, when they are inputted into deformable attention, the order of dimension indicate that the 4 values correspond to 4 levels. On the other hand, when random_refpoints_xy is used, the first two values seem to represent xy instead? It's a bit confusing.
I have some doubts on line https://github.com/IDEA-opensource/DAB-DETR/blob/main/models/DAB_DETR/transformer.py#L242 .
refHW_cond = self.ref_anchor_head(output).sigmoid() # nq, bs, 2
This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).
So, I am wondering whether the model can learn width and height as expected?
Thanks for your great work. I notice a difference between dab-detr and conditional detr where there is a MLP defined as 'self.query_scale' before each transformer encoder layer. Does this operation have a description in the paper or other reference paper to explain its effect?
Thanks for your great works! I am curious of the code release time. Can you please provide an approximate time? Thanks.
Hi, first thanks for your great work to improve DETR-like detectors!
I have a question on the cat() operation when we want to get query pos embedding using func gen_sineembed_for_position()
in file models/DAB_DETR/transformer.py
line 51 and 61. Why here use
pos = torch.cat((pos_y, pos_x)...)
should x
come first than y
?
Is it possible that multiple labels of a query are topks during post-processing? How to solve this situation?
Thank you for help!
Dear Shilong,
Thanks for the amazing work again! In your codes, it supports swim-transformer as the backbone, but I could not find its ImageNet pre-trained weights. Could you please kindly provide support?
Thank you in advance!
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/DAB_DETR/transformer.py#L446
In this place, the last dim of tgt2 is 2*256, while the tgt's last dim is 256, so it will cause a mismatch error?
In the paper, there is a section saying the optimal temperature for positional embedding is 20 in your model. However, this line under gen_sineembed_for_position indicates that a value of 10000 is used for the temperature. Is there any part I missed when I am trying to understand the codes?
Besides, the paper also says that only x and y coordinates are used to generate positional embedding for the cross-attention, but this line, despite commenting as num_queries x batch_size x 2, actually operates on num_queries x batch_size x 4 if printing out the tensor shape. Does this perform better than only using x&y or they are similar in performance?
Instead of 42.2, I got 43.1 AP on the default setup with your pre-trained weights. That's a big surprise! :) May I know the reason?
when try to compile CUDA operator I get this errors:
C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(97): error: identifier "__floorf" is undefined in device code
C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(98): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear ") is not allowed
C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(98): error: identifier "__floorf" is undefined in device code
C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(172): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear_gm ") is not allowed
C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(172): error: identifier "__floorf" is undefined in device code
C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(173): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear_gm ") is not allowed
C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(173): error: identifier "__floorf" is undefined in device code
12 errors detected in the compilation of "C:/Users/Ali/AppData/Local/Temp/tmpxft_00002e2c_00000000-7_ms_deform_attn_cuda.cpp1.ii".
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe' failed with exit code 1
Any suggestions......
Thank you for your excellent work! After I read the code about DAB-detr Decoder, I have some question about the reference points.
In the code of Decoder, "reference_points = new_reference_points.detach()". I am confused about why should use the detach operator. In my understanding, if using detach operator, the gradient won't backpropagated to the reference embedding, the gradient is cut-off.
Looking forward to your reply! Thank you!.
Hi! thanks for your excellent work, I'm wondering how to evaluate the flops of DAB-DETR model?
I can't directly use the DETR script which will get AssertionError in jit_handles.py.
facebookresearch/detr#110
Could you pls share your python script?
Hi~ Thanks for your excellent work! I'm confused about an operation about attention weight calculation.
In the implementation of the attention, there is a small modification, which i have not found in the paper.
The code is:
# previous choise of conditional detr and nn.MultiheadAttention
attn_output_weights = softmax(attn_output_weights, dim=-1)
# DAB-DETR modified this line:
attn_output_weights = softmax(attn_output_weights - attn_output_weights.max(dim=-1, keepdim=True)[0], dim=-1)
Whether or not this procedure refers to some previous studies, which i have not been read.
Will doing this improve the performance?
Hello,
Thanks for the great work.
I wonder that the architecture changes such as two stage structure of DAB-Deformable-DETR (mixed query selection) which is implemented on DINO-DETR paperwork, will be also released in this repo or only in DINO-DETR page?
Additional question is that will the DINO-DETR github repo will be based upon this repo, or will there be some drastic changes? If we start from this repo, can it be easy to move the changes of DINO training and model modifications into this repo when DINO page is released?
Thank you so much for your responses in advance.
I'm following the installation guide. When running test.py on step 4, I got RuntimeError: CUDA out of memory
. Is it okay to proceed (using smaller batch on training or inference), or will it have any effect on the performance?
* True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
* True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07
* True check_gradient_numerical(D=30)
* True check_gradient_numerical(D=32)
* True check_gradient_numerical(D=64)
* True check_gradient_numerical(D=71)
* True check_gradient_numerical(D=1025)
Traceback (most recent call last):
File "/home/azureuser/WilliamJustin/DAB-DETR/models/dab_deformable_detr/ops/test.py", line 86, in <module>
check_gradient_numerical(channels, True, True, True)
File "/home/azureuser/WilliamJustin/DAB-DETR/models/dab_deformable_detr/ops/test.py", line 76, in check_gradient_numerical
gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step))
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1400, in gradcheck
return _gradcheck_helper(**args)
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1414, in _gradcheck_helper
_gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1061, in _gradcheck_real_imag
gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1097, in _slow_gradcheck
numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 146, in _get_numerical_jacobian
jacobians += [get_numerical_jacobian_wrt_specific_input(fn, inp_idx, inputs, outputs, eps,
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 290, in get_numerical_jacobian_wrt_specific_input
return _combine_jacobian_cols(jacobian_cols, outputs, input, input.numel())
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 230, in _combine_jacobian_cols
jacobians = _allocate_jacobians_with_outputs(outputs, numel, dtype=input.dtype if input.dtype.is_complex else None)
File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 45, in _allocate_jacobians_with_outputs
out.append(t.new_zeros((numel_input, t.numel()), **options))
RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 15.75 GiB total capacity; 7.50 GiB already allocated; 7.30 GiB free; 7.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The original code is :
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L390
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L391
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L392
I think the code "reference_points[: , : , None]" should be "reference_points[ : , None , : ]". Because the last dmension of "torch.cat([src_valid_ratios, src_valid_ratios], -1)[:, None]" has a length of 4 and means [w_ratio,h_ratio,w_ratio,h_ratio] for [x,y,w,h],so the last dimension of "reference_points" should be [x,y,w,h].
However, in the case of "reference_points[: , : , None]" , the last dimension will be [x,x,x,x] or [y,y,y,y] 、 [w,w,w,w]、[h,h,h,h] after broadcast. Actually,in this case , the last dimension of "reference_points_input" has a length of 4, but reference_points_input[...,0] = reference_points_input[...,2],reference_points_input[...,1] = reference_points_input[...,3]. That means the result is [xw_ratio,xh_ratio,xw_ratio,xh_ratio].
But what we want to get should be [x * w_ratio,y * h_ratio,w * w_ratio,h * h_ratio].
Is there something wrong with my understanding?
First of all, thank you very much for the great work.
My question is..
NotImplementError is raised when executing "self.detr.input_proj(src)" in code in DETRsegm class.
"self.detr.transformer(src_proj, mask, self.detr.query_embed.weight, pos[-1])"
In the code above, self.detr.query_embed is a class variable that is only initialized when use_dab=False.
I'm just curious if this is what you meant by implementation.
Can you please convert your model into the ONNX model? I want to test it on tensor rt for inferencing.
I am trying to convert it to the ONNX model but getting the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0, and cpu! (when checking argument for argument index in method wrapper__index_select)
Thanks for the great contribution to the literature.
I have a quick question. Is there a specific reason for not using pattern embedding in dab-deformable-detr? Is pattern embedding not applicable to deformable structure? Maybe it might not increase the performance of dab-detr but I havent seen any ablation study related with that.
Thanks in advance
Thanks for your great work.
After testing the code, how can I print the image results with bounding box?
Is there any command or code?
Can't two stage and use DAB be set to true at the same time?
if self.two_stage: ...
elif self.use_dab: ...
Thank you for your excellent work,
What I want to know is have you used the two-stage strategy when training DAB-Deformable-DETR? For DAB-Deformable-DETR, does it give a performance boost?
when you visualize Figure7. Positional attention maps with different temperatures, why don't your figure have period ?
Since sinusoidal embedding has sin/cos func, does should it have some repeated shapes?
Thanks for your great and detailed code!
In dab_deformable_detr.py
, is this part about dealing with the insufficient channel of features (for Deformable DETR)?
It seems that the last channel of features is put to backbone[1]
repeatedly. (channel? or something else?)
Could you please help me understand that?
It seems that 'engine.py' doesn't conclude the test function.
Hi, I'd like to reproduce DAB-DETR, and I have two questions about some technique details of DAB-DETR.
i. How do you initialize the learnable anchor boxes (results in Table 2)? Why not using the results in Table 8 (random initialization and fixing them in the first decoder layer) as default setting?
ii. I am confused about modulated positional attention in Section 4.4. Is it an improvement on "conditional cross attention" in Conditional DETR (split cross attention into two parts, content and spatial dot-products)? Does the proposed modulated positional attention add referenced w into spatial dot-products?
Thanks for your great work.
However, after reading the code, I'm confused about the setting of dropout.
The dropout rates used in the encoder and decoder layers of DAB-DETR and DAB-Deformable-DETR are set at 0.0 as default.
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/main.py#L79-L80
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/models/DAB_DETR/transformer.py#L459-L462
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/models/dab_deformable_detr/deformable_transformer.py#L449-L456
DETR and Deformable-DETR use 0.1 as default, does this means training DAB-DETR or DAB-Deformable-DETR without using dropout can get better performance?
Can the inference test script be released?
Further, release scripts that output json format.
thanks.
batch_size = 1,
epochs = 50,
lr_drop = 40,
modelname = 'dab_detr',
num_workers = 6,
10% of the entire coco2017
So far, I've trained 20 epochs. But the results of AP and AR in every epoch are the same, and the results are as follows:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003
And the average statistics for the 20th epoch are:
class_error: 88.89 loss: 16.2750 (16.8940) loss_ce: 0.8517 (0.9079) loss_bbox: 0.7610 (0.8277) loss_giou: 1.1219 (1.0805) loss_ce_0: 0.8508 (0.9079) loss_bbox_0: 0.7550 (0.8284) loss_giou_0: 1.1278 (1.0790) loss_ce_1: 0.8519 (0.9078) loss_bbox_1: 0.7502 (0.8281) loss_giou_1: 1.1264 (1.0795) loss_ce_2: 0.8520 (0.9078) loss_bbox_2: 0.7614 (0.8283) loss_giou_2: 1.1250 (1.0794) loss_ce_3: 0.8514 (0.9080) loss_bbox_3: 0.7613 (0.8279) loss_giou_3: 1.1236 (1.0799) loss_ce_4: 0.8517 (0.9079) loss_bbox_4: 0.7611 (0.8278) loss_giou_4: 1.1225 (1.0802) loss_ce_unscaled: 0.8517 (0.9079) class_error_unscaled: 80.0000 (77.4109) loss_bbox_unscaled: 0.1522 (0.1655) loss_giou_unscaled: 0.5610 (0.5403) loss_xy_unscaled: 0.0560 (0.0604) loss_hw_unscaled: 0.0998 (0.1051) cardinality_error_unscaled: 293.0000 (293.0660) loss_ce_0_unscaled: 0.8508 (0.9079) loss_bbox_0_unscaled: 0.1510 (0.1657) loss_giou_0_unscaled: 0.5639 (0.5395) loss_xy_0_unscaled: 0.0561 (0.0605) loss_hw_0_unscaled: 0.1034 (0.1052) cardinality_error_0_unscaled: 293.0000 (293.0660) loss_ce_1_unscaled: 0.8519 (0.9078) loss_bbox_1_unscaled: 0.1500 (0.1656) loss_giou_1_unscaled: 0.5632 (0.5397) loss_xy_1_unscaled: 0.0560 (0.0605) loss_hw_1_unscaled: 0.1029 (0.1051) cardinality_error_1_unscaled: 293.0000 (293.0660) loss_ce_2_unscaled: 0.8520 (0.9078) loss_bbox_2_unscaled: 0.1523 (0.1657) loss_giou_2_unscaled: 0.5625 (0.5397) loss_xy_2_unscaled: 0.0560 (0.0605) loss_hw_2_unscaled: 0.1022 (0.1052) cardinality_error_2_unscaled: 293.0000 (293.0660) loss_ce_3_unscaled: 0.8514 (0.9080) loss_bbox_3_unscaled: 0.1523 (0.1656) loss_giou_3_unscaled: 0.5618 (0.5400) loss_xy_3_unscaled: 0.0560 (0.0604) loss_hw_3_unscaled: 0.1014 (0.1052) cardinality_error_3_unscaled: 293.0000 (293.0660) loss_ce_4_unscaled: 0.8517 (0.9079) loss_bbox_4_unscaled: 0.1522 (0.1656) loss_giou_4_unscaled: 0.5612 (0.5401) loss_xy_4_unscaled: 0.0560 (0.0604) loss_hw_4_unscaled: 0.1006 (0.1052) cardinality_error_4_unscaled: 293.0000 (293.0660)
So what could be the problem? Dataset? Training parameters? Or something else?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.