szq0214 / dsod Goto Github PK
View Code? Open in Web Editor NEWDSOD: Learning Deeply Supervised Object Detectors from Scratch. In ICCV 2017.
License: Other
DSOD: Learning Deeply Supervised Object Detectors from Scratch. In ICCV 2017.
License: Other
Do some people implement other version, eg. mxnet?
Thanks
hi @szq0214,
will you release the training code?
thanks.
Hi,
thank you for your job.It is great.
Some question:
1,the pretrained model can not detect the small object .
2. is it great than the RON?
thank you
Hi,
I would like to learn from my own dataset composed of only gray level images. Could you tell me how I could adapt DSOD to work using only 1 channel. Thanks !!!
I am getting the following message when running train command
python examples/dsod/DSOD300_pascal.py
32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections
Hey guys!
is it also possible to do Video Detection with this model, like in the SSD which is implemented by Wei Liu?
Best Wishes
I use my own data,but it reports that check failed :mean_values_.size() == 1 || mean_values_.size() == img_channels Specify either 1 mean_value or as many as channels: 1
Could you help me ?
Thx for your sharing code.And I want to make a re-implementation of this net with other framwork.But the definition of pooling layer is different from which in caffe.
In caffe,I think the funtion of size of output is a ceil function as shown in most of your code.But in the final,I don't know why it become a floor function.
I mean that the process should be
300x300→150x150→75x75→38x38→19x19→10x10→5x5→3x3→2x2
But in your code,
model2 = add_bl_layer2(model1, 256, dropout, 1) # pooling4: 10x10
net.Third = model2
model3 = add_bl_layer2(model2, 128, dropout, 1) # pooling5: 5x5
net.Fourth = model3
model4 = add_bl_layer2(model3, 128, dropout, 1) # pooling6: 3x3
net.Fifth = model4
model5 = add_bl_layer2(model4, 128, dropout, 1) # pooling7: 1x1
I don't know why 3x3→1x1.Could you give me some suggestion?
Hi, @szq0214:
I only have two GTX 1080 GPUs. I want to reproduce you GRP-DSOD. When I change the batch_size and accum_batch_size to 6 and 30, the mAP is just 63%. What I should do to get the results as you paper?
Thanks.
@szq0214
what is the difference between DSOD300_pascal and DSOD300_pascal++?
the last pooling size is not inconsistent with paper.
in the paper:
the size of the pooling7 feature map = 1*1
in model_libs.py:
the size of the pooling7 feature map = 2*2
Also, DSOD prediction layers differ from the figure 1 in paper.
Hi, I was trying to fine tune a pre-trained model with my dataset and I need to change number of classes from 21 to 2. So I planed to modify the python script instead of making it in prototxt files. But I found the model python script created is about "28.6M", which is different from anyone offered in this repository. If I want to train a model with 2 classes, I should train it without pre-trained model?
Many thanks!
First of all, the idea of training from scratch is awesome. I have a question. Have you tried large input images (600 * 1000)?
Hi
Just wonder how did you produce
VOC0712Plus_test_lmdb
We could download images for official website but not VOC12 annotations.
How did you compute the VOC12 mAP outside the evaluation platform without annotation?
May be you can supported grp-dsod pretrained mdoles like dsod-300
I want to use TensorFlow to implement this DSOS, I try to write, but some mistakes. So are there any TensorFlow versions?
Thanks
I have download the DSOD_voc+coco model and modify the corresponding prototxt according to the video test in SSD project. While it works well in SSD project, the test failed when setting up the DSOD network, throwing the following error:
F0922 17:37:27.110465 13992 bbox_util.cpp:2197] Check failed: label < colors.size() (2 vs. 0)
*** Check failure stack trace: ***
@ 0x7f6b48c805cd google::LogMessage::Fail()
@ 0x7f6b48c82433 google::LogMessage::SendToLog()
@ 0x7f6b48c8015b google::LogMessage::Flush()
@ 0x7f6b48c82e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f6b493cb414 caffe::VisualizeBBox<>()
@ 0x7f6b49765844 caffe::DetectionOutputLayer<>::Forward_gpu()
@ 0x7f6b494689e1 caffe::Net<>::ForwardFromTo()
@ 0x7f6b49468ad7 caffe::Net<>::Forward()
@ 0x4199a3 test()
@ 0x415aa5 main
@ 0x7f6b47688830 __libc_start_main
@ 0x416679 _start
@ (nil) (unknown)
And here is the modified part in DSOD prototxt: (I mainly modify the input layer and detection output layer according to the SSD settings)
The input layer is:
layer {
name: "data"
type: "VideoData"
top: "data"
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
resize_param {
prob: 1
resize_mode: WARP
height: 300
width: 300
interp_mode: LINEAR
}
}
data_param {
batch_size: 1
}
video_data_param {
video_type: VIDEO
video_file: "examples/videos/ILSVRC2015_train_00755001.mp4"
skip_frames: 1
}
}
And the detection layer is:
layer {
name: "detection_out"
type: "DetectionOutput"
bottom: "mbox_loc"
bottom: "mbox_conf_flatten"
bottom: "mbox_priorbox"
bottom: "data"
top: "detection_out"
include {
phase: TEST
}
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
resize_param {
prob: 1
resize_mode: WARP
height: 576
width: 1024
interp_mode: LINEAR
}
}
detection_output_param {
num_classes: 21
share_location: true
background_label_id: 0
nms_param {
nms_threshold: 0.449999988079
top_k: 400
}
save_output_param {
output_directory: "data/VOC0712/dsod_labelmap_voc.prototxt"
}
code_type: CENTER_SIZE
keep_top_k: 200
confidence_threshold: 0.00999999977648
visualize: true
visualize_threshold: 0.3
}
}
interestingly, when I close the visualize process by setting visualize: false, the network could work well but I can't tell if the result is right without visualize video. I wonder if anyone met the same problem like this and how do you deal with it?
I can train with only 6 batch size on my single TITAN X (Pascal) without "out of memory". So what is your trick to overcome the GPU memory constraints in the paper?
Thank you~~
Hi,
Recently, I tried to train DSOD512 version which follows origin-SSD512 settings except for backbone dsod.
But, the accuracy was not good as dsod300.
Have you tried to train dsod512?
Thanks :)
Hi,
I want to know how to measure the inference time?
Did you use caffe time operator ? or Did you measure full time when VOC 4952 test images are tested ?
Thanks in advance :)
Could you provide a Dockerfile? Does anyone have a Dockerfile?
When I‘m training a DSOD model on VOC 07+12 by python examples/dsod/DSOD300_pascal.py
,I encounter
Traceback (most recent call last):
File “examples/dsod/DSOD300_pascal.py”, line 380, in
DSOD300_V3_Body(net, from_layer=‘data’)
NameError: name ‘DSOD300_V3_Body’ is not defined
What should I do to deal with it? Thank you~
I am getting the following message when running train command
python examples/dsod/DSOD300_pascal.py
I0312 18:13:06.186707 31109 layer_factory.hpp:77] Creating layer data
I0312 18:13:06.186813 31109 net.cpp:100] Creating Layer data
I0312 18:13:06.186830 31109 net.cpp:408] data -> data
I0312 18:13:06.186846 31109 net.cpp:408] data -> label
F0312 18:13:06.189031 31210 db_lmdb.hpp:15] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***
@ 0x7f8d26f785cd google::LogMessage::Fail()
@ 0x7f8d26f7a433 google::LogMessage::SendToLog()
@ 0x7f8d26f7815b google::LogMessage::Flush()
@ 0x7f8d26f7ae1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f8d2784b770 caffe::db::LMDB::Open()
@ 0x7f8d2768a396 caffe::DataReader<>::Body::InternalThreadEntry()
@ 0x7f8d2767c465 caffe::InternalThread::entry()
@ 0x7f8d1cc4f5d5 (unknown)
@ 0x7f8d159ee6ba start_thread
@ 0x7f8d25fcf3dd clone
@ (nil) (unknown)
Aborted (core dumped)
Any help would be appreciated. All steps before running the training command were successful.
@szq0214 @liuzhuang13 When I'm trying test the video,the DSOD300 Occupy GPU memory is 1500M ,just like SSD_300_ResNet101 . I have tried the official optimized version of the densenet.,GPU memory footprint is not particularly serious,Is this a problem with version optimization?
Hi,
I tried to train DSOD300 using this file on 8 TITAN Xp GPU.
But the result is 76.94% which is lower than the reported result(77.7%).
I couldn't find out why this problem occurs.
Could anyone face this problem?
How long is your training time based on one TitanX GPU or 8 GPUs?
SSD300S† 07+12 ✗ VGGNet Plain 46 26.3M 300 ×300 69.6
SSD300S† 07+12 ✗ VGGNet Dense 37 26.0M 300 ×300 70.4
in the table 4 of your paper, Dense-ssd seems to be no advantage with VGG-ssd. similar precision but slower
Thx for your sharing code of grp-dsod.I read the code and I find that result after relu function isn't used in this part.
def global_level(net, from_layer, relu_name):
fc = L.InnerProduct(net[relu_name], num_output=1)
sigmoid = L.Sigmoid(fc, in_place=True)
att_name = "{}_att".format(from_layer)
sigmoid = L.Reshape(sigmoid, reshape_param=dict(shape=dict(dim=[-1])))
scale = L.Scale(net[att_name], sigmoid, axis=0, bias_term=False, bias_filler=dict(value=0))
relu = L.ReLU(scale, in_place=True)
residual = L.Eltwise(net[from_layer], scale)
gatt_name = "{}_gate".format(from_layer)
net[gatt_name] = residual
return net
relu = L.ReLU(scale, in_place=True)
Is it a mistake?Or,is it discarded?
Hi, @szq0214. Sorry for bothering you again. Can you tell me what I should change to test on VOC2012, the default is 2007.
Hello,
First of all I want to say thank you for releasing the code.
Can you please tell me if I can train with my own custom dataset?
Because it is not clear to me.
Thank you
Or do you mean when training from scratch , performance is much worse than fine-tuning ?
And tks for sharing the code.
I want to know how to prepare voc12 test lmdb to run training on the voc07++12 dataset. Anyone can help me? thanks a lot.
I copy to my all test files to a path , and i want to batch test this image files,and to get the annotations of test images files ?Could you tell me a method?
It seems your model graph is inconsistent with the paper (Table1 Output Size) for the Transition w/o Pooling Layer (1+2)
In the paper:
Transition w/o Pooling Layer (1) channel = 1120
Transition w/o Pooling Layer (2) channel = 1568
In the model graph:
Convolution49 num output = 1184
Convolution66 num output = 256
Also, I don't quite understand of the purpose of Transition w/o Pooling Layer (1), you don't actually compress nor expand its filter number (num input = num output), and you don't branch it out for prediction. By removing it (Convolution49 + BN 50 + ReLU50) you would have a compact Dense Block (3+4) with 8 x 2 = 16 dense layers. So what's the reason to explicitly inject such extra (BN+ReLU+1x1Conv) block in between?
When I run DSOD300_pascal.py, there is an error: No module named model_libs? So, what can I do?
@szq0214
Hi!
I trained the DSOD by using my datasets,but time is 10 times the time of the SSD.
Why DSOD is so slow?
Hi everyone,
How long does DSOD take when I have 20 000 images for 1 class
My GPU is Quadro P4000, Computational Capacity = 6.1
Any python script to run detection test with one input image using one of your pretrained models?
Thanks,
When I run python DSOD300_pascal.py
, I get many information like: I0809 19:00:06.018213 8332 detection_output_layer.cu:113] Couldn't find any detections.
What should I do?
Hi, with your changes to the SSD model, the last layer has 2x2 spatial size, not 1x1 anymore. This stems from the fact that the last 3×3×128 conv layer has padding 1 and also the parallel pooling branch, having kernel size 2, will output a 2x2 feature, instead of 1x1. You can double check this by reading Caffe's code of conv_layer.cpp:
const int output_dim = (input_dim + 2 * pad_data[i] - kernel_extent) / stride_data[i] + 1;
output_dim = (3 + 2 * 1 - 3) / 2 + 1 = 2 / 2 + 1 = 1 + 1 = 2
Also, Caffe's output reflects this:
I1023 13:46:00.587738 43 net.cpp:100] Creating Layer Sixth
I1023 13:46:00.587746 43 net.cpp:434] Sixth <- Convolution77
I1023 13:46:00.587751 43 net.cpp:434] Sixth <- Convolution79
I1023 13:46:00.587757 43 net.cpp:408] Sixth -> Sixth
I1023 13:46:00.587786 43 net.cpp:150] Setting up Sixth
I1023 13:46:00.587792 43 net.cpp:157] Top shape: 2 256 2 2 (2048)
Given this, I think the step size in the Sixth_norm_mbox_priorbox should be 150 (= 300/2) instead of 300 (=300/1).
EDIT: I should also point out that I have made NO modification whatsoever to the source code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.