Comments (16)
As a result:
-
There is no simple way to train Yolo on resolution larger than 416x416, but you can use
.weights
-file trained on 416x416, to detect on larger resolution:1.1. Yolo automatically resizes any images to resolution 416x416, but it makes it impossible to detect small objects
1.2. If you want to detect small object on images with 832x832 or 1088x1088, then simply use
.weights
-file trained on 416x416, and change lines in your.cfg
-file:Line 4 in 76dbdae
From:
subdivisions=8
height=416
width=416
To:
subdivisions=64
height=1088
width=1088
In details:
- Yolo v2 has not parameter
side
, but has parameternum=5
(number of anchors)Line 232 in 76dbdae
- and output of network:
filters = (classes + coords + 1)*num
Line 224 in 76dbdae
- In new Yolo v2 the number of
sides
is determined by a resolution network, the number of layers and itsstrides/steps
. - YOLO’s convolutional layers downsample the image by a factor of
32
so by using an input image of416
we get an output feature map of13 × 13
. https://arxiv.org/pdf/1612.08242v1.pdf
And sides will automatically increased from 13 to 34. (changes of the subdivision is only necessary to reduce consumption of GPU-RAM)
On image below:
- left cfg-file (
classes=6, num=5, filters=55, subdivision=8, width=416, height=416
) - as you see output layer (13x13x55
) - right cfg-file (
classes=6, num=5, filters=55, subdivision=64, width=1088, height=1088
) - as you see output layer (34x34x55
)
- If you want to increase precision by training with higher resolution, then you can train Yolo with dynamic resolution
320x320 - 608x608
by set flagrandom=1
:Line 244 in 76dbdae
This increase mAP +1%: https://arxiv.org/pdf/1612.08242v1.pdf
from darknet.
Simply change these lines in your .cfg
file: https://groups.google.com/d/msg/darknet/MumMJ2D8H9Y/UBeJOa-eCwAJ
from
[net]
batch=64
subdivisions=8
height=416
width=416
to
[net]
batch=64
subdivisions=16
height=544
width=544
If out of memory, then set subdivisions=64
.
And then train by using this .cfg
-file.
Also you can train for 416x416 but use with 544x544 or more, for example, 832x832.
- Trained 416x416, and detection 416x416:
- Trained 416x416, and detection 832x832:
from darknet.
If you still want to train Yolo at high resolution 1088x1088, then you can try this, but it does not provide many guarantees of success:
-
change this line:
Line 83 in 76dbdae
- from:
int dim = (rand() % 10 + 10) * 32;
- to:
int dim = args.w;
- from:
-
set dynamic resolution flag
random=1
in your.cfg
-file:Line 244 in 76dbdae
-
change
subdivisions
,height
andwidth
lines in your.cfg
-file:Line 4 in 76dbdae
From:
subdivisions=8
height=416
width=416
To:
subdivisions=64
height=1088
width=1088
- train Yolo as usually
You should get .cfg
-file look like this, if you use 6 classes(objects) and resolution 1088x1088: http://pastebin.com/GY9NPfmc
from darknet.
I have successfully trained YOLO on 544 * 544, the trick is that training images should be bigger than this size. it sacrifices the speed although, as YOLO authors mentioned on the FPS/mAP curve.
from darknet.
What needs to change in order to use a pretrained weight with 416x416 to detect with 544x544 or higher?
Thanks,
from darknet.
@kaishijeng To use a pretrained weights with 416x416 to detect with 832x832 - we need changes of the same type in your custom .cfg
-file or in default yolo.cfg
/yolo-voc.cfg
:
from
[net]
batch=64
subdivisions=8
height=416
width=416
to
[net]
batch=64
subdivisions=64
height=832
width=832
from darknet.
from darknet.
@kaishijeng Higher resolution requires more GPU-memory.
If you get error "out of memory" then you should decrease batch
-value or increase subdivisions
-value. At once processed batch
/subdivisions
images.
64/8
requres more than 4 GB GPU-RAM for 832x832 resolution when used cuDNN, and you should use 64/64
.
from darknet.
@AlexeyAB Thanks!!
I'am starting to trian with 832,but at start training ,the log has few nan value,
log:
Region Avg IOU: 0.287866, Class: 0.028420, Obj: 0.548111, No Obj: 0.513275, Avg Recall: 0.100000, count: 20
Region Avg IOU: 0.361796, Class: 0.022684, Obj: 0.525597, No Obj: 0.514241, Avg Recall: 0.333333, count: 9
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.513698, Avg Recall: -nan, count: 0
Region Avg IOU: 0.243378, Class: 0.014114, Obj: 0.460395, No Obj: 0.513718, Avg Recall: 0.000000, count: 6
Region Avg IOU: 0.509384, Class: 0.016810, Obj: 0.515260, No Obj: 0.513929, Avg Recall: 0.500000, count: 2
Can I ignore this nan ,and continue train ?
thanks again!
from darknet.
@matakk If only in some of the lines occurs nan, I think this is normal. Try to detect each type of objects (at least each object once) after 2000-4000 iterations, and if all ok, then you can continue train.
Also note, that you should use version not earlier than 10 Jan 2017, where was fixed bug: b831db5
About nan. Nan occurs here, if count == 0
:
Line 320 in 2fc5f6d
This may be because:
- if every of 30 generated box-truth is wrong
if(!truth.x) break;
:Line 262 in 2fc5f6d
- or if l.batch == 0 here
for (b = 0; b < l.batch; ++b)
, but batch is equal 64 from.cfg
-file:Line 187 in 2fc5f6d
from darknet.
@AlexeyAB
Please have a look at #30
@matakk
Have you trained on 544 * 544?
from darknet.
I tried to detect a relatively small object with 416*416
trained network and 640*480
video input, but the network can not detect it from far.
Could the reason is that because I have not included images that shows the object from far in the training/validation data-set?
from darknet.
@VanitarNordic You should not change aspect ratio, use for detection network size 608x608.
Object in training-dataset should have the same relative size in %, as in detection-dataset.
from darknet.
No, I trained the model with 416*416
resolution
The input live video resolution is 640*480
for testing
detection-dataset you mean validation images which used in training process or you mean unseen images when we decide to test the model?
from darknet.
Detection-dataset is images or video on which you want to detect objects. Did you change network size?
What is average relative size of object was:
- in training-dataset?
- in detection-dataset?
Could the reason is that because I have not included images that shows the object from far in the training/validation data-set?
Yes.
from darknet.
Okay, I got it. Thanks.
Yes I changed the network size to 416*416
to make the speed test.
Yes, in the training data-set the object sizes are normal and are not from far, that's correct. in detection-dataset sometimes I was putting the object far from camera and it was unable to detect. I think (as you mentioned correctly) if I want to detect the object by its all scales and conditions, I should add training/validation images which cover these conditions.
from darknet.
Related Issues (20)
- How to set conf threshold for Darknet YOLO AlexeyAB
- Can Darknet train a two-layer LSTM on R8 dataset?
- License for linked resources (i.e. weights file) HOT 1
- gl headers before including cuda_gl_interop.h HOT 1
- How can I visualize the loss/iteration changes of two different trainings I conducted with YOLOv4 algorithm using input sizes of 256 and 320 in a single graph?
- Free Layer has a Memory Leak
- Does the SAM-Mish activation function have a formula?
- YOLOv9 is better than any convolution or transformer based object detectors HOT 3
- I compiled the darknet. But when train, it has a lot beep sound. Can i turn it off? HOT 3
- how to convert pt to weights in yolov7-tiny?
- hello,in box.c function dx_box_iou,why add twice grad when Iw<=0||Ih<=0 HOT 1
- l.iou_loss is alaways null in yolo_layer.c ,functon process_batch HOT 1
- Cannot download default weights file for tiny-yolo-voc. HOT 2
- make darknet directory HOT 1
- YOLOv7 minimum in training loss
- Dockerfile contains typos & references removed image
- Cuda stream support
- How do the parameters "ignore_thresh" and "iou_thresh" contribute to the training of a yolo detector model ?
- Models trained in Linux run poorly on Windows and vice-versa. HOT 7
- mAP is 0% and the chart goes up and down... HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from darknet.