Hi, I'm training YOLO over VOC 2007 & 2012. While I want to get the curve of I

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

You can try to set width=608 and <code class="notrans

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Greater IoU than Recall? about darknet HOT 13 OPEN

alexeyab commented on July 23, 2024

Greater IoU than Recall?

from darknet.

Comments (13)

PatricLee commented on July 23, 2024 3

Hi @iraadit , sorry for the late reply.
I tried training a 960x320 network on my dataset the other day and it worked fine. It took fewer iterations to train (or at least it made me feel this way) and it has slightly higher accuracy than the 416x416 network I trained earlier, probably because 960x320 resolution is larger than 416x416.

But if you are in the same scenario as I am, where all the data have the same aspect ratio, then maybe Alexey is right, it makes no point that you train a non-square network that has the same aspect ratio as the data, instead of a square one.

from darknet.

AlexeyAB commented on July 23, 2024 2

BTW, I've also noticed that learning rate would be 10 times larger after 100 iterations, for example when I set learning rate to be 0.0001 like in the example, it automatically changes to 0.001 after 100 iterations and the network diverges. So I had to set learning rate to 0.00001 so that learning rate would be 0.0001 and the network worked just fine. Is it programmed this way?

"the network worked just fine" - It depends on the number of classes and the number of images. For PascalVOC seems optimal values in the yolo-voc.cfg

How it is programmed - see paragraph 5: #30 (comment)

If learning_rate = 0.0001, policy=steps, steps=100,25000,35000 and scales=10,.1,.1 then actual learning_rate will be:

[0 - 100] iterations learning_rate will be 0.0001
[100 - 25000] iterations learning_rate will be 0.001
[25000 - 35000] iterations learning_rate will be 0.0001
[35000 - ...] iterations learning_rate will be 0.00001

from darknet.

PatricLee commented on July 23, 2024

BTW, I've also noticed that learning rate would be 10 times larger after 100 iterations, for example when I set learning rate to be 0.0001 like in the example, it automatically changes to 0.001 after 100 iterations and the network diverges. So I had to set learning rate to 0.00001 so that learning rate would be 0.0001 and the network worked just fine. Is it programmed this way?

from darknet.

AlexeyAB commented on July 23, 2024

Yes, strictly speaking, the Recall should always be greater than (or equal to) IoU. But Yolo calculates average of the best IoUs instead of average IoU. And calculates True Positives instead of Recall.
That's why I advise you to pay attention to IoU (best IoU closer to IoU, than True Positive to Recall): https://github.com/AlexeyAB/darknet#when-should-i-stop-training

https://en.wikipedia.org/wiki/Precision_and_recall

Yolo calculates average of the best IoUs instead of average IoU. And calculates True Positives instead of Recall.

IoU - Yolo calculates the best IoUs for each image:
and get average: avg_iou*100/total
Recall - Yolo calculates True positives
and get average: 100.*correct/total

darknet/src/detector.c

Line 432 in b3a3e92

    
           fprintf(stderr, "%5d %5d %5d\tRPs/Img: %.2f\tIOU: %.2f%%\tRecall:%.2f%%\n", i, correct, total, (float)proposals/(i+1), avg_iou*100/total, 100.*correct/total);

from darknet.

PatricLee commented on July 23, 2024

Well that's why my Recall curve looked so much like the true positive curve.

Thank you for you reply though, and your amazing work.

I've finished the training on VOC dataset, validated the network on VOC testing set and compared my result to yolo-voc.weights I downloaded. I noticed that although I'm getting about as many true positives and as much average IoU as the downloaded network, my network has noticeably more RPs/Img (about 160 vs 75), so there I have some questions:

Does this mean that my RPN part has not yet converged and require further training?
Will this (more region proposals per image) cause performance issue, like more time spent when detecting objects?

from darknet.

AlexeyAB commented on July 23, 2024

my network has noticeably more RPs/Img (about 160 vs 75), so there I have some questions:

Does this mean that my RPN part has not yet converged and require further training?

Hard to say. But also it may be some effect of the Bug on Windows that I corrected just that: 4422399

Will this (more region proposals per image) cause performance issue, like more time spent when detecting objects?

No, this should not significantly affect performance.

from darknet.

PatricLee commented on July 23, 2024

Thanks for the correction, Alexey, it seems to work... I couldn't tell for now though

One last question. Since I'm currently working on autonomous driving, my camera has a really wild angle and a weird aspect ratio of about 3:1, so,
-Is it possible to modify the input of the network so that the network also has a aspect ratio of 3:1 (say inputs would be 600x200)? And if it is, where do I have to modify except 'height' and 'width' in the .cfg file?
-Will this lead to a performance improvement (or greater IoU, to be more specifically) in my scenario, compared to network with 1:1 aspect ratio, like 416x416?

For now I'm getting an average IoU of about 65% on my data set, and that's not so good when it detects object for autonomous driving. I wonder if I could improve this somehow.

Again, thank you for your amazing work and amazing answers.

from darknet.

AlexeyAB commented on July 23, 2024

You can try to set width=608 and height=224

darknet/cfg/yolo-voc.cfg

Line 4 in 4e9798d

height=416

It must always be a multiple of 32, such as 608x224, not 600x200
I didn't test non-square resolution, so I can't said will there any bugs or undefined behavior.

I used Yolo to detection wide image (stitched 8 cameras) with wide-angle ~200, but I divide it to many 416x416 square images and run Yolo for each square-image on separate 4 GPU.

I think if your training-dataset has the same aspect ration 3:1 such as detection-dataset, then you should use square resolution 416x416.

To increase IoU:

You can train Yolo with flag random=1

darknet/cfg/yolo-voc.cfg

Line 244 in 4e9798d

random=0
You can train Yolo with multiplied steps at number_of_classes/20, for example if you use 6 classes then steps=100,7500,10000

darknet/cfg/yolo-voc.cfg

Line 17 in 4e9798d

steps=100,25000,35000
For detection (not for training) you can use larger resolution, for example 832x832 by using weights-file trained on 416x416 resolution.
(Or if you trained on resolution 608x224 then you can change resolution to 1216x448 after training).
Also may be, for detection (not for training) you should rescale ahcnors from 16:9 to 3:1, i.e. divide each second value by 1.7, it should be anchors = 1.08,0.71, 3.42,2.59, 6.63,6.69, 9.42,3.00, 16.62,6.19:

darknet/cfg/yolo-voc.cfg

Line 228 in 4e9798d

anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52

from darknet.

PatricLee commented on July 23, 2024

Thank you so much for your answer, I will try them out.

from darknet.

iraadit commented on July 23, 2024

Hi @PatricLee
Have you tried to train with the non-square size? Was it working?

from darknet.

MyVanitar commented on July 23, 2024

Also may be, for detection (not for training) you should rescale ahcnors from 16:9 to 3:1, i.e. divide each second value by 1.7, it should be anchors = 1.08,0.71, 3.42,2.59, 6.63,6.69, 9.42,3.00, 16.62,6.19:

Why we should not train the model with new calculated anchors?

I think if your training-dataset has the same aspect ration 3:1 such as detection-dataset, then you should use square resolution 416x416.

How can we calculate this when each image has its own width and height?
if you think it is a good idea, we can pad (add black area around image) and make them all have the same size (for example 960 * 960) and then start to annotate them.

from darknet.

Brandy24 commented on July 23, 2024

i am training 5 classes on cpu(intel core i7-5500 2.4Ghz) 8gb ram. how many pictures should i train per classes to have good result? and how long will it take to finish?

from darknet.

stephanecharette commented on July 23, 2024

i am training 5 classes on cpu(intel core i7-5500 2.4Ghz) 8gb ram. how many pictures should i train per classes to have good result? and how long will it take to finish?

Not sure why you chose this closed issue to post your question. But I would argue that you cannot possibly train a 5-class network on a CPU. It would take weeks if not months to train. Get yourself a decent GPU, or rent one from Amazon AWS, Linode, Google, Azure, etc...

See this recent post I made about a 2-class network. It took 4 hours to train a network with a GPU, but it would have taken 16 days on my 16-core 3.2 GHz CPU: https://www.ccoderun.ca/programming/2020-01-04_neural_network_training/

from darknet.

Greater IoU than Recall? about darknet HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent