Comments (18)
Exactly. This is one of the reasons. I implemented a few fixes and training is still going on.
They include:
- Fix for burn in (leads to better "initialization" in the beginning). It was skipped due to an off by one error...
- Linear interpolation for LR decay
- Decay LR based on optimizer steps and not the number of batches, as they differ due to gradiant accumulation, this leads as you suggested to a slower decay
- Usage of SGD with nesterov instead of Adam (brings a surprising benefit with some hyperparameters)
- Higher initial learning rate (0.01 for sgd)
- Multiplication of loss by the mini batch size to account for split gradients
I will create a PR soon, but I am currently on vacation and my training is still running.
from pytorch-yolov3.
I can confirm current settings work well for COCO, using pretrained darknet weights. On COCO test this checkpoint gets mAP 0.52318
from pytorch-yolov3.
Did you use the imagenet pretrained backbone weights (weights/darknet53.conv.74
)?
Training totally from random initialization is not feasible on coco in 24 epochs. Even if you use the pretrained backbone, more than 50 epochs are needed. Ideal are a couple hundred epochs (default value is 300).
from pytorch-yolov3.
Did you use the imagenet pretrained backbone weights (
weights/darknet53.conv.74
)?Training totally from random initialization is not feasible on coco in 24 epochs. Even if you use the pretrained backbone, more than 50 epochs are needed. Ideal are a couple hundred epochs (default value is 300).
Thanks for your quick reply! I forgot to load the imagenet pretrained backbone weights before, but now I have already loaded it and trained the model for 60 epochs and the mAP is 0.03232. Is this a normal value? Can you give me some suggestions? Thank you very much!
from pytorch-yolov3.
I also started a training after you opened the issue and I have also a mAP of ~3 at the same epoch. I would train it for a couple hundred more and maybe try to tune the hyperparameters a bit. I don't train coco from scratch that often with this repo. I mostly train with in house datasets and get mAPs in the high 90s for these datasets, but the default hyperparameters should work for coco, so I will check that.
from pytorch-yolov3.
I also started a training after you opened the issue and I have also a mAP of ~3 at the same epoch. I would train it for a couple hundred more and maybe try to tune the hyperparameters a bit. I don't train coco from scratch that often with this repo. I mostly train with in house datasets and get mAPs in the high 90s for these datasets, but the default hyperparameters should work for coco, so I will check that.
Thank you for your attention to this issue. I have now trained on the basis of imagenet pretrained backbone for 114 epochs, but the mAP is only 0.03480. I feel that even after 300 training epochs, the performance will not be good.
from pytorch-yolov3.
That is strange. You could try to deactivate the data augmentation and see what happens.
from pytorch-yolov3.
I am currently testing a hyperparameter set that achieves 11.6 mAP at epoch 3. I'll keep you updated.
from pytorch-yolov3.
I am currently testing a hyperparameter set that achieves 11.6 mAP at epoch 3. I'll keep you updated.
I train the code with default hyperparameter using my dataset, and it works well. So I think the default hyperparameter is not suitable for coco dataset. What dataset did you test with, Is it coco? Thanks for you sharing.
from pytorch-yolov3.
I am currently trying to find better hyperparameters for coco and already have a few promising sets. Sadly training on coco takes quite some time. Even it you are running 4 nodes with different hyperparameters progress is slow :/ I'll keep you updated.
from pytorch-yolov3.
I am currently trying to find better hyperparameters for coco and already have a few promising sets. Sadly training on coco takes quite some time. Even it you are running 4 nodes with different hyperparameters progress is slow :/ I'll keep you updated.
I think the reason for the poor performance on the coco data set is that the learning rate decays so fast that the learning rate is too small after 50 epochs. In your code, the lr multiplied by 0.1 when epoch greater than 50 and multiplied by 0.01 when epoch greater than 56.
from pytorch-yolov3.
Sorry for the bad phone screenshot, my laptop broke during my vacation...
Green is SGD with higher LR etc. and blue is the beginning of the training with fixed burn in, and longer, interpolated LR decay.
from pytorch-yolov3.
Exactly. This is one of the reasons. I implemented a few fixes and training is still going on. They include:
- Fix for burn in (leads to better "initialization" in the beginning). It was skipped due to an off by one error...
- Linear interpolation for LR decay
- Decay LR based on optimizer steps and not the number of batches, as they differ due to gradiant accumulation, this leads as you suggested to a slower decay
- Usage of SGD with nesterov instead of Adam (brings a surprising benefit with some hyperparameters)
- Higher initial learning rate (0.01 for sgd)
- Multiplication of loss by the mini batch size to account for split gradients
I will create a PR soon, but I am currently on vacation and my training is still running.
May I ask if you are using the Adam optimizer in. cfg without making any changes?
from pytorch-yolov3.
Adam is the default at the moment afaik
from pytorch-yolov3.
Adam is the default at the moment afaik
wow can you show me the modification strategy you mentioned above in the yolov3.cfg?I see you mentioned changing adam to sgd maybe work better
from pytorch-yolov3.
Adam is the default at the moment afaik
I found that there are only four data enhancement operations that rotate and change the saturation of the image and those are the first ones in the cfg file what else is there and I found that there seems to be only one strategy for adjusting the learning rate which is multiplying by 0.1 according to the steps
from pytorch-yolov3.
Adam is the default at the moment afaik
I'm sorry I'm asking a little bit too many questions but I really want to know that it might be useful for me to train large data sets
from pytorch-yolov3.
Setting the learning rate to 1e-3 and cancelling the lr_decrease helps in my case.
from pytorch-yolov3.
Related Issues (20)
- Replace own `Mish` implementation by torch's HOT 5
- Crash after processing JSON to RDF HOT 2
- Need help with getting bboxes.
- how to set custom path for labels? please help me HOT 6
- how to adjust learning rates or something to be better when using a pretrained model to train my dataset?please give me some advice HOT 5
- i'm using yolov3.weights to test data/sample/, i found there are some wrong bboxes which are different from this projects HOT 8
- Unable to fully load data classes of classes.names HOT 1
- Questions regarding the evaluation model in train.py and test.py HOT 1
- PyTorch-YOLOv3 Tracking
- is woring in 3090,lower fps! HOT 2
- for Linux?or for windows? HOT 6
- mAP nearly zero? HOT 3
- Testing Suite
- 关于pytorchyolo1.8.0和GPU不兼容的问题 HOT 1
- 怎么把pytorchyolo卸载干净呀? HOT 2
- how can i calculate the params and flops of yolov3-tiny model 如何计算 yolov3-tiny 模型的参数和计算量?
- Would support torchv2.2.2?
- tensorboard没有显示的结果? HOT 1
- How to compute FPS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-yolov3.