Comments (9)
@ildoonet
Thank you for your reply.
I do understand your concerns but I don't agree that mentioning the best performance is cheating. As I said, the best model surely can be treated to represent the performance of the method. The difference between the best and the last model is coming from the step decaying learning rate. In our case of using cosine learning rate on CIFAR100, the best and last model is almost the same (within +- 0.1% acc).
All the experiments we re-implemented are conducted in the same experiment setting, the best model is selected for every other method, so there is no cheating and fair-comparison issues.
Our best model's performance is not instantly peaked high value because we conducted several times and report the mean of the best performances.
from cutmix-pytorch.
@ildoonet
For your clarification and further discussion, So I re-open this issue.
The baseline has a similar top-1 accuracy as your paper said (16.45), but with cutmix(p=0.5), the result is somewhat poor compared to the reported value(14.23).
I newly ran our code on CIFAR100 three times, and we got
at 300 epoch | best acc | |
---|---|---|
try1 | 14.78 | 14.23 |
try2 | 15.44 | 14.5 |
try3 | 15.00 | 14.68 |
average | 15.07 | 14.47 |
Also, for ImageNet-1K, we got
at 300 epoch | best | |
---|---|---|
try1 | 21.20 | 21.19 |
try2 | 21.61 | 21.61 |
try3 | 21.40 | 21.40 |
average | 21.403 | 21.400 |
Interestingly, we got the best performance near the last epoch of the training.
I wonder if it is right to use the best validation accuracy. As you can see, the converged model's accuracy is slightly lower than the best one and it is hard to be sure that the best accuracy presents the model's true performance.
For ImageNet-1K task, many methods report their best validation accuracy during training because they cannot approach the 'test dataset'. Of course, we will add the statement to our final paper We report the best performance during training.
for the clarification.
We evaluated on CIFAR datasets using the same evaluation strategy. And we try our best to reproduce the baselines (mixup, cutout, and so on) and report their best
performance for fair-comparison.
But I have a question about what is the true performance
as you said.
I'm not sure the only way to represent the true performance
of the model is to report the last epoch's performance because the model could fluctuate at the end of the training and we cannot guarantee the model was converged
at the last epoch. Therefore, researchers usually train models and pick their best models by validating on the validation set.
In short, we choose the best model
to represent the performance
of the model and I think two approaches, selecting best model
or last model
, are both making sense to evaluate the trained models.
But your comments about the best and last models are very worth to consider for future work.
When I worked on Fast AutoAugment, I used the converged value instead of the instantly peaked high value, and as far as I know, AutoAugment measure the performance in a same way.
First, it is a nice work for the Fast AutoAugment!
In my guess, Fast AutoAugment and AutoAugment may use cosine learning rate decaying, so they are less fluctuating at the end of training, so the best performance and last performance would be similar.
I recently found CutMix + cosine learning rate
works well with CIFAR dataset, so we will report both the best
and last
performance when using cosine learning rate. I hope the gap between the best and the last models would be smaller than current training scheme.
from cutmix-pytorch.
'Cheating' is a rather harsh word. However, comparing the peak value indeed benefits oscillating and risky methods.
from cutmix-pytorch.
- How can I reproduce your result? Especially with your provided codes and sample commands, I should reproduce 14.23% of Top1 Accuracy with PyramidNet+Cutmix. It will be great if you can provide the specific environment and command to reproduce the result or this helps you to find some problems on this repo.
We use pytorch 1.0.0, Tesla P40 GPUs. The paper's experiments were conducted on our cloud system (NSML).
So I recently tested again our code on local machine for CIFAR100 and ImageNet using this repo, and I got slightly lower performance on CIFAR100 (top-1 error 14.5~14.6 as similar to your report) but got better performance on ImageNet (top-1 error 21.4). One possible reason would be the difference between the cloud system and the local machines.
We note that the results (top-1 error 14.5 on CIFAR100) still much better than the important baselines (cutout, mixup, etc). In the camera-ready version of our paper, we might update the performance to 14.5 on CIFAR100 and 21.4 on ImageNet for better reproducibility using local machines.
- Did you use 'last validation accuracy' after training or 'best validation accuracy(peak accuracy)' while training? I saw some codes tracking the best validation accuracy while training and print out the value before terminating, so I assume that you used 'best(peak) validation accuracy'.
As you can see in the code, we choose the best validation accuracy.
Thanks!
from cutmix-pytorch.
@hellbell Thanks, I guess that this reproducibility issue is not from the environment.
I wonder if it is right to use the best validation accuracy. As you can see, the converged model's accuracy is slightly lower than the best one and it is hard to be sure that the best accuracy presents the model's true performance. When I worked on Fast AutoAugment, I used the converged value instead of the instantly peaked high value, and as far as I know, AutoAugment measure the performance in a same way.
Anyway, thanks for the clarification.
from cutmix-pytorch.
@ildoonet
I agree with your reply at some points and it is worth to see the final performance (or, converged performance) comparisons. But our paper also reports the best performance of other algorithms for fair-comparison by re-implementing. Only a few methods which we cannot reproduce were reported by their original paper's scores.
Anyway, thank you for the constructive comments!
from cutmix-pytorch.
I guess that if you mention the best & converged accuracy, it will be okay. Mentioning only instantly peaked high value is somewhat considered as cheating or validation over-fitting.
But as you can say, true performance
of the model is hard to measure even if we have a held-out set only for testing.
Also, I trained with cosine learning rate for many models but there are similar gaps also.
Anyway Thanks for your consideration and long explanations. This is very helpful for me to think lot of things.
from cutmix-pytorch.
I guess that if you mention the best & converged accuracy, it will be okay. Mentioning only instantly peaked high value is somewhat considered as cheating or validation over-fitting.
But as you can say,
true performance
of the model is hard to measure even if we have a held-out set only for testing.Also, I trained with cosine learning rate for many models but there are similar gaps also.
Anyway Thanks for your consideration and long explanations. This is very helpful for me to think lot of things.
If choosing the best performance is cheating, then many people are cheating. So I don't agree with your point. Rather, @hellbell is fairly correct. Thanks for the interesting work
from cutmix-pytorch.
I deeply apologize for the misrepresentation of poor English and poor word choice. CutMix inspired me a lot and helped me a lot in my research.
from cutmix-pytorch.
Related Issues (20)
- cutmix for segmentation HOT 1
- How could I use cutmix in object detection? HOT 1
- question about code in object detection HOT 2
- cutmix for segmentation HOT 1
- Could you please to figure out how to train on Pascal VOC? HOT 2
- About the probability of applying CutMix HOT 6
- Have you tried using feature-level CutMix and image-level CutMix at the same time? HOT 2
- Clarification about CutMix HOT 3
- Object Detection about
- CutMix for Image Captioning HOT 4
- Any COCO pretrained models Yet? HOT 1
- Reproducibility Issue again HOT 6
- Consistency between code and paper HOT 4
- About the hyper-parameter alpha of mixup HOT 5
- About the result of PyramidNet-110 baseline in the paper HOT 1
- Scripts for mentioned experiments HOT 4
- what should I change to use this concept on my data set HOT 3
- Clarification on in-between class samples results HOT 2
- Can Cutmix work well with CosFace?
- The difference between the enhanced image and the expected image is significant HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cutmix-pytorch.