Code Monkey home page Code Monkey logo

Comments (9)

hellbell avatar hellbell commented on July 28, 2024 3

@ildoonet
Thank you for your reply.
I do understand your concerns but I don't agree that mentioning the best performance is cheating. As I said, the best model surely can be treated to represent the performance of the method. The difference between the best and the last model is coming from the step decaying learning rate. In our case of using cosine learning rate on CIFAR100, the best and last model is almost the same (within +- 0.1% acc).
All the experiments we re-implemented are conducted in the same experiment setting, the best model is selected for every other method, so there is no cheating and fair-comparison issues.
Our best model's performance is not instantly peaked high value because we conducted several times and report the mean of the best performances.

from cutmix-pytorch.

hellbell avatar hellbell commented on July 28, 2024 1

@ildoonet
For your clarification and further discussion, So I re-open this issue.

The baseline has a similar top-1 accuracy as your paper said (16.45), but with cutmix(p=0.5), the result is somewhat poor compared to the reported value(14.23).

I newly ran our code on CIFAR100 three times, and we got

  at 300 epoch best acc
try1 14.78 14.23
try2 15.44 14.5
try3 15.00 14.68
average 15.07 14.47

Also, for ImageNet-1K, we got

  at 300 epoch best
try1 21.20 21.19
try2 21.61 21.61
try3 21.40 21.40
average 21.403 21.400

Interestingly, we got the best performance near the last epoch of the training.

I wonder if it is right to use the best validation accuracy. As you can see, the converged model's accuracy is slightly lower than the best one and it is hard to be sure that the best accuracy presents the model's true performance.

For ImageNet-1K task, many methods report their best validation accuracy during training because they cannot approach the 'test dataset'. Of course, we will add the statement to our final paper We report the best performance during training. for the clarification.
We evaluated on CIFAR datasets using the same evaluation strategy. And we try our best to reproduce the baselines (mixup, cutout, and so on) and report their best performance for fair-comparison.
But I have a question about what is the true performance as you said.
I'm not sure the only way to represent the true performance of the model is to report the last epoch's performance because the model could fluctuate at the end of the training and we cannot guarantee the model was converged at the last epoch. Therefore, researchers usually train models and pick their best models by validating on the validation set.
In short, we choose the best model to represent the performance of the model and I think two approaches, selecting best model or last model, are both making sense to evaluate the trained models.
But your comments about the best and last models are very worth to consider for future work.

When I worked on Fast AutoAugment, I used the converged value instead of the instantly peaked high value, and as far as I know, AutoAugment measure the performance in a same way.

First, it is a nice work for the Fast AutoAugment!
In my guess, Fast AutoAugment and AutoAugment may use cosine learning rate decaying, so they are less fluctuating at the end of training, so the best performance and last performance would be similar.
I recently found CutMix + cosine learning rate works well with CIFAR dataset, so we will report both the best and last performance when using cosine learning rate. I hope the gap between the best and the last models would be smaller than current training scheme.

from cutmix-pytorch.

JiyueWang avatar JiyueWang commented on July 28, 2024 1

'Cheating' is a rather harsh word. However, comparing the peak value indeed benefits oscillating and risky methods.

from cutmix-pytorch.

hellbell avatar hellbell commented on July 28, 2024

@ildoonet

  1. How can I reproduce your result? Especially with your provided codes and sample commands, I should reproduce 14.23% of Top1 Accuracy with PyramidNet+Cutmix. It will be great if you can provide the specific environment and command to reproduce the result or this helps you to find some problems on this repo.

We use pytorch 1.0.0, Tesla P40 GPUs. The paper's experiments were conducted on our cloud system (NSML).
So I recently tested again our code on local machine for CIFAR100 and ImageNet using this repo, and I got slightly lower performance on CIFAR100 (top-1 error 14.5~14.6 as similar to your report) but got better performance on ImageNet (top-1 error 21.4). One possible reason would be the difference between the cloud system and the local machines.
We note that the results (top-1 error 14.5 on CIFAR100) still much better than the important baselines (cutout, mixup, etc). In the camera-ready version of our paper, we might update the performance to 14.5 on CIFAR100 and 21.4 on ImageNet for better reproducibility using local machines.

  1. Did you use 'last validation accuracy' after training or 'best validation accuracy(peak accuracy)' while training? I saw some codes tracking the best validation accuracy while training and print out the value before terminating, so I assume that you used 'best(peak) validation accuracy'.

As you can see in the code, we choose the best validation accuracy.

Thanks!

from cutmix-pytorch.

ildoonet avatar ildoonet commented on July 28, 2024

@hellbell Thanks, I guess that this reproducibility issue is not from the environment.

I wonder if it is right to use the best validation accuracy. As you can see, the converged model's accuracy is slightly lower than the best one and it is hard to be sure that the best accuracy presents the model's true performance. When I worked on Fast AutoAugment, I used the converged value instead of the instantly peaked high value, and as far as I know, AutoAugment measure the performance in a same way.

Anyway, thanks for the clarification.

from cutmix-pytorch.

hellbell avatar hellbell commented on July 28, 2024

[Updated reply]

@ildoonet
I agree with your reply at some points and it is worth to see the final performance (or, converged performance) comparisons. But our paper also reports the best performance of other algorithms for fair-comparison by re-implementing. Only a few methods which we cannot reproduce were reported by their original paper's scores.
Anyway, thank you for the constructive comments!

from cutmix-pytorch.

ildoonet avatar ildoonet commented on July 28, 2024

I guess that if you mention the best & converged accuracy, it will be okay. Mentioning only instantly peaked high value is somewhat considered as cheating or validation over-fitting.

But as you can say, true performance of the model is hard to measure even if we have a held-out set only for testing.

Also, I trained with cosine learning rate for many models but there are similar gaps also.

Anyway Thanks for your consideration and long explanations. This is very helpful for me to think lot of things.

from cutmix-pytorch.

GuoleiSun avatar GuoleiSun commented on July 28, 2024

I guess that if you mention the best & converged accuracy, it will be okay. Mentioning only instantly peaked high value is somewhat considered as cheating or validation over-fitting.

But as you can say, true performance of the model is hard to measure even if we have a held-out set only for testing.

Also, I trained with cosine learning rate for many models but there are similar gaps also.

Anyway Thanks for your consideration and long explanations. This is very helpful for me to think lot of things.

If choosing the best performance is cheating, then many people are cheating. So I don't agree with your point. Rather, @hellbell is fairly correct. Thanks for the interesting work

from cutmix-pytorch.

ildoonet avatar ildoonet commented on July 28, 2024

I deeply apologize for the misrepresentation of poor English and poor word choice. CutMix inspired me a lot and helped me a lot in my research.

from cutmix-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.