Comments (5)
@rederxz
Thank you for your constructive opinion.
As pointed out in our paper, we tried two alphas (0.5 and 1.0) for mixup training and we choose alpha=1.0 because it shows better performance than 0.5. So we didn't try alpha below 0.5, but as you said, it would be worth finding the optimal alpha for mixup.
In fact, after doing some experiments, we find that the performance of Mixup and Cutmix could be close on ImageNet with preferred alpha settings respectively (0.2 and 1.0).
Could you give more detail about this? such as the accuracy, training settings, and so on.
Have you tried some related experiments and what do you think about it?
As I remember, in our training settings, CutMix was always better than mixup for ResNet variants regardless of alpha values.
However, for lightweight architectures like EfficientNet variants, mixup and CutMix shows similar performance gain.
So I think there should be a better strategy. Some recent works (e.g., https://arxiv.org/pdf/2012.12877.pdf) utilize mixup and CutMix at the same time for performance boosting.
from cutmix-pytorch.
Thanks for your reply! Some of our experiments are still in progress. I will upload detailed results in a few days.
from cutmix-pytorch.
Here are the results.
Our experiments:
model (Resolution) | augmentation | regularization | batch size | optimizer | lr | epochs | lr_schedule | wd | acc | Reference |
---|---|---|---|---|---|---|---|---|---|---|
ResNet_vd-50 160 | ResizedCrop | label smooth 0.1 mixup_batch alpha=0.2 | 256 * 4 | SGD | 0.1 * 4 | 200 | Cosine | 0.0001 | 78.58 | / |
ResNet_vd-50 160 | ResizedCrop | label smooth 0.1 mixup_batch alpha=1.0 | 256 * 4 | SGD | 0.1 * 4 | 200 | Cosine | 0.0001 | 77.55 | / |
ResNet_vd-50 160 | ResizedCrop | label smooth 0.1 cutmix_batch alpha=1.0 | 256 * 4 | SGD | 0.1 * 4 | 200 | Cosine | 0.0001 | 78.43 | / |
model (Resolution) | augmentation | regularization | batch size | optimizer | lr | epochs | lr_schedule | wd | acc | Reference |
---|---|---|---|---|---|---|---|---|---|---|
ResNet_vd-50 avd 160 | ResizedCrop | label smooth 0.1 cutmix_batch alpha=0.2 | 256 * 4 | SGD | 0.1 * 4 | 200 | Cosine | 0.0001 | 79.13 | / |
ResNet_vd-50 avd 160 | ResizedCrop | label smooth 0.1 cutmix_batch alpha=1.0 | 256 * 4 | SGD | 0.1 * 4 | 200 | Cosine | 0.0001 | 78.68 | / |
model (Resolution) | augmentation | regularization | batch size | optimizer | lr | epochs | lr_schedule | wd | acc | Reference |
---|---|---|---|---|---|---|---|---|---|---|
ResNet_vd-50 224 | ResizedCrop | mixup_batch alpha=0.2 | 256 * 4 | SGD | 0.1 * 4 | 300 | Cosine | 0.0001 | 79.00 | / |
ResNet_vd-50 224 | ResizedCrop | mixup_batch alpha=1.0 | 256 * 4 | SGD | 0.1 * 4 | 300 | Cosine | 0.0001 | 78.44 | / |
ResNet_vd-50 224 | ResizedCrop | cutmix_batch alpha=0.2 | 256 * 4 | SGD | 0.1 * 4 | 300 | Cosine | 0.0001 | 79.15 | / |
ResNet_vd-50 224 | ResizedCrop | cutmix_batch alpha=1.0 | 256 * 4 | SGD | 0.1 * 4 | 300 | Cosine | 0.0001 | 79.17 | / |
Results from PaddleClas
model (Resolution) | augmentation | regularization | batch size | optimizer | lr | epochs | lr_schedule | wd | acc | Reference |
---|---|---|---|---|---|---|---|---|---|---|
ResNet-50 224 | ResizedCrop | mixup_batch alpha=0.2 | 256 | SGD | 0.1 | 300 | Cosine | 0.0001 | 0.7828 | page |
ResNet-50 224 | ResizedCrop | cutmix_batch alpha=0.2 | 256 | SGD | 0.1 | 300 | Cosine | 0.0001 | 0.7839 | page |
Experiments in the paper of CutMix
model (Resolution) | augmentation | regularization | batch size | optimizer | lr | epochs | lr_schedule | wd | acc | Reference |
---|---|---|---|---|---|---|---|---|---|---|
ResNet-50 224 | ResizedCrop | mixup_batch alpha=1.0 | 256 | SGD | 0.1 | 300 | Step | 0.0001 | 0.7742 | cutmix paper |
ResNet-50 224 | ResizedCrop | cutmix_batch alpha=1.0 | 256 | SGD | 0.1 | 300 | Step | 0.0001 | 0.7860 | cutmix paper |
We can see that mixup's performence is better when alpha equals 0.2 than when alpha equals 1.0. Also, the gap between mixup and cutmix becomes smaller when alpha equals 0.2, which can also be confirmed by the results from paddleClas.
from cutmix-pytorch.
Thank you for sharing the results! They are great experiments.
Given your results, I agree that alpha should be 0.2 for mixup on ImageNet experiments. If there's a chance to revise or extend our paper, this information would be very useful :)
At the same time, I'm curious about the CutMix result with alpha=1.0 on PaddleClas
table. I guess its performance would be better than alpha=0.2.
Thanks!
from cutmix-pytorch.
I agree that alpha influences performances of cutmix and mixup, and may have greater impact on mixup on ImageNet.
Thanks for your reply.:smiley:
from cutmix-pytorch.
Related Issues (20)
- cutmix for segmentation HOT 1
- How could I use cutmix in object detection? HOT 1
- question about code in object detection HOT 2
- cutmix for segmentation HOT 1
- Could you please to figure out how to train on Pascal VOC? HOT 2
- About the probability of applying CutMix HOT 6
- Have you tried using feature-level CutMix and image-level CutMix at the same time? HOT 2
- Clarification about CutMix HOT 3
- Object Detection about
- CutMix for Image Captioning HOT 4
- Any COCO pretrained models Yet? HOT 1
- Reproducibility Issue again HOT 6
- Consistency between code and paper HOT 4
- About the result of PyramidNet-110 baseline in the paper HOT 1
- Scripts for mentioned experiments HOT 4
- what should I change to use this concept on my data set HOT 3
- Clarification on in-between class samples results HOT 2
- Can Cutmix work well with CosFace?
- The difference between the enhanced image and the expected image is significant HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cutmix-pytorch.