In the paper mixup: Beyond E

Here are the results. Our experiments: <markdown-accessiblity-

About the hyper-parameter alpha of mixup about cutmix-pytorch HOT 5 CLOSED

clovaai commented on July 28, 2024

About the hyper-parameter alpha of mixup

from cutmix-pytorch.

Comments (5)

hellbell commented on July 28, 2024

@rederxz
Thank you for your constructive opinion.
As pointed out in our paper, we tried two alphas (0.5 and 1.0) for mixup training and we choose alpha=1.0 because it shows better performance than 0.5. So we didn't try alpha below 0.5, but as you said, it would be worth finding the optimal alpha for mixup.

In fact, after doing some experiments, we find that the performance of Mixup and Cutmix could be close on ImageNet with preferred alpha settings respectively (0.2 and 1.0).

Could you give more detail about this? such as the accuracy, training settings, and so on.

Have you tried some related experiments and what do you think about it?

As I remember, in our training settings, CutMix was always better than mixup for ResNet variants regardless of alpha values.
However, for lightweight architectures like EfficientNet variants, mixup and CutMix shows similar performance gain.
So I think there should be a better strategy. Some recent works (e.g., https://arxiv.org/pdf/2012.12877.pdf) utilize mixup and CutMix at the same time for performance boosting.

from cutmix-pytorch.

rederyang commented on July 28, 2024

Thanks for your reply! Some of our experiments are still in progress. I will upload detailed results in a few days.

from cutmix-pytorch.

rederyang commented on July 28, 2024

Here are the results.

Our experiments:

model (Resolution)	augmentation	regularization	batch size	optimizer	lr	epochs	lr_schedule	wd	acc	Reference
ResNet_vd-50 160	ResizedCrop	label smooth 0.1 mixup_batch alpha=0.2	256 * 4	SGD	0.1 * 4	200	Cosine	0.0001	78.58	/
ResNet_vd-50 160	ResizedCrop	label smooth 0.1 mixup_batch alpha=1.0	256 * 4	SGD	0.1 * 4	200	Cosine	0.0001	77.55	/
ResNet_vd-50 160	ResizedCrop	label smooth 0.1 cutmix_batch alpha=1.0	256 * 4	SGD	0.1 * 4	200	Cosine	0.0001	78.43	/

model (Resolution)	augmentation	regularization	batch size	optimizer	lr	epochs	lr_schedule	wd	acc	Reference
ResNet_vd-50 avd 160	ResizedCrop	label smooth 0.1 cutmix_batch alpha=0.2	256 * 4	SGD	0.1 * 4	200	Cosine	0.0001	79.13	/
ResNet_vd-50 avd 160	ResizedCrop	label smooth 0.1 cutmix_batch alpha=1.0	256 * 4	SGD	0.1 * 4	200	Cosine	0.0001	78.68	/

model (Resolution)	augmentation	regularization	batch size	optimizer	lr	epochs	lr_schedule	wd	acc	Reference
ResNet_vd-50 224	ResizedCrop	mixup_batch alpha=0.2	256 * 4	SGD	0.1 * 4	300	Cosine	0.0001	79.00	/
ResNet_vd-50 224	ResizedCrop	mixup_batch alpha=1.0	256 * 4	SGD	0.1 * 4	300	Cosine	0.0001	78.44	/
ResNet_vd-50 224	ResizedCrop	cutmix_batch alpha=0.2	256 * 4	SGD	0.1 * 4	300	Cosine	0.0001	79.15	/
ResNet_vd-50 224	ResizedCrop	cutmix_batch alpha=1.0	256 * 4	SGD	0.1 * 4	300	Cosine	0.0001	79.17	/

Results from PaddleClas

model (Resolution)	augmentation	regularization	batch size	optimizer	lr	epochs	lr_schedule	wd	acc	Reference
ResNet-50 224	ResizedCrop	mixup_batch alpha=0.2	256	SGD	0.1	300	Cosine	0.0001	0.7828	page
ResNet-50 224	ResizedCrop	cutmix_batch alpha=0.2	256	SGD	0.1	300	Cosine	0.0001	0.7839	page

Experiments in the paper of CutMix

model (Resolution)	augmentation	regularization	batch size	optimizer	lr	epochs	lr_schedule	wd	acc	Reference
ResNet-50 224	ResizedCrop	mixup_batch alpha=1.0	256	SGD	0.1	300	Step	0.0001	0.7742	cutmix paper
ResNet-50 224	ResizedCrop	cutmix_batch alpha=1.0	256	SGD	0.1	300	Step	0.0001	0.7860	cutmix paper

We can see that mixup's performence is better when alpha equals 0.2 than when alpha equals 1.0. Also, the gap between mixup and cutmix becomes smaller when alpha equals 0.2, which can also be confirmed by the results from paddleClas.

from cutmix-pytorch.

hellbell commented on July 28, 2024

Thank you for sharing the results! They are great experiments.
Given your results, I agree that alpha should be 0.2 for mixup on ImageNet experiments. If there's a chance to revise or extend our paper, this information would be very useful :)
At the same time, I'm curious about the CutMix result with alpha=1.0 on PaddleClas table. I guess its performance would be better than alpha=0.2.
Thanks!

from cutmix-pytorch.

rederyang commented on July 28, 2024

I agree that alpha influences performances of cutmix and mixup, and may have greater impact on mixup on ImageNet.
Thanks for your reply.:smiley:

from cutmix-pytorch.

About the hyper-parameter alpha of mixup about cutmix-pytorch HOT 5 CLOSED

Comments (5)

Our experiments:

Results from PaddleClas

Experiments in the paper of CutMix

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent