Hi, thank you for your great project! I’m stuck with two problems wh

Losses about mean-teacher HOT 4 OPEN

curiousai commented on August 17, 2024

Losses

from mean-teacher.

Comments (4)

SHMCU commented on August 17, 2024

I think to make mean teacher work, you have to set the consistency_weight to some value. In the mean teacher pytorch webpage, it is set to 100.0. The logit_distance_cost is set to 0.01 for Cifar10 experiment. I believe these are necessary to make mean teacher work.

from mean-teacher.

rracinskij commented on August 17, 2024

It looks like that the logit_distance_cost should be set to some positive value only if the student model has two outputs.
And yes, total loss depends on the teacher model only if the consistency_weight is non-zero. But then the accuracy of my minimalistic MNIST implementation is lower compared to a single convnet.

from mean-teacher.

tarvaina commented on August 17, 2024

Hi,

So if I understood correctly, your dataset is MNIST with 1000 labeled and 59000 unlabeled examples? And you are using a convolutional network with mean teacher and comparing the results against a bare convolutional network?

Yes, you should set consistency > 0. The best value for consistency may depend on the dataset, the mix of unlabeled/labeled per batch, and other things. A bad consistency cost can lead to worse performance than not using any. Also ema_decay parameter may effect performance a lot. See Figure 4 in the paper for what these look like for SVHN.

At the beginning of the training, the labeled examples are much more useful than the unlabeled examples. If you have a high consistency cost in the beginning, it may hurt the learning. There are two ways around it: either use a consistency ramp-up or use logit_distance_cost > 0 (and yes, two outputs from the network). Also these are hyperparameters that may require tuning.

from mean-teacher.

DISAPPEARED13 commented on August 17, 2024

Hi there,
I notice this problem, too. As we known that paper mentioned just 2 kind of loss(class loss and consistency loss) to optimize, what's the situation that student model has 2 output? I saw the difference between the output is that using different fc layer. Is it because representation learning or some stuff?

Thanks a lot!

from mean-teacher.

Recommend Projects

Losses about mean-teacher HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent