visionlearninggroup / ssda_mme Goto Github PK

View Code? Open in Web Editor NEW

289.0 289.0 46.0 1.55 MB

Semi-supervised Domain Adaptation via Minimax Entropy

License: MIT License

Shell 1.14% Python 98.86%

ssda_mme's People

Contributors

Stargazers

Watchers

Forkers

yyht forks-learning 459548764 zy-python zhangzp9970 abutaufique guoshuxuan 1061136002 gaimjkp jing--li kessido yguooo sameerkhurana10 superxingzai daaiyiyejian albert0147 xyzhu12 meghbhalerao liupeng0606 qljust bigheiniu tl32rodan dmund95 andy12392 woodszp pi-sky cv-ip undercurrentzyy wangxiaoying sender666 sahil-iit morales97 ankitshah009 zhangmaohi 160820120 zivzone obsinaan bizzmug sophieloiz tinyloop withmoka maianhpuco yseitaro sailfish009 qifansun1128 yld2018

ssda_mme's Issues

Confusion about the txt file names

Hi @ksaito-ut ,

What's the difference between unlabeled_target_images_XXXX_1.txt and unlabeled_target_images_XXXX_3.txt?

Hyper-parameters used in the tSNE plot

Dear Authors,
Thank you for making the code open source! I had a small question while reading the paper. I was wondering if you could let me know the approximate hyper-parameters used while getting the tSNE embeddings as given in the paper, i.e. the perplexity, number of iterations, learning rate among others. I am plotting the tSNE using this function. Please do let me know if it would be possible.
Thanks,
Megh

Cannot reproduce the results on DomainNet when K = 1

Thank for sharing your project~
Based on the data split and settings of hyper-parameters in the paper, I can obtain similar classification performance on DomainNet when K = 3, but not when K = 1 (e.g., in P to R, I get 73.65 accuracy, instead of 76.1 claimed in the paper; in R to S, I get 59.655 accuracy, rather than 61.0 claimed in the paper).

Do I miss some important details ?

Gradient reversal layer

Hi,
thanks for publishing the codes.
I have a question regarding the training with the gradient reversal layer.
As I understood from the code (main.py), Training consists of two steps:

you update the feature extractor and classifier with respect to label data
you update the feature extractor and classifier with respect to unlabeled data using gradient reversal layer.

My question is whether you need two steps because in the paper you mentioned it is done one step.
whether it can't be done like this:

loss.backward() with labeled data
t_loss.backward() with unlabeled data
update the model - optimizer.step()

Thanks in advance

Reproducing the results for Resnet-34 in real-to-sketch adaptation

Hello, thank you for sharing the code.

Could you publish the commands for training the model to reproduce the results of table1 in the paper for the 3-shot settings? I ran the training with "labeled_source_images_real.txt " as source annotations, "labeled_target_images_sketch_3.txt" as labeled target, "unlabeled_target_images_sketch_3.txt" for domain adaptation. The model performance was tested on "unlabeled_target_images_sketch_3.txt".

For resnet-34 I obtained the following results :
ACC All 62.775959 ACC Averaged over Classes 63.848788
It is stated that the model performance is 72.2% in the table 1 of the paper.

Maybe I made a mistake in the training setup. Thank you

How max entropy of unlabelled data w.r.t classifier work?

First, thanks for your sharing code!

Actually, I am little confused about How maximizing entropy of unlabelled data w.r.t classifier work? in the chapter 3.2 training objectives.
Firstly, you train the F (feature extractor) and C on labelled data, it is intuitive that the prototype of one class (i.e. class A) will locate at the centre of feature distributions of this class (i.e. class A), where the objective function is cross-entropy minimization.

In the second step, you mentioned that the maximizing entropy of unlabelled data w.r.t classifier will force/push the all the prototypes (representative points) to the feature distributions of target domain.
It's correct, but how can you ensure that prototype of Class A (initially at the centre of source Class A feature distribution) will be pushed to the centre of target Class A feature distribution?
Because the figures in your paper shows that the class-specific prototype (initially at the centre of source Class A feature distribution) will be pushed to class-specific centre of feature distribution in target domain.

Or doesn't need to do or cannot achieve class-specific updating? And then in next step, try to minimizing the entropy w.r.t feature extractor.

If so, do you think first minimizing entropy w.r.t feature extractor and then max it w.r.t classifier is better? or it doesn't matter because Minimax will be implemented alternatively.

It is easy and intuitive to understand that Minimax alternative training can refine the performance, but it is still confused how the first max entropy step work (push prototypes to specific target centre).

Confused about the ENT and AdENT

SSDA_MME/main.py

Lines 195 to 204 in 81c3a9c

    
           if args.method == 'ENT': 
        
               loss_t = entropy(F1, output, args.lamda) 
        
               loss_t.backward() 
        
               optimizer_f.step() 
        
               optimizer_g.step() 
        
           elif args.method == 'MME': 
        
               loss_t = adentropy(F1, output, args.lamda) 
        
               loss_t.backward() 
        
               optimizer_f.step() 
        
               optimizer_g.step()

SSDA_MME/utils/loss.py

Lines 28 to 41 in 81c3a9c

    
           def entropy(F1, feat, lamda, eta=1.0): 
        
               out_t1 = F1(feat, reverse=True, eta=-eta) 
        
               out_t1 = F.softmax(out_t1) 
        
               loss_ent = -lamda * torch.mean(torch.sum(out_t1 * 
        
                                                        (torch.log(out_t1 + 1e-5)), 1)) 
        
               return loss_ent 
        
           def adentropy(F1, feat, lamda, eta=1.0): 
        
               out_t1 = F1(feat, reverse=True, eta=eta) 
        
               out_t1 = F.softmax(out_t1) 
        
               loss_adent = lamda * torch.mean(torch.sum(out_t1 * 
        
                                                         (torch.log(out_t1 + 1e-5)), 1)) 
        
               return loss_adent

Thank you for your code.
From your code it seems that
ENT method try to minimize entropy on classifier but maximize on feature extractor;
AdENT method try to maximize entropy on classifier but minimize on feature extractor, which is proposed in your paper.

BUT, in your paper the ENT method seems to be described as minimize entropy on both classifier and feature extractor, as referred in Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In NIPS, 2005

So, i'm very confused about it. I'm looking forward to hearing from you.

Unsupervised Results using ResNet34 on DomainNet.

Hi @ksaito-ut

Thanks for sharing the code.
Did you try the unsupervised setting on DomainNet using ResNet34? If yes, then can you please share the accuracy values obtained?

Thanks

visionlearninggroup / ssda_mme Goto Github PK

ssda_mme's People

Contributors

Stargazers

Watchers

Forkers

ssda_mme's Issues

Confusion about the txt file names

Hyper-parameters used in the tSNE plot

Cannot reproduce the results on DomainNet when K = 1

Gradient reversal layer

Reproducing the results for Resnet-34 in real-to-sketch adaptation

How max entropy of unlabelled data w.r.t classifier work?

Confused about the ENT and AdENT

Unsupervised Results using ResNet34 on DomainNet.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if args.method == 'ENT':
	loss_t = entropy(F1, output, args.lamda)
	loss_t.backward()
	optimizer_f.step()
	optimizer_g.step()
	elif args.method == 'MME':
	loss_t = adentropy(F1, output, args.lamda)
	loss_t.backward()
	optimizer_f.step()
	optimizer_g.step()

	def entropy(F1, feat, lamda, eta=1.0):
	out_t1 = F1(feat, reverse=True, eta=-eta)
	out_t1 = F.softmax(out_t1)
	loss_ent = -lamda * torch.mean(torch.sum(out_t1 *
	(torch.log(out_t1 + 1e-5)), 1))
	return loss_ent


	def adentropy(F1, feat, lamda, eta=1.0):
	out_t1 = F1(feat, reverse=True, eta=eta)
	out_t1 = F.softmax(out_t1)
	loss_adent = lamda * torch.mean(torch.sum(out_t1 *
	(torch.log(out_t1 + 1e-5)), 1))
	return loss_adent