Code Monkey home page Code Monkey logo

ssda_mme's People

Contributors

dependabot[bot] avatar ksaito-ut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssda_mme's Issues

Hyper-parameters used in the tSNE plot

Dear Authors,
Thank you for making the code open source! I had a small question while reading the paper. I was wondering if you could let me know the approximate hyper-parameters used while getting the tSNE embeddings as given in the paper, i.e. the perplexity, number of iterations, learning rate among others. I am plotting the tSNE using this function. Please do let me know if it would be possible.
Thanks,
Megh

Cannot reproduce the results on DomainNet when K = 1

Thank for sharing your project~
Based on the data split and settings of hyper-parameters in the paper, I can obtain similar classification performance on DomainNet when K = 3, but not when K = 1 (e.g., in P to R, I get 73.65 accuracy, instead of 76.1 claimed in the paper; in R to S, I get 59.655 accuracy, rather than 61.0 claimed in the paper).

Do I miss some important details ?

Gradient reversal layer

Hi,
thanks for publishing the codes.
I have a question regarding the training with the gradient reversal layer.
As I understood from the code (main.py), Training consists of two steps:

  1. you update the feature extractor and classifier with respect to label data
  2. you update the feature extractor and classifier with respect to unlabeled data using gradient reversal layer.

My question is whether you need two steps because in the paper you mentioned it is done one step.
whether it can't be done like this:

  • loss.backward() with labeled data
  • t_loss.backward() with unlabeled data
  • update the model - optimizer.step()

Thanks in advance

Reproducing the results for Resnet-34 in real-to-sketch adaptation

Hello, thank you for sharing the code.

Could you publish the commands for training the model to reproduce the results of table1 in the paper for the 3-shot settings? I ran the training with "labeled_source_images_real.txt " as source annotations, "labeled_target_images_sketch_3.txt" as labeled target, "unlabeled_target_images_sketch_3.txt" for domain adaptation. The model performance was tested on "unlabeled_target_images_sketch_3.txt".

For resnet-34 I obtained the following results :
ACC All 62.775959 ACC Averaged over Classes 63.848788
It is stated that the model performance is 72.2% in the table 1 of the paper.

Maybe I made a mistake in the training setup. Thank you

How max entropy of unlabelled data w.r.t classifier work?

First, thanks for your sharing code!

Actually, I am little confused about How maximizing entropy of unlabelled data w.r.t classifier work? in the chapter 3.2 training objectives.
Firstly, you train the F (feature extractor) and C on labelled data, it is intuitive that the prototype of one class (i.e. class A) will locate at the centre of feature distributions of this class (i.e. class A), where the objective function is cross-entropy minimization.

In the second step, you mentioned that the maximizing entropy of unlabelled data w.r.t classifier will force/push the all the prototypes (representative points) to the feature distributions of target domain.
It's correct, but how can you ensure that prototype of Class A (initially at the centre of source Class A feature distribution) will be pushed to the centre of target Class A feature distribution?
Because the figures in your paper shows that the class-specific prototype (initially at the centre of source Class A feature distribution) will be pushed to class-specific centre of feature distribution in target domain.

Or doesn't need to do or cannot achieve class-specific updating? And then in next step, try to minimizing the entropy w.r.t feature extractor.

If so, do you think first minimizing entropy w.r.t feature extractor and then max it w.r.t classifier is better? or it doesn't matter because Minimax will be implemented alternatively.

It is easy and intuitive to understand that Minimax alternative training can refine the performance, but it is still confused how the first max entropy step work (push prototypes to specific target centre).

Confused about the ENT and AdENT

SSDA_MME/main.py

Lines 195 to 204 in 81c3a9c

if args.method == 'ENT':
loss_t = entropy(F1, output, args.lamda)
loss_t.backward()
optimizer_f.step()
optimizer_g.step()
elif args.method == 'MME':
loss_t = adentropy(F1, output, args.lamda)
loss_t.backward()
optimizer_f.step()
optimizer_g.step()

SSDA_MME/utils/loss.py

Lines 28 to 41 in 81c3a9c

def entropy(F1, feat, lamda, eta=1.0):
out_t1 = F1(feat, reverse=True, eta=-eta)
out_t1 = F.softmax(out_t1)
loss_ent = -lamda * torch.mean(torch.sum(out_t1 *
(torch.log(out_t1 + 1e-5)), 1))
return loss_ent
def adentropy(F1, feat, lamda, eta=1.0):
out_t1 = F1(feat, reverse=True, eta=eta)
out_t1 = F.softmax(out_t1)
loss_adent = lamda * torch.mean(torch.sum(out_t1 *
(torch.log(out_t1 + 1e-5)), 1))
return loss_adent

Thank you for your code.
From your code it seems that
ENT method try to minimize entropy on classifier but maximize on feature extractor;
AdENT method try to maximize entropy on classifier but minimize on feature extractor, which is proposed in your paper.

BUT, in your paper the ENT method seems to be described as minimize entropy on both classifier and feature extractor, as referred in Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In NIPS, 2005

So, i'm very confused about it. I'm looking forward to hearing from you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.