Code Monkey home page Code Monkey logo

Comments (8)

jik0730 avatar jik0730 commented on August 23, 2024

Hi, thanks for your interest in the code.
What about trying params['meta_learner.features.1.bn1.running_mean']?
The reason being is that in the code we are using a class instance of 'MetaLearner' as a model to optimize and it defines an attribute 'self.meta_learner' so that in order to access its parameters via dictionary type the key should start with 'meta_learner.blahblah...'.
If there exists other issues, let me know:)
Thank you!

from maml-in-pytorch.

rshaojimmy avatar rshaojimmy commented on August 23, 2024

Hi, thanks for your interest in the code.
What about trying params['meta_learner.features.1.bn1.running_mean']?
The reason being is that in the code we are using a class instance of 'MetaLearner' as a model to optimize and it defines an attribute 'self.meta_learner' so that in order to access its parameters via dictionary type the key should start with 'meta_learner.blahblah...'.
If there exists other issues, let me know:)
Thank you!

Yes, I have solved my previous problem. But I have found that another implementation says that there is no gradient in BatchNorm with running_mean and running_var. Therefore, these two factors should be avoided in the calculation process of theta_old and theta_new.

May I ask whether you agree with it? Thanks.

from maml-in-pytorch.

jik0730 avatar jik0730 commented on August 23, 2024

Yes, there is no gradient in BatchNorm with running_mean and running_var since those are just buffers. And yes, those two factors should be avoided in computing theta.
In our implementation, those factors are never used since we do not turn model into eval() mode. For each task given in training or evaluation, in batch norm layer, it uses current task's (mini-batch's) statistics for forward path. This has to be true since for each task its data distribution may differ from other tasks.
Thanks!

from maml-in-pytorch.

rshaojimmy avatar rshaojimmy commented on August 23, 2024

For each task given in training or evaluation, in batch norm layer, it uses current task's (mini-batch's) statistics for forward path. This has to be true since for each task its data distribution may differ from other tasks.

Thanks for your reply!

Does this mean that the momentum for batch norm should be set as 1?

from maml-in-pytorch.

jik0730 avatar jik0730 commented on August 23, 2024

Yes, but if you use eval() mode in the model. I think in our current code it does not matter how the momentum value is set since we do not use eval() mode.
However, if you use eval() mode, assuming that test point comes one-by-one, the momentum should be set by 1, which means we will use support set's batch norm statistics in evaluation. But, in many meta-learning papers, in each task (episode), they assume that we are given both support and query sets.
I think more correct way to be beneficial from this transductive setting is that we need to forward both support and query sets together and update meta-parameter to task-specific parameter with support set and set the model into eval() mode given momentum=1 for evaluating query set.
Correct me if I'm wrong, thanks!

from maml-in-pytorch.

rshaojimmy avatar rshaojimmy commented on August 23, 2024

momentum

Yes, but I find that the momentum is set by 1 by default in your implementation and says that "NOTE we do not need to care about running_mean anv var since momentum=1". However, the default value in batchnorm function of Pytorch is set by 0.1.

I wonder whether the value of the momentum will affect the process of meta-learning, and may I ask why you set the value of momentum by 1 by default?

Thank you very much.

from maml-in-pytorch.

jik0730 avatar jik0730 commented on August 23, 2024
  1. I wonder whether the value of the momentum will affect the process of meta-learning ...
    I'm pretty sure that the momentum value will not affect the training of meta-learning (here few-shot learning) because the momentum value is used only for computing moving average of statistics in batch-norm, which are used in evaluation after turning on model.eval().

  2. may I ask why you set the value of momentum by 1 by default?
    If we set momentum=1, batch-norm statistics are to be set by previous mini-batch's statistics, which is not desirable situation in general supervised learning task (that's why the default value=0.1). However in meta-learning context, in meta-evaluation phase, we want the statistics in batch-norm only computed from the current task's support set, and finally compute predictions for query set with the computed statistics. That's why we set momemtum=1 by default.

I agree NOTE in the code might be confusing, and will be corrected soon. Thanks for asking! If you want further clarification on this issue, let me know.

from maml-in-pytorch.

rshaojimmy avatar rshaojimmy commented on August 23, 2024
  1. I wonder whether the value of the momentum will affect the process of meta-learning ...
    I'm pretty sure that the momentum value will not affect the training of meta-learning (here few-shot learning) because the momentum value is used only for computing moving average of statistics in batch-norm, which are used in evaluation after turning on model.eval().
  2. may I ask why you set the value of momentum by 1 by default?
    If we set momentum=1, batch-norm statistics are to be set by previous mini-batch's statistics, which is not desirable situation in general supervised learning task (that's why the default value=0.1). However in meta-learning context, in meta-evaluation phase, we want the statistics in batch-norm only computed from the current task's support set, and finally compute predictions for query set with the computed statistics. That's why we set momemtum=1 by default.

I agree NOTE in the code might be confusing, and will be corrected soon. Thanks for asking! If you want further clarification on this issue, let me know.

Noted! Thank you very much!

from maml-in-pytorch.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.