Code Monkey home page Code Monkey logo

Comments (9)

pengzhenghao avatar pengzhenghao commented on August 18, 2024

I think in this implementation they use softmax as the output activation function when sampling action.
And see the code below, you can find that they have attempted to use Argmax activation by return CategoricalPdType(ac_space.n) when sampling. But eventually they use softmax activation when training the Q net.

def make_pdtype(ac_space):
    from gym import spaces
    if isinstance(ac_space, spaces.Box):
        assert len(ac_space.shape) == 1
        return DiagGaussianPdType(ac_space.shape[0])
    elif isinstance(ac_space, spaces.Discrete):
        # return CategoricalPdType(ac_space.n)
        return SoftCategoricalPdType(ac_space.n)
    elif isinstance(ac_space, spaces.MultiDiscrete):
        #return MultiCategoricalPdType(ac_space.low, ac_space.high)
        return SoftMultiCategoricalPdType(ac_space.low, ac_space.high)
    elif isinstance(ac_space, spaces.MultiBinary):
        return BernoulliPdType(ac_space.n)
    else:
        raise NotImplementedError

from maddpg.

djbitbyte avatar djbitbyte commented on August 18, 2024

Hello, @pengzhenghao!

I've looked into involved functions again, I guess they use SoftCategoricalPdType(ac_space.n), then SoftCategoricalPdType.sample() to somehow add noise to actions, finally to softmax(logits - noise) as output of actor network.

And the noise added to action is from:
def sample(self):
u = tf.random_uniform(tf.shape(self.logits))
return U.softmax(self.logits - tf.log(-tf.log(u)), axis=-1)

I don't quite get it why they handle the noise in this way.

from maddpg.

djbitbyte avatar djbitbyte commented on August 18, 2024

The sample function in distribution is implementation of Gumbel-softmax, I added it to my code, now it helps to speed up stabilize the training, but my speaker still can not tell the different landmarks.

How do you handle the action exploration then?

from maddpg.

LiuQiangOpenMind avatar LiuQiangOpenMind commented on August 18, 2024

Hello, @pengzhenghao!
I don't quite get it why they handle the noise in the form of the log-log link function. Due to the log-log link function is non-linear function, the noise randomly generated every time could fluctuate, how to control the degree of noise to ensure adequate action exploration?
111

from maddpg.

pengzhenghao avatar pengzhenghao commented on August 18, 2024

Hello, @pengzhenghao!
I don't quite get it why they handle the noise in the form of the log-log link function. Due to the log-log link function is non-linear function, the noise randomly generated every time could fluctuate, how to control the degree of noise to ensure adequate action exploration?
111

Gumble-Softmax Trick is an important re-parameterization trick that can help smoothing the back propagation. I refer you to search with keyword "gumbel softmax" for more information. I am sorry for not providing more info since I do not thoroughly understand the whole process of gumbel softmax...

from maddpg.

Ah31 avatar Ah31 commented on August 18, 2024

Hello @djbitbyte!

You said that gumbel softmax helps to speed up stabilize the training. I am trying to reproduce the results in pytorch and using torch.nn.functional._gumbel_softmax_sample while sampling the action for current state as follows:

act
Also I am using torch.nn.functional.gumbel_softmax for computing target actions for next states and for computing action for current agent to be fed into actor_local.
Based on the original code and the algorithm, I am not able to understand why training is not converging after i use gumbel_softmax.

Thanks in advance!

from maddpg.

Ah31 avatar Ah31 commented on August 18, 2024

Hello!
Just to mention that there were many other issues in the code instead of gumbel-softmax because of which the training was not converging.

from maddpg.

kargarisaac avatar kargarisaac commented on August 18, 2024

Hello!
Just to mention that there were many other issues in the code instead of gumbel-softmax because of which the training was not converging.

Hi, I'm trying to understand how to use gumbel_softmax in pytorch to reproduce the results. I'm using PPO but it cannot even learn the task for only one agent and one landmark completely. It reaches some good level but it's not as good as MADDPG at all. I think the problem is with this simple softmax and Categorical distribution I use and want to change it to humble softmax. I used:

policy_dist = distributions.Categorical(F.gumbel_softmax(policy_logits_out, tau=1, hard=False).to("cpu"))

But didn't get good results. There is also a distribution.Gumble in pytorch. I think I use them incorrectly.

Can you provide an example to use them in your own algorithm?

Thank you

from maddpg.

tanxiangtj avatar tanxiangtj commented on August 18, 2024

The sample function in distribution is implementation of Gumbel-softmax, I added it to my code, now it helps to speed up stabilize the training, but my speaker still can not tell the different landmarks.

How do you handle the action exploration then?

can you provide the code of your implementation of Gumbel-softmax? I meet the same problem when using MADDPG.
many thanks.

from maddpg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.