Code Monkey home page Code Monkey logo

Comments (10)

qijinshe avatar qijinshe commented on August 23, 2024 2

I find it can be solved by delete "self.action_scale" in line 103 of model.py.
In SAC paper, this item does not exist. But I'm not sure. Maybe I miss something

from pytorch-soft-actor-critic.

chelydrae avatar chelydrae commented on August 23, 2024 2

I am facing the same problem. It is not solved yet, however, I can locate the issue:

In GaussianPolicy -> sample(): The policy output x_t is transformed by tanh(x_t). If you look at the shape of tanh() then you will find out that this is equal to 1 or -1 for most of the arguments besides a small area between -5 and 5. As a result, we receive a clipped y_t. The actions are "fixed" to the action_space constraints. The algorithm stops exploring and is therefore increasing the temperature-factor alpha. This leads firstly to an exploding temperature factor and secondly to an exploding critic loss.

The solution should be to replace action by x_t in the return statement of the sample() method. However, this is leading to an error (underneath):

I am trying to solve this and let you know if there are updates from my side. If you have any thoughts or inputs on this, please let me know.

Traceback (most recent call last):
File "/Users/hammlerp/PycharmProjects/SupplyChainOptimization/src/Optimization/SoftActorCritic/sac_main.py", line 94, in
action = agent.select_action(state) # Sample action from policy
File "/Users/hammlerp/PycharmProjects/SupplyChainOptimization/src/Optimization/SoftActorCritic/sac.py", line 48, in select_action
action, _, _ = self.policy.sample(state)
File "/Users/hammlerp/PycharmProjects/SupplyChainOptimization/src/Optimization/SoftActorCritic/model.py", line 98, in sample
normal = Normal(mean, std)
File "/Users/hammlerp/opt/anaconda3/envs/SupplyChainOptimization/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "/Users/hammlerp/opt/anaconda3/envs/SupplyChainOptimization/lib/python3.8/site-packages/torch/distributions/distribution.py", line 53, in init
raise ValueError("The parameter {} has invalid values".format(param))
ValueError: The parameter loc has invalid values

from pytorch-soft-actor-critic.

chelydrae avatar chelydrae commented on August 23, 2024 1

So the code is working on your side with the learned temperature and w/o modification ?

Yes. It works without modification. If I have time in the evening I'll check LunarLander for you.

from pytorch-soft-actor-critic.

seneca-bit avatar seneca-bit commented on August 23, 2024

Did you ever manage to solve this problem? I'm encountering similar exploding temperature for my environment no matter what value I choose, eventually the policy reaches that entropy, and temperature starts increasing...

from pytorch-soft-actor-critic.

reubenwong97 avatar reubenwong97 commented on August 23, 2024

Hi, not at the moment. I also compared the implementation to OpenAI's baselines and experimented with theirs but with similar results. Working on multiple things at the moment but will provide an update if I find anything.

from pytorch-soft-actor-critic.

thomashirtz avatar thomashirtz commented on August 23, 2024

this is equal to 1 or -1 for most of the arguments besides a small area between -5 and 5

Can you explain a little bit more what you meant by that ? I am also trying to fix it on my side

from pytorch-soft-actor-critic.

thomashirtz avatar thomashirtz commented on August 23, 2024

Hi, not at the moment. I also compared the implementation to OpenAI's baselines and experimented with theirs but with similar results. Working on multiple things at the moment but will provide an update if I find anything.

The OpenAI implementation have also the temperature exploding ?

from pytorch-soft-actor-critic.

chelydrae avatar chelydrae commented on August 23, 2024

this is equal to 1 or -1 for most of the arguments besides a small area between -5 and 5

Can you explain a little bit more what you meant by that ? I am also trying to fix it on my side.

The issue is solved on my side. Are you using a custom environment? My problem was related to the environment. Make sure to punish your agent when it proposes a value outside the desired interval.

from pytorch-soft-actor-critic.

thomashirtz avatar thomashirtz commented on August 23, 2024

this is equal to 1 or -1 for most of the arguments besides a small area between -5 and 5

Can you explain a little bit more what you meant by that ? I am also trying to fix it on my side.

The issue is solved on my side. Are you using a custom environment? My problem was related to the environment. Make sure to punish your agent when it proposes a value outside the desired interval.

I am confused, I thought that the aim of the squashed guaussian was to not go outside of the interval ? So the code is working on your side with the learned temperature and w/o modification ?

(I am trying to use it on the 'LunarLanderContinuous-v2' right now, the scores hover around 0 and to solve it you need to have >200)

from pytorch-soft-actor-critic.

thomashirtz avatar thomashirtz commented on August 23, 2024

Oh, my bad, it is working 😶 Thank you though :)

from pytorch-soft-actor-critic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.