Code Monkey home page Code Monkey logo

Comments (18)

maximecb avatar maximecb commented on May 22, 2024 2

PPO does run with drop_last = False. The policy loss values seem normal. Action values don't seem to immediately saturate.

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024 1

Ok. Then I will add it, it might help.

Yes, this is a good idea. I will try to add it this week.

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

One solution is just to process actions in such a way that they are in [-1, 1], for example to add an action wrapper that uses tanh to remap them. But it's a little bit surprising that the output is so large (because the network is initialized to produce outputs roughly in the range of [-1, 1]).

Do you normalize the inputs?

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

What do you mean by normalize the inputs?

Regarding using tanh, do you think it would be a good idea to add code to always pass the outputs through tanh, and then remap that to the range of the action space?

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

So they have zero mean and unit variance (also then you need to remove / 255 from the model).

Yes, but just as an action wrapper for gym.

In any case, I'm going to add an optional normalization for image inputs probably today.

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

My images are in [0, 255], and not normalized.

The CNNPolicy takes the gym action_space as an input, so it should probably have the action scaling code.

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

Thanks for being so responsive. Your code is the best PyTorch implementation of A2C/ACKTR I found. :)

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

Thanks! I'm glad that this code can be useful.

At the moment, you can use this wrapper (I want to keep my code as consistent as possible with OpenAI implementations, but I'm going to add it to a new branch at some point).

class ScaleActions(gym.ActionWrapper):
    def __init__(self, env=None):
        super(ScaleActions, self).__init__(env)

    def _step(self, action):
        action = (np.tanh(action) + 1) / 2 * (self.action_space.high - self.action_space.low) + self.action_space.low
        return self.env.step(action)

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

At the moment, you can use this wrapper

Thanks. I will give it a try. I do think you should scale the actions in the main branch though. If the actions aren't within the action space bounds, it's obviously a bug, even if OpenAI is doing it. TBH, I don't like their code that much. I find it's only geared to work with a few environments they tested, everything else requires modifications to the environment and to their code.

One small update: the huge action values are only a problem with your A2C implementation. This problem does not happen with ACKTR. The huge action values correspond with huge policy loss values (in the 10^9+ range).

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

By the way, did you try to use PPO?

After playing with all algorithms I found PPO to be the most robust and reliable one.

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

I didn't try PPO because of this requirement:

assert args.num_processes * args.num_steps % args.batch_size == 0

I can only run one process at the moment, my gym environment connects to a ROS/gazebo robot simulator.

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

Oh. Actually, this requirement isn't really necessary. You can just remove this line and also change

sampler = BatchSampler(SubsetRandomSampler(range(args.num_processes * args.num_steps)), args.batch_size * args.num_processes, drop_last=False)
sampler = BatchSampler(SubsetRandomSampler(range(args.num_processes * args.num_steps)), args.batch_size * args.num_processes, drop_last=True)

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

Hi again. There's an issue with wrapping the action outputs in a tanh I think. The actions quickly saturate at [1, 1]. At this point, I think the algorithm stays stuck there, unable to learn anymore. It would probably work better with a function that has less quick saturation?

Will try PPO now (without the wrapper)...

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

I think it probably saturates so quickly because advantages are not normalized (so it makes a huge step). In PPO advantages are normalized so it should work better.

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

Having an issue with PPO:

Traceback (most recent call last):
  File "main.py", line 250, in <module>
    main()
  File "main.py", line 243, in main
    final_rewards.max(), -dist_entropy.data[0],
UnboundLocalError: local variable 'dist_entropy' referenced before assignment

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

It means that an iteration of PPO didn't happen.

from pytorch-a2c-ppo-acktr-gail.

maximecb avatar maximecb commented on May 22, 2024

That's what I thought. Due to there just being one process?

from pytorch-a2c-ppo-acktr-gail.

ikostrikov avatar ikostrikov commented on May 22, 2024

Probably, because batch size was larger than the number of samples.

Try to keep drop_last = False.

from pytorch-a2c-ppo-acktr-gail.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.