Comments (18)
PPO does run with drop_last = False
. The policy loss values seem normal. Action values don't seem to immediately saturate.
from pytorch-a2c-ppo-acktr-gail.
Ok. Then I will add it, it might help.
Yes, this is a good idea. I will try to add it this week.
from pytorch-a2c-ppo-acktr-gail.
One solution is just to process actions in such a way that they are in [-1, 1], for example to add an action wrapper that uses tanh to remap them. But it's a little bit surprising that the output is so large (because the network is initialized to produce outputs roughly in the range of [-1, 1]).
Do you normalize the inputs?
from pytorch-a2c-ppo-acktr-gail.
What do you mean by normalize the inputs?
Regarding using tanh, do you think it would be a good idea to add code to always pass the outputs through tanh, and then remap that to the range of the action space?
from pytorch-a2c-ppo-acktr-gail.
So they have zero mean and unit variance (also then you need to remove / 255 from the model).
Yes, but just as an action wrapper for gym.
In any case, I'm going to add an optional normalization for image inputs probably today.
from pytorch-a2c-ppo-acktr-gail.
My images are in [0, 255], and not normalized.
The CNNPolicy
takes the gym action_space
as an input, so it should probably have the action scaling code.
from pytorch-a2c-ppo-acktr-gail.
Thanks for being so responsive. Your code is the best PyTorch implementation of A2C/ACKTR I found. :)
from pytorch-a2c-ppo-acktr-gail.
Thanks! I'm glad that this code can be useful.
At the moment, you can use this wrapper (I want to keep my code as consistent as possible with OpenAI implementations, but I'm going to add it to a new branch at some point).
class ScaleActions(gym.ActionWrapper):
def __init__(self, env=None):
super(ScaleActions, self).__init__(env)
def _step(self, action):
action = (np.tanh(action) + 1) / 2 * (self.action_space.high - self.action_space.low) + self.action_space.low
return self.env.step(action)
from pytorch-a2c-ppo-acktr-gail.
At the moment, you can use this wrapper
Thanks. I will give it a try. I do think you should scale the actions in the main branch though. If the actions aren't within the action space bounds, it's obviously a bug, even if OpenAI is doing it. TBH, I don't like their code that much. I find it's only geared to work with a few environments they tested, everything else requires modifications to the environment and to their code.
One small update: the huge action values are only a problem with your A2C implementation. This problem does not happen with ACKTR. The huge action values correspond with huge policy loss values (in the 10^9+ range).
from pytorch-a2c-ppo-acktr-gail.
By the way, did you try to use PPO?
After playing with all algorithms I found PPO to be the most robust and reliable one.
from pytorch-a2c-ppo-acktr-gail.
I didn't try PPO because of this requirement:
assert args.num_processes * args.num_steps % args.batch_size == 0
I can only run one process at the moment, my gym environment connects to a ROS/gazebo robot simulator.
from pytorch-a2c-ppo-acktr-gail.
Oh. Actually, this requirement isn't really necessary. You can just remove this line and also change
sampler = BatchSampler(SubsetRandomSampler(range(args.num_processes * args.num_steps)), args.batch_size * args.num_processes, drop_last=False)
sampler = BatchSampler(SubsetRandomSampler(range(args.num_processes * args.num_steps)), args.batch_size * args.num_processes, drop_last=True)
from pytorch-a2c-ppo-acktr-gail.
Hi again. There's an issue with wrapping the action outputs in a tanh I think. The actions quickly saturate at [1, 1]. At this point, I think the algorithm stays stuck there, unable to learn anymore. It would probably work better with a function that has less quick saturation?
Will try PPO now (without the wrapper)...
from pytorch-a2c-ppo-acktr-gail.
I think it probably saturates so quickly because advantages are not normalized (so it makes a huge step). In PPO advantages are normalized so it should work better.
from pytorch-a2c-ppo-acktr-gail.
Having an issue with PPO:
Traceback (most recent call last):
File "main.py", line 250, in <module>
main()
File "main.py", line 243, in main
final_rewards.max(), -dist_entropy.data[0],
UnboundLocalError: local variable 'dist_entropy' referenced before assignment
from pytorch-a2c-ppo-acktr-gail.
It means that an iteration of PPO didn't happen.
from pytorch-a2c-ppo-acktr-gail.
That's what I thought. Due to there just being one process?
from pytorch-a2c-ppo-acktr-gail.
Probably, because batch size was larger than the number of samples.
Try to keep drop_last = False.
from pytorch-a2c-ppo-acktr-gail.
Related Issues (20)
- does mask introduce bias in the gail implementation ?
- observation reset before insert
- Why acktr algorithm cannot be used in Mujoco settings?
- Can I train in my own game
- Can not run enjoy.py
- Stale hidden states
- Possible bug on the sign of policy log prob. in Fisher computation
- CNN Architecture
- Operations that have no effect
- No softmax before categorical loss?
- object has no attribute 'steps' in acktr
- why PPO needs to store action_log_probs instead of using stop_gradient for better efficiency? HOT 1
- [Question]Can I use Recurrent_policy for GAIL at this implementation?
- question about the recurrent HOT 1
- Oops! wrong repo :-D HOT 1
- setup.py and requirements.py have same dependencies except for h5py
- Where are the experts data for GAIL get from?
- Why is episode_rewards negative when running ant_v3 with PPO?
- Why didn't run to generate log?
- Updates: Support the latest Atari environment and state entropy maximization-based exploration.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-a2c-ppo-acktr-gail.