Comments (2)
Hmm, that's a good point. I am not sure what exactly is happening under the hood in Tensorflow here, but I would imagine that most the time the gradients are well within the boundary and this shouldn't have much of an adverse effect. But I think you're right, this may not be 100% correct.
It seems a bit ugly to fix this. I guess you would need to combine the two train ops by iterating through all gradients, add them, and then only clip the shared ones?
from reinforcement-learning.
@dennybritz I totally agree. Could be ugly but maybe a chance to refactor. sorry to get back to you so late. I was busy implementing ACER, something like an off-policy version of A3C.
I think one way to do this is to add up losses from policy net and value net first, and then compute the gradient and then clip them. I guess that requires lots of changes in the whole architecture because PolicyEstimator and ValueEstimator are now separate classes.
My suggestion is that we merge PolicyEstimator and ValueEstimator into a single class, something like this:
def build_shared_network(input):
...
return shared
def policy_network(shared):
...
return mu, sigma
def value_network(shared):
...
return logits
class Estimator():
def __init__(self, ...):
...
shared = build_shared_network(...)
mu, sigma = policy_network(shared)
logits = value_network(shared)
self.pi_loss = ...
self.vf_loss = ...
self.loss = self.pi_loss + self.vf_loss - entropy
if trainable:
self.optimizer = ...
self.grads_and_vars = self.optimizer.compute_gradients(self.loss)
This has several advantages:
- we don't need to pass "reuse" argument to build_shared_network anymore
- need only 1 optimizer instead of 2 in separate classes
if trainable: self.optimizer = tf.train.RMSPropOptimizer(0.00025, 0.99, 0.0, 1e-6) ...
- need only 1 make_train_op()
net_train_op = make_train_op(self.net, self.global_net) # self.vnet_train_op = make_train_op(self.value_net, self.global_value_net) # self.pnet_train_op = make_train_op(self.policy_net, self.global_policy_net)
But this is a big change and I'm sure whether that's a good idea.
from reinforcement-learning.
Related Issues (20)
- Why CliffWalkingEnv returns 'is_done=True' when reaching cliff? HOT 2
- Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'? HOT 1
- Why is Chapter 11 excluded? HOT 2
- why DQN use kernel size 8 ?
- Gambler's Problem: 0 Stake Allowed?
- Some question in MC Control with Epsilon-Greedy Policies Solution.ipynb HOT 2
- DQL size error
- Policy Evaluation Exercise Solution Is Wrong HOT 1
- Monte Carlo AssertionError: defaultdict(<function mc_control_importance_sampling.<locals>.<lambda> at 0x7f31699ffe18>, {}) (<class 'collections.defaultdict'>)
- Lecture Slides need an update
- Clarification on DQN testing rewards on Atari games
- DQN Testing Rewards on Atari Games HOT 1
- Reinforcement learning policy HOT 1
- Minor Link fix
- A small correction in "MDPs and Bellman Equations" section
- Typo in: "Model-Free Prediction & Control with Monte Carlo (MC)" section -> "Blackjack Playground.ipynb" file:
- Issue in: reinforcement-learning/MC/MC Prediction Solution.ipynb
- please provide requirements.txt or mention the exact version of packages used.
- demystifying-deep-reinforcement-learning link is broken
- MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reinforcement-learning.