Hi there, I noticed that even though policy net and value net share

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Gradient clipping in A3C about reinforcement-learning HOT 2 OPEN

dennybritz commented on May 12, 2024

Gradient clipping in A3C

from reinforcement-learning.

Comments (2)

dennybritz commented on May 12, 2024

Hmm, that's a good point. I am not sure what exactly is happening under the hood in Tensorflow here, but I would imagine that most the time the gradients are well within the boundary and this shouldn't have much of an adverse effect. But I think you're right, this may not be 100% correct.

It seems a bit ugly to fix this. I guess you would need to combine the two train ops by iterating through all gradients, add them, and then only clip the shared ones?

from reinforcement-learning.

poweic commented on May 12, 2024

@dennybritz I totally agree. Could be ugly but maybe a chance to refactor. sorry to get back to you so late. I was busy implementing ACER, something like an off-policy version of A3C.

I think one way to do this is to add up losses from policy net and value net first, and then compute the gradient and then clip them. I guess that requires lots of changes in the whole architecture because PolicyEstimator and ValueEstimator are now separate classes.

My suggestion is that we merge PolicyEstimator and ValueEstimator into a single class, something like this:

def build_shared_network(input):
  ...
  return shared
  
def policy_network(shared):
  ...
  return mu, sigma
  
def value_network(shared):
  ...
  return logits

class Estimator():
  def __init__(self, ...):
    ...
    shared = build_shared_network(...)
    mu, sigma = policy_network(shared)
    logits = value_network(shared)
    
    self.pi_loss = ...
    self.vf_loss = ...
    self.loss = self.pi_loss + self.vf_loss - entropy

    if trainable:
      self.optimizer = ...
      self.grads_and_vars = self.optimizer.compute_gradients(self.loss)

This has several advantages:

we don't need to pass "reuse" argument to build_shared_network anymore

need only 1 optimizer instead of 2 in separate classes

if trainable:
  self.optimizer = tf.train.RMSPropOptimizer(0.00025, 0.99, 0.0, 1e-6)
  ...

need only 1 make_train_op()

net_train_op = make_train_op(self.net, self.global_net)
# self.vnet_train_op = make_train_op(self.value_net, self.global_value_net)
# self.pnet_train_op = make_train_op(self.policy_net, self.global_policy_net)

But this is a big change and I'm sure whether that's a good idea.

from reinforcement-learning.

Recommend Projects

Gradient clipping in A3C about reinforcement-learning HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent