carpedm20 / naf-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

193.0 17.0 59.0 504 KB

"Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow

License: MIT License

Python 99.06% Shell 0.94%

tensorflow gym continuous-rl reinforcement-learning deep-reinforcement-learning deep-learning

naf-tensorflow's Introduction

Normalized Advantage Functions (NAF) in TensorFlow

TensorFlow implementation of Continuous Deep q-Learning with Model-based Acceleration.

Requirements

Python 2.7
gym
TensorFlow 0.9+

Usage

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for an environment with a continuous action space:

$ python main.py --env_name=Pendulum-v0 --is_train=True
$ python main.py --env_name=Pendulum-v0 --is_train=True --display=True

To test and record the screens with gym:

$ python main.py --env_name=Pendulum-v0 --is_train=False
$ python main.py --env_name=Pendulum-v0 --is_train=False --display=True

Results

Training details of Pendulum-v0 with different hyperparameters.

$ python main.py --env_name=Pendulum-v0 # dark green
$ python main.py --env_name=Pendulum-v0 --action_fn=tanh # light green
$ python main.py --env_name=Pendulum-v0 --use_batch_norm=True # yellow
$ python main.py --env_name=Pendulum-v0 --use_seperate_networks=True # green

References

Author

Taehoon Kim / @carpedm20

naf-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

icewwn hhhmoan peterzcc tonydeep nkcr7 jacktangtang williamd4112 ynswon tigerneil qianwangthu praveensingh123 barzinm mrace zhexiaozhe lousiaye lyfjwp pyni bekerov saadmahboob chenglongchen vmayoral derekpp huanntran100 kylinliu onetree1994 mengwoods xcodeburpx binderwang abiraja2004 sweetieboss semueller dogordog jackokaiser afcarl echogogogo minvex cclalala123 reinforcement-learning-fun jackofspades-7 phuongxuanpham lvbaiyang xeransis lagrassa hedingbo tedrepo vanvantong zengman lishuailong kratosomega githubbeinner kevintrannz guoyaq mitchgh yangg5403 amysmith28 546454596

naf-tensorflow's Issues

What is run2(...) in naf.py?

I have a question.

What is run2() function in naf.py?

Hello, I was wondering what is the hyperparameter used for the blue line in the README plot (assets/Pendulum-v0_2016-07-15.png). Information for that plot line seems to be missing from the code block above it.

Unable to run

@carpedm20, after applying #6 I run into the following conflict:

python main.py --env=Pendulum-v0 --is_train=True --display=True
{'action_fn': 'tanh',
 'action_w': 'uniform_big',
 'batch_size': 100,
 'clip_action': False,
 'discount': 0.99,
 'display': True,
 'env_name': 'Pendulum-v0',
 'hidden_dims': '[100, 100]',
 'hidden_fn': 'tanh',
 'hidden_w': 'uniform_big',
 'is_train': True,
 'learning_rate': 0.001,
 'log_level': 'INFO',
 'max_episodes': 10000,
 'max_steps': 200,
 'monitor': False,
 'noise': 'linear_decay',
 'noise_scale': 0.3,
 'random_seed': 123,
 'tau': 0.001,
 'update_repeat': 10,
 'use_batch_norm': False,
 'use_seperate_networks': False,
 'w_reg': 'none',
 'w_reg_scale': 0.001}
2017-12-19 18:04:45.657697: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-19 18:04:45.657712: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-19 18:04:45.657717: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-19 18:04:45.657721: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[2017-12-19 18:04:45,658] Making new env: Pendulum-v0
[2017-12-19 18:04:45,689] Creating prediction network...
[2017-12-19 18:04:45,691] Creating shared networks for v, l, and mu
Traceback (most recent call last):
  File "main.py", line 118, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 99, in main
    scope='pred_network', **shared_args
  File "/Users/Victor/NAF-tensorflow/src/network.py", line 66, in __init__
    row = tf.pad(tf.concat(1, (diag_elem, non_diag_elems)), ((0, 0), (idx, 0)))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1043, in concat
    dtype=dtypes.int32).get_shape(
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in convert_to_tensor
    as_ref=False)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 113, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 374, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

I'd appreciate if you could take a look at provide an orientation about how to address this issue.

the action 'mu' only ranges from -1 to 1

the action 'mu' from the networks is only adjusted by the activation function tanh, in this way, the action can only range from -1 to 1

About OU Noise Function

I think in the paper, they use a modified OU process in section 8.2. I'm not sure if that is implemented in your code?

Reproducing results of the paper on Mujoco domain

Working on paper branch (link).

environment	Best return for 200 steps
InvertedPendulum-v1
InvertedDoublePendulum-v1
Reacher-v1
HalfCheetah-v1	100
Swimmer-v1
Hopper-v1
Walker2d-v1
Ant-v1
Humanoid-v1
HumanoidStandup-v1

run error

i got this error:
Traceback (most recent call last): File "main.py", line 77, in <module> tf.app.run() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv)) File "main.py", line 72, in main conf.max_step, conf.max_update, conf.max_episode, conf.test_step) File "/Users/FloodSurge/DeepReinforcementLearning/gymCodes/NAF-tensorflow-master/src/naf.py", line 57, in __init__ name='pred_network', **shared_args File "/Users/FloodSurge/DeepReinforcementLearning/gymCodes/NAF-tensorflow-master/src/network.py", line 108, in __init__ L = tf.transpose(tf.pack(rows, axis=1), (0, 2, 1)) TypeError: pack() got an unexpected keyword argument 'axis'

The method is
tf.pack(values, name='pack')

I removed axis=1, still got errors.

There is no optimization of prediction network in run2 function

As the title, in run2 function, should the optimization be add ahead of updating of target network?

Imagination rollouts

Is there any code available for this from the paper?