Code Monkey home page Code Monkey logo

naf-tensorflow's Introduction

Normalized Advantage Functions (NAF) in TensorFlow

TensorFlow implementation of Continuous Deep q-Learning with Model-based Acceleration.

algorithm

Requirements

Usage

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for an environment with a continuous action space:

$ python main.py --env_name=Pendulum-v0 --is_train=True
$ python main.py --env_name=Pendulum-v0 --is_train=True --display=True

To test and record the screens with gym:

$ python main.py --env_name=Pendulum-v0 --is_train=False
$ python main.py --env_name=Pendulum-v0 --is_train=False --display=True

Results

Training details of Pendulum-v0 with different hyperparameters.

$ python main.py --env_name=Pendulum-v0 # dark green
$ python main.py --env_name=Pendulum-v0 --action_fn=tanh # light green
$ python main.py --env_name=Pendulum-v0 --use_batch_norm=True # yellow
$ python main.py --env_name=Pendulum-v0 --use_seperate_networks=True # green

Pendulum-v0_2016-07-15

References

Author

Taehoon Kim / @carpedm20

naf-tensorflow's People

Contributors

carpedm20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

naf-tensorflow's Issues

README plot question

Hello, I was wondering what is the hyperparameter used for the blue line in the README plot (assets/Pendulum-v0_2016-07-15.png). Information for that plot line seems to be missing from the code block above it.

Unable to run

@carpedm20, after applying #6 I run into the following conflict:

python main.py --env=Pendulum-v0 --is_train=True --display=True
{'action_fn': 'tanh',
 'action_w': 'uniform_big',
 'batch_size': 100,
 'clip_action': False,
 'discount': 0.99,
 'display': True,
 'env_name': 'Pendulum-v0',
 'hidden_dims': '[100, 100]',
 'hidden_fn': 'tanh',
 'hidden_w': 'uniform_big',
 'is_train': True,
 'learning_rate': 0.001,
 'log_level': 'INFO',
 'max_episodes': 10000,
 'max_steps': 200,
 'monitor': False,
 'noise': 'linear_decay',
 'noise_scale': 0.3,
 'random_seed': 123,
 'tau': 0.001,
 'update_repeat': 10,
 'use_batch_norm': False,
 'use_seperate_networks': False,
 'w_reg': 'none',
 'w_reg_scale': 0.001}
2017-12-19 18:04:45.657697: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-19 18:04:45.657712: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-19 18:04:45.657717: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-19 18:04:45.657721: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[2017-12-19 18:04:45,658] Making new env: Pendulum-v0
[2017-12-19 18:04:45,689] Creating prediction network...
[2017-12-19 18:04:45,691] Creating shared networks for v, l, and mu
Traceback (most recent call last):
  File "main.py", line 118, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 99, in main
    scope='pred_network', **shared_args
  File "/Users/Victor/NAF-tensorflow/src/network.py", line 66, in __init__
    row = tf.pad(tf.concat(1, (diag_elem, non_diag_elems)), ((0, 0), (idx, 0)))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1043, in concat
    dtype=dtypes.int32).get_shape(
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in convert_to_tensor
    as_ref=False)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 113, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 374, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

I'd appreciate if you could take a look at provide an orientation about how to address this issue.

About OU Noise Function

I think in the paper, they use a modified OU process in section 8.2. I'm not sure if that is implemented in your code?

Reproducing results of the paper on Mujoco domain

Working on paper branch (link).

environment Best return for 200 steps
InvertedPendulum-v1
InvertedDoublePendulum-v1
Reacher-v1
HalfCheetah-v1 100
Swimmer-v1
Hopper-v1
Walker2d-v1
Ant-v1
Humanoid-v1
HumanoidStandup-v1

run error

i got this error:
Traceback (most recent call last): File "main.py", line 77, in <module> tf.app.run() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv)) File "main.py", line 72, in main conf.max_step, conf.max_update, conf.max_episode, conf.test_step) File "/Users/FloodSurge/DeepReinforcementLearning/gymCodes/NAF-tensorflow-master/src/naf.py", line 57, in __init__ name='pred_network', **shared_args File "/Users/FloodSurge/DeepReinforcementLearning/gymCodes/NAF-tensorflow-master/src/network.py", line 108, in __init__ L = tf.transpose(tf.pack(rows, axis=1), (0, 2, 1)) TypeError: pack() got an unexpected keyword argument 'axis'

The method is
tf.pack(values, name='pack')

I removed axis=1, still got errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.