Code Monkey home page Code Monkey logo

async-rl's People

Contributors

bryant1410 avatar coreylynch avatar ei-grad avatar osh avatar xu-song avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

async-rl's Issues

OSError,why?

OSError: dlopen(/Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so, 6): Symbol not found: __ZNKSt5ctypeIcE13_M_widen_initEv
Referenced from: /Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so
Expected in: /usr/lib/libstdc++.6.dylib
in /Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so

Reward doesn't go up ....

I ran the async dqn model out of the box with 3 seeds on 7 atari games on 24 threads -- Pong, Breakout, SeaQuest, BeamRider, SpaceInvaders, Qbert, and Enduro. However, the reward stays the same for all the games until 11M global time steps. I've also run Breakout up to 30M global steps with 5 seeds and the reward doesn't go up either. Anybody has this issue?

No local network synchronization

I'm interested as to why you decided not to create a local copy of the variables in the worker threads and sync them with the global network at the end of the rollout. Does that create issues with the global network (being used for inference in the rollout) being updated in the middle of rollout? Is there a reason why you changed your algorithm from the one described in the Async methods for RL paper?

pretrained model

Hi @coreylynch , thanks for the awesome project!

I was wondering, do you have the Keras weights of a pretrained agent somewhere? I was looking to do some quick visualizations with breakout.

Best,
-eder

Tensorflow outdated

I guess this code is written in old tensorflow?

x = tf.reshape(x, tf.pack([-1, prod(shape(x)[1:])]))
AttributeError: 'module' object has no attribute 'pack'

Is it possible that this code updated to latest tensorflow.

Thanks!

t_max = 32

Hello,

In the A3C paper they state t_max = 5, is there any reason you set it to 32?

Actually I don't really understand why the batch size should be so small, why shouldn't we use traditional batch sizes of 128 or more frames, shouldn't this make learning stronger?

clipping

In the code the rewards returned from the environment are clipped between -1 and 1. But I believe breakout will give higher rewards than 1 for bricks in rows nearer the top. What is the rationale for clipping?

When are you planning to have A3C FF ( Algorithm 2) and A3C LSTM (Algorithm 3) done

What is your timeline of having n-Step Q-Learning A3C FF ( Algorithm 2 ) and A3C LSTM ( Algorithm 3) done as per you next steps in Keras + Tensorflow . I do have some code for a Stock Trading game that is using Deep Q ( just standard Deep Q Learning with Experience Play, but i would like to use A3C LSTM with Experience Play as per the research paper ) . Let me know if you are interested in working to incorporate the Stock trading Game into your code ( i will email you the zip code, it is 6 small python files) It is in Keras + TensorFlow .

ValueError: need more than 4 values to unpack

When I try to run the a3c.py, I came across some problem.
“”“
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "a3c.py", line 71, in actor_learner_thread
s, a, R, minimize, p_network, v_network = graph_ops
ValueError: need more than 4 values to unpack
“‘’”
Follow by the solution in the Stackoverflow, I add a comma in the code. but it failed.
I would appreciate it if anyone can help me.

Do results differ only because of the seed?

You write that one should try experiments with multiple seeds. Did you found that results differ substantially given only different seeds?

I'm asking because in the paper, Mnih. et al. take the best 5 out of 50 runs with different learning rates. However, from the paper it's not clear to me whether the methods are sensitive to the choice of learning rate or instable in general.

About the randomness of the performance

I am currently trying to run your code and get the same performance, but the mean reward is stuck around a score of 5. I have tried to run it three times and I got the same performance each time. The code seems to run fine though.

How random is the performance ? How many trials did you do before obtaining the results presented in the README ?

RGB image

How to use the raw RGB instead of the grayscaled image ?
I have some troubles with the neural networks shape which doesn't match the observation shape (84,84,3) ?

tf.Variable unexpected keyword 'dtype'

Using TensorFlow backend.
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 4
INFO:gym.envs.registration:Making new env: Breakout-v0
[2016-07-16 22:50:28,278] Making new env: Breakout-v0
Traceback (most recent call last):
File "async_dqn.py", line 310, in
tf.app.run()
File "/Library/Python/2.7/site-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "async_dqn.py", line 301, in main
graph_ops = build_graph(num_actions)
File "async_dqn.py", line 173, in build_graph
s, q_network = build_network(num_actions=num_actions, agent_history_length=FLAGS.agent_history_length, resized_width=FLAGS.resized_width, resized_height=FLAGS.resized_height)
File "/Users/nathaniel/Downloads/async-rl-master/model.py", line 10, in build_network
model = Convolution2D(nb_filter=16, nb_row=8, nb_col=8, subsample=(4,4), activation='relu', border_mode='same')(inputs)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 458, in call
self.build(input_shapes[0])
File "/usr/local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 296, in build
self.W = self.init(self.W_shape, name='{}_W'.format(self.name))
File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 61, in glorot_uniform
return uniform(shape, s, name=name)
File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 33, in uniform
name=name)
File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 103, in variable
v = tf.Variable(value, dtype=_convert_string_dtype(dtype), name=name)
TypeError: init() got an unexpected keyword argument 'dtype'

How to speed up training with GPU?

Hey! Thanks a bunch for sharing this.
I've made some attempts of speeding up the training with a GPU, but if there is any increase at all - it's very little. I get about 10 global frames/steps per sec when running the algorithm not on ALE but on a very simple python-script I've written myself. I've tried other GPU-compatible DL-algoritms and the slowdown doesn't seem to originate from scrip I've written. Do you have any idea of how to manage this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.