coreylynch / async-rl Goto Github PK

Tensorflow + Keras + OpenAI Gym implementation of 1-step Q Learning from "Asynchronous Methods for Deep Reinforcement Learning"

License: MIT License

Python 100.00%

async-rl's People

Contributors

Stargazers

Watchers

Forkers

jmrinaldi magnord skypea amoliu scylla codeaudit jeffstokes72 somaticapi philipz huleg thanujadax fnaval maniacs-ops wanjinchang techscientist datastark xzoo2013 neuroradiology leoh0 shangxing2015 ghotiv alanguo001 ml-ai-nlp-ir jacobzweig asmith26 paulhendricks floodsung poliflix loofahcus offbit aaronzhudp itfische jhayes14 edersantana drl-ycheng csdlrl vyraun tigerneil biruce-ai neverspill zeyuan1987 benjamesbabala hedgefair ematvey mazecreator stevekapturowski zengqinglong yif0 pengcheng-wang aistrych pkumusic risheekg programfiles sherjilozair hongzimao capybaralet nsimsiri davidalgo mayurand ivehui medusagit pwaila kaixianglin jinsongbo nkcr7 khudkhud osh jadielam yiiwood ryannnxu zencoding wsjeon trigrass2 danielgordon10 yangee ei-grad wjssx innixma lxpan wuntoguo victorzsl nwayt001 happywu anthonysull lichnak yongduek catchmrbharath ijeomaonuosa mightychaos coocoky zhexiaozhe mnrmja007 luonay bryant1410 fanninnypeom teeyihhou tawnkramer ginsongsong moniljhaveri alangoran

async-rl's Issues

Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.

Has anyone else had this problem when running the example code?

OSError: dlopen(/Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so, 6): Symbol not found: __ZNKSt5ctypeIcE13_M_widen_initEv
Referenced from: /Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so
Expected in: /usr/lib/libstdc++.6.dylib
in /Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so

Reward doesn't go up ....

I ran the async dqn model out of the box with 3 seeds on 7 atari games on 24 threads -- Pong, Breakout, SeaQuest, BeamRider, SpaceInvaders, Qbert, and Enduro. However, the reward stays the same for all the games until 11M global time steps. I've also run Breakout up to 30M global steps with 5 seeds and the reward doesn't go up either. Anybody has this issue?

No local network synchronization

I'm interested as to why you decided not to create a local copy of the variables in the worker threads and sync them with the global network at the end of the rollout. Does that create issues with the global network (being used for inference in the rollout) being updated in the middle of rollout? Is there a reason why you changed your algorithm from the one described in the Async methods for RL paper?

pretrained model

Hi @coreylynch , thanks for the awesome project!

I was wondering, do you have the Keras weights of a pretrained agent somewhere? I was looking to do some quick visualizations with breakout.

Best,
-eder

Tensorflow outdated

I guess this code is written in old tensorflow?

x = tf.reshape(x, tf.pack([-1, prod(shape(x)[1:])]))
AttributeError: 'module' object has no attribute 'pack'

Is it possible that this code updated to latest tensorflow.

Thanks!

FailedPreconditionError

Excuse me, as I execute your program, I've got an error in tensorflow initialize：
Attempting to use uninitialized value convolution2d_1_W
[[Node: convolution2d_1_W/read = IdentityT=DT_FLOAT, _class=["loc:@convolution2d_1_W"], _device="/job:localhost/replica:0/task:0/cpu:0"]]

How can I slove it？
Thx!

t_max = 32

Hello,

In the A3C paper they state t_max = 5, is there any reason you set it to 32?

Actually I don't really understand why the batch size should be so small, why shouldn't we use traditional batch sizes of 128 or more frames, shouldn't this make learning stronger?

May I know the version of keras and tensorflow?

clipping

In the code the rewards returned from the environment are clipped between -1 and 1. But I believe breakout will give higher rewards than 1 for bricks in rows nearer the top. What is the rationale for clipping?

Attempting to use uninitialized value conv2d_1/kernel

Whenever I try to start the training I get the error:
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value conv2d_1/kernel

When are you planning to have A3C FF ( Algorithm 2) and A3C LSTM (Algorithm 3) done

What is your timeline of having n-Step Q-Learning A3C FF ( Algorithm 2 ) and A3C LSTM ( Algorithm 3) done as per you next steps in Keras + Tensorflow . I do have some code for a Stock Trading game that is using Deep Q ( just standard Deep Q Learning with Experience Play, but i would like to use A3C LSTM with Experience Play as per the research paper ) . Let me know if you are interested in working to incorporate the Stock trading Game into your code ( i will email you the zip code, it is 6 small python files) It is in Keras + TensorFlow .

ValueError: need more than 4 values to unpack

When I try to run the a3c.py, I came across some problem.
“”“
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "a3c.py", line 71, in actor_learner_thread
s, a, R, minimize, p_network, v_network = graph_ops
ValueError: need more than 4 values to unpack
“‘’”
Follow by the solution in the Stackoverflow, I add a comma in the code. but it failed.
I would appreciate it if anyone can help me.

Do results differ only because of the seed?

You write that one should try experiments with multiple seeds. Did you found that results differ substantially given only different seeds?

I'm asking because in the paper, Mnih. et al. take the best 5 out of 50 runs with different learning rates. However, from the paper it's not clear to me whether the methods are sensitive to the choice of learning rate or instable in general.

Example of Actor critic for large number of actions

Is the a3c implementation done?

Do you have an example implementation of actor critic for large action spaces. I see that there is a example for a3c, but the action space for this problem is small.

About the randomness of the performance

I am currently trying to run your code and get the same performance, but the mean reward is stuck around a score of 5. I have tried to run it three times and I got the same performance each time. The code seems to run fine though.

How random is the performance ? How many trials did you do before obtaining the results presented in the README ?

duplicate

RGB image

How to use the raw RGB instead of the grayscaled image ?
I have some troubles with the neural networks shape which doesn't match the observation shape (84,84,3) ?

Stop actor gradient flowing through the critic

I think you should use tf.stop_gradient() in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a while to figure this out in my own code, too.

ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (4, 84)

Hello, I install all dependencies and run the first train command python async_dqn.py --experiment breakout --game "Breakout-v0" --num_concurrent 8, but it come out an error. It seems that the input size (4,84) is wrong?

tf.Variable unexpected keyword 'dtype'

Using TensorFlow backend.
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 4
INFO:gym.envs.registration:Making new env: Breakout-v0
[2016-07-16 22:50:28,278] Making new env: Breakout-v0
Traceback (most recent call last):
File "async_dqn.py", line 310, in
tf.app.run()
File "/Library/Python/2.7/site-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "async_dqn.py", line 301, in main
graph_ops = build_graph(num_actions)
File "async_dqn.py", line 173, in build_graph
s, q_network = build_network(num_actions=num_actions, agent_history_length=FLAGS.agent_history_length, resized_width=FLAGS.resized_width, resized_height=FLAGS.resized_height)
File "/Users/nathaniel/Downloads/async-rl-master/model.py", line 10, in build_network
model = Convolution2D(nb_filter=16, nb_row=8, nb_col=8, subsample=(4,4), activation='relu', border_mode='same')(inputs)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 458, in call
self.build(input_shapes[0])
File "/usr/local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 296, in build
self.W = self.init(self.W_shape, name='{}_W'.format(self.name))
File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 61, in glorot_uniform
return uniform(shape, s, name=name)
File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 33, in uniform
name=name)
File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 103, in variable
v = tf.Variable(value, dtype=_convert_string_dtype(dtype), name=name)
TypeError: init() got an unexpected keyword argument 'dtype'

null

How to speed up training with GPU?

Hey! Thanks a bunch for sharing this.
I've made some attempts of speeding up the training with a GPU, but if there is any increase at all - it's very little. I get about 10 global frames/steps per sec when running the algorithm not on ALE but on a very simple python-script I've written myself. I've tried other GPU-compatible DL-algoritms and the slowdown doesn't seem to originate from scrip I've written. Do you have any idea of how to manage this issue?