jaromiru / ai-blog Goto Github PK
View Code? Open in Web Editor NEWAccompanying repository for Let's make a DQN / A3C series.
Home Page: https://jaromiru.com
License: MIT License
Accompanying repository for Let's make a DQN / A3C series.
Home Page: https://jaromiru.com
License: MIT License
I am trying to adapt your code to train the agent to play breakout. I tried to use both the CartPole-basic file as well as the Seaquest-DDQN-PER file but the agent doesn't seem to learn after training for couple of hundreds of episodes (the total reward is around 1 to 3 in average). Have you tried to train the agent to play breakout with the code? If so was rgere any effects? I was using "Breakout-ram-v0" for the CartPole-basic file and "Breakout-v0" for the Seaquest-DDQN-PER file.
Any chance you could help me get this running on a Retro Gym environment?
On your blog page on theory about A3C:
https://jaromiru.com/2017/03/26/lets-make-an-a3c-implementation/
you put define the getsample function in the Agent class:
def get_sample(memory, n):
r = 0.
for i in range(n):
r += memory[i][2] * (GAMMA ** i)
s, a, _, _ = memory[0]
_, _, , s = memory[n-1]
return s, a, r, s_
but in the actual code at line 183 the for loop is missing like:
def get_sample(memory, n):
s, a, _, _ = memory[0]
_, _, , s = memory[n-1]
return s, a, self.R, s_
I think your blog is right the implementation is missing the part right? Thanks
I suppose there is a race condition in your Optimizer class in A3C implementation here:
if len(self.train_queue[0]) < MIN_BATCH:
time.sleep(0) # yield
return
with self.lock_queue:
s, a, r, s_, s_mask = self.train_queue
self.train_queue = [ [], [], [], [], [] ]
If Optimizer A and B both passed the queue size check, then A takes all experiences away from the train queue, Optimizer B will get nothing from it and will raise an exception. The easiest way to reproduce it is to run the script in MountainCar
environment (though I don't know why it's ok in CartPole
)
I guess the check should also be included in the critical area, after getting the lock of the train queue.
Thank you!
Hi,
Although I can run other scripts, I get the following error when I attempt to run Seaquest-DDQN-PER.py (Using theano backend):
Using Theano backend. Using gpu device 0: GeForce GT 730M (CNMeM is disabled, cuDNN 5103) /usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5. warnings.warn(warn) [2016-11-18 15:58:01,675] Making new env: Seaquest-v0 Traceback (most recent call last): File "run.py", line 261, in <module> agent = Agent(stateCnt, actionCnt) File "run.py", line 139, in __init__ self.brain = Brain(stateCnt, actionCnt) File "run.py", line 49, in __init__ self.model = self._createModel() File "run.py", line 59, in _createModel model.add(Dense(output_dim=512, activation='relu')) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 312, in add output_tensor = layer(self.outputs[0]) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 487, in __call__ self.build(input_shapes[0]) File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 695, in build name='{}_W'.format(self.name)) File "/usr/local/lib/python2.7/dist-packages/keras/initializations.py", line 59, in glorot_uniform return uniform(shape, s, name=name) File "/usr/local/lib/python2.7/dist-packages/keras/initializations.py", line 32, in uniform return K.random_uniform_variable(shape, -scale, scale, name=name) File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 140, in random_uniform_variable return variable(np.random.uniform(low=low, high=high, size=shape), File "mtrand.pyx", line 1565, in mtrand.RandomState.uniform (numpy/random/mtrand/mtrand.c:17319) OverflowError: Range exceeds valid bounds
using Tensorflow backend:
Using TensorFlow backend. I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally [2016-11-18 15:57:31,713] Making new env: Seaquest-v0 I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GT 730M major: 3 minor: 5 memoryClockRate (GHz) 0.758 pciBusID 0000:01:00.0 Total memory: 1023.88MiB Free memory: 161.57MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:01:00.0) E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 161.57M (169422848 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY Traceback (most recent call last): File "run.py", line 260, in <module> agent = Agent(stateCnt, actionCnt) File "run.py", line 138, in __init__ self.brain = Brain(stateCnt, actionCnt) File "run.py", line 48, in __init__ self.model = self._createModel() File "run.py", line 54, in _createModel model.add(Convolution2D(32, 8, 8, subsample=(4,4), activation='relu', input_shape=(self.stateCnt))) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 280, in add layer.create_input_layer(batch_input_shape, input_dtype) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 370, in create_input_layer self(x) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 514, in __call__ self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 149, in create_node output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0])) File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 466, in call filter_shape=self.W_shape) File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1639, in conv2d x = tf.nn.conv2d(x, kernel, strides, padding=padding) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 394, in conv2d data_format=data_format, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2319, in create_op set_shapes_for_outputs(ret) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1711, in set_shapes_for_outputs shapes = shape_func(op) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 246, in conv2d_shape padding) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 184, in get2d_conv_output_size (row_stride, col_stride), padding_type) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 149, in get_conv_output_size "Filter: %r Input: %r" % (filter_size, input_size)) ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (2, 84)
Based on the discussion here, the normal Huber loss should be used in DQN.
Thanks for this repo. Very useful.
However, would you elaborate on why you are using models model
and model_
in the brain. In the agent.replay, it looks like you use one model to predict on s
and another to do the same on s_
while you only train model
. I can see that at some point in the learning process, you update model_
's weights while the agent is observing.
Can you please explain how this trick is different from what you had in CartPole-Basic.py
?
Used threading
library used in A3C example is not really concurrent. See https://docs.python.org/3/library/threading.html.
In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once.
This implementation still benefits from this design, performance is however severely degraded. Could be possibly fixed by rewriting in multiprocessing
library, https://docs.python.org/3.6/library/multiprocessing.html
If anyone wants to do it, let me know.
Hello, first of all, congrats for the article.
I'm using it for study, and i'm trying to run your code to better undestanding.
So, i have some questions:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_3/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_3/kernel)
In line 202 of A3C version, R is divided by GAMMA. But in line 211, R is not.
According to your blog:
R_1=(R_0 - r_0+gamma^n * r_n)/gamma
I think line 211 are supposed to be
self.R = self.R - self.memory[0][2]/GAMMA
I could be wrong but it does not seem that you are annealing the bias with important sampling as suggested in the PER paper(section 3.4).
w_i = (1/N * 1/P(i))^beta
I think you would have to multiply this w_i term with your gradients
First, what are EPS_START, EPS_STOP, and EPS_STEPS? If I want episodes to last until the game naturally terminates an episode, how would I modify these? Could I just set EPS_STEPS to be a really large value?
Second, I'm using a 3D state space, and for some reason the following lines:
s = np.vstack(s)
a = np.vstack(a)
r = np.vstack(r)
s_ = np.vstack(s_)
s_mask = np.vstack(s_mask)
result in s and s_ having 3 dimensions instead of the proper 4 (batch dimension included). I changed these lines to:
s = np.array(s)
a = np.vstack(a)
r = np.vstack(r)
s_ = np.array(s_)
s_mask = np.vstack(s_mask)
Is that an acceptable solution or am I screwing up the logic in this way?
Thank you so much for your clarification. This code is immensely helpful and I appreciate it and the thorough explanation very much.
Hi,
With which licence it is published ?
I like the SumTree implementation and some other stuff and I'm curious what are the conditions to use your code and how you should be credited.
Given that the OpenAI Gym environment MountainCar-v0 ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it does, because I have working code (basically your awesome code, that is) that proves it.
It is my understanding that ultimately there needs to be a "sparse reward" that is found. Yet as far as I can see from the openAI Gym code, there is never any reward other than -1. It feels more like a "no reward" environment.
What almost answers my question, but in fact does not: when the task is completed quickly, the return (sum of rewards) of the episode is larger. So if the car never finds the flag, the return is -1000. If the car finds the flag quickly the return might be -200. The reason this does not answer my question is because with DQN and experience replay, those returns (-1000, -200) are never present in the experience replay memory. All the memory has are tuples of the form (state, action, reward, next_state), and of course tuples are pulled from memory at random, not episode-by-episode.
If reaching the flag yielded a reward of +1 (or 100) etc.... things would make more sense to me...
So, I don't see anything in the memory that indicates that the episode was performed well.
And thus, I have no idea why this DQN code is working for MountainCar.
PS: I asked this question on your blog too (as a comment). Apologies for duplication -- I'm not sure where you look and don't look :)
https://github.com/jaara/AI-blog/blob/361e8c79dcec861e30418f82de17b644028d8623/CartPole-basic.py#L112-L113
If I am not mistaken this should not be agent
but self
as agent
is the variable created outside of class Agent
. Therefore the program will work as intended, however this would be better as self
.
According to Keras 2 API need to change in Brain
class
output_dim
to units
nb_epoch
to epochs
def _createModel(self):
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=stateCnt))
model.add(Dense(units=actionCnt, activation='linear'))
opt = RMSprop(lr=0.00025)
model.compile(loss=hubert_loss, optimizer=opt)
return model
def train(self, x, y, epoch=1, verbose=0):
self.model.fit(x, y, batch_size=64, epochs=epoch, verbose=verbose)
`
I am also working on multi-thread prediction on RL model. I've been stuck on this issue for about a week until I saw this line of your code:
model._make_predict_function() # have to initialize before threading
So can you tell me what does this function do and why we need to add this line of code? I can't find any documentation about this.
Just in case you do not know, your blog seems to be down the past few days.
Keith
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.