jaromiru / ai-blog Goto Github PK

View Code? Open in Web Editor NEW

390.0 23.0 175.0 20 KB

Accompanying repository for Let's make a DQN / A3C series.

Home Page: https://jaromiru.com

License: MIT License

Python 100.00%

dqn a3c reinforcement-learning tutorial keras tensorflow

ai-blog's People

Contributors

Stargazers

Watchers

Forkers

ionelhosu wangxiao5791509 vyraun hyqleonardo killervictor innixma nikicicz jonygli shashankn91 hydchydc darkforte philkuz haoshuji simeneide zxsted ryannnxu zencoding simmoncn jolinxql just-rks mhasana faur zhexiaozhe alanxu89 hellokitty8 malagori xuexixuexihaha mightyroy jaehyek newebug derekwei77 rajatg tacalvin moraval psfournier haha-533 egaus junkbot xzw0005 micahcarroll mehdimashayekhi collector-m yhtp6 jiths shubhampachori12110095 djbyrne davidsonggithub verigibest pinkgranny cg31 scratchcat1 filipre feiua royerk maxilirator samching dav-rnd kastnerkyle wojciechmigda taylorkangbeck oskli acherestes khangdaomsc hbcbh1999 rahulsridhar-conv sunshinejnjn jagatfx ellerylin mekladious auserj strike60 afcarl mdasadul youngjt codeingreen kent-wong guo-m batermj william0523 dailyncepu laojiang012 panyisheng kirarpit liangchenms jrjdr edmarisov w0lv3r1nix shashazhengchuan myausweis lucas110550 puzzler10 joelalb baichenjia zzlking swansealeo gouthamhm anands09 ejcv vulcanrowley jingranburangyongzhongwen

ai-blog's Issues

Playing Breakout with the program

I am trying to adapt your code to train the agent to play breakout. I tried to use both the CartPole-basic file as well as the Seaquest-DDQN-PER file but the agent doesn't seem to learn after training for couple of hundreds of episodes (the total reward is around 1 to 3 in average). Have you tried to train the agent to play breakout with the code? If so was rgere any effects? I was using "Breakout-ram-v0" for the CartPole-basic file and "Breakout-v0" for the Seaquest-DDQN-PER file.

Retro

Any chance you could help me get this running on a Retro Gym environment?

A3c getsample potential error

On your blog page on theory about A3C:
https://jaromiru.com/2017/03/26/lets-make-an-a3c-implementation/
you put define the getsample function in the Agent class:
def get_sample(memory, n):
r = 0.
for i in range(n):
r += memory[i][2] * (GAMMA ** i)
s, a, _, _ = memory[0]
_, _, , s = memory[n-1]

return s, a, r, s_

but in the actual code at line 183 the for loop is missing like:

def get_sample(memory, n):
s, a, _, _ = memory[0]
_, _, , s = memory[n-1]

return s, a, self.R, s_

I think your blog is right the implementation is missing the part right? Thanks

Race condition in Optimizer in A3C script

I suppose there is a race condition in your Optimizer class in A3C implementation here:

    if len(self.train_queue[0]) < MIN_BATCH:
      time.sleep(0)  # yield
      return

    with self.lock_queue:
      s, a, r, s_, s_mask = self.train_queue
      self.train_queue = [ [], [], [], [], [] ]

If Optimizer A and B both passed the queue size check, then A takes all experiences away from the train queue, Optimizer B will get nothing from it and will raise an exception. The easiest way to reproduce it is to run the script in MountainCar environment (though I don't know why it's ok in CartPole )

I guess the check should also be included in the critical area, after getting the lock of the train queue.

Thank you!

Continue train lstm model in keras while maintenance to deploy the model with flask on one gpu?

Error in Seaquest-DDQN-PER.py

Hi,
Although I can run other scripts, I get the following error when I attempt to run Seaquest-DDQN-PER.py (Using theano backend):
Using Theano backend. Using gpu device 0: GeForce GT 730M (CNMeM is disabled, cuDNN 5103) /usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5. warnings.warn(warn) [2016-11-18 15:58:01,675] Making new env: Seaquest-v0 Traceback (most recent call last): File "run.py", line 261, in <module> agent = Agent(stateCnt, actionCnt) File "run.py", line 139, in __init__ self.brain = Brain(stateCnt, actionCnt) File "run.py", line 49, in __init__ self.model = self._createModel() File "run.py", line 59, in _createModel model.add(Dense(output_dim=512, activation='relu')) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 312, in add output_tensor = layer(self.outputs[0]) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 487, in __call__ self.build(input_shapes[0]) File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 695, in build name='{}_W'.format(self.name)) File "/usr/local/lib/python2.7/dist-packages/keras/initializations.py", line 59, in glorot_uniform return uniform(shape, s, name=name) File "/usr/local/lib/python2.7/dist-packages/keras/initializations.py", line 32, in uniform return K.random_uniform_variable(shape, -scale, scale, name=name) File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 140, in random_uniform_variable return variable(np.random.uniform(low=low, high=high, size=shape), File "mtrand.pyx", line 1565, in mtrand.RandomState.uniform (numpy/random/mtrand/mtrand.c:17319) OverflowError: Range exceeds valid bounds

using Tensorflow backend:
Using TensorFlow backend. I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally [2016-11-18 15:57:31,713] Making new env: Seaquest-v0 I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GT 730M major: 3 minor: 5 memoryClockRate (GHz) 0.758 pciBusID 0000:01:00.0 Total memory: 1023.88MiB Free memory: 161.57MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:01:00.0) E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 161.57M (169422848 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY Traceback (most recent call last): File "run.py", line 260, in <module> agent = Agent(stateCnt, actionCnt) File "run.py", line 138, in __init__ self.brain = Brain(stateCnt, actionCnt) File "run.py", line 48, in __init__ self.model = self._createModel() File "run.py", line 54, in _createModel model.add(Convolution2D(32, 8, 8, subsample=(4,4), activation='relu', input_shape=(self.stateCnt))) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 280, in add layer.create_input_layer(batch_input_shape, input_dtype) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 370, in create_input_layer self(x) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 514, in __call__ self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 149, in create_node output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0])) File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 466, in call filter_shape=self.W_shape) File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1639, in conv2d x = tf.nn.conv2d(x, kernel, strides, padding=padding) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 394, in conv2d data_format=data_format, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2319, in create_op set_shapes_for_outputs(ret) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1711, in set_shapes_for_outputs shapes = shape_func(op) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 246, in conv2d_shape padding) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 184, in get2d_conv_output_size (row_stride, col_stride), padding_type) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 149, in get_conv_output_size "Filter: %r Input: %r" % (filter_size, input_size)) ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (2, 84)

Using pseudo-huber loss is incorrect

Based on the discussion here, the normal Huber loss should be used in DQN.

why two models in the brain?

Thanks for this repo. Very useful.
However, would you elaborate on why you are using models model and model_ in the brain. In the agent.replay, it looks like you use one model to predict on s and another to do the same on s_ while you only train model. I can see that at some point in the learning process, you update model_'s weights while the agent is observing.

Can you please explain how this trick is different from what you had in CartPole-Basic.py?

Threading in A3C is not really concurrent

Used threading library used in A3C example is not really concurrent. See https://docs.python.org/3/library/threading.html.

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once.

This implementation still benefits from this design, performance is however severely degraded. Could be possibly fixed by rewriting in multiprocessing library, https://docs.python.org/3.6/library/multiprocessing.html

If anyone wants to do it, let me know.

Just some questions

Hello, first of all, congrats for the article.
I'm using it for study, and i'm trying to run your code to better undestanding.
So, i have some questions:

Do you remember the exact versions of tensorflow, keras and gym used?
I tried with tensorflow 1.15 and keras 2.3.1 and i'm getting the following error, do you what am i doing wrong?:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_3/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_3/kernel)

A3C version missing a GAMMA

In line 202 of A3C version, R is divided by GAMMA. But in line 211, R is not.
According to your blog:
R_1=(R_0 - r_0+gamma^n * r_n)/gamma
I think line 211 are supposed to be
self.R = self.R - self.memory[0][2]/GAMMA

annealing bias

I could be wrong but it does not seem that you are annealing the bias with important sampling as suggested in the PER paper(section 3.4).

w_i = (1/N * 1/P(i))^beta

I think you would have to multiply this w_i term with your gradients

A couple of questions

First, what are EPS_START, EPS_STOP, and EPS_STEPS? If I want episodes to last until the game naturally terminates an episode, how would I modify these? Could I just set EPS_STEPS to be a really large value?

Second, I'm using a 3D state space, and for some reason the following lines:

s = np.vstack(s)
a = np.vstack(a)
r = np.vstack(r)
s_ = np.vstack(s_)
s_mask = np.vstack(s_mask)

result in s and s_ having 3 dimensions instead of the proper 4 (batch dimension included). I changed these lines to:

s = np.array(s)
a = np.vstack(a)
r = np.vstack(r)
s_ = np.array(s_)
s_mask = np.vstack(s_mask)

Is that an acceptable solution or am I screwing up the logic in this way?

Thank you so much for your clarification. This code is immensely helpful and I appreciate it and the thorough explanation very much.

Licence

Hi,
With which licence it is published ?
I like the SumTree implementation and some other stuff and I'm curious what are the conditions to use your code and how you should be credited.

Conceptual question about DQN when reward is always -1

Given that the OpenAI Gym environment MountainCar-v0 ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it does, because I have working code (basically your awesome code, that is) that proves it.

It is my understanding that ultimately there needs to be a "sparse reward" that is found. Yet as far as I can see from the openAI Gym code, there is never any reward other than -1. It feels more like a "no reward" environment.

What almost answers my question, but in fact does not: when the task is completed quickly, the return (sum of rewards) of the episode is larger. So if the car never finds the flag, the return is -1000. If the car finds the flag quickly the return might be -200. The reason this does not answer my question is because with DQN and experience replay, those returns (-1000, -200) are never present in the experience replay memory. All the memory has are tuples of the form (state, action, reward, next_state), and of course tuples are pulled from memory at random, not episode-by-episode.

If reaching the flag yielded a reward of +1 (or 100) etc.... things would make more sense to me...

So, I don't see anything in the memory that indicates that the episode was performed well.

And thus, I have no idea why this DQN code is working for MountainCar.

PS: I asked this question on your blog too (as a comment). Apologies for duplication -- I'm not sure where you look and don't look :)

Agent/self mixup

https://github.com/jaara/AI-blog/blob/361e8c79dcec861e30418f82de17b644028d8623/CartPole-basic.py#L112-L113
If I am not mistaken this should not be agent but self as agent is the variable created outside of class Agent. Therefore the program will work as intended, however this would be better as self.

Keras 2 API

According to Keras 2 API need to change in Brain class

output_dim to units

nb_epoch to epochs

def _createModel(self):
        model = Sequential()

        model.add(Dense(units=64, activation='relu', input_dim=stateCnt))
        model.add(Dense(units=actionCnt, activation='linear'))

        opt = RMSprop(lr=0.00025)
        model.compile(loss=hubert_loss, optimizer=opt)

        return model

    def train(self, x, y, epoch=1, verbose=0):
        self.model.fit(x, y, batch_size=64, epochs=epoch, verbose=verbose)

what is model._make_predict_function() used for?

I am also working on multi-thread prediction on RL model. I've been stuck on this issue for about a week until I saw this line of your code:

model._make_predict_function()	# have to initialize before threading

So can you tell me what does this function do and why we need to add this line of code? I can't find any documentation about this.

how to add an LSTM layer?

site down

Just in case you do not know, your blog seems to be down the past few days.

Keith