rlcode / reinforcement-learning Goto Github PK
View Code? Open in Web Editor NEWMinimal and Clean Reinforcement Learning Examples
License: MIT License
Minimal and Clean Reinforcement Learning Examples
License: MIT License
I am trying to implement A3C for gridworld by appropriately modifying run() method of A3C cartpole example. However, I am getting the below error:
Exception ignored in: <bound method PhotoImage.del of <PIL.ImageTk.PhotoImage object at 0x7f7b807077b8>>
Traceback (most recent call last):
File "/home/akb/.local/lib/python3.4/site-packages/PIL/ImageTk.py", line 130, in del
name = self.__photo.name
AttributeError: 'PhotoImage' object has no attribute '_PhotoImage__photo'
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
self.run()
File "deep_a3c.py", line 159, in run
env = Env()
File "/home/akb/reinforcement-learning/1-grid-world/6-deep-sarsa/environment_a3c.py", line 21, in init
self.shapes = self.load_images()
File "/home/akb/reinforcement-learning/1-grid-world/6-deep-sarsa/environment_a3c.py", line 58, in load_images
Image.open("../img/rectangle.png").resize((30, 30)))
File "/home/akb/.local/lib/python3.4/site-packages/PIL/ImageTk.py", line 124, in init
self.__photo = tkinter.PhotoImage(**kw)
File "/usr/lib/python3.4/tkinter/init.py", line 3419, in init
Image.init(self, 'photo', name, cnf, master, **kw)
File "/usr/lib/python3.4/tkinter/init.py", line 3375, in init
self.tk.call(('image', 'create', imgtype, name,) + options)
RuntimeError: main thread is not in main loop
Any help on this error. If possible, could you provide an implementation of A3C on Gridworld.
Thanks,
Akilesh
The url is embedded in the given line:
Minimal and clean examples of reinforcement learning algorithms presented by RLCode team.
Are there no variants of A2C with mini-batch update instead of training every time step? If yes, could you tell the pros and cons of such an approach?
Thanks,
Akilesh
You are doing Q-Learning:
# get action for the current state and go one step in environment
action = agent.get_action(state)
next_state, reward, done, info = env.step(action)
But isn't that SARSA?
a = np.argmax(target_next[i])
target[i][action[i]] = reward[i] + self.discount_factor * (target_val[i][a])
Is that a mistake or is that a valid approach? I'm new to RL...
I don't understand the effect of moving obstacles in grid world (Deep SARSA and REINFORCE) as in environment.py, the negative rewards are hard-coded for obstacles at coordinates [0, 1], [1, 2], [2, 3].
Thanks,
Akilesh
Hi,
It's a really great repo for learning RL. However, if you could provide some links/blogs containing explanation of each algorithm, that will be even more beneficial for users.
Thanks,
Akilesh
Any idea how to go about implementing diagonal movement in grid score?
How many days/episodes did it take until it converged in breakout_a3c? Did you try using LSTM for faster convergence?
I tried to run Pong Policy Gradient for 2000 episodes on the original file with no results whatsoever. Then boosted reward for positive points (points scored by the learner(right side) to 20 and got this result:
I boosted learner's points rewards to 100 and after around 1500 episodes got a slight improvement, similar to that in the picture. I ran it to 8100 episodes and there was no improvement except for a slightly smaller variance. Forgive my being naive but successfully running three versions of cartpole I was expecting some logical results.
As you can see from the picture variance is big and after a 800-900 improvement the results seem stagnant.
Has anybody run it for some more episodes and tried to tweak the rewards and brought results up and variance down?
Given the policy should I boost the penalty for the teacher's (left opponent's) scoring points?
Any guidance will be appreciated. Thanks.
Hi all,
Thanks for your amazing project!
I have a question. If I want to add dropout into the network for policy gradient, how can I do that?
I think in order to do that, I need to completely change the code. Right now the workflow is as follows.
Having state -> do a forward computation -> having the output -> compute the gradient -> create a new input, output to train the network -> perform training the network with the <input, output> for one epoch -> repeating again.
However, to add dropout we need to change the workflow as follows:
Having state -> do a forward computation -> having the output -> compute the gradient -> backpropogate the gradient -> modifying network parameters -> repeating.
This would really complicate for an automatic differentiation system like Keras, I think. Any idea?
Thanks a lot for your help!
Best,
Dqn_per has no important sampling weight in training, which should be a problem?
Could you please provide examples on how to use the saved model (.h5 files) at test time for Grid world, Cartpole environments?
Thanks,
Akilesh
PC1
cpu: intel i-5
no graphic card
python 3.5
tensorflow 1.14
keras 2.3.0
PC2
cpu: inter i-7
rtx-2070
python 3.5
tensorflow 1.14
keras 2.3.0
When i execute breakout_a3c.py ,
The following problem occurs on both computers
I guess that the issue is related to threading library...
Model: "model_18"
Layer (type) Output Shape Param #
input_9 (InputLayer) (None, 84, 84, 4) 0
conv2d_17 (Conv2D) (None, 20, 20, 16) 4112
conv2d_18 (Conv2D) (None, 9, 9, 32) 8224
flatten_9 (Flatten) (None, 2592) 0
dense_25 (Dense) (None, 256) 663808
dense_27 (Dense) (None, 1) 257
Total params: 676,401
Trainable params: 676,401
Non-trainable params: 0
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 207, in run
action, policy = self.get_action(history)
File "/home/name/AdvRL/reinforcement-learning/3-atari/1-breakout/breakout_a3c.py", line 327, in get_action
policy = self.local_actor.predict(history)[0]
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 276, in predict_loop
callbacks.model.stop_training = False
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/network.py", line 323, in setattr
super(Network, self).setattr(name, value)
File "/home/name/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 1215, in setattr
if not _DISABLE_TRACKING.value:
AttributeError: '_thread._local' object has no attribute 'value'
I want to run other atari game, it's performance looks doesn't good. Could anyone help me? Whether I can achieve this goal by change "gym.make('ENV_NAME')", and it's real_action? help me plllllz, appreciate so MUCH
I have changed the code like I wrote above, it's performance not so good, and occurs some ERRORS. like this:
TypeError: unsupported operand type(s) for /: 'tuple' and 'float'Traceback (most recent call last):
File "ddqn_spaceinvaders.py", line 372, in
agent.train_replay(step)
File "ddqn_spaceinvaders.py", line 235, in train_replay
history[i] = np.float32(mini_batch[i][0] / 255.)
Excuse me.
I can not read Korean.
Could you translate the comments into English?
I was surprised to see this loss function because it is generally used when the target is a distribution (i.e. sums to 1). This is not the case for the advantage estimate. However, I worked out the math and it does appear to be doing the right thing which is neat!
I think this trick should be mentioned in the code.
First, thanks for making this. It's very easy to get started with and has really helped me move things forward on a personal project of mine I've been struggling with for months. This is really awesome work. Thanks again.
In my efforts to tweak the code from your A3C cartpole implementation to work with my own custom OpenAI environment, I've discovered a few things that I think can help make it generalize a bit more.
def get_action(self, state,actionfilter): policy = self.actor.predict(np.reshape(state, [1, self.state_size]))[0] policy=np.multiply(policy,actionfilter) probs=policy/np.sum(policy) action=np.random.choice(self.action_size, 1, p=probs)[0] return action
https://www.gitbook.com/book/dnddnjs/rl/details에서 RL을 공부하다가 이 깃헙까지 오게되었습니다.
RL관련해서 자료가 없어서 막막한 부분이 있었는데 gitbook에 번역본을 제공해주셔서 빠르게 공부할 수 있었습니다. 감사합니다.
can not open
Hi,
First of all I just want to say awesome work on the library overall, really love the concept 👍
I have an issue where cartpole_a3c will converge relatively quickly (around ep 300-400). Then keep doing well, and then suddenly collapsing and not recovering. Has anyone else experienced this?
Is there any part to play the breakout by calling the saved model?
When I try to run Breakout_DQN I get the following error:
gym.error.DeprecatedEnv: Env BreakoutDeterministic-v4 not found (valid versions include ['BreakoutDeterministic-v3', 'BreakoutDeterministic-v0'])
What version of gym are you using? (I'm using 0.8.1)
Hi,
Is it possible to give image as input in Gridworld environment. Can you suggest ways in which this can be done? Are there ways of converting tkinter Canvas into a numpy array which can then be fed into a ConvNet?
Thanks,
Akilesh
Hi Shangtong , I am new to the reinforcement learning part , I have a sceanrio in which I have a machine learning model predicting target properly.I want to figure out the input paramters so as to attain a particular target value .Any suggestions will be great ..
The input parameter may range from 4-20 and since the input parameters are discrete numeric values there may be lot many combinations of the same .
I have train the model for around 4 days, episode now is 14278, while the score is 40 ~ 50, what's the problem?
Hi,
Could anyone please elaborate on the use of memory in Cartpole A3C implementation? The saved samples haven't been used during training.
Thanks,
Akilesh
References a function that doesn't exist.
Just want to know if it makes sense to apply the tech in
https://github.com/rlcode/reinforcement-learning/blob/master/2-cartpole/3-reinforce/cartpole_reinforce.py#L45
to a3c implementations, for cartpole and breakout? Thanks.
Hi, I would like to test some hyperparameters, with using threading, that will be much faster. But when I run threading on DQN and DDQN algorithm, the error says:
<Tensor Tensor("dense_1/kernel:0", shape=(2, 32), dtype=float32_ref) is not an element of this graph>
Seems Keras can't support threading, but your A3C works, it's strange for me.
Hello,
I am aware of this smart trick of implementing policy gradient (see his for a reference: https://github.com/rlcode/reinforcement-learning/blob/master/2-cartpole/3-reinforce/cartpole_reinforce.py). Specifically, categorical cross entropy is defined H(p, q) = sum(p_i * log(q_i)). For the action taken, a, we can set p_a = advantage * [index of action a in 1-hot-vector representation). Meanwhile, q_a is the output of the policy network, which is the probability of taking the action a, i.e. policy(s, a).
However, when the classes of output is huge (e.g. as in machine translation or language modeling), I simply cannot convert the output into one hot vector in the first place, using to_categorical(output, num_classes=output_class) function in keras.
Because of this, I cannot apply the trick to compute p_a.
So how to implement policy gradient in this case?
I hope I make my question in a clear way!
Many thanks for your help!
Best,
Cuong
@fredcallaway: I saw you commented on the code so I tagged you here as well. If you can give me an answer, I would really appreciate it ...
Dear
Do you have any tutorial for your code listed in this github ? or have you created your tutorial for your code ?
Thx
If I increase both the HEIGHT and WIDTH from 5 to 10 keeping the obstacles and the final goal at the same position, Deep SARSA network doesn't seem to converge. What do you think is the problem? Should I increase the depth or dimensions of the hidden layer in actor and critic networks?
Thanks,
Akilesh
Hello, trained agent play CartPole-v1 with score 500, but when I restart it with ...
self.load_model from = True and with correct name, it start learning again with low score results.
How can I load weights and start trained agent to play, without learning?
I understand that state_size is 15. Could you please elaborate on what each of the 15 values denote or signify?
Thanks,
Akilesh
run python Gridworld_DQN.py
error:
File "/home/wangdawei/anaconda2/envs/py3/lib/python3.6/site-packages/dask/dataframe/core.py", line 38, in <module> pd.computation.expressions.set_use_numexpr(False) AttributeError: module 'pandas' has no attribute 'computation'
first i thought it is a pandas problem, finally it is a dask version problem
update dask to new version solve it.
Hi,
In Cartpole ddqn the following Q(s,a) formula has target_val, is it one step reward or is it expected future rewards?
target[i][action[i]] = reward[i] + self.discount_factor * ( target_val[i][a])
Can ask how long it takes to train the breakout dqn model, what is your graph card? Thanks !
Amazing work!!
Tried running the A3C algorithm for breakout and it works great!
Where did you get the background information in order to write the code? It's a little bit different than what was explained in the "Asynchronous Methods for Deep Reinforcement Learning" paper.
Thanks :)
Why does the implementation (deep_sarsa_agent.py) have action_space = [0, 1, 2, 3, 4] when there are only 4 possible actions that the agent can take (as specified in the environment.py)?
Thanks,
Akilesh
First of all, great tutorials! I've been basing my own projects with this repo to better understand RL but through the process I found that persisting the QLearning Agent turns out to be really difficult because of it's final size.
I tried pickle, json, jsonpickle, cPickle, marshal, klepto, dbm and finally h5py and I noticed it might not be as easy as it seems, because none of these worked. My 64-bit Linux Mint system kills the process and leaves a 0 bytes file where the q_table should be.
It actually works, rewards getting better and all but if it's trained to a point it becomes impossible to persist it back to disk, it seems. I tried creating swap space from the intuition that it was running out of memory, to no avail.
Would be glad if anyone has a fix for this. Thanks!
I have compared the implementation and the book "RL: an introduction". It seems the mse loss and cross-entropy loss can not get the update rule as Actor-Critic. It is w=w+alphaIdelta*grad for value function, and theta = theta + alpha *I delta grad(ln pi(action)). Especially for value function, mse loss gets another v^hat multiplied.
Firstly, thanks for the great collection of code and articles. The articles were very useful in understanding DQN and implementing it.
However, my code is very bad in learning. I am not sure what is wrong with my code. I am using DDQN and passing rewards based on different criteria. Also the state is just a normalized version of the board itself.
My code repo is here https://github.com/codetiger/MachineLearning-2048
Let me know if you can review and help me understanding why my code doesnot learn anything even after 1000 episodes.
Hi,
Could you provide an implementation of prioritized experience replay for either Gridworld or Cartpole environment?
Thanks,
Akilesh
Great stuff! This has been extremely helpful! My only suggestion would be in line 78, changing mini_batch = random.sample(self.memory, batch_size) to mini_batch = random.sample(list(self.memory), batch_size), otherwise you get the following error, "TypeError: Population must be a sequence or set. For dicts, use list(d)."
i was looking at the code for breakout and i saw various saved models ,but the code is only for one saved model then how the other models were saved, i want to know if they were saved after making some changes to the code
Hi.
Really nice job. This is the most readable and "easiest" code I found for the A3C implementation. With regular tensorflow on CPU the code is working fine, but with tensorflow-gpu I get the error below.
Do you know, why this is happening and is it possible to get the A3C code working with GPU accelleration?
Thanks in advance!
Caused by op 'IsVariableInitialized_16/IsVariableInitialized_22/IsVariableInitialized/IsVariableInitialized_13/IsVariableInitialized_6/IsVariableInitialized_7', defined at:
File "C:\Users\trek\.vscode\extensions\ms-python.python-2018.3.1\pythonFiles\PythonTools\visualstudio_py_debugger.py", line 2068, in new_thread_wrapper
func(*posargs, **kwargs)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\threading.py", line 882, in _bootstrap
self._bootstrap_inner()
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "d:\Thesis\Code\examples\cartpole\a3c2.py", line 159, in run
action = self.get_action(state)
File "d:\Thesis\Code\examples\cartpole\a3c2.py", line 209, in get_action
policy = self.actor.predict(np.reshape(state, [1, self.state_size]))[0]
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1835, in predict
verbose=verbose, steps=steps)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1330, in _predict_loop
batch_outs = f(ins_batch)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 2476, in __call__
session = get_session()
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 192, in get_session
[tf.is_variable_initialized(v) for v in candidate_vars])
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 192, in <listcomp>
[tf.is_variable_initialized(v) for v in candidate_vars])
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\variables.py", line 1203, in is_variable_initialized
return state_ops.is_variable_initialized(variable)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\state_ops.py", line 180, in is_variable_initialized
return gen_state_ops.is_variable_initialized(ref=ref, name=name)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 175, in is_variable_initialized
result = _op_def_lib.apply_op("IsVariableInitialized", ref=ref, name=name)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
op_def=op_def)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\trek\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Cannot colocate nodes 'IsVariableInitialized_16/IsVariableInitialized_22/IsVariableInitialized/IsVariableInitialized_13/IsVariableInitialized_6/IsVariableInitialized_7' and 'Adam_1/iterations: Cannot merge devices with incompatible types: '/job:localhost/replica:0/task:0/device:GPU:0' and '/job:localhost/replica:0/task:0/device:CPU:0'
[[Node: IsVariableInitialized_16/IsVariableInitialized_22/IsVariableInitialized/IsVariableInitialized_13/IsVariableInitialized_6/IsVariableInitialized_7 = IsVariableInitialized[_class=["loc:@Adam/beta_1", "loc:@Adam/beta_2", "loc:@Adam_1/iterations", "loc:@Variable_27", "loc:@Variable_4", "loc:@dense_3/bias"], dtype=DT_FLOAT](Variable_4)]]
From reinforcement-learning/2-cartpole/1-dqn/cartpole_dqn.py/train_model
def train_model(self):
if len(self.memory) < self.train_start:
return
batch_size = min(self.batch_size, len(self.memory))
mini_batch = random.sample(self.memory, batch_size)
update_input = np.zeros((batch_size, self.state_size))
update_target = np.zeros((batch_size, self.state_size))
action, reward, done = [], [], []
for i in range(self.batch_size):
update_input[i] = mini_batch[i][0]
action.append(mini_batch[i][1])
reward.append(mini_batch[i][2])
update_target[i] = mini_batch[i][3]
done.append(mini_batch[i][4])
target = self.model.predict(update_input)
target_val = self.target_model.predict(update_target)
for i in range(self.batch_size):
# Q Learning: get maximum Q value at s' from target model
if done[i]:
target[i][action[i]] = reward[i]
else:
target[i][action[i]] = reward[i] + self.discount_factor * (
np.amax(target_val[i]))
# and do the model fit!
self.model.fit(update_input, target, batch_size=self.batch_size,
epochs=1, verbose=0)
In the this part of code, why you use self.batch_size after take the minimum value between self.batch_size and the length of memory? Would batch_size be better?
The actor net takes state as input and outputs a policy containing the probability of each action. In train_model(), the ground truth for training actor net is 'advantages' which is not a probability distribution over possible actions. So, how does the categorical cross-entropy computation between the predicted output of actor net and 'advantages' work?
Thanks,
Akilesh
# initialize target model with same weights as the model, in case we load a model
#shouldn't this be done after load_model?
self.update_target_model()
if self.load_model:
self.model.load_weights("./save_model/cartpole_dqn.h5")
could be
if self.load_model:
self.model.load_weights("./save_model/cartpole_dqn.h5")
self.update_target_model()
so that if we load a saved model, the target_model would have the saved weights rather than starting with the Keras-initialized weights.
I'm going to test this but it seems like the loaded model would be using an inferior target_model
for at least the first episode and the model weights could get adjusted in the wrong way in that first episode, slightly slowing down it's learning.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.