keon / deep-q-learning Goto Github PK
View Code? Open in Web Editor NEWMinimal Deep Q Learning (DQN & DDQN) implementations in Keras
Home Page: https://keon.io/deep-q-learning
License: MIT License
Minimal Deep Q Learning (DQN & DDQN) implementations in Keras
Home Page: https://keon.io/deep-q-learning
License: MIT License
I have several questions:
1- When I compared with algorithm presented in"Human-level control through deep reinforcement learning", I can not find the third initialization (initial target action value)? Also, I do not find the last step "every C step Qhat=Q"? Would you please explain where are them or what is the difference to reach them? These steps seems essential!
2- I have my own environment, If I want to have a state=[a,b,c] as input instead of just one input for DQN showing the state what I should do?
on reloading the model performs very poorly as compared to training
This is extremely helpful code, thanks for sharing! I have a bit of a hypothetical question. Let's say that after training the agent using your code I want to be able to predict the q-values for moving to the right or left given a new combination of inputs. (i.e. do some type of model.predict(new_input), or test the code on new data). Where in the code would this go? Could you do model.predict(new_input) at the end of your main function outside of the for loop?
I ask because I wonder where the model parameters are being saved and if this affects where you call model.predict(new_input) for new data. Let me know if anything is unclear!
The DQN algorithm from NATURE leverages a target network to update the target Q value for training.
So I think the code in ddqn.py should be code for the DQN algorithm.
@keon Thanks for your applicable code just one question how we can add K frame to this as said in section 4.1 last sentences of first paragraph of Mnih et al. Nature 2015
4.1 Preprocessing and Model Architecture
Working directly with raw Atari frames, which are 210 � 160 pixel images with a 128 color palette,
can be computationally demanding, so we apply a basic preprocessing step aimed at reducing the
input dimensionality. The raw frames are preprocessed by first converting their RGB representation
to gray-scale and down-sampling it to a 110 �84 image. The final input representation is obtained by
cropping an 84 � 84 region of the image that roughly captures the playing area. The final cropping
stage is only required because we use the GPU implementation of 2D convolutions from [11], which
expects square inputs. For the experiments in this paper, the function � from algorithm 1 applies this
preprocessing to the last 4 frames of a history and stacks them to produce the input to the Q-function.
The neural network would not converge for only 1 epoch right?
`for e in range(EPISODES):
state = env.reset()
state = np.reshape(state, [1, state_size])
for time in range(500):
# env.render()
action = agent.act(state)
next_state, reward, done, _ = env.step(action)
reward = reward if not done else -10
next_state = np.reshape(next_state, [1, state_size])
agent.remember(state, action, reward, next_state, done)
state = next_state
if done:
print("episode: {}/{}, score: {}, e: {:.2}"
.format(e, EPISODES, time, agent.epsilon))
break
if len(agent.memory) > batch_size:
agent.replay(batch_size)`
Hi, I find a bug in your code.
The agent.replay(batch_size) should be in the inner loop, means train_on_batch each time step, not each episode.
Your version can pass the cart-pole, but not the lunar-lander (also from openai gym)
The formal algorithm is followed.
The image from Human-level control through deep reinforcement learning FYI
Go jackets!
Hi. I tried to change ddqn code to update in batch like dqn_batch but this change cause no any learning. i don't have any idea why? it is a simple change and i even set the batch size to 1 so it should behave exactly like no bathing.
I'm trying to test this out using a minor TicTacToe game but I'm failing miserably for days. DQN keeps choosing negative rewards.
First of all, thank you very much for your work. It was really helpful for me to understand RL. I would like to ask you the way you got the image display of the game. I didn't find it in the code. Thank you very much!
Hi,
Thanks for the excellent repository. Extremely useful.
I have trained a model and saved the weight file in .h5 format. How would I predict the action for the new environment?
Thank you,
KK
I add 2 convolutional layer and train this on miniworld (another gym environment),but i keep getting this:
`IndexError Traceback (most recent call last)
in
39
40 if len(agent.memory) > batch_size:
---> 41 agent.replay(batch_size)
42
43 if e % 10 == 0:
in replay(self, batch_size)
59
60
---> 61 target_f[0][action] = target
62 self.model.fit(state, target_f, epochs=1, verbose=0)
IndexError: index 17447 is out of bounds for axis 0 with size 60
`
I don't know why I got the index 17447...
thanks Keon for your great code!
I have two questions:
1- What does [0] means in self.model.predict(next_state)[0] and return np.argmin(act_values[0])? Does this mean that first element of batch?
2-If in addition to batch, I need that my state is the state from K times before, what is the necessary change in order to do this? I want to send the state=state[i-k+1]....state[i-1],state[i] not only one state! How I can do this?
Thanks again
First, thank you for this wonderful code.
In the replay
function, there is one model.fit(state, target_f)
per sample in the minibach (i.e. if there are 32 samples, then there are 32 fit
).
I think all samples of the minibatch could be used in a single update with one single train_on_batch(states, targets_f)
, which would speed up the processing time.
I get this numpy error while running the script - dqn.py
2022-10-06 23:47:28.547558: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2022-10-06 23:47:28.547772: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. /home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/keras/optimizers/optimizer_v2/adam.py:114: UserWarning: The
lrargument is deprecated, use
learning_rateinstead. super().__init__(name, **kwargs) /home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/_asarray.py:102: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. return array(a, dtype, copy=False, order=order) Traceback (most recent call last): File "ddqn.py", line 100, in <module> state = np.reshape(state, [1, state_size]) File "<__array_function__ internals>", line 5, in reshape File "/home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 299, in reshape return _wrapfunc(a, 'reshape', newshape, order=order) File "/home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 55, in _wrapfunc return _wrapit(obj, method, *args, **kwds) File "/home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 44, in _wrapit result = getattr(asarray(obj), method)(*args, **kwds) ValueError: cannot reshape array of size 2 into shape (1,4)
Has anybody encountered the same issue?
This would break in environments that return the state as more/less than 4 values for unpacking.
r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8
r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5
reward = r1 + r2
Hi, is it just me or the algorihtm is not learning? I collected all the rewards for the episodes and they converge to 10
https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html
i think you may interested in this url
If the cartpole is already all the way at the right, we can't really select that action. So would it make sense to disallow that from either the random case (by sampling again) or the network case (by choosing the next highest Q value that the network predicts)?
I found a minor issue on line 42.
Currently:
return env.action_space.sample()
Should be:
return self.env.action_space.sample()
p.s. It's better practice to not put a bunch of stuff in the global namespace (e.g., under if __name__ == '__main__':
). It's safer to use an actual main()
method.
I tried several machines, but with CUDA 9 and 10 it seems to be a lot slower on GPU.
Why is this the case? Am I the only one getting this behavior?
should update the weight every time step ? (I think it is better to update the weight every for instance 10 steps in time step T/10==0 then saveweight) but in code it is updated for every 10 steps of episodes?
Hi,
I uncommented lines 69, 90 and 91 (in the dqn.py) but it seems that the weights are not reloaded: the score restart at a very low value. The file ddqn.py seems to have the same issue.
Kind regards,
Sylvain.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.