fg91 / deep-q-learning Goto Github PK

View Code? Open in Web Editor NEW

211.0 211.0 67.0 83.01 MB

Tensorflow implementation of Deepminds dqn with double dueling networks

Jupyter Notebook 97.22% Gnuplot 2.78%

deep-q-learning's People

Contributors

Stargazers

Watchers

Forkers

christinazxy a-to-the-5 gdpan919 russellksing githubforandy janduracz maranimatias dkidsun stjordanis njmch03 ruta-u tranlethaison azzmusam gorisanson scho15 zaneh1992 kevchin tcchriszhao controlist20 piyushstk harishyvs niharikamessi mufaddal12 nikhil16kulkarni aanshshah leauyn oliveryangke sangwonseo94 seancarverphd fahim9898 jasonbian97 lorenzopinto04 ai-natural-language-processing-lab filipe-monteiro fatcatzf k-eato aneeshpanoli lazy-leopard coder-raksh2509 amiraayadi hunyter fccmac skezle git520-phil lvarga37 tobydrane firestudiox frankgt ahaidichen mostafij-rahman daddycool98 bigblubruin thientoan0101 nomatu messorem7 hamzaalhariri ham41 ldesdunes terryweng03 zyamg chenbindeng patarapornkan caroline1103 ajehani pranav270-create

deep-q-learning's Issues

evaluation_score is NaN

I have been running this code for several days, it works quite well. But sometimes its evaluation_score returns NaN, I couldn't figure out why... Has anybody run into this? Thanks!

Request help on Double DQN

Hi, thanks a lot for your great work!

I have a question, in the Double DQN, maybe the following code needs a stop_gradient?

target_q = rewards + (gamma*double_q * (1-terminal_flags))

The double_q is from the target DQN. And when updating the main DQN, the error will back propagated to the target DQN if we don't stop the flow, right? So do we need to stop the gradient as follows?

target_q = tf.stop_gradient(target_q)

Could you please give some advice? Thanks.

Unknown character...

In the notebook you "have a reward received $k$ time steps in the future is worth only..." then $\gamma$ to the power of unknown character...

DQN with RGB image as input

Hello
I try to do a DQN with a RGB image.
Python code :

def build_QNetwork_RGB(n_actions, learning_rate, history_length, input_shape):
"""Builds a dueling DQN as a Keras model
Arguments:
n_actions: Number of possible action the agent can take
learning_rate: Learning rate
input_shape: Shape of the preprocessed frame the model sees
history_length: Number of historical frames the agent can see
Returns:
A compiled Keras model
"""
model_input = Input(shape=(history_length, input_shape[0], input_shape[1], input_shape[2]))
x = Lambda(lambda layer: layer / 255)(model_input) # normalize by 255
x = Conv2D(32, (8, 8), strides=4, kernel_initializer=VarianceScaling(scale=2.), activation='relu', use_bias=False)(x)
x = Conv2D(64, (4, 4), strides=2, kernel_initializer=VarianceScaling(scale=2.), activation='relu', use_bias=False)(x)
x = Conv2D(64, (3, 3), strides=1, kernel_initializer=VarianceScaling(scale=2.), activation='relu', use_bias=False)(x)
x = Conv2D(1024, (7, 7), strides=1, kernel_initializer=VarianceScaling(scale=2.), activation='relu', use_bias=False)(x)

# Split into value and advantage streams
val_stream, adv_stream = Lambda(lambda w: tf.split(w, 2, 4))(x)  # custom splitting layer

val_stream = Flatten()(val_stream)
val = Dense(1, kernel_initializer=VarianceScaling(scale=2.))(val_stream)

adv_stream = Flatten()(adv_stream)
adv = Dense(n_actions, kernel_initializer=VarianceScaling(scale=2.))(adv_stream)

# Combine streams into Q-Values
reduce_mean = Lambda(lambda w: tf.reduce_mean(w, axis=1, keepdims=True))  # custom layer for reduce mean

q_vals = Add()([val, Subtract()([adv, reduce_mean(adv)])])

# Build model
model = Model(model_input, q_vals)
model.compile(Adam(learning_rate), loss=tf.keras.losses.Huber())    

model.summary()

return model**_

===========================================================
when I used this function to build network :

INPUT_SHAPE = (84, 84, 3) # Size of the preprocessed input frame.
HISTORY_LENGTH = 5
Num_Actions = 4

BATCH_SIZE = 32 # Number of samples the agent learns from at once
LEARNING_RATE = 0.00001

MAIN_DQN = build_QNetwork_RGB(Num_Actions, LEARNING_RATE, HISTORY_LENGTH, INPUT_SHAPE)

===========================================================
I got :
Model: "model"

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 5, 84, 84, 3 0

lambda (Lambda) (None, 5, 84, 84, 3) 0 input_1[0][0]

conv2d (Conv2D) (None, 5, 20, 20, 32 6144 lambda[0][0]

conv2d_1 (Conv2D) (None, 5, 9, 9, 64) 32768 conv2d[0][0]

conv2d_2 (Conv2D) (None, 5, 7, 7, 64) 36864 conv2d_1[0][0]

conv2d_3 (Conv2D) (None, 5, 1, 1, 1024 3211264 conv2d_2[0][0]

lambda_1 (Lambda) [(None, 5, 1, 1, 512 0 conv2d_3[0][0]

flatten_1 (Flatten) (None, 2560) 0 lambda_1[0][1]

dense_1 (Dense) (None, 4) 10244 flatten_1[0][0]

flatten (Flatten) (None, 2560) 0 lambda_1[0][0]

lambda_2 (Lambda) (None, 1) 0 dense_1[0][0]

dense (Dense) (None, 1) 2561 flatten[0][0]

subtract (Subtract) (None, 4) 0 dense_1[0][0]
lambda_2[0][0]

add (Add) (None, 4) 0 dense[0][0]
subtract[0][0]

Total params: 3,299,845
Trainable params: 3,299,845
Non-trainable params: 0

I feel that this model is not well built. Do you have an idea how to correct it if this is the case?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.