What is purpose of number of OBSERVE steps > size of REPLAY_MEMORY?

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Question about OBSERVE state about deeplearningflappybird HOT 6 CLOSED

yenchenlin commented on April 28, 2024

Question about OBSERVE state

from deeplearningflappybird.

Comments (6)

yenchenlin commented on April 28, 2024

Hello @mrgloom ,
I set OBSERVE steps so big just for demo purpose 😄

If you are trying to reproduce the model,
I've added a section about that.

from deeplearningflappybird.

yenchenlin commented on April 28, 2024

Hi @mrgloom ,
If above comments have answered your question, would you please close this issue?
Thanks!

from deeplearningflappybird.

mrgloom commented on April 28, 2024

I'm still not sure how number of OBSERVE timesteps estimated, it's just arbitary number BATCH < OBSERVE < REPLAY_MEMORY ?

Also what if I can't do all 3000000 at one time, how training can be continued? Just set OBSERVE to same value, load CNN weights, and set EXPLORE = (3000000 - steps_already_trained) ?

from deeplearningflappybird.

yenchenlin commented on April 28, 2024

Hello @mrgloom

arbitary number BATCH < OBSERVE <= REPLAY_MEMORY

However, I set it according to the reference paper and empirical result.

from deeplearningflappybird.

mrgloom commented on April 28, 2024

Also is there something special about OBSERVE state, for example should bird pass through a pipe at least once during this state or it's not necessary?
Or OBSERVE state just used to init replay memory?

Also I run 2 training cases (about 150000 timesteps) one with recommended parameters and another with no EXPLORE state at all (I set FINAL_EPSILON and INITIAL_EPSILON to 0)

I found that without EXPLORE state it also learn to play fine, but my intuition about this that it will choose more long routes trying to maximize score and this will lead to more risky playing, and with random actions at each timestep with small probability model learn to play more safely(so it's some kind of regularization?).

What my intuition can't understand is that how model learns to play game if during OBSERVE state bird do not pass any pipes.

from deeplearningflappybird.

yenchenlin commented on April 28, 2024

OBSERVE is only used to fill in the replay memory.

Regarding why it still works without EXPLORE state, I think it's because this network is an overkill for this game.

from deeplearningflappybird.

Related Issues (20)

Recommend Projects

Question about OBSERVE state about deeplearningflappybird HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent