Code Monkey home page Code Monkey logo

Comments (6)

yenchenlin avatar yenchenlin commented on April 28, 2024

Hello @mrgloom ,
I set OBSERVE steps so big just for demo purpose 😄

If you are trying to reproduce the model,
I've added a section about that.

from deeplearningflappybird.

yenchenlin avatar yenchenlin commented on April 28, 2024

Hi @mrgloom ,
If above comments have answered your question, would you please close this issue?
Thanks!

from deeplearningflappybird.

mrgloom avatar mrgloom commented on April 28, 2024

I'm still not sure how number of OBSERVE timesteps estimated, it's just arbitary number BATCH < OBSERVE < REPLAY_MEMORY ?

Also what if I can't do all 3000000 at one time, how training can be continued? Just set OBSERVE to same value, load CNN weights, and set EXPLORE = (3000000 - steps_already_trained) ?

from deeplearningflappybird.

yenchenlin avatar yenchenlin commented on April 28, 2024

Hello @mrgloom

  1. arbitary number BATCH < OBSERVE <= REPLAY_MEMORY

However, I set it according to the reference paper and empirical result.

  1. Yes

from deeplearningflappybird.

mrgloom avatar mrgloom commented on April 28, 2024

Also is there something special about OBSERVE state, for example should bird pass through a pipe at least once during this state or it's not necessary?
Or OBSERVE state just used to init replay memory?

Also I run 2 training cases (about 150000 timesteps) one with recommended parameters and another with no EXPLORE state at all (I set FINAL_EPSILON and INITIAL_EPSILON to 0)

I found that without EXPLORE state it also learn to play fine, but my intuition about this that it will choose more long routes trying to maximize score and this will lead to more risky playing, and with random actions at each timestep with small probability model learn to play more safely(so it's some kind of regularization?).

What my intuition can't understand is that how model learns to play game if during OBSERVE state bird do not pass any pipes.

from deeplearningflappybird.

yenchenlin avatar yenchenlin commented on April 28, 2024

OBSERVE is only used to fill in the replay memory.

Regarding why it still works without EXPLORE state, I think it's because this network is an overkill for this game.

from deeplearningflappybird.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.