yenchenlin / deeplearningflappybird Goto Github PK
View Code? Open in Web Editor NEWFlappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).
License: MIT License
Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).
License: MIT License
I find from here that all the rewards are add into the deque. We need to sample the 1 and -1 reward from the deque to use them. So do you think it may be slow.
In Chinese:是不是reward为1和-1的情况也都放在deque里,那么reward为1和-1的被sample出来的几率岂不是很低,反馈就会很慢?
Thank you @yenchenlin
but way you use the same value on INITIAL_EPSILON and FINAL_EPSILON
Are the input really the last 4 frames or it's just one frame stacked into four? The code below seem to indicate it's one frame stacked into four as serve as the input.
do_nothing[0] = 1
x_t, r_0, terminal = game_state.frame_step(do_nothing)
x_t = cv2.cvtColor(cv2.resize(x_t, (80, 80)), cv2.COLOR_BGR2GRAY)
ret, x_t = cv2.threshold(x_t,1,255,cv2.THRESH_BINARY)
s_t = np.stack((x_t, x_t, x_t, x_t), axis=2)
Why I can't read png file from assets?
Is that a problem from my module?
Probably you might want to enlighten me on the reason why you use pooling in this architecture, cause as far as i know pooling might result into a network that is insensitive to the location of an object in the image. Thanks in advance
Hi,
Thanks for your nice code and documentation.
I saw the report from Kevin Chen where he experimented with three difficulty levels (easy, medium, hard) of the game. Can you please tell me which difficulty level the game is set in your code ? and How to change the difficulty level if I want to?
I guess, it's related to value of PIPEGAPSIZE in wrapped_flappy_bird.py.. currently it's set to 100. Is that hard mode? By Increasing or decreasing the PIPEGAPSIZE, can I change the difficulty level? If so, are there any specific value for those modes?
Thanks!
memory leak?
It runs well but slower and slower, as memory increasing.
Hi @yenchenlin1994 , love your implementation!
I went through your code and I can't seem to find where you've frozen the target network?
Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?
i want know whats wrong about this , THX very much.
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
Traceback (most recent call last):
File "deep_q_network.py", line 215, in
main()
File "deep_q_network.py", line 212, in main
playGame()
File "deep_q_network.py", line 209, in playGame
trainNetwork(s, readout, h_fc1, sess)
File "deep_q_network.py", line 82, in trainNetwork
readout_action = tf.reduce_sum(tf.multiply(readout, a), reduction_indices=1)
AttributeError: 'module' object has no attribute 'multiply'
In the figure,
I wonder that your math things are valid
I mean,
input 80 x 80 x 4 -- conv. w/ 8 x 8 x 4 x 32, stride 4 --> output 19 x 19 x 32
(because (80 - 8) / 4 + 1) => your result was 20 x 20 x 32
input 10 x 10 x 32 -- conv. w/ 4 x 4 x32 x 64, stride 2 ---> output 4 x 4 x 64
(because (10 - 4) / 2 + 1) => your result was 5 x 5 x 64
and...
input 3 x 3 x 64 -- conv. w/ 3 x 3 x64 x 64 --> your result was 3 x 3 x 64 (is this possible?)
am I wrong?
Since I am a newbie on this area, if I misunderstood, please teach me.
Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 21, in load
pygame.image.load('assets/sprites/0.png').convert_alpha(),
pygame.error: File is not a Windows BMP file
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.0001
epsilon = INITIAL_EPSILON
"so in this condition "epsilon" will never be update."
if epsilon > FINAL_EPSILON and t > OBSERVE:
epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE
Why parameters (and network struct) from codes and those of the paper are quite different?what is better? Thank you
boolings are all commented, why ? Such as
#h_pool2 = max_pool_2x2(h_conv2)
[root@localhost DeepLearningFlappyBird]# python3.6 deep_q_network.py
ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 42, in load
SOUNDS['die'] = pygame.mixer.Sound('assets/audio/die' + soundExt)
pygame.error: Unable to open file 'assets/audio/die.ogg'
[root@localhost DeepLearningFlappyBird]#
2016-12-03 15:39:16.578 Python[10293:65253] 15:39:16.578 WARNING: 140: This application, or a library it uses, is using the deprecated Carbon Component Manager for hosting Audio Units. Support for this will be removed in a future release. Also, this makes the host incompatible with version 3 audio units. Please transition to the API's in AudioComponent.h.
Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 21, in load
pygame.image.load('assets/sprites/0.png').convert_alpha(),
pygame.error: Failed loading libpng.dylib: dlopen(libpng.dylib, 2): image not found
I was wondering if the tf.reduce_sum
and y
are 1d and the mse cost term is 1d, however the gradient to be propagated needs to same dimension as network output i.e (1,ACTIONS) = (1,2). Is the final loss grad just replicated in both dimension ? i.e (1,1) -> (1,2) ?
Thanks @yenchenlin
It seems that these 4 frames comes are the same on https://github.com/yenchenlin/DeepLearningFlappyBird/blob/master/deep_q_network.py#L102
I read from here.
Why do the program only use the current state and the next state?
Why only using the two state can work?
Thank you @yenchenlin
Could it be possible to accelerate the game to save the training time?
Hello
Hi,
DeepLearningFlappyBird crashes on launch on Mac OS X EI Captian, here is the error log:
tensorflow.python.framework.errors.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for bird-dqn-30000
I tried to run this and this showed on the terminal (on ubuntu):
W tensorflow/core/kernels/io.cc:228] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for bird-dqn-30000
So I went to the "saved_networks" folder, and "bird-dqn-30000" was there.
Inside the checkpoint file, the content was:
model_checkpoint_path: "bird-dqn-30000"
all_model_checkpoint_paths: "bird-dqn-10000"
all_model_checkpoint_paths: "bird-dqn-20000"
all_model_checkpoint_paths: "bird-dqn-30000"
I changed the first line to:
model_checkpoint_path: "saved_networks/bird-dqn-30000"
all_model_checkpoint_paths: "bird-dqn-10000"
all_model_checkpoint_paths: "bird-dqn-20000"
all_model_checkpoint_paths: "bird-dqn-30000"
Worked fine.
I don't know if this is a real issue, or just something messy with my system. Just want to let you know.
If i don't change the penalty value, I can't reproduce even after about 6 million steps.
Is the reward rule right?
But I failed?
Hello~ I want to reproduce the project and have modified the configurations as you suggest.
But it seems that the bird jumps to the top every time and hit the pipe.
Maybe after some steps, the bird can learn to be a little bit intelligent?
If so, how many steps?
Thanks.
Hey there, if anyone else is getting the same error then please uninstall the tensorflow version on windows using pip uninstall tensorflow and then re-install tensorflow.
You might use other versions of tensorflow too, only if it's not working in 1.5.0.
You can also downgrade to pip install tensorflow==1.1
BTW amazing stuff man, Kudos!
What is purpose of number of OBSERVE steps > size of REPLAY_MEMORY?
Appreciate this excellent work. I got a lot of inspiration from this work on pygame.
I have achieved to train an AI for a more difficult version flappy bird: the horizontal distance between adjacent pipes and the gap between up and down pipes are random within a certain range rather than being fixed. Instead of neural networks and reinforcement learning,I use evolutionary strategies and Cartesian genetic programming, which attempts to build the control function (a math expression) directly using only basic arithmetic operators. With a small population of size 10, the bird can learn to fly quite well in typically less than 50 generations, which seems to be much more efficient than simple neuron evolution.
I implement this algorithm with Python and pygame. For those who are interested, please check my GitHub repository. A demo is here.
How do i start training the bot again from the beginning with a different learning rate?
The code works fine but lacks of freezing of the target network.
How to handle png file? I got the following message and have no idea how to solve it. I had tried to install pil or pillow but didn't work.
$python deep_q_network.py
Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 21, in load
pygame.image.load('assets/sprites/0.png').convert_alpha(),
pygame.error: File is not a Windows BMP file
Oops, sorry. I typed the issue in the wrong repo. I'm trying to figure how to delete the issue.
EDIT : Turns out you can't delete issues on Github. really sorry.
Can we somehow speedup emulator? maybe we can run it without visualization?
please,Will you stop after the training
I've implemented DQL for Flappy Bird in Keras and I find that pickling more than 50000 experiences takes over 11GB of storage due to the inefficiency of pickle (or cPickle for that matter), while the actual size of queue is around 5 Gigs using sys.getsizeof() (there is no better alternative to get size of python objects)
Did you face this issue? I would imagine using a database like sqlite should be more efficient.
After py deep_q_network.py
:
AttributeError: module 'tensorflow' has no attribute 'InteractiveSession'
Then pip install --upgrade tensorflow==0.7
It returns:
ERROR: Could not find a version that satisfies the requirement tensorflow==0.7 (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
ERROR: No matching distribution found for tensorflow==0.7
How can I run this?
Below are some resources which may improve current implementation
I will try to implement all these when I got time!
https://github.com/yenchenlin/DeepLearningFlappyBird/blob/master/deep_q_network.py#L82-L83
I can not find the math that support the multiply operation.
Update deep_q_network.py:
...
readout_action = tf.reduce_sum(tf.mul(readout, a), reduction_indices=1)
...
To:
readout_action = tf.reduce_sum(tf.multiply(readout, a), reduction_indices=1)
TF 1.0 change:
http://stackoverflow.com/questions/42217059/tensorflowattributeerror-module-object-has-no-attribute-mul
As i found on the internet that this game can be built without the use of deep learning [https://github.com/chncyhn/flappybird-qlearning-bot]
So can u help me understand what is more beneficial in using deep learning in this game rather than simply using q-learning.
Traceback (most recent call last):
File "deep_q_network.py", line 8, in <module>
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in <module>
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 42, in load
SOUNDS['die'] = pygame.mixer.Sound('assets/audio/die' + soundExt)
MemoryError
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.