Code Monkey home page Code Monkey logo

deep_reinforcement_learning_pong's Introduction

Deep reinforcement learning with pixel features in Atari Pong Game

This project is intended to build up an intelligent agent able to play and win Pong game (https://gym.openai.com/envs/Pong-v0/). This agent was trained under the methods of Neutral Network and Deep Learning.

Introduction

In the Pong environment, the agent has three related elements: action, reward and state.

Actions: agent takes the action at time t; there are six actions including going up, down, staying put, fire the ball, etc. Rewards: agent/environment receives/produces reward, when the opponent fails to hit the ball back towards the agent or the agent get 21 points and win. State: environment updates state St, which is defined by four game frames’ interfaces stacking together - the Pong involves motion of two paddles and one ball, and background features that the agent need to learn at the game. The network, suggested by https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf, is used to approximate the action values, which consists of three convolutional neural networks followed by two dense layers. In addition to a network used for training, the other network, which is architecture identical with the first one, gets its weights by copying them from the train network periodically during training and is used to compute the action value label. The other network (called the target network by the paper) is set up to avoid instability in training.

The model is trained using the following three frameworks.

Simple Deep Q Learning Using Only Train Network

Initially, the model is trained without the action value labels being computed by the target network. They are instead computed by the train network. Therefore, the train network is used for both the model training and label computing.

Simple Deep Q Learning Using Target Network

Then a simple deep-Q network model has been established as a baseline model. This time a target network, whose parameter values are periodically copied from the train network, is utilized to compute the labels.

Double Q Network

Lastly a double q network model has been tried to compare its performance with that of the baseline model. This time a best action is chosen by using the train network to compute the action values of the next state and finding the maximum action value. Then the target network is used to compute the action value of this “best action,” which is used as the label.

Getting Started

The whole project is done in Google's colaboratory environment (https://colab.research.google.com/).

  • run the "Set Up Google Cloud GPU" section first to set up the GPU for faster computation
  • run the sequent chunks of code to start training

Results

Built With

Acknowledgments

deep_reinforcement_learning_pong's People

Contributors

gznyyb avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.