Code Monkey home page Code Monkey logo

sarl_atari_pong's Introduction

Single-Agent RL Atari Pong

Atari Pong Single-Agent Classic Reinforcement Learning (no Deep RL) as course project of Distributed Artificial Intelligence, University of Modena and Reggio Emilia, Italy

Observation preprocessing

The screen pixel observation is downsampled on rows and columns by a factor of 3 and 2 respectively. Reaching a shape of 53 x 80. I'm considering just the pixels from 35 to 92 i.e. cutting out the side walls and the scores to reduce the amount of pixels.

The states are calculated considering the resized screen values (described in the previous section) as:

$$53*80 (pos\_ball) * 53 (pos\_agent) * 6 (n\_actions) = 1 348 320 (states) * 4 (byte) = 5.4 MB$$

I made the assumption that i don't need to know the position of the competitor in order to win the game, indeed i counted the states only for agent_0. This assumption make the game partial observable.

Learning

In this project I invesigated the Q-Learning (RL) potentials regarding the extraction of smart behaviours. I focused mainly on the hard convergence problem due to sparsity i.e. the qtables are big. In order to tackle this problem I experimented the effects of gaussian reward (smoother reward) and qtable initialization.

Qtable Initialization

At first I was convinced that initializing the qtable with values different from zero could be a good solution as happens in neural networks. I soon realized that the random initialization weren't actually good. Indeed It introduced noise in the q-learning convergence (since it relies on qtable values).

The image above proves that behaviour. The random initialization works worse than a zero initialization.

Gaussian Rewards

In order to address the sparsity problem, I implemented a gaussian smoothing on the reward signal. Since exists a close relationship between the states and the screen's pixels, it makes sense to spead the reward spatially by smoothing (e.g. if a specific pixel is a great location to catch the ball than it's reasonable that the near ones are a good positions too).

It shows that the gaussian reward converge faster to a defined threshold. mCR10 is the mean over the last 10 steps of the cumulative reward signal.

Reward kernel: 3x3 vs 5x5

It shows that the 5x5 reward converge faster than the 3x3. mCR10 is the mean over the last 10 steps of the cumulative reward signal.

3x3 Kernel

The following images show the qtable state (in 3x3 smootherd reward setting) for each action of the racket.

The title of each subplots defines the coordinate position of the racket when the action is performed. The subplot itself shows the ball position. Basically It tells whether is good (white) or bad(black), for the racket, to be in that position (subplot number title) and doing that action.

5x5 Kernel

The following images show the qtable state for each action of the pong racket of a 5x5 smoothed reward training. The image meaning is the same described in the 3x3 reward section.

sarl_atari_pong's People

Contributors

fmolivato avatar

Stargazers

ZOU ZHE avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.