Code Monkey home page Code Monkey logo

actor-critic's Introduction

Actor-Critic

The A2C Reinforcement Learning Method


Introduction

This project contains an implementation of the Advantage Actor-Critic Reinforcement Learning Method, and includes an example on Cart-Pole. Cart-Pole is a game in which the player (in this case, our agent) attempts to balance a pole on a cart. At each time step, the player can either accelerate the cart left or right uniformally. An episode of the game is lost if the pole falls + or - 15 degrees from vertical, and it is won if the player survives 200 time steps. In order to be considered a solution, an agent must survive an average of 195+ time steps over 100+ episodes.

Results

Side-by-side comparison of random agent (takes random actions) and trained A2C agent:

Rewards at each episode for 4 seperate trials:

Training can be quite unstable, even with extensive hyperparameter tuning

Implementation Details

Open AI Gym provides a variety of Reinforcement Learning Environments: https://gym.openai.com/envs/

Their CartPole-v0 env was used for this project

At each time step, the agent provides an action to the environment and the environment provides an observation and a reward. In the case of Cart-Pole the reward at each time step is 1, such that the total reward for each episode depends on how long the agent survives the game. An observation is an array consisting of the following: (cart position, cart velocity, pole angle, pole rotation rate).

This implementation of A2C uses two neural networks:


Actor: takes in an observation as input and outputs action probabilities
self.actor = nn.Sequential(
           nn.Linear(4, 128),
           nn.ReLU(),
           nn.Linear(128, 2)
       ).double()

Critic: takes in an observation and outputs a value which estimates the expected return at the current state
self.critic = nn.Sequential(
           nn.Linear(4, 128),
           nn.ReLU(),
           nn.Linear(128, 1)
       ).double()

Note: The above code creates network architectures for Cart-Pole, however the actual module in src/a2c.py infers the input and output dimensions and thus can be used for any OpenAI Gym Env

Built With

Installation and Running Scripts

  1. Clone the repo and change into directory

    $ git clone https://github.com/Lucasc-99/Actor-Critic.git
    $ cd Actor-Critic
  2. Install Pytorch and Gym

    $ pip3 install torch
    $ pip3 install gym
  3. Run scripts

    $ python3 -m src.cart-pole-baseline.py
    $ python3 -m src.cart-pole-a2c.py

actor-critic's People

Contributors

lucasc-99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.