Code Monkey home page Code Monkey logo

rl-lunar-lander's Introduction

Lunar Lander

The Lunar Lander environment is a rocket trajectory optimization problem. The goal is to touch down at the landing pad as close as possible. The rocket starts at the top center with a random initial force applied to its center of mass.

There are four discrete action: do nothing, fire left engine, fire main engine, and fire right engine.

Each observation is an 8-dimensional vector containing: the lander position in x & y, its linear velocity in x & y, its angle, its angular velocity, and two boolean flags indicating whether each leg has contact with the ground.

Positive rewards are received for a landing (100-140, depending on the position) with +100 if the lander comes to a rest. Firing the engines gives a tiny (-0.03) and crashing a big (-100) negative reward. The problem is considered solved by reaching 200 points.

The following RL algorithms were implemented:

  • Neural Fitted Q Iteration (NFQ)
  • Deep Q-Network (DQN)
  • REINFORCE with baseline / Vanilla Policy Gradient (VPG)
  • Advantage Actor Critic (AC)

For better comparison, all algorithms use a 2-layer MLP (128, 64) and a discount factor of 0.999. The learning rate is set individually.

How to

Install dependencies with      pip install -r requirements.txt.

Run      main.py train <agent> <episodes>      to train an agent.

Run      main.py evaluate <agent> <episodes> <render>      to evaluate a pre-trained agent.

<agent> (string)    NFQ, DQN, VPG or AC

<episodes> (int)    Number of episodes

<render> (bool)      Display episodes on screen

Neural Fitted Q Iteration

Training After 2000 episodes

Reference: M. Riedmiller (2005) Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Deep Q-Network

Training After 1000 episodes

Reference: V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing Atari with Deep Reinforcement Learning

REINFORCE with baseline / Vanilla Policy Gradient

Training After 5000 episodes

Reference: R. Sutton, and A. Barto (2018) Reinforcement Learning: An Introduction, p. 328

Reference: OpenAI: Spinning Up in Deep RL!, Vanilla Policy Gradient

Advantage Actor Critic

Training After 1000 episodes

Reference: RL Course by David Silver - Lecture 7: Policy Gradient Methods

Comparison

The score is the average return over 100 episodes on the trained agent.

Score
Neural Fitted Q Iteration -24.90
Deep Q-Network 271.47
Vanilla Policy Gradient 172.49
Advantage Actor Critic 205.77

Dependencies

  • Python v3.10.9
  • Gym v0.26.2
  • Matplotlib v3.6.2
  • Numpy v1.24.1
  • Pandas v1.5.2
  • PyTorch v1.13.1
  • Tqdm v4.64.1
  • Typer v0.7.0

rl-lunar-lander's People

Contributors

hoverslam avatar

Watchers

Kostas Georgiou avatar  avatar

rl-lunar-lander's Issues

Write problem description

This should contain infos about the environment:

  • Action space
  • Observation space
  • Rewards
  • Termination

Create agent class

This is the super class for different deep reinforcement learning implementations.

Implement Deep Q-Network (DQN)

The first algorithm solving this problem should be a simple one. This will be the benchmark for more sophisticated agents.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.