Light

hoverslam / rl-lunar-lander Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 17.22 MB

Solving OpenAI's "Lunar Lander" with Reinforcement Learning

License: GNU General Public License v3.0

Python 100.00%

deep-q-network reinforcement-learning openai-gym neural-fitted-q reinforce-with-baseline actor-critic

rl-lunar-lander's Introduction

Lunar Lander

The Lunar Lander environment is a rocket trajectory optimization problem. The goal is to touch down at the landing pad as close as possible. The rocket starts at the top center with a random initial force applied to its center of mass.

There are four discrete action: do nothing, fire left engine, fire main engine, and fire right engine.

Each observation is an 8-dimensional vector containing: the lander position in x & y, its linear velocity in x & y, its angle, its angular velocity, and two boolean flags indicating whether each leg has contact with the ground.

Positive rewards are received for a landing (100-140, depending on the position) with +100 if the lander comes to a rest. Firing the engines gives a tiny (-0.03) and crashing a big (-100) negative reward. The problem is considered solved by reaching 200 points.

The following RL algorithms were implemented:

Neural Fitted Q Iteration (NFQ)
Deep Q-Network (DQN)
REINFORCE with baseline / Vanilla Policy Gradient (VPG)
Advantage Actor Critic (AC)

For better comparison, all algorithms use a 2-layer MLP (128, 64) and a discount factor of 0.999. The learning rate is set individually.

How to

Install dependencies with pip install -r requirements.txt.

Run main.py train <agent> <episodes> to train an agent.

Run main.py evaluate <agent> <episodes> <render> to evaluate a pre-trained agent.

<agent> (string) NFQ, DQN, VPG or AC

<episodes> (int) Number of episodes

<render> (bool) Display episodes on screen

Neural Fitted Q Iteration

Training	After 2000 episodes

Reference: M. Riedmiller (2005) Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Deep Q-Network

Training	After 1000 episodes

Reference: V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing Atari with Deep Reinforcement Learning

REINFORCE with baseline / Vanilla Policy Gradient

Training	After 5000 episodes

Reference: R. Sutton, and A. Barto (2018) Reinforcement Learning: An Introduction, p. 328

Reference: OpenAI: Spinning Up in Deep RL!, Vanilla Policy Gradient

Advantage Actor Critic

Training	After 1000 episodes

Reference: RL Course by David Silver - Lecture 7: Policy Gradient Methods

Comparison

The score is the average return over 100 episodes on the trained agent.

	Score
Neural Fitted Q Iteration	-24.90
Deep Q-Network	271.47
Vanilla Policy Gradient	172.49
Advantage Actor Critic	205.77

Dependencies

Python v3.10.9
Gym v0.26.2
Matplotlib v3.6.2
Numpy v1.24.1
Pandas v1.5.2
PyTorch v1.13.1
Tqdm v4.64.1
Typer v0.7.0

rl-lunar-lander's People

Contributors

Watchers

rl-lunar-lander's Issues

Add target network to DQN

Should make the training much more stable.

Write problem description

This should contain infos about the environment:

Action space
Observation space
Rewards
Termination

Implement "Vanilla Policy Gradient" method

Create agent class

This is the super class for different deep reinforcement learning implementations.

Implement Deep Q-Network (DQN)

The first algorithm solving this problem should be a simple one. This will be the benchmark for more sophisticated agents.

Implement an Actor-Critic method

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.