Code Monkey home page Code Monkey logo

albertau_rlcapstone's Introduction

AlbertaU_RLCapstone

The capstone project for the Reinforcement Learning specialization by Alberta University on Coursera. Here, we simulate the landing of a lunar module on the moon and train the agent using deep reinforcement learning.

Concretely, we define a Markov Decision Process (MDP) with the lunar module as the agent, firing of left/right thrusters as actions, and its coordinates, angle, and velocity as its state. We reward the agent for its ability to land in a designated zone, but penalize it for landing too quickly or outside the zone.

This will be an episodic task, with each episode terminating when the agent makes contact with the ground or flies outside the environment boundaries. The agent should also be allowed to learn at every time step (Temporal-Difference learning) to allow more efficient learning and continuous learning in the case that input is lost. (eg. damaged sensors)

In view of this formalization, we utilize the Expected SARSA algorithm for training our agent. This is due to its ability to handle episodic control problems with continous state variables through TD-Learning. We choose Expected SARSA over Q-Learning as we feel a epsilon-soft policy will be more effective than a deterministic policy at function approximation and avoiding state aliasing.

The estimates (Q-values) in the action-value network will be updated with a 2-layer NN trained with the Adam optimizer. This is to allow the agent to estimate non-linear functions and thus provide greater robustness in its predictions.

We also utilize experience replay buffers (data store of previous episodes) to increase the efficiency of training.

After about 300 episodes, the module had learnt to safely land on the moon.

After 1000 episodes, reward maximisation goal had reached a plateau.

Visualization

Simulation

Simulation

Learning Curve

albertau_rlcapstone's People

Contributors

seraphimstreets avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.