Code Monkey home page Code Monkey logo

rlneuralnetapproximator's Introduction

RL Project: Function Approximation and REINFORCE

Your Task

In this project you will implement the following RL algorithms, in this order:

  1. Sarsa with state-value function approximation
  2. Monte Carlo with state-value function approximation
  3. REINFORCE with soft-max policy
  4. REINFORCE with baseline and soft-max policy

all for the Cart-Pole task from Gymnasium (formerly OpenAI Gym).

Your Tools

There are a lot of Python files. All require editing; grep for BEGIN to see where your code is required. It is much less work than it seems at first glance; read on for details.

The *Agent.py files define two hierarchies of base classes for generic RL algorithms, and the CartPole*.py files each instantiate a specific method for this task. At a glance, these are the dependencies (by inheritance or inclusion) between the files:

DiscreteAgent
  DiscreteSarsaAgent
    CartPoleSarsa
  DiscreteMonteCarloAgent
    CartPoleMonteCarlo

ReinforceAgent
  CartPoleReinforce
  ReinforceBaselineAgent
    CartPoleReinforceBaseline

In addition, see run.sh and plots.py for evaluation.

Some Hints

  • Since all q and h functions operate on the same state (=observation) space, you can probably use the same neural network architecture everywhere.

  • The trainEpisode() method of the ReinforceAgent is almost identical to that of the DiscreteMonteCarloAgent.

  • Part of the update() method of the ReinforceBaselineAgent includes an almost verbatim copy of the update() method of the ReinforceAgent.

  • Don't expect perfect results. No agent will learn perfectly. However, each agent should achieve perfect results on multiple consecutive episodes.

  • See the lecture notes for further hints on implementation with PyTorch.

What and How to Hand In

Upload this entire directory as an archive file to OLAT. Besides its original content (including your edits) it should contain plots generated by plots.py, 8 in total, for at least one run of each method. Write any observations or comments into submission.md.

Do not include any bulky log or .npy files.

An Alternative Task

The above task explores discrete actions on episodic tasks. In class we also discussed (or will discuss) continuous action spaces and non-episodic tasks (i.e., tasks that may potentially continue forever unless they fail, starting a new episode). The classic control tasks of Gymnasium include tasks with continuous action spaces, but these are not suitable for our basic methods, for reasons (to be) discussed in class. Of these, the Pendulum task can easily be turned into a non-episodic task, but again, it is not useful for our purposes.

Instead of doing Your Task specified above, you may choose to do the following:

  • Adapt or create a task (as a class derived from gymnasium.Env) that either involves continuous actions, is non-episodic, or both, and that is solvable by the methods we discussed in class.

  • Implement one such method, building as much as possible on the supplied Python files, and demonstrate that it works, analogously to the above instructions. In particular, implement one of the following methods:

    • Differential Semi-Gradient Sarsa or Continuing Actor-Critic for a continuing task with discrete actions
    • REINFORCE or Episodic Actor-Critic with a policy parametrized as a Normal distribution for an episodic task with continuous actions
    • Continuing Actor-Critic for a non-episodic task with continuous actions

rlneuralnetapproximator's People

Contributors

falk358 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.