RL Project: Function Approximation and REINFORCE

Your Task

In this project you will implement the following RL algorithms, in this order:

Sarsa with state-value function approximation
Monte Carlo with state-value function approximation
REINFORCE with soft-max policy
REINFORCE with baseline and soft-max policy

all for the Cart-Pole task from Gymnasium (formerly OpenAI Gym).

Your Tools

There are a lot of Python files. All require editing; grep for BEGIN to see where your code is required. It is much less work than it seems at first glance; read on for details.

The *Agent.py files define two hierarchies of base classes for generic RL algorithms, and the CartPole*.py files each instantiate a specific method for this task. At a glance, these are the dependencies (by inheritance or inclusion) between the files:

DiscreteAgent
  DiscreteSarsaAgent
    CartPoleSarsa
  DiscreteMonteCarloAgent
    CartPoleMonteCarlo

ReinforceAgent
  CartPoleReinforce
  ReinforceBaselineAgent
    CartPoleReinforceBaseline

In addition, see run.sh and plots.py for evaluation.

Some Hints

Since all q and h functions operate on the same state (=observation) space, you can probably use the same neural network architecture everywhere.
The trainEpisode() method of the ReinforceAgent is almost identical to that of the DiscreteMonteCarloAgent.
Part of the update() method of the ReinforceBaselineAgent includes an almost verbatim copy of the update() method of the ReinforceAgent.
Don't expect perfect results. No agent will learn perfectly. However, each agent should achieve perfect results on multiple consecutive episodes.
See the lecture notes for further hints on implementation with PyTorch.

What and How to Hand In

Upload this entire directory as an archive file to OLAT. Besides its original content (including your edits) it should contain plots generated by plots.py, 8 in total, for at least one run of each method. Write any observations or comments into submission.md.

Do not include any bulky log or .npy files.

An Alternative Task

The above task explores discrete actions on episodic tasks. In class we also discussed (or will discuss) continuous action spaces and non-episodic tasks (i.e., tasks that may potentially continue forever unless they fail, starting a new episode). The classic control tasks of Gymnasium include tasks with continuous action spaces, but these are not suitable for our basic methods, for reasons (to be) discussed in class. Of these, the Pendulum task can easily be turned into a non-episodic task, but again, it is not useful for our purposes.

Instead of doing Your Task specified above, you may choose to do the following:

Adapt or create a task (as a class derived from gymnasium.Env) that either involves continuous actions, is non-episodic, or both, and that is solvable by the methods we discussed in class.
Implement one such method, building as much as possible on the supplied Python files, and demonstrate that it works, analogously to the above instructions. In particular, implement one of the following methods:
- Differential Semi-Gradient Sarsa or Continuing Actor-Critic for a continuing task with discrete actions
- REINFORCE or Episodic Actor-Critic with a policy parametrized as a Normal distribution for an episodic task with continuous actions
- Continuing Actor-Critic for a non-episodic task with continuous actions

falk358 / rlneuralnetapproximator Goto Github PK

rlneuralnetapproximator's Introduction

RL Project: Function Approximation and REINFORCE

Your Task

Your Tools

Some Hints

What and How to Hand In

An Alternative Task

rlneuralnetapproximator's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent