The rl_algorithms from batermj

Welcome!

This repository contains Reinforcement Learning algorithms which are being used for research activities at Medipixel. The source code will be frequently updated. We are warmly welcoming external contributors! :)


BC agent on LunarLanderContinuous-v2	RainbowIQN agent on PongNoFrameskip-v4	SAC agent on Reacher-v2

Contributors

Thanks goes to these wonderful people (emoji key):

_{Jinwoo Park (Curt)}
💻

_{Kyunghwan Kim}
💻

_darthegg
💻

This project follows the all-contributors specification.

Algorithms

Performance

We have tested each algorithm on some of the following environments.

The performance is measured on the commit 4248057. Please note that this won't be frequently updated.

Reacher-v2

We reproduced the performance of DDPG, TD3, and SAC on Reacher-v2 (Mujoco). They reach the score around -3.5 to -4.5. See W&B Log for more details.

PongNoFrameskip-v4

RainbowIQN learns the game incredibly fast! It accomplishes the perfect score (21) within 100 episodes! The idea of RainbowIQN is roughly suggested from W. Dabney et al.. See W&B Log for more details.

LunarLander-v2 / LunarLanderContinuous-v2

We used these environments just for a quick verification of each algorithm, so some of experiments may not show the best performance. Click the following lines to see the figures.

LunarLander-v2: RainbowDQN, RainbowDQfD

See W&B log for more details.

LunarLanderContinuous-v2: A2C, PPO, DDPG, TD3, SAC

See W&B log for more details.

LunarLanderContinuous-v2: DDPG, PER-DDPG, DDPGfD, BC-DDPG

See W&B log for more details.

LunarLanderContinuous-v2: SAC, SACfD, BC-SAC

See W&B log for more details.

Getting started

Prerequisites

In order to run Mujoco environments (e.g. Reacher-v2), you need to acquire Mujoco license.

Installation

First, clone the repository.

git clone https://github.com/medipixel/rl_algorithms.git
cd rl_algorithms

Secondly, install packages required to execute the code. Just type:

make dep

For developers

You need to type the additional command which configures formatting and linting settings. It automatically runs formatting and linting when you commit the code.

make dev

After having done make dev, you can validate the code by the following commands.

make format  # for formatting
make test  # for linting

Usages

You can train or test algorithm on env_name if examples/env_name/algorithm.py exists. (examples/env_name/algorithm.py contains hyper-parameters and details of networks.)

python run_env_name.py --algo algorithm

e.g. running soft actor-critic on LunarLanderContinuous-v2.

python run_lunarlander_continuous_v2.py --algo sac <other-options>

e.g. running a custom agent, if you have written your own example: examples/env_name/ddpg-custom.py.

python run_env_name.py --algo ddpg-custom

You will see the agent run with hyper parameter and model settings you configured.

Arguments for run-files

In addition, there are various argument settings for running algorithms. If you check the options to run file you should command

python <run-file> -h

--test
- Start test mode (no training).
--off-render
- Turn off rendering.
--log
- Turn on logging using W&B.
--seed <int>
- Set random seed.
--save-period <int>
- Set saving period of model and optimizer parameters.
--max-episode-steps <int>
- Set maximum episode step number of the environment. If the number is less than or equal to 0, it uses the default maximum step number of the environment.
--episode-num <int>
- Set the number of episodes for training.
--render-after <int>
- Start rendering after the number of episodes.
--load-from <save-file-path>
- Load the saved models and optimizers at the beginning.

W&B for logging

We use W&B for logging of network parameters and others. For more details, read W&B tutorial.

Class Diagram

Class diagram at #135. This won't be frequently updated.

batermj / rl_algorithms Goto Github PK

rl_algorithms's Introduction

Contents

Welcome!

Contributors

Algorithms

Performance

Reacher-v2

PongNoFrameskip-v4

LunarLander-v2 / LunarLanderContinuous-v2

Getting started

Prerequisites

Installation

For developers

Usages

Arguments for run-files

W&B for logging

Class Diagram

References

rl_algorithms's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org