Code Monkey home page Code Monkey logo

relax's Introduction

ReLAx

ReLAx - Reinforcement Learning Applications

ReLAx is an object oriented library for deep reinforcement learning built on top of PyTorch.

Contents

Implemented Algorithms

ReLAx library contains implementations of the following algorithms:

Special Features

ReLAx offers a set of special features:

And other options for building non-standard RL architectures:

Usage With Custom Environments

Some examples of how to write custom user-defined environments and use them with ReLAx:

Minimal Examples

On Policy

import torch
import gym

from relax.rl.actors import VPG
from relax.zoo.policies import CategoricalMLP
from relax.data.sampling import Sampler

# Create training and eval envs
env = gym.make("CartPole-v1")
eval_env = gym.make("CartPole-v1")

# Wrap them into Sampler
sampler = Sampler(env)
eval_sampler = Sampler(eval_env)

# Define Vanilla Policy Gradient actor
actor = VPG(
    device=torch.device('cuda'), # torch.device('cpu') if no gpu available
    policy_net=CategoricalMLP(acs_dim=2, obs_dim=4,
                              nlayers=2, nunits=64),
    learning_rate=0.01
)

# Run training loop:
for i in range(100):
    
    # Sample training data
    train_batch = sampler.sample(n_transitions=1000,
                                 actor=actor,
                                 train_sampling=True)
    
    # Update VPG actor
    actor.update(train_batch)
    
    # Collect evaluation episodes
    eval_batch = eval_sampler.sample_n_episodes(n_episodes=5,
                                                actor=actor,
                                                train_sampling=False)
    
    # Print average return per iteration
    print(f"Iter: {i}, eval score: {eval_batch.create_logs()['avg_return']}")
    

Off policy

import torch
import gym

from relax.rl.actors import ArgmaxQValue
from relax.rl.critics import DQN

from relax.exploration import EpsilonGreedy
from relax.schedules import PiecewiseSchedule
from relax.zoo.critics import DiscQMLP

from relax.data.sampling import Sampler
from relax.data.replay_buffer import ReplayBuffer

# Create training and eval envs
env = gym.make("CartPole-v1")
eval_env = gym.make("CartPole-v1")

# Wrap them into Sampler
sampler = Sampler(env)
eval_sampler = Sampler(eval_env)

# Define schedules
# First 5k no learning - only random sampling
lr_schedule = PiecewiseSchedule({0: 5000}, 5e-5)
eps_schedule = PiecewiseSchedule({1: 5000}, 1e-3)

# Define actor
actor = ArgmaxQValue(
    exploration=EpsilonGreedy(eps=eps_schedule)
)

# Define critic
critic = DQN(
    device=torch.device('cuda'), # torch.device('cpu') if no gpu available
    critic_net=DiscQMLP(obs_dim=4, acs_dim=2, 
                        nlayers=2, nunits=64),
    learning_rate=lr_schedule,
    batch_size=100,
    target_updates_freq=3000
)

# Provide actor with critic
actor.set_critic(critic)

# Run q-iteration training loop:
print_every = 1000
replay_buffer = ReplayBuffer(100000)

for i in range(100000):
    
    # Sample training data (one transition)
    train_batch = sampler.sample(n_transitions=1,
                                 actor=actor,
                                 train_sampling=True)
                                 
    # Add it to buffer                             
    replay_buffer.add_paths(train_batch)
    
    # Update DQN critic
    critic.update(replay_buffer)
    
    # Update ArgmaxQValue actor (only to step schedules)
    actor.update()
    
    if i > 0 and i % print_every == 0:
      # Collect evaluation episodes
      eval_batch = eval_sampler.sample_n_episodes(n_episodes=5,
                                                  actor=actor,
                                                  train_sampling=False)

      # Print average return per iteration
      print(f"Iter: {i}, eval score: " + \
            f"{eval_batch.create_logs()['avg_return']}, " + \
            "buffer score: " + \
            f"{replay_buffer.create_logs()['avg_return']}")

Installation

Building from GitHub Source

Installing into a separate virtual environment:

git clone https://github.com/nslyubaykin/relax
cd relax
conda create -n relax python=3.6
conda activate relax
pip install -r requirements.txt
pip install -e .

Mujoco

To install Mujoco do the following steps:

mkdir ~/.mujoco
cd ~/.mujoco
wget http://www.roboti.us/download/mujoco200_linux.zip
unzip mujoco200_linux.zip
mv mujoco200_linux mujoco200
rm mujoco200_linux.zip
wget http://www.roboti.us/file/mjkey.txt

Then, add the following line to the bottom of your bashrc:

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin/

Finally, install mujoco_py itself:

pip install mujoco-py==2.0.2.2

!Note: very often installation crushes with error: error: command 'gcc' failed with exit status 1. To debug this run:

sudo apt-get install gcc
sudo apt-get install build-essential

And then again try to install mujoco-py==2.0.2.2

Atari Environments

ReLAx package was developed and tested with gym[atari]==0.17.2. Newer versions also should work, however, its compatibility with provided Atari wrappers is uncertain.

Here is Gym Atari installation guide:

pip install gym[atari]==0.17.2

In case of "ROMs not found" error do the following steps:

  1. Download ROMs archive
wget http://www.atarimania.com/roms/Roms.rar
  1. Unpack it
unrar x Roms.rar
  1. Install atari_py
pip install atari_py
  1. Provide atari_py with ROMS
python -m atari_py.import_roms ROMS

Further Developments

In the future the following functionality is planned to be added:

  • Curiosity (RND)
  • Offline RL (CQL, BEAR, BCQ, SAC-N, EDAC)
  • Decision Transformers
  • PPG
  • QR-DQN
  • IQN
  • FQF
  • Discrete SAC
  • NAF
  • Stochastic environment models
  • Improving documentation

Known Issues

  • Lack of documentation (right now compensated with usage examples)
  • On some systems relax.zoo.layers.NoisyLinear seems to leak memory. This issue is very unpredictable and yet not fully understood. Sometimes, installing different versions of PyTorch and CUDA may fix it. If the problem persists, as a workaround, consider not using noisy linear layers.
  • Filtering & Reward Weighted Refinement declared performance in paper is not yet reached
  • DYNA-Q is not compatible with PER as it is not clear which priority to assign to synthetic branched transitions (possible option: same priority as its parent transition)

relax's People

Contributors

nslyubaykin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

peter9697

relax's Issues

Solved

Hello Nikita,

first of all I would like to acknowledge that this is probably the cleanest ml code I have srsly ever seen. Thank you for that!

So I am using FRWR and Random Shooting and I have a question about the horizon.
I am running on limited resources in real-time (gtx1060+win+cuda, overall I would say RS is quite fast, CEM slow, FRWR a bit slow) and I see that the length of the horizon scales worse in regards to the overall performance. So I experimented to lower the iterations/s and reduced f.i. ensemble size to 3, candidate_sequences = 300, horizon = 3-5. So I thought, it would be nicer, to spread the sampling into the future by 2^x (so horizon[1,2,4,8,16] ), maybe also weighting the later even higher - and discarding the intermediate steps for calculations of candidates. So I could span a bigger time horizon with the same number of steps. (Maybe this approach is already used in the mpc community, but I am fairly new to it, so I don't know.)
I see you have a very interesting scheduler system, but I am confused how to initialize an exponential scheduler that lags the horizon in its tensor dimensions. I made a drawing for this.

mpc horizon mod

mpc horizon mod

Update: I tried in the data utils get_next_lag_obs

    lag_obs_split = np.split(lag_obs, 
                             indices_or_sections=1+nlags**2,   # pow  or   =1+int(nlags**1.25)
                             axis=concat_axis)

I am also thinking to convert all lists to np.arrays and jax jit-ing the buffers (and trying hugginface accelerate), but not sure

Also a short quesstion, would it be possible to replace RS/CEM by (NN-approximated) MPPI)? Because they say so. (A combination of MPPI/FRWR)

Anyways, thank you for your beautiful and very instructive codebase! (I am wondering at what / what instance taught this way of ('pure pythonic'?) coding?)

Best

Lee
Living Computation Foundation Member

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.