rlberry-py / rlberry Goto Github PK

View Code? Open in Web Editor NEW

158.0 8.0 28.0 12.16 MB

An easy-to-use reinforcement learning library for research and education.

Home Page: https://rlberry-py.github.io/rlberry

License: MIT License

Python 99.35% Shell 0.65%

reinforcement-learning reinforcement-learning-algorithms reinforcement-learning-environments multi-armed-bandits

rlberry's Introduction

A Reinforcement Learning Library for Research and Education

What is `rlberry`?

Writing reinforcement learning algorithms is fun! But after the fun, we have lots of boring things to implement: run our agents in parallel, average and plot results, optimize hyperparameters, compare to baselines, create tricky environments etc etc!

rlberry is a Python library that makes your life easier by doing all these things with a few lines of code, so that you can spend most of your time developing agents. rlberry also provides implementations of several RL agents, benchmark environments and many other useful tools.

We provide you a number of tools to help you achieve reproducibility, statistically comparisons of RL agents, and nice visualization.

Installation

Install the latest (minimal) version for a stable release.

pip install rlberry

The documentation includes more installation instructions.

Getting started

In our dev documentation, you will find quick starts to the library and a user guide with a few tutorials on using rlberry, and some examples. See also the stable documentation for the documentation corresponding to the last release.

Changelog

See the changelog for a history of the chages made to rlberry.

Other rlberry projects

rlberry-scool : It’s the repository used for teaching purposes. These are mainly basic agents and environments, in a version that makes it easier for students to learn.

rlberry-research : It’s the repository where our research team keeps some agents, environments, or tools compatible with rlberry. It’s a permanent “work in progress” repository, and some code may be not maintained anymore.

Citing rlberry

If you use rlberry in scientific publications, we would appreciate citations using the following Bibtex entry:

@misc{rlberry,
    author = {Domingues, Omar Darwiche and Flet-Berliac, Yannis and Leurent, Edouard and M{\'e}nard, Pierre and Shang, Xuedong and Valko, Michal},
    doi = {10.5281/zenodo.5544540},
    month = {10},
    title = {{rlberry - A Reinforcement Learning Library for Research and Education}},
    url = {https://github.com/rlberry-py/rlberry},
    year = {2021}
}

About us

This project was initiated and is actively maintained by INRIA SCOOL team. More information here.

Contributing

Want to contribute to rlberry? Please check our contribution guidelines. If you want to add any new agents or environments, do not hesitate to open an issue!

rlberry's People

Contributors

Stargazers

Watchers

rlberry's Issues

setup.py does not install some rlberry packages

In setup.py, we have:

from setuptools import setup, find_packages
packages = find_packages(exclude=['docs', 'notebooks', 'assets'])

and some packages are missing from the list.

For instance, when installing from pip (pip install 'rlberry[full]==0.1') and running

from rlberry.agents.torch.reinforce import REINFORCEAgent

we receive the error ModuleNotFoundError: No module named 'rlberry.agents.torch.utils'

Pruning in hyperparameter optimization

Include partial_fit() function in the Agent interface to make pruning possible in hyperparameter optimization.

This would also require some modifications in AgentStats.

Add DQN

Add algorithms with regret guarantees

Is your feature request related to a problem? Please describe.
Feature request: Is there a plan for adding more algorithms with theoretical guarantees in the tabular setting or with linear function approximation with known features?

Describe the solution you'd like
Tabular algorithm with regret bound
E.g.

UCRL and its variants
PSRL and its variants
EULER and its variants
UCBMQ (This is already implemented in https://github.com/omardrwch/ucbmq_code )
Q-EarlySettled-Advantage https://arxiv.org/abs/2110.04652

Exploration in Linear function approximation with known features
e.g.

LSVI-UCB
UCRL-VTR
RLSVI
LSVI-PHI
Tabular implementation of the above feature-based algorithms. (Use one-hot feature and then reformulate the algorithm in the tabular representation.)

Describe alternatives you've considered
Is rlberry framework suitable to implement these algorithms?
If so, I am willing to contribute to some implementations of the above algorithms in this framework after getting familiar with rlberry.

NameError: name 'torch' is not defined

Describe the bug
Use pip install -e .[test] to install the rlberry will not have pytorch installed.
Then if running python examples/demo_ucbvi_and_opqtl.py, you will get an error

  File "/xxx/rlberry/rlberry/exploration_tools/typing.py", line 48, in process_type
    elif isinstance(arg, torch.Tensor):
NameError: name 'torch' is not defined

To Reproduce

pip install -e .[test]
python examples/demo_ucbvi_and_opqtl.py

Expected behavior
Run successfully even without PyTorch because this is a tabular algorithm for a tabular environment.

Desktop (please complete the following information):
- OS: MacOS
- Version 11.5.2
- Python version 3.7.11
- PyTorch version N/A

Access issue

I think I have some access issue when I try to push:
ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Can you check it out @omardrwch ?

Potential issue with numba installation under macOS

I am under macOS 10.13 High Sierra, and I encountered the following issue on numba when installing rlberry:

Not sure if it is an issue related to mac or not.

Add RND (Random Network Distillation)

Create instructions for future contributors

Coding Style: use type hints

Since the project is in Python 3, it would be nice to enforce the use of type annotations.
See the Python documentation, or what it looks like in highway-env.
This improves code clarity for the reader and helps the IDE (pycharm) in spotting mistakes in the code.

Add other basic agents

Please use the same naming convention for branches

See the title. I would suggest using hyphen, e.g. feature-hpo, dev-dqn, etc.

Correct docstring

I included an API section in the doc that present the main classes of rlberry however this doc is dependent on the docstrings of the classes and there are some problems with the docstring. We should rewrite them.

Ideally, the docstring should follow the numpy docstring format, it must at least contain a brief description, the parameters and what it returns, but it can be nice to also have references, description of attribute...

default parameter

rlberry/rlberry/wrappers/writer_utils.py

Line 23 in 033bfff

def __init__(self, env, writer, write_scalar="regret"):

From the docstring seems that 'reward' and not 'regret' is the default

Support environments on rlberry?

Currently, rlberry supports some environments, but we would like to add other environments developed by the SCOOL group to the repository. This raises the question of whether we should have environments as part of rlberry or not. An alternative would be to create another repository (like an RL Baselines Zoo for rlberry) in which we would put code for the environments (and possibly other things such as default hyperparameters).

I think it would be cleaner to separate the two repositories so that we can keep rlberry concise. In that case, we would still need to:

Decide what things will go into this new repository (trained agents, default hyperparameters, etc..)
Give the new repository a name
Create issues to remove what isn't needed here

Some minor questions about readthedocs

Hello guys, some minor questions about the doc:

1). Is the following numbering 2, 1, 2, 1 expected?

2). When clicked on the cite us button, it is showing not found as below

3). Not very important, but more aesthetic question, I find it a bit disturbing when things are spread out on two lines

Bandits

Here is another brainstorming issue.

There have been a proposition of coding bandits in RLBerry. The idea would be to

Facilitate parallel computing, logging and plots.
Have some benchmark (for bandits and for environments)
- Classical bandits algo, environments, and metrics
- Possible inspiration : https://github.com/tensorflow/agents/tree/master/tf_agents/bandits
- Clear and easily changed code. Modular code, for the user to be able to change just one little thing without rewriting everything.
Have Benchmark datasets/environments
Have environment for corrupted, differentially private, fair bandits

Questions are :

Whether we go with it and do a bandit module ?
Guidelines of this bandit module ? (simplicity for the user, readability...)
There are a lot of algo/environments. We could say that (similarly to scikit-learn) we include only algo that have been cited a certain number of times and something similar for the environment ?

[ignore]

The samples seem to be coming from the most recently added values:

rlberry/rlberry/agents/jax/utils/replay_buffer.py

Line 112 in b4a8dcf

return next(self.dataset)

See the output of

import matplotlib.pyplot as plt
import numpy as np
from rlberry.agents.jax.utils.replay_buffer import ReplayBuffer

replay = ReplayBuffer(batch_size=1, chunk_size=1, max_replay_size=10)
replay.setup_entry(name='state', shape=(2,), dtype=np.float32)
replay.build()
with replay.get_writer() as writer:
    sampled_from_replay = []
    for ii in range(100):
        state = ii * np.ones((2,), dtype=np.float32)
        state[1] += 0.1
        writer.append(dict(state=state))
        writer.end_episode()
        batch = replay.sample()
        print(state, batch.data['state'][0][0])
        sampled_from_replay.append(batch.data['state'][0][0][0])
        print('--------------------')
    plt.plot(sampled_from_replay)
    plt.show()

pytest error "No module named 'pyvirtualdisplay'"

$ pytest

==================================================== test session starts ====================================================
platform darwin -- Python 3.7.9, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /Users/yannisfletberliac/Developer/rlberry-py/rlberry
collected 162 items / 1 error / 161 selected

========================================================== ERRORS ===========================================================
___________________________ ERROR collecting rlberry/rendering/tests/test_rendering_interface.py ____________________________
ImportError while importing test module '/Users/yannisfletberliac/Developer/rlberry-py/rlberry/rlberry/rendering/tests/test_rendering_interface.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../miniconda3/envs/rlberry/lib/python3.7/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
rlberry/rendering/tests/test_rendering_interface.py:4: in
from pyvirtualdisplay import Display
E ModuleNotFoundError: No module named 'pyvirtualdisplay'
================================================== short test summary info ==================================================
ERROR rlberry/rendering/tests/test_rendering_interface.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================== 1 error in 1.49s ======================================================

What is the purpose of the Getting started section in README?

In the README, the Getting started section only covers the Tests. Shouldn't we include a link to the tutorials?

Need a standard for docstring writing

It's pretty a mess for the moment... And probably write it down in the CONTRIBUTING guide.

Use logging for logging

As this project is likely to involve a substantial amount of code and debugging effort, it would be best to move from print statements to the official Python 3 logging module, which provides advanced control over the logs (levels of severity, output to file or console, enable/disable logging of specific modules, etc.)

Add TRPO

Bugs with deep algos

I open this issue because there seem to be some instability with deep rl agorithms and I propose we document that and try to resolve it here. Please post your example of a clean code that does not behave as it should here and we will try to solve it.

I begin with DQN:

from rlberry.envs import Chain
from rlberry.envs.benchmarks.ball_exploration import PBall2D
from rlberry.manager import AgentManager
from rlberry.agents.torch import DQNAgent
import numpy as np


env_ctor = PBall2D
env_kwargs = {}

agent = AgentManager(
    DQNAgent,
    (env_ctor, env_kwargs),
    fit_budget=3,
    n_fit=1,
    init_kwargs={"horizon": 3},
    seed=42,
)
agent.fit()
state = env.reset()
agent.agent_handlers[0].policy(state)

results in

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-2-716414c230d7> in <module>()
     17     seed=42,
     18 )
---> 19 agent.fit()
     20 state = env.reset()
     21 agent.agent_handlers[0].policy(state)


/usr/local/lib/python3.7/dist-packages/torch/serialization.py in __init__(self, name, mode)
    209 class _open_file(_opener):
    210     def __init__(self, name, mode):
--> 211         super(_open_file, self).__init__(open(name, mode))
    212 
    213     def __exit__(self, *args):

FileNotFoundError: [Errno 2] No such file or directory: 'rlberry_data/temp/manager_data/DQN_2022-02-02_14-06-50_f62f4977/agent_handlers/idx_0'

I also tried with larger horizon and fit_budget but I have the same error. It seems that my agent is not dumped as it should (indeed, no rlberry_data appeared so the dumping has a problem).

PyOpenGL in macOS

Some rlberry environments use OpenGL for rendering, which might fail in macOS.

See https://stackoverflow.com/questions/63475461/unable-to-import-opengl-gl-in-python-on-macos for a possible fix.

Non-deterministic behavior when using threading and torch agents

The output of the code below is only reproducible when parallelization='process', and not when parallelization='thread'.

This seems to happen only with torch agents. In each thread, AgentManager calls set_external_seed(), which sets the seed of PyTorch, so I guess the problem is that PyTorch is sharing seeds among threads, and we get a non-deterministic behavior inherent to multithreading.

I don't think it is necessarily something to be fixed, but it's nice to be aware of it.

import numpy as np
from rlberry.envs.benchmarks.ball_exploration import PBall2D
from rlberry.agents.torch.a2c import A2CAgent
from rlberry.manager import AgentManager, plot_writer_data, evaluate_agents
from rlberry.seeding import set_external_seed


if __name__ == '__main__':
    set_external_seed(123)

    # --------------------------------
    # Define train and evaluation envs
    # --------------------------------
    train_env = (PBall2D, dict())
    eval_env = (PBall2D, dict())

    # -----------------------------
    # Parameters
    # -----------------------------
    N_EPISODES = 250
    GAMMA = 0.99
    HORIZON = 50

    params_a2c = {"gamma": GAMMA,
                  "horizon": HORIZON,
                  "learning_rate": 0.0003}

    eval_kwargs = dict(eval_horizon=HORIZON, n_simulations=20)

    a2c_stats = AgentManager(
        A2CAgent,
        train_env,
        fit_budget=N_EPISODES,
        init_kwargs=params_a2c,
        eval_kwargs=eval_kwargs,
        n_fit=4,
        seed=123,
        parallelization='thread')

    agent_manager_list = [a2c_stats]

    for st in agent_manager_list:
        st.fit()

    # learning curves
    plot_writer_data(agent_manager_list,
                     tag='episode_rewards',
                     preprocess_func=np.cumsum,
                     title='cumulative rewards',
                     show=False)

    # compare final policies
    output = evaluate_agents(agent_manager_list)
    print(output)

    for st in agent_manager_list:
        st.clear_output_dir()

(running pytest) Warning in rlberry/agents/dynprog/tests/test_value_iteration.py

Test output:

================================================ warnings summary ================================================
rlberry/agents/dynprog/tests/test_value_iteration.py::test_bellman_operator_monotonicity_and_contraction[0.001-2-1]
/home/omardrwch/miniconda3/envs/rlberry/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================= 61 passed, 1 warning in 2.34s ==========================================

The warning disappears when we remove

@jit(nopython=True)

from the functions in rlberry.agents.dynprog.utils

Known issue: unable to import opengl.gl in python on macos

Link: https://stackoverflow.com/questions/63475461/unable-to-import-opengl-gl-in-python-on-macos

Improve setup.py

Make possible a 'light' installation, without pytorch, pyopengl and other packages which are not essential.

American English or British English?

In the README (and maybe in the codebase), there are occurrences of "optimisation" (e.g. in diagram and text) and "optimization" (in text). Also, "visualisation".

We should pick one for the whole repo and stick to it. I usually use American English in papers (optimization, visualization, generalization) so I would vote for American English. What do you prefer?

Include old examples in the doc

With PR #72 I moved the examples to examples_old because to include these examples in the doc there is some explaining/cleaning/sorting to do. It would be nice to reinclude them.

Warning: be aware that for now, the sphinx gallery only support (static) plot output. This is restrictive for RL so I want to try and see if we can adapt matplotlib animation support from sphinx gallery to have support for mp4 or gif, I just have to figure out how.

Naming convention for agents

I suggest VeryCoolAgent as convention, what do you think?

This would change, for instance:
PPOAgent -> PpoAgent

RSUCBVIAgent -> RsUcbviAgent

etc.

demo_compare_policy.py is not reproducible

Voilà tout.

Add REINFORCE

Random fails of test_hyperparam_optim.py test

Maybe increase the number of trials here will solve the issue:

rlberry/rlberry/stats/tests/test_hyperparam_optim.py

Line 141 in bb8474d

vi_stats.optimize_hyperparams(n_trials=5, n_fit=1,

Reverb currently only supports Linux based OSes

Error when installing dm-reverb on macOS: https://pypi.org/project/dm-reverb/0.6.0/#installation

"Reverb currently only supports Linux based OSes"

A2C + MultipleStats (multiprocessing) error

Describe the bug

Error when A2C is run with MultipleStats.

Caused by the line 185:

self.memory.logprobs.append(action_logprob)

To Reproduce

from rlberry.envs import gym_make
from rlberry.agents.a2c import A2CAgent
from rlberry.stats import AgentStats, MultipleStats


# Environment
env = gym_make('CartPole-v1')

# Parameters
params = {}

params['a2c'] = {
          "n_episodes": 5,
          "gamma": 0.99,
          "horizon": 500
}

# Create AgentStats for REINFORCE and A2C
mstats = MultipleStats()
mstats.append(
    AgentStats(A2CAgent,
               env,
               init_kwargs=params['a2c'],
               n_fit=4,
               n_jobs=4)
)

# Fit
mstats.run()

raises the error:

multiprocessing.pool.MaybeEncodingError: Error sending result: '[<rlberry.stats.agent_stats.AgentStats object at 0x7f9e08faf390>]'. Reason: 'RuntimeError('Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).')'

PyOpenGL-accelerate install issue

I am trying to do pip install -e .[full] and get the following error. Have no idea how to resolve it. Any idea how I can solve this?

` Stored in directory: /Users/hmishfaq/Library/Caches/pip/wheels/9f/18/84/8f69f8b08169c7bae2dde6bd7daf0c19fca8c8e500ee620a28
Building wheel for PyOpenGL-accelerate (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/hmishfaq/miniconda3/envs/rlberry/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"'; file='"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-wheel-s6cu0h__
cwd: /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/
Complete output (14 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.7
creating build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
copying OpenGL_accelerate/init.py -> build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
running build_ext
building 'OpenGL_accelerate.wrapper' extension
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/.. -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/src -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15 -I/Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m -c src/wrapper.c -o build/temp.macosx-10.9-x86_64-3.7/src/wrapper.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for PyOpenGL-accelerate
Running setup.py clean for PyOpenGL-accelerate
Successfully built pyperclip
Failed to build PyOpenGL-accelerate
Installing collected packages: zipp, urllib3, typing-extensions, pyasn1, idna, chardet, wcwidth, rsa, requests, pyperclip, pyasn1-modules, pbr, oauthlib, numpy, MarkupSafe, importlib-metadata, greenlet, colorama, cachetools, attrs, stevedore, sqlalchemy, requests-oauthlib, PyYAML, python-editor, PrettyTable, Mako, google-auth, cmd2, werkzeug, tqdm, tensorboard-plugin-wit, protobuf, packaging, markdown, llvmlite, grpcio, google-auth-oauthlib, EasyProcess, colorlog, cmaes, cliff, alembic, absl-py, torch, tensorboard, rlberry, pyvirtualdisplay, PyOpenGL-accelerate, PyOpenGL, optuna, numba, ffmpeg-python
Attempting uninstall: numpy
Found existing installation: numpy 1.20.1
Uninstalling numpy-1.20.1:
Successfully uninstalled numpy-1.20.1
Attempting uninstall: rlberry
Found existing installation: rlberry 0.1
Uninstalling rlberry-0.1:
Successfully uninstalled rlberry-0.1
Running setup.py develop for rlberry
Running setup.py install for PyOpenGL-accelerate ... error
ERROR: Command errored out with exit status 1:
command: /Users/hmishfaq/miniconda3/envs/rlberry/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"'; file='"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-record-6eze1xi4/install-record.txt --single-version-externally-managed --compile --install-headers /Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m/PyOpenGL-accelerate
cwd: /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/
Complete output (14 lines):
running install
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.7
creating build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
copying OpenGL_accelerate/init.py -> build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
running build_ext
building 'OpenGL_accelerate.wrapper' extension
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/.. -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/src -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15 -I/Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m -c src/wrapper.c -o build/temp.macosx-10.9-x86_64-3.7/src/wrapper.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/hmishfaq/miniconda3/envs/rlberry/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"'; file='"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-record-6eze1xi4/install-record.txt --single-version-externally-managed --compile --install-headers /Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m/PyOpenGL-accelerate Check the logs for full command output.`

Add MCTS algorithms

UCT
OPD
OLOP
MDP-GapE

a_idx2str up and down inverted in GridWorld ?

Hi, I found something weird in the controls in GridWorld. It seems like up and down are inverted:
I used the first cells of the Google Colab tutorial in Google Colab:

from IPython import get_ipython
COLAB = False
if 'google.colab' in str(get_ipython()):
    COLAB = True

if COLAB:
    # install rlberry library
    !git clone https://github.com/rlberry-py/rlberry.git 
    !cd rlberry && git pull && pip install -e . > /dev/null 2>&1

    # install ffmpeg-python for saving videos
    !pip install ffmpeg-python > /dev/null 2>&1

    # install optuna for hyperparameter optimization
    !pip install optuna > /dev/null 2>&1

    # packages required to show video
    !pip install pyvirtualdisplay > /dev/null 2>&1
    !apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1

    print("")
    print(" ~~~  Libraries installed, please restart the runtime! ~~~ ")
    print("")

Then,

# Create directory for saving videos
!mkdir videos > /dev/null 2>&1

# Initialize display and import function to show videos
import rlberry.colab_utils.display_setup
from rlberry.colab_utils.display_setup import show_video

And finally, I just move down: action 3

from rlberry.envs import GridWorld

# A grid world is a simple environment with finite states and actions, on which 
# we can test simple algorithms.
# -> The reward function can be accessed by: env.R[state, action]
# -> And the transitions: env.P[state, action, next_state]
env_ctor = GridWorld
env_kwargs =dict(nrows=3, ncols=10,
                reward_at = {(1,1):0.1, (2, 9):1.0},
                walls=((1,4),(2,4), (1,5)),
                success_probability=0.9)
env = env_ctor(**env_kwargs)

## ---- MY CODE ---- ##
print("Information on action: ", env.a_idx2str)

env.enable_rendering()
print(env.state)
env.reset()
print(env.state)
env.step(3)
print(env.state)
env.step(0)
## ---- MY CODE ---- ##

# save video and clear buffer
env.save_video('./videos/gw.mp4', framerate=5)
env.clear_render_buffer()
# show video
show_video('./videos/gw.mp4')

And this is the output:

The agent went upward...

Hyperparameter optim samplers

Is your feature request related to a problem? Please describe.
Why only 'random' and 'optuna_dafault' are supported? I don't understand, is there any additional technical difficulties to include other samplers?

Describe the solution you'd like
All Optuna samplers being supported.

Describe alternatives you've considered
NA

Additional context
NA

`Model` should inherit from `gym.Env`

gym.Env is a well-established standard for environments. I think that our Model class should only be an addition, as it is basically the same with the added ability to define generative models, and the centralized seeding.
But it should not be an independent class as it is now, since it prevents using all the existing tools online tailored for gym.Env.

For instance, when I needed an episodic version of MountainCar to test DQN i did the following:

from rlberry.envs.classic_control import MountainCar
from gym.wrappers import TimeLimit

env = MountainCar()
env = TimeLimit(env, max_episode_steps=200)

which, results in an AttributeError: 'MountainCar' object has no attribute 'metadata'

The solution with the current mindset would be to define a custom version rlberry.TimeLimit, but I suspect that this approach will only end up in an entire replication of gym.Env to our definitions, which is not desirable.

TLDR: make gym the default interface, and inherit from it to include additional/modified features for specific usages

Separate Value and Policy networks in AVEC

Is your feature request related to a problem? Please describe.
Lack of uniformization with PPO and A2C.

Describe the solution you'd like
Separate Value and Policy nets.

Describe alternatives you've considered
None.

Additional context
None.

Potential fix for pygame install

https://stackoverflow.com/questions/19579528/pygame-installation-sdl-config-command-not-found

Add compatibility for images in policy gradient methods (A2C, PPO, etc.)

Doc improvement proposal

Proposition of improvements:

Add the link to https://rlberry.readthedocs.io in the "about" section in main github page
Have an API documentation in the doc (which automatically copy the docstrings of major functions)
Support mathjax in the doc to use equations
Add some examples and visual illustrations and plots of the main functions (e.g. comparison of agents...)
Put most of the content of the Readme in the docs : I find the Readme to be too long and it tries to do the job of the doc. More generally, make a policy of having all of the docs in the doc ?
Maybe enhance the design of the doc a bit, add the logo... We could also inspire ourselves (or use) scikit-learn theme (this is also a sphinx theme) ?
Inspire ourselves of scikit-learn contribution page for the contribution section of the doc
Have the doc constructed at PR time (using https://docs.readthedocs.io/en/stable/pull-requests.html) this will be useful to check that the doc is working typically are the code for the examples rendering a plot or not ?

I open the issue for brainstorming purposes and because I had these ideas when I saw the github for rlberry, if someone has other ideas, they can add them here and I may also want to add some more ideas later on. When we agree on some main ideas, I will make

MultipleStats not reproducible due to multi-thread/process seeding

MultipleStats fits several instances of AgentStats using multiple threads. Then, each AgentStats worker calls a global rlberry.seeding, defined in the process where the MultipleStats instance is created.

Using this global seeding is incompatible with the rlberry.experiment module, which uses global reseeding before creating AgentStats instances, to ensure reproducibility.
However, when using MultipleStats, the threads of AgentStats uses the global seed defined in the MultipleStats process, not the global seed defined before creating AgentStats using rlberry.experiment.

I'm avoiding MultipleStats for now.

Support for more gym spaces in wrappers.gym_utils

Currently, only gym.spaces.Discrete and 1d gym.spaces.Box spaces are supported.

JAX ReplayBuffer cannot handle nested entries

rlberry/rlberry/agents/jax/utils/replay_buffer.py

Line 60 in 1e8f96a

trajectory[key] = self.writer.history[key][-self.chunk_size:]

rlberry/rlberry/agents/jax/utils/replay_buffer.py

Line 130 in 1e8f96a

shape=[self._chunk_size, *shape],

Proposed solution:

import jax
import tree

...

# In ChunkWriter.append():
trajectory[key] = jax.tree_map(lambda x: x[-self.chunk_size:], self.writer.history[key])

...

# In ReplayBuffer:
def setup_entry(self, name, shape, dtype):
    """
    Setup new entry in the replay buffer.

    Parameters
    ----------
    name : str
        Entry name.
    shape : Tuple
        Shape of the data. Can be nested (tuples).
    dtype :
        Type of the data. Can be nested.
    """
    if name in self._signature:
        raise ValueError(f'Entry {name} already added to the replay buffer.')

    # handle possibly nested shapes
    shape_with_chunk = jax.tree_map(
        lambda x: np.array((self._chunk_size,) + tuple(x), dtype=np.int32),
        shape, is_leaf=(lambda y: isinstance(y, tuple)))

    self._signature[name] = tree.map_structure(
        lambda *x: tf.TensorSpec(*x), shape_with_chunk, dtype
    )

Modification of the agents and evaluation

I propose that instead of always having to define an eval function in the agents, we have an eval parameter that can be either "cumulative_reward", "best_arm_identification",... or "custom" and then if eval is custom, the eval function must be specified.

This would simplify the creation of a basic agent because in the end, we often use the same evaluations.

Example of resulting template agent :

class MyAgent(Agent):
    name = "MyAgent"

    def __init__(self,
                 env,
                 eval="cumulative_reward",
                 param_1,
                 param_2,
                 param_n,
                 **kwargs):
        Agent.__init__(self, env, eval, **kwargs)

    def fit(self, budget: int):
        """
        ** Must be implemented. **

        Trains the agent, given a computational budget (e.g. number of steps or episodes).
        """
        # code to train the agent
        # ...
        pass

    @classmethod
    def sample_parameters(cls, trial):
        """
        ** Optional **

        Sample hyperparameters for hyperparam optimization using
        Optuna (https://optuna.org/).

        Parameters
        ----------
        trial: optuna.trial
        """
        # Note: param_1 and param_2 are in the constructor.

        # for example, param_1 could be the batch_size...
        param_1 = trial.suggest_categorical('param_1',
                                            [1, 4, 8, 16, 32, 64])
        # ... and param_2 could be a learning_rate
        param_2 = trial.suggest_loguniform('param_2', 1e-5, 1)
        return {
            'param_1': param_1,
            'param_2': param_2,
        }

rlberry-py / rlberry Goto Github PK

rlberry's Introduction

What is rlberry?

Installation

Getting started

Changelog

Other rlberry projects

Citing rlberry

About us

Contributing

rlberry's People

Contributors

Stargazers

Watchers

Forkers

rlberry's Issues

Recommend Projects

Recommend Topics

Recommend Org

What is `rlberry`?