Code Monkey home page Code Monkey logo

rlberry's Introduction

A Reinforcement Learning Library for Research and Education

Python Version contributors codecov


What is rlberry?

Writing reinforcement learning algorithms is fun! But after the fun, we have lots of boring things to implement: run our agents in parallel, average and plot results, optimize hyperparameters, compare to baselines, create tricky environments etc etc!

rlberry is a Python library that makes your life easier by doing all these things with a few lines of code, so that you can spend most of your time developing agents. rlberry also provides implementations of several RL agents, benchmark environments and many other useful tools.

We provide you a number of tools to help you achieve reproducibility, statistically comparisons of RL agents, and nice visualization.

Installation

Install the latest (minimal) version for a stable release.

pip install rlberry

The documentation includes more installation instructions.

Getting started

In our dev documentation, you will find quick starts to the library and a user guide with a few tutorials on using rlberry, and some examples. See also the stable documentation for the documentation corresponding to the last release.

Changelog

See the changelog for a history of the chages made to rlberry.

Other rlberry projects

rlberry-scool : It’s the repository used for teaching purposes. These are mainly basic agents and environments, in a version that makes it easier for students to learn.

rlberry-research : It’s the repository where our research team keeps some agents, environments, or tools compatible with rlberry. It’s a permanent “work in progress” repository, and some code may be not maintained anymore.

Citing rlberry

If you use rlberry in scientific publications, we would appreciate citations using the following Bibtex entry:

@misc{rlberry,
    author = {Domingues, Omar Darwiche and Flet-Berliac, Yannis and Leurent, Edouard and M{\'e}nard, Pierre and Shang, Xuedong and Valko, Michal},
    doi = {10.5281/zenodo.5544540},
    month = {10},
    title = {{rlberry - A Reinforcement Learning Library for Research and Education}},
    url = {https://github.com/rlberry-py/rlberry},
    year = {2021}
}

About us

This project was initiated and is actively maintained by INRIA SCOOL team. More information here.

Contributing

Want to contribute to rlberry? Please check our contribution guidelines. If you want to add any new agents or environments, do not hesitate to open an issue!

rlberry's People

Contributors

adriennetuynman avatar aleshi94 avatar borishamadej avatar brahimdriss avatar codacy-badger avatar dependabot[bot] avatar eleurent avatar julient01 avatar kohlerhector avatar menardprr avatar mmcenta avatar omardrwch avatar pre-commit-ci[bot] avatar remydegenne avatar riccardodv avatar riiswa avatar sauxpa avatar timotheemathieu avatar xuedong avatar yannberthelot avatar yfletberliac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rlberry's Issues

setup.py does not install some rlberry packages

In setup.py, we have:

from setuptools import setup, find_packages
packages = find_packages(exclude=['docs', 'notebooks', 'assets'])

and some packages are missing from the list.

For instance, when installing from pip (pip install 'rlberry[full]==0.1') and running

from rlberry.agents.torch.reinforce import REINFORCEAgent

we receive the error ModuleNotFoundError: No module named 'rlberry.agents.torch.utils'

Add algorithms with regret guarantees

Is your feature request related to a problem? Please describe.
Feature request: Is there a plan for adding more algorithms with theoretical guarantees in the tabular setting or with linear function approximation with known features?

Describe the solution you'd like
Tabular algorithm with regret bound
E.g.

Exploration in Linear function approximation with known features
e.g.

  • LSVI-UCB
  • UCRL-VTR
  • RLSVI
  • LSVI-PHI
  • Tabular implementation of the above feature-based algorithms. (Use one-hot feature and then reformulate the algorithm in the tabular representation.)

Describe alternatives you've considered
Is rlberry framework suitable to implement these algorithms?
If so, I am willing to contribute to some implementations of the above algorithms in this framework after getting familiar with rlberry.

NameError: name 'torch' is not defined

Describe the bug
Use pip install -e .[test] to install the rlberry will not have pytorch installed.
Then if running python examples/demo_ucbvi_and_opqtl.py, you will get an error

  File "/xxx/rlberry/rlberry/exploration_tools/typing.py", line 48, in process_type
    elif isinstance(arg, torch.Tensor):
NameError: name 'torch' is not defined

To Reproduce

pip install -e .[test]
python examples/demo_ucbvi_and_opqtl.py

Expected behavior
Run successfully even without PyTorch because this is a tabular algorithm for a tabular environment.

Desktop (please complete the following information):
- OS: MacOS
- Version 11.5.2
- Python version 3.7.11
- PyTorch version N/A

Access issue

I think I have some access issue when I try to push:
ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Can you check it out @omardrwch ?

Correct docstring

I included an API section in the doc that present the main classes of rlberry however this doc is dependent on the docstrings of the classes and there are some problems with the docstring. We should rewrite them.

Ideally, the docstring should follow the numpy docstring format, it must at least contain a brief description, the parameters and what it returns, but it can be nice to also have references, description of attribute...

Support environments on rlberry?

Currently, rlberry supports some environments, but we would like to add other environments developed by the SCOOL group to the repository. This raises the question of whether we should have environments as part of rlberry or not. An alternative would be to create another repository (like an RL Baselines Zoo for rlberry) in which we would put code for the environments (and possibly other things such as default hyperparameters).

I think it would be cleaner to separate the two repositories so that we can keep rlberry concise. In that case, we would still need to:

  1. Decide what things will go into this new repository (trained agents, default hyperparameters, etc..)
  2. Give the new repository a name
  3. Create issues to remove what isn't needed here

Some minor questions about readthedocs

Hello guys, some minor questions about the doc:

1). Is the following numbering 2, 1, 2, 1 expected?

Screen Shot 2021-12-01 at 17 28 19

2). When clicked on the cite us button, it is showing not found as below
Screen Shot 2021-12-01 at 17 33 25

3). Not very important, but more aesthetic question, I find it a bit disturbing when things are spread out on two lines
Screen Shot 2021-12-01 at 17 35 02

Bandits

Here is another brainstorming issue.

There have been a proposition of coding bandits in RLBerry. The idea would be to

  • Facilitate parallel computing, logging and plots.
  • Have some benchmark (for bandits and for environments)
  • Have Benchmark datasets/environments
  • Have environment for corrupted, differentially private, fair bandits

Questions are :

  • Whether we go with it and do a bandit module ?
  • Guidelines of this bandit module ? (simplicity for the user, readability...)
  • There are a lot of algo/environments. We could say that (similarly to scikit-learn) we include only algo that have been cited a certain number of times and something similar for the environment ?

[ignore]

The samples seem to be coming from the most recently added values:

return next(self.dataset)

See the output of

import matplotlib.pyplot as plt
import numpy as np
from rlberry.agents.jax.utils.replay_buffer import ReplayBuffer

replay = ReplayBuffer(batch_size=1, chunk_size=1, max_replay_size=10)
replay.setup_entry(name='state', shape=(2,), dtype=np.float32)
replay.build()
with replay.get_writer() as writer:
    sampled_from_replay = []
    for ii in range(100):
        state = ii * np.ones((2,), dtype=np.float32)
        state[1] += 0.1
        writer.append(dict(state=state))
        writer.end_episode()
        batch = replay.sample()
        print(state, batch.data['state'][0][0])
        sampled_from_replay.append(batch.data['state'][0][0][0])
        print('--------------------')
    plt.plot(sampled_from_replay)
    plt.show()

pytest error "No module named 'pyvirtualdisplay'"

$ pytest

==================================================== test session starts ====================================================
platform darwin -- Python 3.7.9, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /Users/yannisfletberliac/Developer/rlberry-py/rlberry
collected 162 items / 1 error / 161 selected

========================================================== ERRORS ===========================================================
___________________________ ERROR collecting rlberry/rendering/tests/test_rendering_interface.py ____________________________
ImportError while importing test module '/Users/yannisfletberliac/Developer/rlberry-py/rlberry/rlberry/rendering/tests/test_rendering_interface.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../miniconda3/envs/rlberry/lib/python3.7/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
rlberry/rendering/tests/test_rendering_interface.py:4: in
from pyvirtualdisplay import Display
E ModuleNotFoundError: No module named 'pyvirtualdisplay'
================================================== short test summary info ==================================================
ERROR rlberry/rendering/tests/test_rendering_interface.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================== 1 error in 1.49s ======================================================

Use logging for logging

As this project is likely to involve a substantial amount of code and debugging effort, it would be best to move from print statements to the official Python 3 logging module, which provides advanced control over the logs (levels of severity, output to file or console, enable/disable logging of specific modules, etc.)

Bugs with deep algos

I open this issue because there seem to be some instability with deep rl agorithms and I propose we document that and try to resolve it here. Please post your example of a clean code that does not behave as it should here and we will try to solve it.

I begin with DQN:

from rlberry.envs import Chain
from rlberry.envs.benchmarks.ball_exploration import PBall2D
from rlberry.manager import AgentManager
from rlberry.agents.torch import DQNAgent
import numpy as np


env_ctor = PBall2D
env_kwargs = {}

agent = AgentManager(
    DQNAgent,
    (env_ctor, env_kwargs),
    fit_budget=3,
    n_fit=1,
    init_kwargs={"horizon": 3},
    seed=42,
)
agent.fit()
state = env.reset()
agent.agent_handlers[0].policy(state)

results in

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-2-716414c230d7> in <module>()
     17     seed=42,
     18 )
---> 19 agent.fit()
     20 state = env.reset()
     21 agent.agent_handlers[0].policy(state)


/usr/local/lib/python3.7/dist-packages/torch/serialization.py in __init__(self, name, mode)
    209 class _open_file(_opener):
    210     def __init__(self, name, mode):
--> 211         super(_open_file, self).__init__(open(name, mode))
    212 
    213     def __exit__(self, *args):

FileNotFoundError: [Errno 2] No such file or directory: 'rlberry_data/temp/manager_data/DQN_2022-02-02_14-06-50_f62f4977/agent_handlers/idx_0'

I also tried with larger horizon and fit_budget but I have the same error. It seems that my agent is not dumped as it should (indeed, no rlberry_data appeared so the dumping has a problem).

Non-deterministic behavior when using threading and torch agents

The output of the code below is only reproducible when parallelization='process', and not when parallelization='thread'.

This seems to happen only with torch agents. In each thread, AgentManager calls set_external_seed(), which sets the seed of PyTorch, so I guess the problem is that PyTorch is sharing seeds among threads, and we get a non-deterministic behavior inherent to multithreading.

I don't think it is necessarily something to be fixed, but it's nice to be aware of it.

import numpy as np
from rlberry.envs.benchmarks.ball_exploration import PBall2D
from rlberry.agents.torch.a2c import A2CAgent
from rlberry.manager import AgentManager, plot_writer_data, evaluate_agents
from rlberry.seeding import set_external_seed


if __name__ == '__main__':
    set_external_seed(123)

    # --------------------------------
    # Define train and evaluation envs
    # --------------------------------
    train_env = (PBall2D, dict())
    eval_env = (PBall2D, dict())

    # -----------------------------
    # Parameters
    # -----------------------------
    N_EPISODES = 250
    GAMMA = 0.99
    HORIZON = 50

    params_a2c = {"gamma": GAMMA,
                  "horizon": HORIZON,
                  "learning_rate": 0.0003}

    eval_kwargs = dict(eval_horizon=HORIZON, n_simulations=20)

    a2c_stats = AgentManager(
        A2CAgent,
        train_env,
        fit_budget=N_EPISODES,
        init_kwargs=params_a2c,
        eval_kwargs=eval_kwargs,
        n_fit=4,
        seed=123,
        parallelization='thread')

    agent_manager_list = [a2c_stats]

    for st in agent_manager_list:
        st.fit()

    # learning curves
    plot_writer_data(agent_manager_list,
                     tag='episode_rewards',
                     preprocess_func=np.cumsum,
                     title='cumulative rewards',
                     show=False)

    # compare final policies
    output = evaluate_agents(agent_manager_list)
    print(output)

    for st in agent_manager_list:
        st.clear_output_dir()

(running pytest) Warning in rlberry/agents/dynprog/tests/test_value_iteration.py

Test output:

================================================ warnings summary ================================================
rlberry/agents/dynprog/tests/test_value_iteration.py::test_bellman_operator_monotonicity_and_contraction[0.001-2-1]
/home/omardrwch/miniconda3/envs/rlberry/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================= 61 passed, 1 warning in 2.34s ==========================================

The warning disappears when we remove

@jit(nopython=True)

from the functions in rlberry.agents.dynprog.utils

Improve setup.py

Make possible a 'light' installation, without pytorch, pyopengl and other packages which are not essential.

American English or British English?

In the README (and maybe in the codebase), there are occurrences of "optimisation" (e.g. in diagram and text) and "optimization" (in text). Also, "visualisation".

We should pick one for the whole repo and stick to it. I usually use American English in papers (optimization, visualization, generalization) so I would vote for American English. What do you prefer?

Include old examples in the doc

With PR #72 I moved the examples to examples_old because to include these examples in the doc there is some explaining/cleaning/sorting to do. It would be nice to reinclude them.

Warning: be aware that for now, the sphinx gallery only support (static) plot output. This is restrictive for RL so I want to try and see if we can adapt matplotlib animation support from sphinx gallery to have support for mp4 or gif, I just have to figure out how.

Naming convention for agents

I suggest VeryCoolAgent as convention, what do you think?

This would change, for instance:
PPOAgent -> PpoAgent

RSUCBVIAgent -> RsUcbviAgent

etc.

A2C + MultipleStats (multiprocessing) error

Describe the bug

Error when A2C is run with MultipleStats.

Caused by the line 185:

self.memory.logprobs.append(action_logprob)

To Reproduce

from rlberry.envs import gym_make
from rlberry.agents.a2c import A2CAgent
from rlberry.stats import AgentStats, MultipleStats


# Environment
env = gym_make('CartPole-v1')

# Parameters
params = {}

params['a2c'] = {
          "n_episodes": 5,
          "gamma": 0.99,
          "horizon": 500
}

# Create AgentStats for REINFORCE and A2C
mstats = MultipleStats()
mstats.append(
    AgentStats(A2CAgent,
               env,
               init_kwargs=params['a2c'],
               n_fit=4,
               n_jobs=4)
)

# Fit
mstats.run()

raises the error:

multiprocessing.pool.MaybeEncodingError: Error sending result: '[<rlberry.stats.agent_stats.AgentStats object at 0x7f9e08faf390>]'. Reason: 'RuntimeError('Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).')'

PyOpenGL-accelerate install issue

I am trying to do pip install -e .[full] and get the following error. Have no idea how to resolve it. Any idea how I can solve this?

` Stored in directory: /Users/hmishfaq/Library/Caches/pip/wheels/9f/18/84/8f69f8b08169c7bae2dde6bd7daf0c19fca8c8e500ee620a28
Building wheel for PyOpenGL-accelerate (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/hmishfaq/miniconda3/envs/rlberry/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"'; file='"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-wheel-s6cu0h__
cwd: /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/
Complete output (14 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.7
creating build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
copying OpenGL_accelerate/init.py -> build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
running build_ext
building 'OpenGL_accelerate.wrapper' extension
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/.. -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/src -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15 -I/Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m -c src/wrapper.c -o build/temp.macosx-10.9-x86_64-3.7/src/wrapper.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for PyOpenGL-accelerate
Running setup.py clean for PyOpenGL-accelerate
Successfully built pyperclip
Failed to build PyOpenGL-accelerate
Installing collected packages: zipp, urllib3, typing-extensions, pyasn1, idna, chardet, wcwidth, rsa, requests, pyperclip, pyasn1-modules, pbr, oauthlib, numpy, MarkupSafe, importlib-metadata, greenlet, colorama, cachetools, attrs, stevedore, sqlalchemy, requests-oauthlib, PyYAML, python-editor, PrettyTable, Mako, google-auth, cmd2, werkzeug, tqdm, tensorboard-plugin-wit, protobuf, packaging, markdown, llvmlite, grpcio, google-auth-oauthlib, EasyProcess, colorlog, cmaes, cliff, alembic, absl-py, torch, tensorboard, rlberry, pyvirtualdisplay, PyOpenGL-accelerate, PyOpenGL, optuna, numba, ffmpeg-python
Attempting uninstall: numpy
Found existing installation: numpy 1.20.1
Uninstalling numpy-1.20.1:
Successfully uninstalled numpy-1.20.1
Attempting uninstall: rlberry
Found existing installation: rlberry 0.1
Uninstalling rlberry-0.1:
Successfully uninstalled rlberry-0.1
Running setup.py develop for rlberry
Running setup.py install for PyOpenGL-accelerate ... error
ERROR: Command errored out with exit status 1:
command: /Users/hmishfaq/miniconda3/envs/rlberry/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"'; file='"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-record-6eze1xi4/install-record.txt --single-version-externally-managed --compile --install-headers /Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m/PyOpenGL-accelerate
cwd: /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/
Complete output (14 lines):
running install
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.7
creating build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
copying OpenGL_accelerate/init.py -> build/lib.macosx-10.9-x86_64-3.7/OpenGL_accelerate
running build_ext
building 'OpenGL_accelerate.wrapper' extension
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/Users/hmishfaq/miniconda3/envs/rlberry/include -arch x86_64 -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/.. -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/src -I/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15 -I/Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m -c src/wrapper.c -o build/temp.macosx-10.9-x86_64-3.7/src/wrapper.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/hmishfaq/miniconda3/envs/rlberry/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"'; file='"'"'/private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-install-c4eb4muq/pyopengl-accelerate_e9ab1b03c35f4124afb92343fb341d15/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/hx/pgz_r8ps30n36q4qds34t0340000gn/T/pip-record-6eze1xi4/install-record.txt --single-version-externally-managed --compile --install-headers /Users/hmishfaq/miniconda3/envs/rlberry/include/python3.7m/PyOpenGL-accelerate Check the logs for full command output.`

a_idx2str up and down inverted in GridWorld ?

Hi, I found something weird in the controls in GridWorld. It seems like up and down are inverted:
I used the first cells of the Google Colab tutorial in Google Colab:

from IPython import get_ipython
COLAB = False
if 'google.colab' in str(get_ipython()):
    COLAB = True

if COLAB:
    # install rlberry library
    !git clone https://github.com/rlberry-py/rlberry.git 
    !cd rlberry && git pull && pip install -e . > /dev/null 2>&1

    # install ffmpeg-python for saving videos
    !pip install ffmpeg-python > /dev/null 2>&1

    # install optuna for hyperparameter optimization
    !pip install optuna > /dev/null 2>&1

    # packages required to show video
    !pip install pyvirtualdisplay > /dev/null 2>&1
    !apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1

    print("")
    print(" ~~~  Libraries installed, please restart the runtime! ~~~ ")
    print("")

Then,

# Create directory for saving videos
!mkdir videos > /dev/null 2>&1

# Initialize display and import function to show videos
import rlberry.colab_utils.display_setup
from rlberry.colab_utils.display_setup import show_video

And finally, I just move down: action 3

from rlberry.envs import GridWorld

# A grid world is a simple environment with finite states and actions, on which 
# we can test simple algorithms.
# -> The reward function can be accessed by: env.R[state, action]
# -> And the transitions: env.P[state, action, next_state]
env_ctor = GridWorld
env_kwargs =dict(nrows=3, ncols=10,
                reward_at = {(1,1):0.1, (2, 9):1.0},
                walls=((1,4),(2,4), (1,5)),
                success_probability=0.9)
env = env_ctor(**env_kwargs)

## ---- MY CODE ---- ##
print("Information on action: ", env.a_idx2str)

env.enable_rendering()
print(env.state)
env.reset()
print(env.state)
env.step(3)
print(env.state)
env.step(0)
## ---- MY CODE ---- ##

# save video and clear buffer
env.save_video('./videos/gw.mp4', framerate=5)
env.clear_render_buffer()
# show video
show_video('./videos/gw.mp4')

And this is the output:
Screenshot from 2021-12-11 09-37-45
Screenshot from 2021-12-11 09-39-37

The agent went upward...

Hyperparameter optim samplers

Is your feature request related to a problem? Please describe.
Why only 'random' and 'optuna_dafault' are supported? I don't understand, is there any additional technical difficulties to include other samplers?

Describe the solution you'd like
All Optuna samplers being supported.

Describe alternatives you've considered
NA

Additional context
NA

`Model` should inherit from `gym.Env`

gym.Env is a well-established standard for environments. I think that our Model class should only be an addition, as it is basically the same with the added ability to define generative models, and the centralized seeding.
But it should not be an independent class as it is now, since it prevents using all the existing tools online tailored for gym.Env.

For instance, when I needed an episodic version of MountainCar to test DQN i did the following:

from rlberry.envs.classic_control import MountainCar
from gym.wrappers import TimeLimit

env = MountainCar()
env = TimeLimit(env, max_episode_steps=200)

which, results in an AttributeError: 'MountainCar' object has no attribute 'metadata'

The solution with the current mindset would be to define a custom version rlberry.TimeLimit, but I suspect that this approach will only end up in an entire replication of gym.Env to our definitions, which is not desirable.

TLDR: make gym the default interface, and inherit from it to include additional/modified features for specific usages

Separate Value and Policy networks in AVEC

Is your feature request related to a problem? Please describe.
Lack of uniformization with PPO and A2C.

Describe the solution you'd like
Separate Value and Policy nets.

Describe alternatives you've considered
None.

Additional context
None.

Doc improvement proposal

Proposition of improvements:

  • Add the link to https://rlberry.readthedocs.io in the "about" section in main github page
  • Have an API documentation in the doc (which automatically copy the docstrings of major functions)
  • Support mathjax in the doc to use equations
  • Add some examples and visual illustrations and plots of the main functions (e.g. comparison of agents...)
  • Put most of the content of the Readme in the docs : I find the Readme to be too long and it tries to do the job of the doc. More generally, make a policy of having all of the docs in the doc ?
  • Maybe enhance the design of the doc a bit, add the logo... We could also inspire ourselves (or use) scikit-learn theme (this is also a sphinx theme) ?
  • Inspire ourselves of scikit-learn contribution page for the contribution section of the doc
  • Have the doc constructed at PR time (using https://docs.readthedocs.io/en/stable/pull-requests.html) this will be useful to check that the doc is working typically are the code for the examples rendering a plot or not ?

I open the issue for brainstorming purposes and because I had these ideas when I saw the github for rlberry, if someone has other ideas, they can add them here and I may also want to add some more ideas later on. When we agree on some main ideas, I will make

MultipleStats not reproducible due to multi-thread/process seeding

MultipleStats fits several instances of AgentStats using multiple threads. Then, each AgentStats worker calls a global rlberry.seeding, defined in the process where the MultipleStats instance is created.

Using this global seeding is incompatible with the rlberry.experiment module, which uses global reseeding before creating AgentStats instances, to ensure reproducibility.
However, when using MultipleStats, the threads of AgentStats uses the global seed defined in the MultipleStats process, not the global seed defined before creating AgentStats using rlberry.experiment.

I'm avoiding MultipleStats for now.

JAX ReplayBuffer cannot handle nested entries

trajectory[key] = self.writer.history[key][-self.chunk_size:]

shape=[self._chunk_size, *shape],

Proposed solution:

import jax
import tree

...

# In ChunkWriter.append():
trajectory[key] = jax.tree_map(lambda x: x[-self.chunk_size:], self.writer.history[key])

...

# In ReplayBuffer:
def setup_entry(self, name, shape, dtype):
    """
    Setup new entry in the replay buffer.

    Parameters
    ----------
    name : str
        Entry name.
    shape : Tuple
        Shape of the data. Can be nested (tuples).
    dtype :
        Type of the data. Can be nested.
    """
    if name in self._signature:
        raise ValueError(f'Entry {name} already added to the replay buffer.')

    # handle possibly nested shapes
    shape_with_chunk = jax.tree_map(
        lambda x: np.array((self._chunk_size,) + tuple(x), dtype=np.int32),
        shape, is_leaf=(lambda y: isinstance(y, tuple)))

    self._signature[name] = tree.map_structure(
        lambda *x: tf.TensorSpec(*x), shape_with_chunk, dtype
    )

Modification of the agents and evaluation

I propose that instead of always having to define an eval function in the agents, we have an eval parameter that can be either "cumulative_reward", "best_arm_identification",... or "custom" and then if eval is custom, the eval function must be specified.

This would simplify the creation of a basic agent because in the end, we often use the same evaluations.

Example of resulting template agent :

class MyAgent(Agent):
    name = "MyAgent"

    def __init__(self,
                 env,
                 eval="cumulative_reward",
                 param_1,
                 param_2,
                 param_n,
                 **kwargs):
        Agent.__init__(self, env, eval, **kwargs)

    def fit(self, budget: int):
        """
        ** Must be implemented. **

        Trains the agent, given a computational budget (e.g. number of steps or episodes).
        """
        # code to train the agent
        # ...
        pass

    @classmethod
    def sample_parameters(cls, trial):
        """
        ** Optional **

        Sample hyperparameters for hyperparam optimization using
        Optuna (https://optuna.org/).

        Parameters
        ----------
        trial: optuna.trial
        """
        # Note: param_1 and param_2 are in the constructor.

        # for example, param_1 could be the batch_size...
        param_1 = trial.suggest_categorical('param_1',
                                            [1, 4, 8, 16, 32, 64])
        # ... and param_2 could be a learning_rate
        param_2 = trial.suggest_loguniform('param_2', 1e-5, 1)
        return {
            'param_1': param_1,
            'param_2': param_2,
        }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.