automl / carl Goto Github PK

View Code? Open in Web Editor NEW

124.0 11.0 10.0 98.8 MB

Benchmarking RL generalization in an interpretable way.

Home Page: https://automl.github.io/CARL/

License: Apache License 2.0

Python 59.46% Makefile 0.39% Shell 0.14% Java 40.01%

generalization reinforcement-learning

carl's Introduction

– The Benchmark Library

CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments. It's designed to test your agent's generalization capabilities in all scenarios where intra-task generalization is important.

Feel free to check out our paper and our short blog post!

Benchmarks

Benchmarks include:

OpenAI gym classic control suite extended with several physics context features like gravity or friction
OpenAI gym Box2D BipedalWalker, LunarLander and CarRacing, each with their own modification possibilities like new vehicles to race
All Brax locomotion environments with exposed internal features like joint strength or torso mass
Super Mario (TOAD-GAN), a procedurally generated jump'n'run game with control over level similarity
dm_control, environments based on the MuJoCo physics engine. The environments are extended with different context features.

For more information, check out our documentation!

Installation

We recommend you use a virtual environment (e.g. Anaconda) to install CARL and its dependencies. We recommend and test with python 3.9 under Linux.

First, clone our repository and install the basic requirements:

git clone https://github.com/automl/CARL.git --recursive
cd CARL
pip install .

This will only install the basic classic control environments, which should run on most operating systems. For the full set of environments, use the install options:

pip install -e .[box2d,brax,dm_control,mario,rna]

These may not be compatible with Windows systems. Box2D environment may need to be installed via conda on MacOS systems:

conda install -c conda-forge gym-box2d

In general, we test on Linux systems, but aim to keep the benchmark compatible with MacOS as much as possible. RNA and Mario at this point, however, will not run on any operation system besides Linux.

To install ToadGAN for the Mario environment:

# System requirements
sudo apt install libfreetype6-dev xvfb

If you want to use RNA, please take a look at the associated ReadME.

CARL's Contextual Extension

CARL contextually extends the environment by making the context visible and configurable. During training we therefore can encounter different contexts and train for generalization. We exemplarily show how Brax' Fetch is extended and embedded by CARL. Different instiations can be achieved by setting the context features to different values.

Cite Us

If you use CARL in your research, please cite our paper on the benchmark:

@inproceedings { BenEim2023a,
  author       = {Carolin Benjamins and
                  Theresa Eimer and
                  Frederik Schubert and
                  Aditya Mohan and
                  Sebastian Döhler and
                  André Biedenkapp and
                  Bodo Rosenhahn and
                  Frank Hutter and
                  Marius Lindauer},
  title        = {Contextualize Me - The Case for Context in Reinforcement Learning},
  journal      = {Transactions on Machine Learning Research},
  year         = {2023},
}

References

OpenAI gym, Brockman et al., 2016. arXiv preprint arXiv:1606.01540

Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation, Freeman et al., NeurIPS 2021 (Dataset & Benchmarking Track)

TOAD-GAN: Coherent Style Level Generation from a Single Example, Awiszus et al., AIIDE 2020

dm_control: Software and Tasks for Continuous Control

License

CARL falls under the Apache License 2.0 (see file 'LICENSE') as is permitted by all work that we use. This includes CARLMario, which is not based on the Nintendo Game, but on TOAD-GAN and TOAD-GUI running under an MIT license. They in turn make use of the Mario AI framework (https://github.com/amidos2006/Mario-AI-Framework). This is not the original game but a replica, explicitly built for research purposes and includes a copyright notice (https://github.com/amidos2006/Mario-AI-Framework#copyrights ).

carl's People

Contributors

Stargazers

Watchers

Forkers

rl-code-lib r-ceph uzman-anwar macmacal arman717 techthiyanes borisguo6 sthagen mlaux1 shenjiede

carl's Issues

Inquiry Regarding RL Libraries and Curves in CARL Project

Hello CARL Team,

I hope this message finds you well. I am currently exploring the CARL project and came across your paper. I have a couple of questions regarding the implementation details mentioned in your paper as well as the curve generation on the CARL website.

In your paper, you reference the use of an RL code library for your experiments. Could you please specify which RL code library you used for the experiments described in the paper? This information would be valuable for those interested in replicating your results.
Additionally, I noticed the presence of curves on the CARL website, particularly on this page: https://automl.github.io/CARL/main/source/environments/environment_families/classic_control.html. Could you kindly clarify which RL library or framework was used to generate these curves? Understanding the underlying RL library would assist users in better comprehending the provided visualizations.

I appreciate your assistance in providing this information, as it will contribute to a deeper understanding of your work and the CARL project as a whole.

Thank you for your time and consideration.

Integrate context mask

Add argument "context_mask" to mask out context features. Implement in the getter property method for context.
Useful if context features are added later on and the networks mismatch or if we only want to change and work with a subset of contexts.

Dataset Folders for CARLRNA

Provide option to specify several folders containing a dataset each instead of a single path like it is right now

AttributeError: 'System' object has no attribute 'body_idx' in brax

when running test/test_all_envs.py, there is AttributeError: 'System' object has no attribute 'body_idx' in carl_fetch and carl_humanoid environments.

only update context (build new) if new context

Update to gymnasium

Reintegrate RNA

Fix installation

Using with stable-baselines3

Hi,

Thank you for this software, it's really useful for my research. I'm currently running some baseline algorithms with stable-baselines3 and somehow I have trouble instantiating the replay buffer. it demands the action space to be of the type 'gymnasium.spaces.box.Box' although the type of the carl environment is 'gym.spaces.box.Box'. let me know how I could fix it? Thanks!

AttributeError: 'LunarLander' object has no attribute 'sky_polys'

Happens when running examples/demo_heuristic_lunarlander.py in branch train.

brax: make context switching faster

more advanced instance selection

Right now, only the modes random and roundrobin for instance selection are implemented. For more sophisticated schedules there should be the option to pass it as a class/function. A base instance selector should be written for this, implementing random and roundrobin.

more advanced context augmentation

Create seperate class for context augmentation offering different kinds of noise/injections/context post-processing.

Different observation/state array types returned with `hide_context` True vs False

The observation/state returned on reset and step with hide_context set to True is of type <class 'jaxlib.xla_extension.DeviceArray'>, whereas the observation/state returned with hide_context set to False is of type np.ndarray.

Specifically:
The output of state from this line is of type jaxlib.xla_extension.DeviceArray, which is preserved with hide_context is True.

CARL/carl/envs/carl_env.py

Line 212 in a88cfbf

state = self.env.reset(**kwargs)

The output state of this build_context_adaptive_state is of type np.ndarray.

CARL/carl/envs/carl_env.py

Line 244 in a88cfbf

return state

This creates issues downstream where a numpy array is expected e.g. if the observation is converted to a torch tensor, it raises an error when given a Jax DeviceArray. A fix to this should consistently return the observation/state as a numpy array. This was tested with the CarlAnt environment.

Search space encoding failing on docstring example

I added the docstring example to test_search_space_encoding and it fails. This means either the docstring needs an update or there's an actual bug.

add citation.cff

In general, in-code __authors__ is towards people who developed the code, for paper refrences, you can use githubs Citation.cff.

An example of one is here for autosklearn, where the main section is for software and then there preffered_citation part is for the paper. When you use the cite this repository feature on the right, it will use the preffered_citation part in which you can include an accompanying paper.

The full spec is here if you need the reference :)

Originally posted by @eddiebergman in #32 (comment)

Subtle error in action_space type for dm_control (gym/gymnasium)

In the codebase we first wrap dmcontrol environments into gym, with MujocoToGymWrapper. But when we again wrap them into gymnasium, we don't cast the action_space (and possibly other stuff) into corresponding gymnasium's spaces class. I guess making MujocoToGymWrapper use gymnasium instead of gym would fix this problem.

Vector support for non-Brax environments

Brax has a simple system for vectorization we can support easily, the same may not be true for all other environments. where usually we have a list of envs. It would still be nice to support the gym vector env, though. The important question to answer here, though, is how to handle context resets and if we can choose different context for parallel envs.

Upgrade to new brax version

argument legacy_spring not accepted

Merge MountainCar

Leaderboards?

Do we want leaderbords like OpenAI? How do we do this if yes?

Fix Installation

Make env families seperately installable. (circumvent installing box2d error on Mac)

update setup.py
update readme: how to modular install, where installation has been tested
update requirements.txt (soften requirements)
update src.envs.init.py (see how it's done for RNA)
update documentation

Support for python ver 3.8

Hi,

I am trying to use tf-agents and CARL together on gpu and to make sure everything is compatible I am using python=3.8, tensorflow=2.3, tf-agents=0.6.0. But CARL doesn't work with python=3.8. If you could give support for python ver 3.8 that would be great.

Integrate sampling for uniform bounds

See branch development.

Adjust convention: [0.9, 1.1] (1 centered) --> [-0.1, 0.1]

Configurable initial state distributions

Motivated by this Twitter thread, it might be convenient to make the initial state distributions of the environments in CARL configurable.

This has already been done for e.g. MountainCarContinuous but for others its still missing.

Allow sampling contexts on the fly on reset

It would be nice to just pass bounds of the contexts to the environment as a dict and have it sample contexts from this distribution on every reset instead of passing pre-defined list of contexts.

P.S. New this repo, so I don't know if this suggestion comes with some performance/implementation challenges.

Set context only during reset, not in init

Documentation cannot be accessed

the link to the documentation cannot be found: https://carl.readthedocs.io/en/latest/

Cannot instantiate CARLHumanoid

CARLVehicleRacing: More dynamics

add weather to world, e.g., wind as a constant for each episode and rain reducing the friction
add different aerodynamic drags

Move documentation to automl.github.io?

inline with other docs from our group / save provider
@TheEimer

Update to Gym 0.25.x

Adjust step function and custom environments.

Work on branch gym_0.25.x.

See gym release notes.

Performance Deviations in Brax

Comparing HalfCheetah in Brax (via gym.make and then wrapped as here: https://github.com/google/brax/blob/main/notebooks/training_torch.ipynb) vs in CARL makes a big difference in return even when the context is kept static. Do we do any unexpected reward normalization? Does the way we reset the env make a difference compared to theirs (as we actually update the simluation)?

Make example with RL library

Just a nice demo so it's clear how to use carl

Integrate DM Control

(convert test file to jupyter notebook. I would like to keep that)
check tests / write more to increase coverage
update README.md
update documentation
add dm_control to requirements
support dict observation space

rewrite carl.envs.init

rewrite loading of environments. loading all takes too much time.

Uniform step function in RNA

As the step function for the base env is named differently, the CARLEnv step is overwritten in RNA. That isn't ideal and should be changed, probably by wrapping the base method.

Remove `dataclasses` from requirements

It's meant for backdating to allow for python 3.6, a version which reached it's end of life end of last year.

Increase coverage

Make CARL pypi installable?

Is this something we want to do? Is it hard? Do we think it's worth it?

Documentation

Briefly describe contextual RL and what context is

CARLLunarLander not training properly

Hi,

I have been using CARL with tensorflow agents to test some things. Everything was working fine until I updated from CARL version 0.2.0 to 1.0.0. After this update training on CARLLunarLander would not lead to good performance. I was able to replicate similar behavior with stable baselines PPO agent and the code for the same is below.


import gymnasium
from gymnasium.wrappers import FlattenObservation

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy

from carl.envs import CARLLunarLander

def env_fn():
    return FlattenObservation(CARLLunarLander())

vec_env_ll = make_vec_env(env_fn, n_envs=16)
ppo_agent = PPO("MlpPolicy", vec_env_ll, verbose=1)

ppo_agent.learn(total_timesteps=1e6)

mean_reward, std_reward = evaluate_policy(ppo_agent, ppo_agent.get_env(), n_eval_episodes=100)
print("Reward: Mean = {0}, Standard deviation = {1}".format(mean_reward, std_reward))
# Output: Reward: Mean = -130.27439046, Standard deviation = 45.32413066301294

env_fn_gymn = partial(
    gymnasium.make, "LunarLander-v2"
)
vec_env_gymn = make_vec_env(env_fn_gymn, n_envs=16)
ppo_agent2 = PPO("MlpPolicy", vec_env_gymn, verbose=1)

ppo_agent2.learn(total_timesteps=1e6)

mean_reward, std_reward = evaluate_policy(ppo_agent2, ppo_agent2.get_env(), n_eval_episodes=100)
print("Reward: Mean = {0}, Standard deviation = {1}".format(mean_reward, std_reward))
# Output: Reward: Mean = 261.80816654, Standard deviation = 38.24869197512363

The expected behavior is that the reward obtained by training on CARLLunarLander and gymnasium.make("LunarLander-v2") should be close to each other since they are identical environments. However the same is not happening.
The jupyter notebook with the code and more details is here.

PS: Similar behavior can be seen with tensorflow agents PPO agent.

Thanks.

Stochasticity benchmarking

How many seeds do we need for proper significance? Should be relevant for all users

Adjust API to current brax version

DMControlEnv (walker/quadruped) has zero gravity in default context

Hi, I am trying to use the DMControlEnvs walker/quadruped with the default context value. But the gravity is zero even when the context has the correct value. Here are the packages I have installed in my python 3.9 environment

absl-py==2.0.0
antlr4-python3-runtime==4.9.3
appdirs==1.4.4
asttokens==2.4.1
astunparse==1.6.3
beautifulsoup4==4.12.2
blinker==1.7.0
brax==0.9.3
bs4==0.0.1
cachetools==5.3.2
carl-bench @ git+https://github.com/automl/CARL.git@b96661be3a2d10bf77969f2bdca6d05b84d54673
certifi==2023.11.17
chardet==5.2.0
charset-normalizer==3.3.2
chex==0.1.85
click==8.1.7
cloudpickle==3.0.0
comm==0.2.0
ConfigArgParse==1.7
ConfigSpace==0.7.1
contextlib2==21.6.0
contourpy==1.2.0
crafter==1.8.2
cycler==0.12.1
dataclasses==0.6
DataProperty==1.0.1
debugpy==1.8.0
decorator==4.4.2
dm-control==1.0.15
dm-env==1.6
dm-tree==0.1.8
docker-pycreds==0.4.0
etils==1.5.2
exceptiongroup==1.2.0
executing==2.0.1
Farama-Notifications==0.0.4
Flask==3.0.0
Flask-Cors==4.0.0
flatbuffers==1.12
flax==0.7.5
fonttools==4.45.1
fsspec==2023.10.0
gast==0.4.0
gitdb==4.0.11
GitPython==3.1.40
glfw==2.6.3
google-auth==2.23.4
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.59.3
gym==0.26.2
gym-notices==0.0.8
gymnasium==0.29.1
h5py==3.10.0
idna==3.6
imageio==2.33.0
imageio-ffmpeg==0.4.9
importlib-metadata==6.8.0
importlib-resources==6.1.1
ipykernel==6.27.1
ipython==8.18.1
itsdangerous==2.1.2
jax==0.4.20
jaxlib==0.4.20+cuda11.cudnn86
jaxopt==0.8.2
jedi==0.19.1
Jinja2==3.1.2
jupyter_client==8.6.0
jupyter_core==5.5.0
keras==2.9.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.5
labmaze==1.0.6
libclang==16.0.6
lxml==4.9.3
Markdown==3.5.1
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.2
matplotlib-inline==0.1.6
mbstrdecoder==1.1.3
mdurl==0.1.2
ml-collections==0.1.1
ml-dtypes==0.3.1
more-itertools==10.1.0
moviepy==1.0.3
msgpack==1.0.7
mujoco==3.0.1
nest-asyncio==1.5.8
numpy==1.26.2
numpyencoder==0.3.0
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-nvcc-cu11==11.8.89
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==8.9.6.50
nvidia-cufft-cu11==10.9.0.58
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusparse-cu11==11.7.5.86
nvidia-nccl-cu11==2.19.3
oauthlib==3.2.2
omegaconf==2.3.0
opensimplex==0.4.5
opt-einsum==3.3.0
optax==0.1.7
orbax-checkpoint==0.4.3
packaging==23.2
pandas==2.1.3
parso==0.8.3
pathvalidate==3.2.0
pexpect==4.9.0
Pillow==10.1.0
platformdirs==4.0.0
proglog==0.1.10
prompt-toolkit==3.0.41
protobuf==3.20.0
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.5.1
pyasn1-modules==0.3.0
pygame==2.5.2
pyglet==2.0.10
Pygments==2.17.2
PyOpenGL==3.1.7
pyparsing==3.1.1
pytablewriter==1.2.0
python-dateutil==2.8.2
pytinyrenderer==0.0.14
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
requests==2.31.0
requests-oauthlib==1.3.1
rich==13.7.0
rsa==4.9
ruamel.yaml==0.17.32
ruamel.yaml.clib==0.2.8
scipy==1.11.4
sentry-sdk==1.38.0
setproctitle==1.3.3
six==1.16.0
smmap==5.0.1
soupsieve==2.5
stack-data==0.6.3
tabledata==1.3.3
tabulate==0.9.0
tcolorpy==0.1.4
tensorboard==2.9.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorboardX==2.6.2.2
tensorflow-cpu==2.9.0
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.34.0
tensorflow-probability==0.23.0
tensorstore==0.1.50
termcolor==2.3.0
toolz==0.12.0
tornado==6.4
tqdm==4.66.1
traitlets==5.14.0
trimesh==4.0.5
typepy==1.3.2
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.1.0
wandb==0.16.0
wcwidth==0.2.12
Werkzeug==3.0.1
wrapt==1.16.0
xvfbwrapper==0.2.9
zipp==3.17.0

General improvements list

Just some things noted that might be nice to know about and I can contribute. Non are urgent, just documenting them:

The tests folder should be moved up one directory as they aren't part of the source code. Doing so means they aren't included with a distribution if CARL is put up on PyPi.
I would recommend using pytest over the built in unittest, it's a lot more flexible but this probably needs some guidance.
Seems like the .env file in the src directory shouldn't be required?
The setup.cfg could be a little more explicit, especially with declaring where the package actually is. Here's some reading if you like.
Seems everything to do with submodules is no longer relevant, can delete the .gitmodules file and the instructions in the README.md with respect to --recursive.
Could set up Github actions to do some general checks
- Run tests
- Set up code coverage, gives reports like this and even more detailed things like this.
- Can do checks for formatting, using black, isort, flake8 and mypy.
Setup a pyproject.toml and .flake8 which will configure all of those formatting tools
A Makefile to do basic things which make contributers life easier and your own life easier. An example can be seen in this PR for autosklearn.
Pip has a hard time resolving all the dependencies for version #b7382fe, not that easy to fix but freezing requirements to a specific version might help that. Has other issues down the line though, such as testing new versions of libraries or getting the latest updates.

About brax

Hello, the version of brax and the main brach seems to be not matched, I can not import '_SYSTEM_CONFIG' from brax.envs.ant.

In carl/envs/brax/carl_ant.py: from brax.envs.ant import _SYSTEM_CONFIG, Ant