Code Monkey home page Code Monkey logo

sail-sg / envpool Goto Github PK

View Code? Open in Web Editor NEW
1.0K 22.0 89.0 3.62 MB

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

Home Page: https://envpool.readthedocs.io

License: Apache License 2.0

Dockerfile 0.71% Makefile 0.80% Starlark 6.55% C++ 64.87% Python 26.80% C 0.19% Shell 0.08%
reinforcement-learning parallel-processing cpp17 pybind11 reinforcement-learning-environments threadpool atari-games vizdoom gym high-performance-computing

envpool's Introduction


PyPI Downloads arXiv Read the Docs Unittest GitHub issues GitHub stars GitHub forks GitHub license

EnvPool is a C++-based batched environment pool with pybind11 and thread pool. It has high performance (~1M raw FPS with Atari games, ~3M raw FPS with Mujoco simulator on DGX-A100) and compatible APIs (supports both gym and dm_env, both sync and async, both single and multi player environment). Currently it supports:

Here are EnvPool's several highlights:

Check out our arXiv paper for more details!

Installation

PyPI

EnvPool is currently hosted on PyPI. It requires Python >= 3.7.

You can simply install EnvPool with the following command:

$ pip install envpool

After installation, open a Python console and type

import envpool
print(envpool.__version__)

If no error occurs, you have successfully installed EnvPool.

From Source

Please refer to the guideline.

Documentation

The tutorials and API documentation are hosted on envpool.readthedocs.io.

The example scripts are under examples/ folder; benchmark scripts are under benchmark/ folder.

Benchmark Results

We perform our benchmarks with ALE Atari environment PongNoFrameskip-v4 (with environment wrappers from OpenAI Baselines) and Mujoco environment Ant-v3 on different hardware setups, including a TPUv3-8 virtual machine (VM) of 96 CPU cores and 2 NUMA nodes, and an NVIDIA DGX-A100 of 256 CPU cores with 8 NUMA nodes. Baselines include 1) naive Python for-loop; 2) the most popular RL environment parallelization execution by Python subprocess, e.g., gym.vector_env; 3) to our knowledge, the fastest RL environment executor Sample Factory before EnvPool.

We report EnvPool performance with sync mode, async mode, and NUMA + async mode, compared with the baselines on different number of workers (i.e., number of CPU cores). As we can see from the results, EnvPool achieves significant improvements over the baselines on all settings. On the high-end setup, EnvPool achieves 1 Million frames per second with Atari and 3 Million frames per second with Mujoco on 256 CPU cores, which is 14.9x / 19.6x of the gym.vector_env baseline. On a typical PC setup with 12 CPU cores, EnvPool's throughput is 3.1x / 2.9x of gym.vector_env.

Atari Highest FPS Laptop (12) Workstation (32) TPU-VM (96) DGX-A100 (256)
For-loop 4,893 7,914 3,993 4,640
Subprocess 15,863 47,699 46,910 71,943
Sample-Factory 28,216 138,847 222,327 707,494
EnvPool (sync) 37,396 133,824 170,380 427,851
EnvPool (async) 49,439 200,428 359,559 891,286
EnvPool (numa+async) / / 373,169 1,069,922
Mujoco Highest FPS Laptop (12) Workstation (32) TPU-VM (96) DGX-A100 (256)
For-loop 12,861 20,298 10,474 11,569
Subprocess 36,586 105,432 87,403 163,656
Sample-Factory 62,510 309,264 461,515 1,573,262
EnvPool (sync) 66,622 380,950 296,681 949,787
EnvPool (async) 105,126 582,446 887,540 2,363,864
EnvPool (numa+async) / / 896,830 3,134,287

Please refer to the benchmark page for more details.

API Usage

The following content shows both synchronous and asynchronous API usage of EnvPool. You can also run the full script at examples/env_step.py

Synchronous API

import envpool
import numpy as np

# make gym env
env = envpool.make("Pong-v5", env_type="gym", num_envs=100)
# or use envpool.make_gym(...)
obs = env.reset()  # should be (100, 4, 84, 84)
act = np.zeros(100, dtype=int)
obs, rew, term, trunc, info = env.step(act)

Under the synchronous mode, envpool closely resembles openai-gym/dm-env. It has the reset and step functions with the same meaning. However, there is one exception in envpool: batch interaction is the default. Therefore, during the creation of the envpool, there is a num_envs argument that denotes how many envs you like to run in parallel.

env = envpool.make("Pong-v5", env_type="gym", num_envs=100)

The first dimension of action passed to the step function should equal num_envs.

act = np.zeros(100, dtype=int)

You don't need to manually reset one environment when any of done is true; instead, all envs in envpool have enabled auto-reset by default.

Asynchronous API

import envpool
import numpy as np

# make asynchronous
num_envs = 64
batch_size = 16
env = envpool.make("Pong-v5", env_type="gym", num_envs=num_envs, batch_size=batch_size)
action_num = env.action_space.n
env.async_reset()  # send the initial reset signal to all envs
while True:
    obs, rew, term, trunc, info = env.recv()
    env_id = info["env_id"]
    action = np.random.randint(action_num, size=batch_size)
    env.send(action, env_id)

In the asynchronous mode, the step function is split into two parts: the send/recv functions. send takes two arguments, a batch of action, and the corresponding env_id that each action should be sent to. Unlike step, send does not wait for the envs to execute and return the next state, it returns immediately after the actions are fed to the envs. (The reason why it is called async mode).

env.send(action, env_id)

To get the "next states", we need to call the recv function. However, recv does not guarantee that you will get back the "next states" of the envs you just called send on. Instead, whatever envs finishes execution gets recved first.

state = env.recv()

Besides num_envs, there is one more argument batch_size. While num_envs defines how many envs in total are managed by the envpool, batch_size specifies the number of envs involved each time we interact with envpool. e.g. There are 64 envs executing in the envpool, send and recv each time interacts with a batch of 16 envs.

envpool.make("Pong-v5", env_type="gym", num_envs=64, batch_size=16)

There are other configurable arguments with envpool.make; please check out EnvPool Python interface introduction.

Contributing

EnvPool is still under development. More environments will be added, and we always welcome contributions to help EnvPool better. If you would like to contribute, please check out our contribution guideline.

License

EnvPool is under Apache2 license.

Other third-party source-code and data are under their corresponding licenses.

We do not include their source code and data in this repo.

Citing EnvPool

If you find EnvPool useful, please cite it in your publications.

@inproceedings{weng2022envpool,
 author = {Weng, Jiayi and Lin, Min and Huang, Shengyi and Liu, Bo and Makoviichuk, Denys and Makoviychuk, Viktor and Liu, Zichen and Song, Yufan and Luo, Ting and Jiang, Yukun and Xu, Zhongwen and Yan, Shuicheng},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
 pages = {22409--22421},
 publisher = {Curran Associates, Inc.},
 title = {Env{P}ool: A Highly Parallel Reinforcement Learning Environment Execution Engine},
 url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/8caaf08e49ddbad6694fae067442ee21-Paper-Datasets_and_Benchmarks.pdf},
 volume = {35},
 year = {2022}
}

Disclaimer

This is not an official Sea Limited or Garena Online Private Limited product.

envpool's People

Contributors

51616 avatar alicia1529 avatar araffin avatar benjamin-eecs avatar ethanluoyc avatar hansbug avatar leninilyich avatar lkevinzc avatar markus28 avatar mavenlin avatar peilinrao avatar quangr avatar trinkle23897 avatar vwxyzjn avatar wangsiping97 avatar yufansong avatar yukunj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

envpool's Issues

[Feature Request] More APIs for Environment Parameters Updating

Motivation

Seems that this project can contribute to a wide range of robotic learning research directions ๐Ÿ‘

However, there is a core limitation of current version that there is no API for curriculum learning, domain randomization, or other environment-updating functions.

These APIs are very common in recent works on RL for legged robots, quadrotors, dexterous hands, etc.

For example, we might want the training environment starting from a easy stage, and then harder and harder.

Normally, we can parameterize the env with some modifiable parameters. Updating parameters can automatically contribute to make the env change.

Solution

A simple solution is to add some API to the main env classes, like update_parameters, init_parameters, compatible to C++ functions. A good reference can be this module, which always updates the env parameters and do randomization for robot learning.

Alternative Solution

Add child classes of Env (like "ParameterizedEnv"), including the needed APIs.

Checklist

  • [Done] I have checked that there is no similar issue in the repo (required)

[BUG] dangling pointer for multidim action array

Describe the bug

template <typename dtype>
Array NumpyToArray(const py::array& arr) {
using array_t = py::array_t<dtype, py::array::c_style | py::array::forcecast>;
array_t arr_t(arr);
ShapeSpec spec(arr_t.itemsize(),
std::vector<int>(arr_t.shape(), arr_t.shape() + arr_t.ndim()));
return Array(spec, reinterpret_cast<char*>(arr_t.mutable_data()));
}

When input numpy array not store in C-style, function NumpyToArray() will creat a new local numpy object arr_t, then return an Array point to a deallocated array. For one dimension action array, because 1-dimensional arrays is both C-style and Fortran-style this code works just fine. But in multidim case, sometimes env get wrong input.

To Reproduce

Input a non C-style action array ,like a two-dim action input and cast it into Fortran-style. I can't provide a short reproduce example because it requries a env takes 2-dim action input.

Expected behavior

The input action will sometimes become mess.

Additional context

template <typename dtype>
Array NumpyToArray(const py::array& arr) {
using array_t = py::array_t<dtype, py::array::c_style | py::array::forcecast>;
array_t arr_t(arr);
ShapeSpec spec(arr_t.itemsize(),
std::vector<int>(arr_t.shape(), arr_t.shape() + arr_t.ndim()));
return Array(spec, reinterpret_cast<char*>(arr_t.mutable_data()));
}

I think it's may safer to add an assertion like assert(arr_t.ptr()==arr.ptr()) in such case. It's not safe to assume py::array_t will return an return an reference but not a new object. MAYBE you would like to check if there are any other codes are written under this wrong assumption. Anyway, I think your guys' work is remarkable, I'm enjoying training agent by envpool.

Reason and Possible fixes

I already open a pull request that cast action array into C-style.
#64

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[BUG] The reset function doesn't actually seem to reset the environment, at least for the "Pong-v5" env

The reset function doesn't actually seem to reset the environment, at least for the "Pong-v5" env. In the following code, I'm using a random policy to step the env. I first step the env n_init_steps number of times, then reset it and then run 3 episodes to completion and check the episode returns for the 3 episodes. When I reset the env, I expect that the episode return is still close to -21 because the policy is random. But as I increase n_init_steps, the episode return starts decreasing. If I don't reset the env after n_init_steps number of steps, the episode returns for 3 episodes are close to -21 as expected. Am I doing something wrong here? The episode returns calculated assuming the env was not reset shows numbers around -21. Am I doing something wrong or is this a bug?

def test_envpool_resets_correctly() -> None:
    def gather_rewards(n_init_steps: int, reset: bool = True):
        env = envpool.make_gym("Pong-v5", num_envs=1, seed=0)
        ep_returns: list[float] = []
        def policy():
            return np.asarray([env.action_space.sample()])
        curr_ep_return = 0
        for _ in range(n_init_steps):
            _, rewards, dones, _ = env.step(policy())
            curr_ep_return += rewards.item()
            if dones.item():
                ep_returns.append(curr_ep_return)
                curr_ep_return = 0
        old_ep_return = curr_ep_return
        if reset:
            env.reset()
            curr_ep_return = 0
        # Copy ep_returns to avoid changing the original list
        # old_ep_returns assumes env was not reset
        old_ep_returns = [x for x in ep_returns]
        for _ in range(3):
            while True:
                _, rewards, dones, _ = env.step(policy())
                curr_ep_return += rewards.item()
                old_ep_return += rewards.item()
                if dones.item():
                    old_ep_returns.append(old_ep_return)
                    ep_returns.append(curr_ep_return)
                    old_ep_return = 0
                    curr_ep_return = 0
                    break
        return ep_returns, old_ep_returns

    print(gather_rewards(0))  # ([-21.0, -21.0, -19.0], [-21.0, -21.0, -19.0])
    print(gather_rewards(500))  # ([-10.0, -21.0, -21.0], [-21.0, -21.0, -21.0])
    print(gather_rewards(800))  # ([-2.0, -21.0, -21.0], [-20.0, -21.0, -21.0])
    print(gather_rewards(500, False))  # ([-20.0, -20.0, -21.0], [-20.0, -20.0, -21.0])
    print(gather_rewards(800, False))  # ([-21.0, -20.0, -20.0, -21.0], [-21.0, -20.0, -20.0, -21.0])

Originally posted by @AdityaGudimella in #119

[Feature Request] ACME Integration

https://github.com/deepmind/acme

Road Map:

@TianyiSun316

  • Go through ACME codebase and integrate vector_env to the available algorithms;
  • Write Atari examples;
  • Check Atari performance: Pong and Breakout;
  • Submit PR;

@LeoGuo98

  • Do some experiments with sample efficiency (actually you can try out with different libraries, either ACME, tianshou, or sb3, this doesn't depend on the previous item)

Resources:

tianshou: #51
stable-baselines3: #39
cleanrl: #48 #53

cc @zhongwen

[BUG] Failed to make several Atari environments list in the docs

Describe the bug

Failed to make serveral Atari environments and 'Wrap ROM error' pops out.

List of envs:

Combat-v5
Joust-v5
MazeCraze-v5
Warlords-v5

To Reproduce

import envpool
envpool.make_gym('Combat-v5')
Attempt to wrap ROM "/home/benjamin/anaconda3/envs/envpool/lib/python3.8/site-packages/envpool/atari/atari_roms/combat/combat.bin"(0ef64cdbecccb7049752a3de0b7ade14) failed.
If you're using an MD5 mismatched ROM, please make sure the filename is in snake case.
e.g., space_invaders.bin

For a list of supported ROMs see https://github.com/mgbellemare/Arcade-Learning-Environment

Expected behavior

Make those envs successfully

Screenshots

image

System info

Describe the characteristic of your environment:

  • envpool is installed via pip
  • envpool version: '0.6.1.post1
  • numpy version: '1.22.4
  • Python version: 3.8.8
0.6.1.post1 1.22.4 3.8.8 (default, Apr 13 2021, 19:58:26)
[GCC 7.3.0] linux

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[Feature Request] async_reset supports multiple calls until getting obs from all envs

Motivation

In my project, I set all env's initial state to the obs returned from step(sync mode) or recv(async mode). After the initialization will I start training or evaluation. Everything is ok in sync mode. However in async mode, for example, env_num is 16 and batch_size is 8, I call async_reset once and get 8 envs' initial state. If I call async_reset again, there will be an error, so that I cannot initialize all envs' initial state at the beginning.

Solution

I can call async_reset multiple times to get all envs' observation. In the example, the first call will reset 8 envs, and the second call will reset another 8 envs.

Alternatives

I think it is ok that I can call reset(sync mode) in aysnc mode, and reset functions exactly as in sync mode.

Additional context

No.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Feature Request] Add Pixel-level Observations for mujoco

Motivation

#107 (comment)

Solution

Init thought:

Add method render or off_screen_render in both mujoco_env.h

Additional context

mujoco_py use MjRenderPool for multiprocess off-screen and on-screen render on multi-GPU

Road Map

  • Go through their record.cc, get familar with off screen render with 3 different OpenGL context: GLFW, OSMesa and EGL
  • Update BUILD file
  • dm_control: Physics.render and Camera
  • mujoco_py
  • Add some unit tests (good to submit the first PR here);

Resource

dm_control

mujoco_py

FrameStack Wrapper

Checklist

  • I have checked that there is no similar issue in the repo (required)

[BUG] Reward is not deterministic after seeding the env

Describe the bug

I use envpool to make HalfCheeth-v3 with a fixed seed, but the rewards are not the same during several runs. Specifically, only the reward turned by the first env is not deterministic, other envs are good. And if the num_envs is small, this bug does not occur.

To Reproduce

import envpool
import numpy as np

def random_rollout():
    np.random.seed(0)
    n = 32
    envs = envpool.make_gym('HalfCheetah-v3', num_envs=n, seed=123)
    envs.reset()
    rew_sum = 0
    for _ in range(10):
        action = np.random.rand(n, envs.action_space.shape[0])
        obs, rew, done, info = envs.step(action)
        rew_sum += rew
    envs.close()
    return rew_sum


if __name__ == "__main__":
    a = random_rollout()
    b = random_rollout()
    print(a - b)

Output:

[-0.01131058  0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.        ]

Expected behavior

The reward should be deterministic after seeding.

System info

Describe the characteristic of your environment:

  • envpool version: '0.6.0'
  • envpool is installed via pip
  • Python version: 3.8.10
0.6.0 1.21.5 3.8.10 (default, Jun  4 2021, 15:09:15) 
[GCC 7.5.0] linux

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[Feature Request] Procgen integration

https://github.com/openai/procgen

Env List:

  • bigfish
  • bossfight
  • caveflyer
  • chaser
  • climber
  • coinrun
  • dodgeball
  • fruitbot
  • heist
  • jumper
  • leaper
  • maze
  • miner
  • ninja
  • plunder
  • starpilot

Road Map:

  • Get comfortable with current codebase, go through https://envpool.readthedocs.io/en/latest/pages/env.html and add a toy environment by yourself locally;
  • Download Procgen and run on your local machine, try with different env settings and see the actual behavior;
  • Go through their source code https://github.com/openai/procgen/tree/master/procgen/src, understand the code structure and where we can bind EnvPool APIs (I think the entry is vecgame.cpp);
  • Integrate only one game and let it work;
  • Add some unit tests (good to submit the first PR here);
  • Integrate other games (submit another PR) and related tests.

Resources:

[Feature Request] Template for private env that uses envpool as a dependency

Motivation

It is desirable that one can develop their own env without having to work under envpool's code base. While still have the access to register the env with envpool and use the make function to create it. It seems already possible with our current code base, we just need a template repo.

Solution

import envpool
import my_private_env
my_env = envpool.make("MyPrivateEnv")

Where my_private_env is developed as another package.

Checklist

  • I have checked that there is no similar issue in the repo (required)

cc @zhongwen

[BUG] Segfault when batch size is larger than 255 on Atari environments

Describe the bug

Segfault when batch size is larger than 255 on Atari environments

MuJoCo environment seems to work well.

To Reproduce

Steps to reproduce the behavior.

import time

import envpool
import numpy as np

batch_size = 256  # set to 255 works

env = envpool.make_gym("Breakout-v5",
                        stack_num=1,
                        
                        num_envs=batch_size * 2,
                        batch_size=batch_size,

                        use_inter_area_resize=False,

                        img_width=88,
                        img_height=88,
                        
                        num_threads=0,
                        thread_affinity_offset=0)
action = np.array(
    [env.action_space.sample() for _ in range(batch_size)]
)

counter = 0

env.async_reset()

last_time = time.time()
while True:
    obs, rew, done, info = env.recv()

    env_id = info["env_id"]
    env.send(action, env_id)

    counter += batch_size
    if counter >= 100000:
        cur_time = time.time()
        print("TPS", counter / (cur_time - last_time))

        counter = 0
        last_time = cur_time
[1]    2959596 segmentation fault (core dumped)  python test_envpool.py

Expected behavior

Can run with large batch size, like 1024, 2048, etc.

System info

Describe the characteristic of your environment:

  • Describe how the library was installed (pip, source, ...)
  • Python version
  • Versions of any other relevant libraries
import envpool, numpy, sys
print(envpool.__version__, numpy.__version__, sys.version, sys.platform)
0.6.1.post1 1.21.2 3.8.12 (default, Oct 12 2021, 13:49:34) 
[GCC 7.5.0] linux

Additional context

Set batch size to 1024 works / segfaults randomly

1024
TPS 49611.30131772514
TPS 57661.12695997062
TPS 52648.235412990536
TPS 52059.6945247295
[1]    2971074 segmentation fault (core dumped)  python test_envpool.py

Reason and Possible fixes

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[BUG] Can't install on python 3.9 / 3.10 on macOS

I tried to install envpool on python 3.9.5 and 3.10.0 with pip install envpool and got the following in both cases:

ERROR: Could not find a version that satisfies the requirement envpool (from versions: none)
ERROR: No matching distribution found for envpool

I haven't checked other python versions though.

[Feature Request] Box2D integration

https://github.com/openai/gym/tree/master/gym/envs/box2d

Env List:

  • CarRacing-v1 (#71)
  • BipedalWalker-v3 (#131)
  • BipedalWalkerHardcore-v3 (#131)
  • LunarLander-v2 (#111)
  • LunarLanderContinuous-v2 (#111)

Road Map:

  • Get comfortable with current codebase, go through https://envpool.readthedocs.io/en/latest/pages/env.html and add a toy environment by yourself locally;
  • Run Box2D environments on your local machine [2], try with different env settings and see the actual behavior;
  • Go through pyBox2d code [1], think about how can we directly call those methods via EnvPool single environment abstraction;
  • Integrate only one game and let it work (you only need to translate python to C++);
  • Add some unit tests (good to submit the first PR here);
  • Integrate other environments (submit another PR) and related tests.

Resources:

  1. https://github.com/pybox2d/pybox2d/tree/master/Box2D
  2. First install gym, then run with
import gym
env = gym.make("CarRacing-v0")
env.reset()
for _ in range(10):
  env.step(env.action_space.sample())
  env.render()

[Feature Request] Compatibility with gym and SB3 wrapper

Motivation

Related to #33

I was trying to make env pool work with SB3 and I noticed different inconsistencies with classic gym envs / gym vector envs.
I wrote a wrapper but currently there is no way to properly handle terminal observations (as mentioned in #33 ) because I cannot step in a particular env... (env.send() does exist but env.recv() does not garantee the result to be from the same env).

Solution

My current solution, I can also make a PR if you think it makes sense to integrate it directly into envpool (would make it easier for people already using gym / SB3 to adopt envpool ;))

import gym
import envpool
from gym.envs.registration import EnvSpec

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecEnvWrapper, VecMonitor
from stable_baselines3.common.env_util import make_vec_env

import numpy as np

from stable_baselines3.common.vec_env.base_vec_env import (
    VecEnv,
    VecEnvStepReturn,
    VecEnvWrapper,
)

num_envs = 4
env_id = "Pendulum-v0"
seed = 0
use_env_pool = True


class VecAdapter(VecEnvWrapper):
    def __init__(self, venv):
        venv.num_envs = venv.spec.config.num_envs
        super().__init__(venv=venv)

    def step_async(self, actions: np.ndarray) -> None:
        self.actions = actions

    def reset(self):
        return self.venv.reset()

    def step_wait(self):
        # TODO: handle terminal obs
        obs, reward, done, info = self.venv.step(self.actions)
        infos = []
        # convert to list
        for i in range(self.num_envs):
            infos.append(
                {
                    key: info[key][i]
                    for key in info.keys()
                    if isinstance(info[key], np.ndarray)
                }
            )
        return obs, reward, done, infos


if use_env_pool:
    env = envpool.make(env_id, env_type="gym", num_envs=num_envs, seed=seed)
    env.spec.id = env_id
    env = VecAdapter(env)
    env = VecMonitor(env)
else:
    env = make_vec_env(env_id, n_envs=num_envs)


model = PPO(
    "MlpPolicy",
    env,
    n_steps=1024,
    learning_rate=1e-3,
    use_sde=True,
    sde_sample_freq=4,
    gae_lambda=0.95,
    gamma=0.9,
    verbose=1,
    seed=seed,
)
try:
    model.learn(100_000)
except KeyboardInterrupt:
    pass

Alternative

A better alternative would be to fix those inconsistencies directly in the c++ code.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[BUG] async_reset crashes when called multiple times

Describe the bug

When we call async_reset multiple times, we get a crash.

The use case here is that I want to run many parallel episodes with the async interface, and then once all the episodes are complete, start some new episodes again.

To Reproduce

``

from ipdb import set_trace
import envpool
import numpy as np

if __name__ == '__main__':
    EPISODES_PER_GRADIENT = 1
    BATCH_SIZE = 1

    ## Initialize environments
    env = envpool.make("CartPole-v1", env_type="gym", num_envs=EPISODES_PER_GRADIENT, batch_size=BATCH_SIZE)

    for round in range(10000000):
        print(f"==> Round {round: 4}.", end="\r")
        ## Reset all environments
        env.async_reset()

        ## Play exactly 100 episodes
        for i in range(100):
            state, rew, done, info = env.recv()
            env_id = info["env_id"]
            env.send(np.random.randint(env.action_space.n, size=BATCH_SIZE), env_id)
$ python3.9 test2.py 
terminate called after throwing an instance of 'std::out_of_range'
  what():  StateBuffer out of storage
Aborted

Expected behavior

Each environment should reset, no crashing.

System info

Linux, python3.9, using anaconda, installed via pip install.

import envpool, numpy, sys
print(envpool.__version__, numpy.__version__, sys.version, sys.platform)

0.5.3.post1 1.22.3 3.9.12 (main, Apr  5 2022, 06:56:58) 
[GCC 7.5.0] linux

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

`.dockerignore` should ignore `.git` folder

It seems that .dockerignore is a symbol link to .gitignore, which doesn't look like a good practice. Generally, it should ignore the binaries (as specified in .gitignore), and ignore .git folder. Therefore, when developers made some minor changes on their git repository (e.g., git pull), their docker image won't need rebuilding.

[Feature Request] Mujoco integration

https://github.com/openai/gym/tree/master/gym/envs/mujoco

Env List:

Road Map:

  • Get comfortable with current codebase, go through https://envpool.readthedocs.io/en/latest/pages/env.html and add a toy environment by yourself locally;
  • Download Mujoco and run on your local machine [1] [5], try with different env settings and see the actual behavior;
  • Go through their code [1] [2] (I think it's better to go through both openai and deepmind versions, but only use deepmind's solution as reference), understand their ctype APIs and what we can use to bind with EnvPool APIs [3];
  • Integrate only one game and let it work;
  • Add some unit tests (good to submit the first PR here);
  • Integrate other environments (submit another PR) and related tests.

Resources:

  1. https://github.com/openai/mujoco-py
  2. https://github.com/deepmind/dm_control/tree/master/dm_control/mujoco
  3. https://github.com/deepmind/mujoco/blob/main/doc/programming.rst
  4. It is quite similar with Atari games which we have already integrated: https://github.com/mgbellemare/Arcade-Learning-Environment
  5. First install gym and mujoco, then run with
import gym
env = gym.make("Ant-v3")
env.reset()
for _ in range(10):
  env.step(env.action_space.sample())
  env.render()
  1. https://github.com/ikostrikov/gym_dmc/blob/master/compare.py a checker

[BUG] Misalignment on Humanoid-v3

Reported by rl_games

Initial result reported by @Benjamin-eecs

  1. envpool v3 train gym v3 test
    gym v3 17 step, reward ~82
    envpool v3 1001 step, reward ~10000, but has variance (maybe not fix seed?)
  2. gym v3 train envpool v3 test
    gym v3 1000 step, reward ~6200
    envpool v3 29 step, reward ~144

[Feature Request] Turn off auto-reset

Motivation

The documentation says auto-reset is enabled by default, but for certain applications it is better to have it turned off.

Solution

When auto-reset is off, terminated environments should not be run. They should continue to return dummy state/action/reward/etc in syncronous mode, and they should never be included in the batch in async mode.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[Feature Request] Support elementwise bounds for array spec

Motivation

Per discussion at

https://github.com/sail-sg/envpool/pull/25/files#r753731476

For state/action array, we need to specify the range of valid values. e.g. in the cartpole env, we need a different range for each of the elements. Currently we only support specifying a global low/high value for all elements in the array.

Solution

Support both global low/high, or elementwise low/high, depending on what the env developer pass in the C++ code. We need a data structure that can take both. The difficulty is how do we support nested initializer list with dynamic depth.

Checklist

  • I have checked that there is no similar issue in the repo (required)

[BUG] no Acc-v3 environment

Describe the bug

A clear and concise description of what the bug is.

When using tianshou, there's no Acc-v3 environment.

File "test_dqn_acc.py", line 246, in
Acc_tain()
File "test_dqn_acc.py", line 112, in Acc_tain
args.task, num_envs=args.training_num, env_type="gym"
File "/home/zhulin/.conda/envs/mytorch/lib/python3.7/site-packages/envpool/registration.py", line 43, in make
f"{task_id} is not supported, envpool.list_all_envs() may help."
AssertionError: Acc-v3 is not supported, envpool.list_all_envs() may help.

[BUG] Terminal observation missing

Describe the bug

Unless I'm mistaken, the env resets automatically when an episode is over, which means the terminal observation is not accessible to the agent which prevent from doing proper bootstrapping for infinite horizon problems.

See openai/gym#2484 and openai/gym#1632

Expected behavior

openai/gym#2484

Reason and Possible fixes

Add terminal_observation key to the info dict, as already done for the timelimit truncation.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[Feature Request] Core Improvement

Road Map:

  1. Install and run envpool examples successfully;
  2. Setup bazel build environment successfully: refer to the online documentation or CI scripts under .github/;
  3. Read code, typically envpool/core/, make sure you understand the current system structure; in short:
  • array.h contains a C++-based numpy-style array, paired with spec.h; for data transfer
  • dict.h contains a C++-based dict; for declaring env attributes such as observation space and action space;
  • env_spec.h contains the common field of environment attributes;
  • env.h is the single env abstraction in envpool; other C++ env MUST inherits this class and overwrite Reset, Step, and IsDone;
  • state_buffer_queue.h and action_buffer_queue.h are (almost) lock-free queue implementation for AsyncEnvPool;
  • envpool.h is the base class for various of envpool implementation, e.g., PyEnvPool is the python binding of envpool, AsyncEnvPool is the C++ async env execution;
  • async_envpool.h contains the main logic of concurrent execution; the thread pool is inside its constructor.

Things to do:

  • #27
  • #168
  • Find out the cause of the following issue, and maybe #3, or lock-free thread pool?
However, the curve shows some strange things: suppose we use N CPU cores with
a single machine, when we set the number of thread (M) to M<N/2 , the performance
is linearly scaled; when M>=N/2 , the performance gain is limited.

For Win/Mac:

[BUG] Breakout-v5 Performance Regression

Describe the bug

PPO can no longer reproduce 400 game scores in the Breakout-v5 given 10M steps of training (same hyperparameters) as it can in BreakoutNoFrameskip-v4.

image

To Reproduce

Run the https://wandb.ai/costa-huang/cleanRL/runs/26k4q5jo/code?workspace=user-costa-huang to reproduce envpool's results and https://wandb.ai/costa-huang/cleanRL/runs/1ngqmz96/code?workspace=user-costa-huang to reproduce BreakoutNoFrameskip-v4 results.

Expected behavior

PPO should obtain 400 game scores in the Breakout-v5 given 10M steps of training

System info

Describe the characteristic of your environment:

  • Describe how the library was installed (pip, source, ...)
  • Python version
  • Versions of any other relevant libraries
import envpool, numpy, sys
print(envpool.__version__, numpy.__version__, sys.version, sys.platform)

>>> import envpool, numpy, sys
__, numpy.__version__, sys.version, sys.platform)
>>> print(envpool.__version__, numpy.__version__, sys.version, sys.platform)
0.4.3 1.21.5 3.9.5 (default, Jul 19 2021, 13:27:26) 
[GCC 10.3.0] linux

Reason and Possible fixes

I ran the gym's ALE/Breakout-v5 as well and got a regression as well as shown below, but looking into it was because ALE/Breakout-v5 by default uses the full action space (14 discrete actions), whereas the Breakout-v5 has the minimal 4 discrete actions. So I have no idea why the regression happens with envpool...

image

Checklist

  • [ x ] I have checked that there is no similar issue in the repo (required)
  • [ x ] I have read the documentation (required)
  • [ x ] I have provided a minimal working example to reproduce the bug (required)

[Feature Request] DMC framestack wrapper support

Motivation

DMC experiments always need a framestack wrapper. However it is hard to implement it out of envpool because we should wrap env before creating vector envs.

Solution

Can you explain why, and whether it is possible to support this wrapper?

Checklist

  • I have checked that there is no similar issue in the repo (required)

Atari option for repeat_action_probability

The -v5 Gym Atari environments have sticky actions enabled by default (with repeat_action_probability=0.25, see here). This makes it impossible to replicate the original results from several key papers, especially the DQN Nature paper.

Would it be possible to add an option to the Atari environment options that lets the user change repeat_action_probability to a different value? I believe that internally this can be accomplished by forwarding the argument to either gym.make or the ALE constructor.

[BUG] Cannot save SB3 VecNormalize wrapped env using env pool

Describe the bug

SB3 VecNormalize wrapper allows saving an environment. This is required for instance, if a VecNormalize wrapper is applied to the env, to retrieve at test/evaluation time. Envpool appears not to have this same feature.

To Reproduce

Steps to reproduce the behavior.

I used the SB3 example with Acrobot-v1 (since Pendulum-v0 appears to be deprecated now) with one slight change: https://github.com/sail-sg/envpool/blob/master/examples/sb3_examples/ppo.py

I additionally wrap the environment with VecNormalize.
for e.g.

from stable_baselines3.common.vec_env import VecNormalize
if use_env_pool:
  env = envpool.make(env_id, env_type="gym", num_envs=num_envs, seed=seed)
  env.spec.id = env_id
  env = VecAdapter(env)
  env = VecNormalize(env)
  env = VecMonitor(env)

Then I try to save the env:

path = "/content/"
env.save(path)
AttributeError                            Traceback (most recent call last)

<ipython-input-22-d83fb0aff1e3> in <module>()
      1 path = "/content/"
----> 2 env.save(path)

2 frames

/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py in __getattr__(self, name)
    301         which have unique attributes of interest.
    302         """
--> 303         blocked_class = self.getattr_depth_check(name, already_found=False)
    304         if blocked_class is not None:
    305             own_class = f"{type(self).__module__}.{type(self).__name__}"

/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py in getattr_depth_check(self, name, already_found)
    353         else:
    354             # this wrapper does not have the attribute. Keep searching.
--> 355             shadowed_wrapper_class = self.venv.getattr_depth_check(name, already_found)
    356 
    357         return shadowed_wrapper_class

/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py in getattr_depth_check(self, name, already_found)
    353         else:
    354             # this wrapper does not have the attribute. Keep searching.
--> 355             shadowed_wrapper_class = self.venv.getattr_depth_check(name, already_found)
    356 
    357         return shadowed_wrapper_class

AttributeError: 'AcrobotGymEnvPool' object has no attribute 'getattr_depth_check'

System info

Tried this on Google Colab.

import envpool, numpy, sys
print(envpool.__version__, numpy.__version__, sys.version, sys.platform)
0.4.4 1.19.5 3.7.12 (default, Sep 10 2021, 00:21:48) 
[GCC 7.5.0] linux

Additional context

Add any other context about the problem here.

Reason and Possible fixes

If you know or suspect the reason for this bug, paste the code lines and suggest modifications.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[BUG] Mujoco: WARNING: Unknown warning type Time = 0.4000.

Describe the bug

When running mujoco env, with some configuration it raises this warning.

To Reproduce

On my laptop

$ cd benchmark
$ python3 test_envpool.py --env mujoco --num-envs 17 --batch-size 17
Namespace(batch_size=17, env='mujoco', num_envs=17, num_threads=0, seed=0, thread_affinity_offset=0,
total_step=50000)
  0%|                                                                         | 0/50000 [00:00<?, ?it/s]
WARNING: Unknown warning type Time = 5.0500.

  0%|โ–                                                            | 102/50000 [00:00<00:49, 1011.42it/s]
WARNING: Unknown warning type Time = 5.0500.

WARNING: Unknown warning type Time = 5.0500.

WARNING: Unknown warning type Time = 5.0500.

WARNING: Unknown warning type Time = 5.0500.

Expected behavior

No warning

System info

My laptop. But this phenomenon also exists in other hardware, e.g., workstation: 50 wait 20.

Additional context

Related issue: openai/mujoco-py#340
This error only occurs when batch_size > 16
Manually checked with NaN and inf but everything looks good.
The sleep solution to some extent helps, but it's not a good solution, i.e., adding print will reduce the possibility of this warning's occurance.

Reason and Possible fixes

Wait until deepmind release the mujoco source code (hopefully in mid-May). Add traceback and find the root cause.
This error generates from libmujoco.so.2.1.5
A pure C++ working example to reproduce this bug would be very helpful.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[Feature Request] Comparison with ELF

Motivation

ELF hosts multiple games in parallel with C++ threading and it says any game with C/C++ interface can be plugged into this framework by writing a simple wrapper, which can serve as an baseline on atari scenario.

Resource

Checklist

  • I have checked that there is no similar issue in the repo (required)

Bazel 3rd party libraries availability in mainland China

We need to figure out a possible solution for these users, otherwise many links are unavailable and it's quite hard for them to build this project locally.


Here is the solution: (and they are documented at https://envpool.readthedocs.io/en/latest/pages/build.html as well)

  1. For docker build: make docker-dev-cn
  2. For installing golang: see https://studygolang.com/dl
  3. For installing go module: go env -w GOPROXY=https://goproxy.cn
  4. For installing bazel (feel free to change the version 5.1.1 below):
wget https://mirrors.huaweicloud.com/bazel/5.1.1/bazel-5.1.1-linux-x86_64
chmod +x bazel-5.1.1-linux-x86_64
mkdir -p $HOME/go/bin
mv bazel-5.1.1-linux-x86_64 $HOME/go/bin/bazel
  1. For fetching 3rd-party libraries in bazel: config HTTP_PROXY and HTTPS_PROXY: https://docs.bazel.build/versions/main/external.html#using-proxies
  2. For fetching pypi wheel: uncomment extra_args in envpool/pip.bzl to switch the pypi source (feel free to change the source URL):
     if "pip_requirements" not in native.existing_rules().keys():
         pip_install(
             name = "pip_requirements",
             python_interpreter = "python3",
             quiet = False,
             requirements = "@envpool//third_party/pip_requirements:requirements.txt",
-            # extra_pip_args = ["--extra-index-url", "https://mirrors.aliyun.com/pypi/simple"],
+            extra_pip_args = ["--extra-index-url", "https://mirrors.aliyun.com/pypi/simple"],
         )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.