I started trying to implement a new, more sane vector API. And then I quickly realized

Thanks for your thoughts on this <a class="user-mention notranslate" data-hovercard-ty

Thanks for your thoughts <a class="user-mention notranslate" data-hovercard-type="user

Vector API - design considerations about gymnasium HOT 5 CLOSED

farama-foundation commented on June 19, 2024

Vector API - design considerations

from gymnasium.

Comments (5)

MarcCote commented on June 19, 2024 1

Hi, would it make sense for VectorEnv.reset(...) to accept a list of options similar to seeds, i.e. one options dict per env?

from gymnasium.

pseudo-rnd-thoughts commented on June 19, 2024

An additional change would be when reset is called.
Currently, the Gym vector API implements it such that if a step is terminated or truncated, reset is called within the step. The original terminating step observation is added to the info as "final_obs".
A proposed solution is noted in this comment openai/gym#2279 (comment) which would be necessary if we wanted EnvPool to work alongside the Gymnasium vector API

from gymnasium.

sven1977 commented on June 19, 2024

Thanks for your thoughts on this @MarcCote ! Sven here from the RLlib team :)

Some update from RLlib wrt vectorization: We are actively looking into fully deprecating all of RLlib's in-house env APIs (VectorEnv, BaseEnv, etc.., except for MultiAgentEnv), in favor of just supporting gymnasium (and its own vector option).
We are currently a few days away from merging our gymnasium support PR.

On the autoreset issue you mentioned: I would prefer autoreset (similar to how deepmind has always done it), but then the per-episode seeding (and option'ing) would not be accessible anymore. If per-episode seeding and option'ing is really essential for users, then we won't be able to avoid having an additional reset_at(vector_idx: int, *, seed=None, options=None) method (like we will have in RLlib after moving to gymnasium in a few days).

from gymnasium.

pseudo-rnd-thoughts commented on June 19, 2024

Thanks for your thoughts @sven1977

Im unfamiliar with how deepmind has achieved environment vectorisation, could you provide a short summary of the difference between deepminds approach and openai's approach (or envpool)

For being able to set the autoreset seed, this is an interesting idea that I hadn't heard of before. I don't think we will include this in the VectorEnv definition as we are interested in users being able to create custom vector environments which has the minimal requirements.
However, like @MarcCote proposal, I would be interested in adding this to a SyncVectorEnv or AsyncVectorEnv special parameter.

The current reset type definition for those classes is reset(self, seed: Union[int, list[int]], options: Optional[dict[str, Any]] -> tuple[ObsType, dict[str, Any]]. As we already have list[int] for setting the initial reset seeds of the individual environments, I'm unsure of what other options we have, using a callable is the best option I can think of other than adding another function. Any other thoughts?

from gymnasium.

pseudo-rnd-thoughts commented on June 19, 2024

Discussion on the step function call order, which if adopted is breaking backward compatibility change for both the AutoResetWrapper and new SyncVectorEnv and AsyncVectorEnv. The proposed function call if identical to the current implementation within EnvPool.This does not necessarily change the VectorEnv implementation.

@RedTachyon @vwxyzjn @araffin Would be interested in your thoughts

Considering a very simple training loop

import gymnasium as gym

env = gym.make("CartPole-v1")

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
    
    if terminated or truncated:
        obs, info = env.reset()
    else:
        obs, info = next_obs, next_info

The AutoReset wrapper (and the Vector implementations) look very similar to this implementation. While the auto resetting removes the need for one if statement, I believe it requires adding a new if statement.

import gymnasium as gym

env = gym.make("CartPole-v1")
env = gym.wrappers.AutoResetWrapper(env)

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    if terminated or truncated:
        replay_buffer.append((obs, info, action, next_info["final_observation"], reward, terminated, truncated))
    else:
        replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
     
    obs, info = next_obs, next_info

I propose the following change to the AutoReset wrapper and vector implementations. The difference is when reset is called, in the current implementation, within the step call, if the resulting data has an episode ending then reset is called within the same step with the old obs and info packaged into the new info from reset. This PR changes it such that the reset will happen in the next step following the episode ending.

import gymnasium as gym

env = gym.make("CartPole-v1")
env = gym.experimental.wrappers.AutoReset(env)

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
    obs, info = next_obs, next_info

Advantages

Simplifies training code, currently, you need to check if the sub-environment's episode ended then collect the last observation and info (for truncated cases) to save to the replay buffer. With a similar change to vector, users shouldn't need to use the info["final_observation"][env_num]

Disadvantage

This is large breaking change for old training code. This could be fixed through renaming the current implementation as LegacySyncVectorEnv.
If an episode has terminated then there is a "dead" observation where the action has to be ignored
It is only possible to convert "old" to "new" and not "new" to "old"

from gymnasium.

Vector API - design considerations about gymnasium HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent