Code Monkey home page Code Monkey logo

Comments (5)

MarcCote avatar MarcCote commented on June 19, 2024 1

Hi, would it make sense for VectorEnv.reset(...) to accept a list of options similar to seeds, i.e. one options dict per env?

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on June 19, 2024

An additional change would be when reset is called.
Currently, the Gym vector API implements it such that if a step is terminated or truncated, reset is called within the step. The original terminating step observation is added to the info as "final_obs".
A proposed solution is noted in this comment openai/gym#2279 (comment) which would be necessary if we wanted EnvPool to work alongside the Gymnasium vector API

from gymnasium.

sven1977 avatar sven1977 commented on June 19, 2024

Thanks for your thoughts on this @MarcCote ! Sven here from the RLlib team :)

  • Some update from RLlib wrt vectorization: We are actively looking into fully deprecating all of RLlib's in-house env APIs (VectorEnv, BaseEnv, etc.., except for MultiAgentEnv), in favor of just supporting gymnasium (and its own vector option).
  • We are currently a few days away from merging our gymnasium support PR.

On the autoreset issue you mentioned: I would prefer autoreset (similar to how deepmind has always done it), but then the per-episode seeding (and option'ing) would not be accessible anymore. If per-episode seeding and option'ing is really essential for users, then we won't be able to avoid having an additional reset_at(vector_idx: int, *, seed=None, options=None) method (like we will have in RLlib after moving to gymnasium in a few days).

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on June 19, 2024

Thanks for your thoughts @sven1977

Im unfamiliar with how deepmind has achieved environment vectorisation, could you provide a short summary of the difference between deepminds approach and openai's approach (or envpool)

For being able to set the autoreset seed, this is an interesting idea that I hadn't heard of before. I don't think we will include this in the VectorEnv definition as we are interested in users being able to create custom vector environments which has the minimal requirements.
However, like @MarcCote proposal, I would be interested in adding this to a SyncVectorEnv or AsyncVectorEnv special parameter.

The current reset type definition for those classes is reset(self, seed: Union[int, list[int]], options: Optional[dict[str, Any]] -> tuple[ObsType, dict[str, Any]]. As we already have list[int] for setting the initial reset seeds of the individual environments, I'm unsure of what other options we have, using a callable is the best option I can think of other than adding another function. Any other thoughts?

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on June 19, 2024

Discussion on the step function call order, which if adopted is breaking backward compatibility change for both the AutoResetWrapper and new SyncVectorEnv and AsyncVectorEnv. The proposed function call if identical to the current implementation within EnvPool.This does not necessarily change the VectorEnv implementation.

@RedTachyon @vwxyzjn @araffin Would be interested in your thoughts

Considering a very simple training loop

import gymnasium as gym

env = gym.make("CartPole-v1")

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
    
    if terminated or truncated:
        obs, info = env.reset()
    else:
        obs, info = next_obs, next_info

The AutoReset wrapper (and the Vector implementations) look very similar to this implementation. While the auto resetting removes the need for one if statement, I believe it requires adding a new if statement.

import gymnasium as gym

env = gym.make("CartPole-v1")
env = gym.wrappers.AutoResetWrapper(env)

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    if terminated or truncated:
        replay_buffer.append((obs, info, action, next_info["final_observation"], reward, terminated, truncated))
    else:
        replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
     
    obs, info = next_obs, next_info

I propose the following change to the AutoReset wrapper and vector implementations. The difference is when reset is called, in the current implementation, within the step call, if the resulting data has an episode ending then reset is called within the same step with the old obs and info packaged into the new info from reset. This PR changes it such that the reset will happen in the next step following the episode ending.

import gymnasium as gym

env = gym.make("CartPole-v1")
env = gym.experimental.wrappers.AutoReset(env)

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
    obs, info = next_obs, next_info

Advantages

  1. Simplifies training code, currently, you need to check if the sub-environment's episode ended then collect the last observation and info (for truncated cases) to save to the replay buffer. With a similar change to vector, users shouldn't need to use the info["final_observation"][env_num]

Disadvantage

  1. This is large breaking change for old training code. This could be fixed through renaming the current implementation as LegacySyncVectorEnv.
  2. If an episode has terminated then there is a "dead" observation where the action has to be ignored
  3. It is only possible to convert "old" to "new" and not "new" to "old"

from gymnasium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.