Comments (5)
Hi, would it make sense for VectorEnv.reset(...)
to accept a list of options
similar to seeds
, i.e. one options dict per env?
from gymnasium.
An additional change would be when reset
is called.
Currently, the Gym vector API implements it such that if a step
is terminated
or truncated
, reset
is called within the step
. The original terminating step observation is added to the info as "final_obs".
A proposed solution is noted in this comment openai/gym#2279 (comment) which would be necessary if we wanted EnvPool to work alongside the Gymnasium vector API
from gymnasium.
Thanks for your thoughts on this @MarcCote ! Sven here from the RLlib team :)
- Some update from RLlib wrt vectorization: We are actively looking into fully deprecating all of RLlib's in-house env APIs (VectorEnv, BaseEnv, etc.., except for MultiAgentEnv), in favor of just supporting gymnasium (and its own vector option).
- We are currently a few days away from merging our gymnasium support PR.
On the autoreset issue you mentioned: I would prefer autoreset (similar to how deepmind has always done it), but then the per-episode seeding (and option'ing) would not be accessible anymore. If per-episode seeding and option'ing is really essential for users, then we won't be able to avoid having an additional reset_at(vector_idx: int, *, seed=None, options=None)
method (like we will have in RLlib after moving to gymnasium in a few days).
from gymnasium.
Thanks for your thoughts @sven1977
Im unfamiliar with how deepmind has achieved environment vectorisation, could you provide a short summary of the difference between deepminds approach and openai's approach (or envpool)
For being able to set the autoreset seed, this is an interesting idea that I hadn't heard of before. I don't think we will include this in the VectorEnv
definition as we are interested in users being able to create custom vector environments which has the minimal requirements.
However, like @MarcCote proposal, I would be interested in adding this to a SyncVectorEnv
or AsyncVectorEnv
special parameter.
The current reset
type definition for those classes is reset(self, seed: Union[int, list[int]], options: Optional[dict[str, Any]] -> tuple[ObsType, dict[str, Any]]
. As we already have list[int]
for setting the initial reset seeds of the individual environments, I'm unsure of what other options we have, using a callable is the best option I can think of other than adding another function. Any other thoughts?
from gymnasium.
Discussion on the step
function call order, which if adopted is breaking backward compatibility change for both the AutoResetWrapper
and new SyncVectorEnv
and AsyncVectorEnv
. The proposed function call if identical to the current implementation within EnvPool.This does not necessarily change the VectorEnv
implementation.
@RedTachyon @vwxyzjn @araffin Would be interested in your thoughts
Considering a very simple training loop
import gymnasium as gym
env = gym.make("CartPole-v1")
obs, info = env.reset()
for training_step in range(1_000):
action = policy(obs)
next_obs, reward, terminated, truncated, next_info = env.step(action)
replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
if terminated or truncated:
obs, info = env.reset()
else:
obs, info = next_obs, next_info
The AutoReset
wrapper (and the Vector implementations) look very similar to this implementation. While the auto resetting removes the need for one if statement, I believe it requires adding a new if statement.
import gymnasium as gym
env = gym.make("CartPole-v1")
env = gym.wrappers.AutoResetWrapper(env)
obs, info = env.reset()
for training_step in range(1_000):
action = policy(obs)
next_obs, reward, terminated, truncated, next_info = env.step(action)
if terminated or truncated:
replay_buffer.append((obs, info, action, next_info["final_observation"], reward, terminated, truncated))
else:
replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
obs, info = next_obs, next_info
I propose the following change to the AutoReset
wrapper and vector implementations. The difference is when reset
is called, in the current implementation, within the step
call, if the resulting data has an episode ending then reset
is called within the same step
with the old obs and info packaged into the new info from reset
. This PR changes it such that the reset
will happen in the next step
following the episode ending.
import gymnasium as gym
env = gym.make("CartPole-v1")
env = gym.experimental.wrappers.AutoReset(env)
obs, info = env.reset()
for training_step in range(1_000):
action = policy(obs)
next_obs, reward, terminated, truncated, next_info = env.step(action)
replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
obs, info = next_obs, next_info
Advantages
- Simplifies training code, currently, you need to check if the sub-environment's episode ended then collect the last observation and info (for truncated cases) to save to the replay buffer. With a similar change to vector, users shouldn't need to use the
info["final_observation"][env_num]
Disadvantage
- This is large breaking change for old training code. This could be fixed through renaming the current implementation as
LegacySyncVectorEnv
. - If an episode has terminated then there is a "dead" observation where the action has to be ignored
- It is only possible to convert "old" to "new" and not "new" to "old"
from gymnasium.
Related Issues (20)
- [Question] How to increase the number of max episode steps HOT 4
- [Question] Installing Gymnasium[box2d] using Conda with legacy-install-failure
- [Bug Report] `InvertedDoublePendulumEnv` and `InvertedPendulumEnv` always gives "alive_bonus" HOT 14
- [Question] mujoco.FatalError: gladLoadGL error HOT 5
- [Bug Report] `Humanoid-v4` does not have `contact_cost` HOT 5
- Documentation: number of fields returned by step() HOT 2
- [Question] Integration of POPGym into Gymnasium HOT 2
- [Proposal] Add a flag to MultiBinary to allow one-hot encoding HOT 14
- [Proposal] FrameStack wrapper but for vector observations HOT 5
- [Bug Report] Fix recording of final observations/infos in vectorized jax functional envs HOT 2
- [Bug Report] env_render_passive_checker missing arguments HOT 3
- [Bug Report] MuJoCo `Ant` & `Humanoid` have wrong "x_position" & "y_position" `info` HOT 4
- Default camera config HOT 12
- [Bug Report] MuJoCo Envs, healthy reward issues HOT 7
- Gymnasium doesn't recognize Box2d HOT 1
- [Question] Change the default glfw window height and width HOT 2
- [Proposal] Explicitly requires to allow closing a closed environment HOT 2
- [Proposal] `VideoRecorder` shouldn't close the env when it's closed HOT 2
- [Bug Report] Update the link of stable-retro in the docs to point to the new Farama repo HOT 2
- [Bug Report] `Humanoid` & `Ant` Have wrong `info["distance_from_origin"]`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gymnasium.