Comments (12)
I think using the last observation is fine in this situation then maybe raising an error if the user tries to step
when all environment episodes have ended. This should avoid issues where the user step
infinitely when all observations are equivalent
from gymnasium.
I don't think I like this, VectorEnvs are meant to have autoreset implemented inside by default -- this is part of the specification of a valid VectorEnv, it'd be a dangerous situation if a "raw" VectorEnv that we supply as a built-in doesn't fulfill it.
This means I cannot leverage VectorEnv to perform parallel evaluations.
I don't see how this is true? You can do evaluation the normal way, and discard any steps from unfinished episodes (which you know based on the truncated
and terminated
values). Removing the autoreset would essentially lead to undefined behavior (taking actions in a terminated environment), which is not a good thing to have.
from gymnasium.
The code returns last_state which contains all the information returned by the last step call
Good point, I misread it. But note that it's very likely that learning code would interpret that as a very long chain of one-step episodes, which is not great.
How's the autoreset handled in the vectorised cartpole?
Gymnasium/gymnasium/envs/classic_control/cartpole.py
Lines 430 to 435 in 5974003
The way I see, I'm not making any assumption about how the Envs should deal with action sent after termination signal was returned.
You're assuming that it makes sense to call step
after the environment terminates, before resetting it again. So notably it shouldn't raise an exception, which would be a pretty reasonable thing to do. If we remove the autoreset, then you can have perfectly reasonable code (env that raises an exception) and on top of it you use a perfectly valid vectorization (AsyncVectorEnv or something), and now everything is crashing. Yes, you're providing a wrapper that does it for Sync/Async vector envs, but we shouldn't require people to use that, or to implement that specific functionality.
Also correct me if I'm wrong, but this IgnoreDoneEnv
wrapper actually does exactly what you want, right? So for Sync/Async vector envs you can just do that, and other vectorization methods anyways can't be expected to provide this functionality. So why should this be included in the core API, if a wrapper works just fine for your application?
Isn't the purpose of having a AutoResetWrapper to have control on when I want this functionality
This is the case for individual environments where it's much less critical -- you can just do if terminated or truncated: env.reset()
and you're golden without autoreset.
With vectorized environments, you basically can't train without autoreset, since you can only reset all the environments the same time, and not just a proper subset of them. If you reset after one env terminates, you lose the tails from all the other envs. If you reset after all envs terminate, then you lost a whole bunch of time computing actions for environments which have been terminated anyways. That's why I'm still convinced that autoreset should be considered a core feature of the vector API, and not something optional. And for the rare edge cases where you want to customize it - just use a wrapper.
from gymnasium.
I'll try that.
There's a good chance you can actually just do
gym.make_vec(..., vectorization_mode="async", wrappers=[IgnoreDoneEnv])
and it will just work the way you want it to work (after changing the wrapper as you described). In fact I'd definitely hope it does work, that's how it's designed.
from gymnasium.
I think the application makes sense and we should support it but can we have this as an option that is disabled by default (autoreset is used) due to backward compatibility.
from gymnasium.
It can easily be made an option, but what is the expected behavior when it is set to false? Is copying the last state over until a manually reset, the best way (i.e., would users expect such behavior)?
from gymnasium.
Good idea. I'll make a PR for this proposal.
Also, I noticed when registering an environment, there is an autoreset option. If you have an autoresetting environment and vectorize it (Sync or Async), the autoreset logic is applied twice, i.e. one from the AutoResetWrapper, and another one in the VectorEnv.step function.
from gymnasium.
Right, I shouldn't have said that I can't, I meant to say it is more convenient to not have to track the step history, do filtering based on termination, and look into the info
dict for 'final_observation' and 'final_info'. Simply looking at the last output of envs.step()
is cleaner in terms of number of lines in the user code.
Removing the autoreset would essentially lead to undefined behavior (taking actions in a terminated environment), which is not a good thing to have.
Not if taking an action in a terminated environment is a no-op (as done in the code included in the original post).
it'd be a dangerous situation if a "raw" VectorEnv that we supply as a built-in doesn't fulfill it.
What do you mean by dangerous?
from gymnasium.
Not if taking an action in a terminated environment is a no-op
That's not true for all envs, and enforcing it would be a huge (and imo bad) change. Also notice that your code only returns the observation, ignoring the rest of the step return. That's probably a simple omission, but it avoids a few potentially important design choices - should we keep returning the same value of terminated
? What about truncated
? Or maybe they should be False
for all these "fake" steps?
What do you mean by dangerous?
By far, the main purpose of gym.Env
and now also gym.VectorEnv
is providing a shared API. It doesn't matter what environment I use, if I have any VectorEnv
, I should be able to interact with it the exact same way as any other environment (bar env-specific additions) and get valid results.
My idea is to have the autoreset behavior be included as part of the VectorEnv
specification, i.e. for your custom VectorEnv
to be valid, it must handle autoreset on its own. If we ship an implementation of VectorEnv
which does not fulfill this requirement, we're basically breaking our own contract, and negating the whole point of having the shared API.
FWIW this is also why I'm uncertain what to do with the list of seeds and list of options, and I'm delaying the decision for now until I can really properly analyze it. This is a very important decision that we can't just take casually, even though it might seem trivial in terms of the code.
The benefit of this proposal, as far as I can tell, is saving a few CPU cycles during evaluation in some specific environments in specific situations (notice that if you're running 8 parallel envs and want to collect a total of 10 evaluation episodes, this will in fact slow you down), which I don't think is impactful enough for this to be worth it.
from gymnasium.
That's not true for all envs, and enforcing it would be a huge (and imo bad) change.
The way I see, I'm not making any assumption about how the Envs should deal with action sent after termination signal was returned. The no-op is enforced at higher-level by skipping the call to self.env.step
in the first place.
Also notice that your code only returns the observation, ignoring the rest of the step return. That's probably a simple omission, but it avoids a few potentially important design choices - should we keep returning the same value of terminated? What about truncated? Or maybe they should be False for all these "fake" steps?
The code returns last_state
which contains all the information returned by the last step call (i.e., before environment ended). I don't see why terminated/truncated should change value as no action was sent to the sub env. That said I can see some issue where TimeLimit would set truncated=True
depending on the wrappers order.
By far, the main purpose of gym.Env and now also gym.VectorEnv is providing a shared API. It doesn't matter what environment I use, if I have any VectorEnv, I should be able to interact with it the exact same way as any other environment (bar env-specific additions) and get valid results.
How's the autoreset handled in the vectorised cartpole?
My idea is to have the autoreset behavior be included as part of the
VectorEnv
specification, i.e. for your customVectorEnv
to be valid, it must handle autoreset on its own. If we ship an implementation ofVectorEnv
which does not fulfill this requirement, we're basically breaking our own contract, and negating the whole point of having the shared API.
Isn't the purpose of having a AutoResetWrapper
to have control on when I want this functionality in the first place? Right now, if I register a new environment with make(..., autoreset=True)
then wants to vectorize it, I get the unexpected behavior of having two autoresets and the info dict looks like this info['final_info'][0]['final_info']['some_key']
.
FWIW this is also why I'm uncertain what to do with the list of seeds and list of options, and I'm delaying the decision for now until I can really properly analyze it. This is a very important decision that we can't just take casually, even though it might seem trivial in terms of the code.
Take all the time you need. I still believe it is a useful feature to have as a user.
The benefit of this proposal, as far as I can tell, is saving a few CPU cycles during evaluation in some specific environments in specific situations (notice that if you're running 8 parallel envs and want to collect a total of 10 evaluation episodes, this will in fact slow you down), which I don't think is impactful enough for this to be worth it.
Compute is not the issue I was looking to address, I want cleaner user code. Also, some environments might use GPU rendering, so it's not only CPU cycles. Also, resets can be expensive, e.g. launching a game.
from gymnasium.
But note that it's very likely that learning code would interpret that as a very long chain of one-step episodes, which is not great. You're assuming that it makes sense to call step after the environment terminates, before resetting it again. So notably it shouldn't raise an exception, which would be a pretty reasonable thing to do. If we remove the autoreset, then you can have perfectly reasonable code (env that raises an exception) and on top of it you use a perfectly valid vectorization (AsyncVectorEnv or something), and now everything is crashing. Yes, you're providing a wrapper that does it for Sync/Async vector envs, but we shouldn't require people to use that, or to implement that specific functionality.
Very good points. I now understand your hesitation.
Also correct me if I'm wrong, but this IgnoreDoneEnv wrapper actually does exactly what you want, right? So for Sync/Async vector envs you can just do that, and other vectorization methods anyways can't be expected to provide this functionality. So why should this be included in the core API, if a wrapper works just fine for your application?
I thought originally that I needed a way to disable the autoreset that is happening in the step function. Since any wrapper provided during gym.make_vec
is going to be executed on the first line of Async/Sync.step()
(ie. before the autoreset code), I could catch the terminated/truncated and changed it back to False but carry over the last state while adding some flag to info
. I'll try that.
from gymnasium.
I was able to bypass the autoreset and make it works as I wanted with an inner Wrapper
to carry over the state and an outer VectorWrapper
to set terminateds
and truncateds
correctly. While autoreset=False
would be useful, implementing the correct and expected behavior is indeed depending on the use case, thus I agree it doesn't make sense to be included in the core API. Thanks for taking the time to look into this.
Here's my wrappers code for reference.
class SkipDone(gym.Wrapper):
def reset(self, **kwargs):
observation, info = self.env.reset(**kwargs)
self._last_state = None
self._is_done = False
return observation, info
def step(self, action):
if not self._is_done:
observation, reward, terminated, truncated, info = self.env.step(action)
self._is_done = terminated or truncated
info["_terminated"] = terminated
info["_truncated"] = truncated
terminated = truncated = False # To avoid being autoreset.
self._last_state = (observation, reward, terminated, truncated, info)
return self._last_state
class PostProcessSkipDone(gym.vector.VectorWrapper):
def step(self, actions):
observations, rewards, terminateds, truncateds, infos = self.env.step(actions)
terminateds = infos["_terminated"]
truncateds = infos["_truncated"]
return observations, rewards, terminateds, truncateds, infos
envs_eval = gym.make_vec(
"ScienceWorld-v0",
num_envs=10,
vectorization_mode="async",
vector_kwargs={"shared_memory": False},
wrappers=[SkipDone]
)
envs_eval = PostProcessSkipDone(envs_eval)
from gymnasium.
Related Issues (20)
- [Proposal] new continuous space `Continuous` HOT 1
- [Bug Report] Code examples in Vector API documentation use deprecated "gym.vector.make" instead of "gym.make_vec" HOT 3
- [Proposal] Add Optional State Variable in `initial` method of `FuncEnv` HOT 1
- [Question] Rough timeline for Atari (ALE) support for gymnasium>=1.0. HOT 3
- [Question] Collect multiple images from different cameras HOT 6
- [Proposal] Default metadata in BaseMujocoEnv HOT 1
- Setting up seed properly in Custom env HOT 1
- [Question] ValueError: ndarray is not C-contiguous HOT 3
- [Question] Is there any reason to add epsilon to action mean? HOT 4
- How to interact with the LunarLander-v2 with a keyboard?
- [Bug Report] Human vector rendering fails
- spaces.Dict: Make `sorted(spaces.items())` optional HOT 2
- [Bug Report] Wrong information about the beginning observation in InvertedDoublePendulum-v4 HOT 1
- ERROR: Cannot install gymnasium[atari]==0.26.1, gymnasium[atari]==0.26.2 and gymnasium[atari]==0.26.3 because these package versions have conflicting dependencies. HOT 2
- [Bug Report] Discrete space does not register as inheriting from gymnasium.spaces.Space when imported directly HOT 1
- [Question] What actions are expected for a previously terminated sub-env in a vector (next `env.step()` should return reset obs so the actions shouldn't matter)? HOT 2
- [Bug Report] 'HumanRendering' object has no attribute 'window' HOT 1
- [Bug Report] Can't install the Gymnasium with Atari dependencies. HOT 1
- [Bug Report] The env checker does not detect when the environment is not deterministic after the first reset. HOT 2
- [Proposal] Allow frame stack for stacks of size 1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gymnasium.