Proposal At the moment, both SyncVectorE

[Proposal] Remove autoreset logic from VectorEnv and rely on AutoResetWrapper instead about gymnasium HOT 12 CLOSED

MarcCote commented on June 19, 2024

[Proposal] Remove autoreset logic from VectorEnv and rely on AutoResetWrapper instead

from gymnasium.

Comments (12)

pseudo-rnd-thoughts commented on June 19, 2024 1

I think using the last observation is fine in this situation then maybe raising an error if the user tries to step when all environment episodes have ended. This should avoid issues where the user step infinitely when all observations are equivalent

from gymnasium.

RedTachyon commented on June 19, 2024 1

I don't think I like this, VectorEnvs are meant to have autoreset implemented inside by default -- this is part of the specification of a valid VectorEnv, it'd be a dangerous situation if a "raw" VectorEnv that we supply as a built-in doesn't fulfill it.

This means I cannot leverage VectorEnv to perform parallel evaluations.

I don't see how this is true? You can do evaluation the normal way, and discard any steps from unfinished episodes (which you know based on the truncated and terminated values). Removing the autoreset would essentially lead to undefined behavior (taking actions in a terminated environment), which is not a good thing to have.

from gymnasium.

RedTachyon commented on June 19, 2024 1

The code returns last_state which contains all the information returned by the last step call

Good point, I misread it. But note that it's very likely that learning code would interpret that as a very long chain of one-step episodes, which is not great.

How's the autoreset handled in the vectorised cartpole?

Gymnasium/gymnasium/envs/classic_control/cartpole.py

Lines 430 to 435 in 5974003

    
           if any(done): 
        
               # This code was generated by copilot, need to check if it works 
        
               self.state[:, done] = self.np_random.uniform( 
        
                   low=self.low, high=self.high, size=(4, done.sum()) 
        
               ).astype(np.float32) 
        
               self.steps[done] = 0

The way I see, I'm not making any assumption about how the Envs should deal with action sent after termination signal was returned.

You're assuming that it makes sense to call step after the environment terminates, before resetting it again. So notably it shouldn't raise an exception, which would be a pretty reasonable thing to do. If we remove the autoreset, then you can have perfectly reasonable code (env that raises an exception) and on top of it you use a perfectly valid vectorization (AsyncVectorEnv or something), and now everything is crashing. Yes, you're providing a wrapper that does it for Sync/Async vector envs, but we shouldn't require people to use that, or to implement that specific functionality.

Also correct me if I'm wrong, but this IgnoreDoneEnv wrapper actually does exactly what you want, right? So for Sync/Async vector envs you can just do that, and other vectorization methods anyways can't be expected to provide this functionality. So why should this be included in the core API, if a wrapper works just fine for your application?

Isn't the purpose of having a AutoResetWrapper to have control on when I want this functionality

This is the case for individual environments where it's much less critical -- you can just do if terminated or truncated: env.reset() and you're golden without autoreset.

With vectorized environments, you basically can't train without autoreset, since you can only reset all the environments the same time, and not just a proper subset of them. If you reset after one env terminates, you lose the tails from all the other envs. If you reset after all envs terminate, then you lost a whole bunch of time computing actions for environments which have been terminated anyways. That's why I'm still convinced that autoreset should be considered a core feature of the vector API, and not something optional. And for the rare edge cases where you want to customize it - just use a wrapper.

from gymnasium.

RedTachyon commented on June 19, 2024 1

I'll try that.

There's a good chance you can actually just do

gym.make_vec(..., vectorization_mode="async", wrappers=[IgnoreDoneEnv])

and it will just work the way you want it to work (after changing the wrapper as you described). In fact I'd definitely hope it does work, that's how it's designed.

from gymnasium.

pseudo-rnd-thoughts commented on June 19, 2024

I think the application makes sense and we should support it but can we have this as an option that is disabled by default (autoreset is used) due to backward compatibility.

from gymnasium.

MarcCote commented on June 19, 2024

It can easily be made an option, but what is the expected behavior when it is set to false? Is copying the last state over until a manually reset, the best way (i.e., would users expect such behavior)?

from gymnasium.

MarcCote commented on June 19, 2024

Good idea. I'll make a PR for this proposal.

Also, I noticed when registering an environment, there is an autoreset option. If you have an autoresetting environment and vectorize it (Sync or Async), the autoreset logic is applied twice, i.e. one from the AutoResetWrapper, and another one in the VectorEnv.step function.

from gymnasium.

MarcCote commented on June 19, 2024

Right, I shouldn't have said that I can't, I meant to say it is more convenient to not have to track the step history, do filtering based on termination, and look into the info dict for 'final_observation' and 'final_info'. Simply looking at the last output of envs.step() is cleaner in terms of number of lines in the user code.

Removing the autoreset would essentially lead to undefined behavior (taking actions in a terminated environment), which is not a good thing to have.

Not if taking an action in a terminated environment is a no-op (as done in the code included in the original post).

it'd be a dangerous situation if a "raw" VectorEnv that we supply as a built-in doesn't fulfill it.

What do you mean by dangerous?

from gymnasium.

RedTachyon commented on June 19, 2024

Not if taking an action in a terminated environment is a no-op

That's not true for all envs, and enforcing it would be a huge (and imo bad) change. Also notice that your code only returns the observation, ignoring the rest of the step return. That's probably a simple omission, but it avoids a few potentially important design choices - should we keep returning the same value of terminated? What about truncated? Or maybe they should be False for all these "fake" steps?

What do you mean by dangerous?

By far, the main purpose of gym.Env and now also gym.VectorEnv is providing a shared API. It doesn't matter what environment I use, if I have any VectorEnv, I should be able to interact with it the exact same way as any other environment (bar env-specific additions) and get valid results.

My idea is to have the autoreset behavior be included as part of the VectorEnv specification, i.e. for your custom VectorEnv to be valid, it must handle autoreset on its own. If we ship an implementation of VectorEnv which does not fulfill this requirement, we're basically breaking our own contract, and negating the whole point of having the shared API.

FWIW this is also why I'm uncertain what to do with the list of seeds and list of options, and I'm delaying the decision for now until I can really properly analyze it. This is a very important decision that we can't just take casually, even though it might seem trivial in terms of the code.

The benefit of this proposal, as far as I can tell, is saving a few CPU cycles during evaluation in some specific environments in specific situations (notice that if you're running 8 parallel envs and want to collect a total of 10 evaluation episodes, this will in fact slow you down), which I don't think is impactful enough for this to be worth it.

from gymnasium.

MarcCote commented on June 19, 2024

That's not true for all envs, and enforcing it would be a huge (and imo bad) change.

The way I see, I'm not making any assumption about how the Envs should deal with action sent after termination signal was returned. The no-op is enforced at higher-level by skipping the call to self.env.step in the first place.

Also notice that your code only returns the observation, ignoring the rest of the step return. That's probably a simple omission, but it avoids a few potentially important design choices - should we keep returning the same value of terminated? What about truncated? Or maybe they should be False for all these "fake" steps?

The code returns last_state which contains all the information returned by the last step call (i.e., before environment ended). I don't see why terminated/truncated should change value as no action was sent to the sub env. That said I can see some issue where TimeLimit would set truncated=True depending on the wrappers order.

By far, the main purpose of gym.Env and now also gym.VectorEnv is providing a shared API. It doesn't matter what environment I use, if I have any VectorEnv, I should be able to interact with it the exact same way as any other environment (bar env-specific additions) and get valid results.

How's the autoreset handled in the vectorised cartpole?

My idea is to have the autoreset behavior be included as part of the VectorEnv specification, i.e. for your custom VectorEnv to be valid, it must handle autoreset on its own. If we ship an implementation of VectorEnv which does not fulfill this requirement, we're basically breaking our own contract, and negating the whole point of having the shared API.

Isn't the purpose of having a AutoResetWrapper to have control on when I want this functionality in the first place? Right now, if I register a new environment with make(..., autoreset=True) then wants to vectorize it, I get the unexpected behavior of having two autoresets and the info dict looks like this info['final_info'][0]['final_info']['some_key'].

FWIW this is also why I'm uncertain what to do with the list of seeds and list of options, and I'm delaying the decision for now until I can really properly analyze it. This is a very important decision that we can't just take casually, even though it might seem trivial in terms of the code.

Take all the time you need. I still believe it is a useful feature to have as a user.

The benefit of this proposal, as far as I can tell, is saving a few CPU cycles during evaluation in some specific environments in specific situations (notice that if you're running 8 parallel envs and want to collect a total of 10 evaluation episodes, this will in fact slow you down), which I don't think is impactful enough for this to be worth it.

Compute is not the issue I was looking to address, I want cleaner user code. Also, some environments might use GPU rendering, so it's not only CPU cycles. Also, resets can be expensive, e.g. launching a game.

from gymnasium.

MarcCote commented on June 19, 2024

But note that it's very likely that learning code would interpret that as a very long chain of one-step episodes, which is not great. You're assuming that it makes sense to call step after the environment terminates, before resetting it again. So notably it shouldn't raise an exception, which would be a pretty reasonable thing to do. If we remove the autoreset, then you can have perfectly reasonable code (env that raises an exception) and on top of it you use a perfectly valid vectorization (AsyncVectorEnv or something), and now everything is crashing. Yes, you're providing a wrapper that does it for Sync/Async vector envs, but we shouldn't require people to use that, or to implement that specific functionality.

Very good points. I now understand your hesitation.

Also correct me if I'm wrong, but this IgnoreDoneEnv wrapper actually does exactly what you want, right? So for Sync/Async vector envs you can just do that, and other vectorization methods anyways can't be expected to provide this functionality. So why should this be included in the core API, if a wrapper works just fine for your application?

I thought originally that I needed a way to disable the autoreset that is happening in the step function. Since any wrapper provided during gym.make_vec is going to be executed on the first line of Async/Sync.step() (ie. before the autoreset code), I could catch the terminated/truncated and changed it back to False but carry over the last state while adding some flag to info. I'll try that.

from gymnasium.

MarcCote commented on June 19, 2024

I was able to bypass the autoreset and make it works as I wanted with an inner Wrapper to carry over the state and an outer VectorWrapper to set terminateds and truncateds correctly. While autoreset=False would be useful, implementing the correct and expected behavior is indeed depending on the use case, thus I agree it doesn't make sense to be included in the core API. Thanks for taking the time to look into this.

Here's my wrappers code for reference.

class SkipDone(gym.Wrapper):

    def reset(self, **kwargs):
        observation, info = self.env.reset(**kwargs)
        self._last_state = None
        self._is_done = False
        return observation, info

    def step(self, action):
        if not self._is_done:
            observation, reward, terminated, truncated, info = self.env.step(action)
            self._is_done = terminated or truncated
            info["_terminated"] = terminated
            info["_truncated"] = truncated

            terminated = truncated = False  # To avoid being autoreset.
            self._last_state = (observation, reward, terminated, truncated, info)

        return self._last_state


class PostProcessSkipDone(gym.vector.VectorWrapper):
    
    def step(self, actions):
        observations, rewards, terminateds, truncateds, infos = self.env.step(actions)
        terminateds = infos["_terminated"]
        truncateds = infos["_truncated"]
        return observations, rewards, terminateds, truncateds, infos


envs_eval = gym.make_vec(
    "ScienceWorld-v0",
    num_envs=10,
    vectorization_mode="async",
    vector_kwargs={"shared_memory": False},
    wrappers=[SkipDone]
)
envs_eval = PostProcessSkipDone(envs_eval)

from gymnasium.

[Proposal] Remove autoreset logic from VectorEnv and rely on AutoResetWrapper instead about gymnasium HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if any(done):
	# This code was generated by copilot, need to check if it works
	self.state[:, done] = self.np_random.uniform(
	low=self.low, high=self.high, size=(4, done.sum())
	).astype(np.float32)
	self.steps[done] = 0