Code Monkey home page Code Monkey logo

Comments (14)

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on June 27, 2024 1

Hi, thanks for your comment.

In short, yes, I agree with your pain. However, I still believe this is not generally possible due to the current implementation of flatten with Discrete which cannot be solved without possibly breaking a large amount of users' code.
I don't know if SB3 supports custom spaces but I suggest making your own custom space for OneHot. An alternative solution is a hack where you create a new class that inherits from Box with a custom sample function that is unflattenable.

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on June 27, 2024

Hey, I think this is a good idea.
Would you be interested in adding the argument to MultiBinary?

from gymnasium.

fracapuano avatar fracapuano commented on June 27, 2024

Hey @pseudo-rnd-thoughts , absolutely. How do we proceed from here?

from gymnasium.

RedTachyon avatar RedTachyon commented on June 27, 2024

I'm not sure about this tbh. If we introduce this, we'd also have to adjust the contains method.

The code in both sample and contains would probably have a structure like

def sample(self):
    if self.onehot:
        ...
    else:
        ...

at which point the functionality should probably just be split into two classes.

So then we can consider whether we want to create a new OneHot space. But unless I'm missing something, this conveys the same exact information as a Discrete (or MultiDiscrete if we want a different shape), just represented differently, so it feels redundant.

from gymnasium.

fracapuano avatar fracapuano commented on June 27, 2024

I am not sure I completely agree that this would convey the same information of MultiDiscrete...
MultiDiscrete uses integer encoding to represent different classes whereas this directly point to one-hot encoding, which just feel way more natural when dealing with lots of problems.

I agree you can always go from one to the other, it is just a matter of how common one practice is with respect to others.

The contains method can be adjusted with ease. As a matter of fact, I have just implemented my own custom space starting from MultiBinary and modifying it. I could post is here maybe, you'll see is nothing too complex.
I agree with the self.onehot switch. I was thinking I could implement the sample and contains methods with:

def sample(self): 
    def sample_onehot(self): 
         # code for sampling with onehot
         return sample
     def sample_simple(self): 
         # old code
         return sample
     if self.onehot: 
         return sample_onehot()
     else: 
         return sample_simple()

The same structure would be used for contains. Maybe I could post here my OneHotSpace once I am done testing it out.

from gymnasium.

RedTachyon avatar RedTachyon commented on June 27, 2024

I am not sure I completely agree that this would convey the same information of MultiDiscrete

I agree you can always go from one to the other, it is just a matter of how common one practice is with respect to others.

These two statements are in contradiction.

To be clear with what I mean, you have a direct correspondence between Discrete(n) and Onehot(n). 3 is the exact same as [0, 0, 0, 1], it's just differently presented. If we add a OneHot space, we do not enable using any new spaces because anything you'd like to express as a OneHot, you can already express as a Discrete.


I was thinking I could implement the sample and contains methods with:

To clarify my other point about the if X: ... else: ... structure, the point was that it's generally just bad practice. Here's one SO thread, you can find a bunch more sources explaining it. The code with two functions defined inside of sample is even more hacky, and we're trying to stay pretty strict on good code quality unless there are really good reasons to compromise on it.


Overall my argument against adding this goes roughly as follows:

  1. If we were to implement this as part of MultiBinary, we'd introduce a code smell via the boolean argument
  2. We don't want to introduce bad new code
  3. If we were to add this feature in some way, a far preferable approach would be just splitting it in two separate classes, i.e. MultiBinary stays as it is, and OneHot is created
  4. If OneHot exists, it exactly duplicates Discrete or MultiDiscrete, so adding it is redundant.

I might be open to a different approach for adding this functionality, but I don't know how important it even is for Gymnasium to handle. Converting between integers and one-hot arrays is fairly trivial. If you want to use one-hot in your code, you can call space.contains(onehot.argmax()) and it's done. Going the other way could be something like np.eye(space.n)[space.sample()]

It miiiiiiiight make sense to have back-and-forth conversion as a method on Discrete, so you'd have e.g.

space = Discrete(5)
space.to_onehot(space.sample())  # [0, 1, 0, 0, 0]
space.from_onehot([0, 1, 0, 0, 0])  # 2

but I'm not sure how useful it really is since DL/RL libraries usually have their own utilities to do that (and it's a one-liner anyways, unless we want to optimize for high dimensionality)

from gymnasium.

fracapuano avatar fracapuano commented on June 27, 2024

Thanks for the very precise answer. I clearly am a novice to these topics, hence my appreciation for such a detailed answer.

I agree on most of the things you said. I am still convinced that, for good design purposes in custom environments, having a OneHot space already available would make things easier but this only is a personal opinion. By the way, I already implemented such space (mainly by modifying here-and-there MultiBinary), maybe you can have a look if you thing that might interesting.

Ultimately, I am convinced that having a OneHot already available would make custom-env development easier, but I am more than happy to accept your feedback on this

from gymnasium.

fracapuano avatar fracapuano commented on June 27, 2024

from gymnasium.

fracapuano avatar fracapuano commented on June 27, 2024

Very similar to MultiBinary ;)

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on June 27, 2024

This conversion reminds me of #102 which pointed out that flatten(Discrete) -> Box means that samples of Box are not one-hotted therefore, unflatten(Box.sample) are not contained in the original Discrete space.
Therefore this might introduce more problems if people expect that flatten_space(Discrete) -> OneHot

To me, this is part of the legacy / technical debt that Gym / Gymnasium has and is difficult to change / fix due to its size and aim of backward compatibility.

from gymnasium.

fracapuano avatar fracapuano commented on June 27, 2024

This conversion reminds me of #102 which pointed out that flatten(Discrete) -> Box means that samples of Box are not one-hotted therefore, unflatten(Box.sample) are not contained in the original Discrete space.
Therefore this might introduce more problems if people expect that flatten_space(Discrete) -> OneHot

I am not sure I completely understand this... Can you please point out which conversion are you referring to?

So why not adding a OneHot space for all those cases in which it just feel more natural than using Discrete/MultiDiscrete for custom environment design? I already have mine implemented and tested. Let me know if you would be interested in testing it out.

from gymnasium.

fracapuano avatar fracapuano commented on June 27, 2024

Hey there, any update?

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on June 27, 2024

Hey,

Read #102 for the whole discussion, in short, Gymnasium allows users to flatten a space which for Discrete is transformed to a Box. The problem is that samples generated by the Box cannot be reverted back to the Discrete values as they are not one-hotted. See the example code in #102. The problem is that we can't solve this and just have to put a warning that flattened samples, i.e., Box.sample(), are not guaranteed to exist within the original space, i.e., Discrete.

One of the proposals in #102 was to introduce a OneHot as replacement to Box in this case but we found there are critical reasons we can't do this.
In addition to the code smell issues pointed out by @RedTachyon, I suspect that introducing the OneHot will cause confusion with the issue related above and given that Discrete is functionally identical to OneHot for users if they want this feature then I don't think we will add a OneHot space. Finally, adding a space into Gymnasium is not as simple as one thinks given we have many spaces.utils and vector.utils functions that need to be added / updated.

Thank you for opening the issue, I am going to close the issue, we can continue the discussion until there is a particularly strong argument that we haven't considered yet then we can reopen

from gymnasium.

zig1000 avatar zig1000 commented on June 27, 2024

Hi, I have a fairly simple motivating example and ran across this and #102.

I'm trying to write a new gym env (hopefully written The Right Way from the start), and I'd like to run stable-baselines3 on it.
The env has an action space of type Dict(Discrete, ...).

stable-baseline3 does not support Dict action spaces, so as far as I can tell the Right way to handle this is to wrap the env in a FlattenAction wrapper. However, doing so results in errors as stable-baselines I assume calls sample() on its end, and generates invalid one-hots of e.g. all-zeros.

The advice in #102 to sample before flattening does not seem to apply here as the sampling isn't on my end.

The only fix I can think of would be to do some truly horrible one-hotifying in the FlattenAction wrapper before the invalid values can be passed to unflatten. np.argmax also doesn't seem like it would suffice alone as it front-biases the results in the case of ties. Similarly if I changed my space to use MultiBinary my env-internal code would be a mess of calls to an unbiased-argmax.

But overall the fundamental problem seems to me to be that there does not exist a space that is both 1. flat 2. sample() behaves like a one-hot. Discrete violates 1, MultiBinary violates 2. Without such a space, there doesn't appear to be a 'clean' env setup for both random sampling and actually feeding values to/from a NN.

To me, this would make the addition of the Discrete-lookalike OneHot space worth it. It would also save users from googling for half an hour trying to figure out whether they should go for MultiBinary/Box or Discrete when they know they want a one hot (as I did). Due to this I think in some cases it may actually reduce confusion in spite of the similarity to Discrete.

from gymnasium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.