proroklab / vectorizedmultiagentsimulator Goto Github PK

VMAS is a vectorized differentiable simulator designed for efficient Multi-Agent Reinforcement Learning benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface.

Home Page: https://vmas.readthedocs.io

License: GNU General Public License v3.0

Python 98.75% TeX 1.25%

gym gym-environment marl multi-agent multi-agent-learning multi-agent-reinforcement-learning multi-agent-simulation multi-agent-systems multi-robot multi-robot-framework multi-robot-sim multi-robot-simulator multi-robot-systems pytorch rllib simulator vectorization vectorized robotics simulation

vectorizedmultiagentsimulator's People

Contributors

Stargazers

Watchers

Forkers

mengxingshifen1218 timefly-1989 keep9oing redaelmar qingbiaoli razajhandir maxhuettenrauch oudeng efosong georgeoctavian zlh23 ayachi3 eltociear dsep gaosz0755 swanindaghosh rohit-mon2 acciorocketships gy2256 kennyorellana luo-li-ba-suo ttt-noora edantoledo floehring zartris simrit1 prajwalthakur qcz01 vagechirkov kylem73 alexandrashaw10 jimyzzp gt-star-lab jianye-xu mirateoli markhaoxiang eccstartup yunlongguo2000 yqu2020 elebaid lukasschaefer robj0nes mfeldman143 samuelmallick nanze-chen kfu02 firewolves guowei-zou cs-7643

vectorizedmultiagentsimulator's Issues

Error when running rllib example

Hi, I would like to run the rllib example on VMAS. Currently the use_vmas_env.py example works great. However I get this error when running the rllib example:

  File "/Users/kevinfu/opt/miniconda3/envs/vmas/lib/python3.8/site-packages/ray/air/integrations/wandb.py", line 367, in __reduce__
    raise RuntimeError("_WandbLoggingProcess is not pickleable.")
RuntimeError: _WandbLoggingProcess is not pickleable.

Any help regarding what is wrong would be appreciated. I am on Python 3.8, ray/rllib 2.2.0, and wandb 0.15.11. Thank you!

Dispersion environment possible bug in visualization/render

I am running a simple dispersion example, based on the sample notebook provided.

When I try to render to the agents, I only see 1 agent and n=4 landmarks.

From my understanding there should be n agents and n landmarks (n=4 in this case). len(env.agents)==4 so I think the env is correct, it might just be a rendering issue?

Save the video from the RLlib example locally

Hi there! I'm currently trying to save a video from the RLlib example to my local storage, but I'm encountering an error. Would you have any idea what might be going wrong? Any help would be greatly appreciated!

    def on_episode_end(
            self,
            *,
            worker: RolloutWorker,
            base_env: BaseEnv,
            policies: Dict[PolicyID, Policy],
            episode: Episode,
            **kwargs,
    ) -> None:
        # vid = np.transpose(self.frames, (0, 3, 1, 2))
        # episode.media["rendering"] = wandb.Video(
        #     vid, fps=1 / base_env.vector_env.env.world.dt, format="mp4"
        # )

        save_video(self.name, self.frames, 1 / base_env.vector_env.env.world.dt)

        self.frames = []

Font file not in package

Hi again

You have a reference in rendering.py to a font file that is not included.
pyglet.font.add_file(os.path.join(os.path.dirname(__file__), "secrcode.ttf"))
So we cannot test the transport.py scenario.

Request: continuous communication examples

Hi
First of all, this is not a bug in your code. I am looking into your simulator again to determine if this is what I should use for my next research project.

I was looking into the communication aspect of this simulator. I do see some examples of discrete communication (in the MPE folder), and I know how to expand this into only local communication. However, I do not see any continuous communication examples.
How would you do this with the current architecture?

"large" box and lines are heavy for collision computations

Hi again

I want to enclose my environment so the small devil agents are not running away. However, the performance is halved just by adding four walls around the training area.

    def collides(self, a: Entity, b: Entity) -> bool:
        if (not a.collides(b)) or (not b.collides(a)) or a is b:
            return False


        a_shape = a.shape
        b_shape = b.shape
        if (
                torch.linalg.vector_norm(a.state.pos - b.state.pos, dim=1)
                > a.shape.circumscribed_radius() + b.shape.circumscribed_radius()
        ).all():
            return False
        # Here we assume that if both entities are not movable and not rotatable, they cannot collide (user error)
        if not a.movable and not a.rotatable and not b.movable and not b.rotatable:
            return False
        if {a_shape.__class__, b_shape.__class__} in self._collidable_pairs:
            return True
        return False

The "problem" comes from the circumscribed_radius as for these "long" objects this value is very big and almost all agents are seen as colliding with the object.
So we are trying to compute the force with _get_collision_force between the large object and almost all other collidable objects.

Now the question is, can we do this smarter? There is a lot of collision detection method, but almost all of them requires another type of data structure and is hard to implement using the current architecture, and since this is Python, these will also not scale very well.

GPU resource distribution

Hello I'm working with your framework for my Master's Dissertation, nice work!

But I don't understand how the distribution of GPU-resources happens.

RLLIB_NUM_GPUS = int(os.environ.get("RLLIB_NUM_GPUS", "0"))
    num_gpus = 0.001 if RLLIB_NUM_GPUS > 0 else 0  # Driver GPU
    num_gpus_per_worker = (
        (RLLIB_NUM_GPUS - num_gpus) / (num_workers + 1) if vmas_device == "cuda" else 0
    )

For example if I have 2 GPU's and 1 worker the Tune config will look like this:

config = {
...
    "num_gpus": 0.001,
    "num_gpus_per_worker": 0.9995 ,
...
}

And the Tune System info shows:
Resources requested: 1.0005/2 GPUs

Thank you for your time

[Feature] Allow any number of actions

Currently there is a limitation on the action input to be of size this function:

    def get_agent_action_size(self, agent: Agent):
        return (
            self.world.dim_p
            + (1 if agent.action.u_rot_range != 0 else 0)
            + (self.world.dim_c if not agent.silent else 0)
            if self.continuous_actions
            else 1
            + (1 if agent.action.u_rot_range != 0 else 0)
            + (1 if not agent.silent else 0)
        )

As this is what the simulator is expecting as an action.

However the process_action is a chance of shaping your networks output into the simulators input format.
My network could output positions and I could use process_action with a position controller to transform it into forces.
But as of right now we are checking and setting actions before this processing step.

Process_action:

VectorizedMultiAgentSimulator/vmas/simulator/environment/environment.py

Line 223 in fb2a8f7

    
           # Scenarios can define a custom action processor. This step takes care also of scripted agents automatically

_set_action:

VectorizedMultiAgentSimulator/vmas/simulator/environment/environment.py

Line 220 in fb2a8f7

# set action for each agent

The process_action expects an agent with the action set as u and u_rot, however we are checking u and u_rot before giving it to process_action. So an option is to do:

move the process_action above the _set_action function.
make process_action take in an agent and the action and have the it return the processed action
run the _set_action for sanity check

If you are not sure what and why this, I can make a PR.
Just tell me if you think it is something you interested in or if it is out of scope of the simulator.

Struggling to Solve MPE Environments that involve communication using torchrl

I first want to say thank you so much for developing this vectorized simulator, and further including the MPE environments. This saves me a lot of work that would have gone into vectorizing my MPE adaptations! This is not a bug in this codebase, more of a question.

I was just playing around with the tutorial that uses torchrl PPO to solve the navigation environment, and was attempting to follow it to solve simple_reference, but was running into an issue where the episode_reward_mean has values with extremely high magnitudes (from around -60 all the way to -180 at times). I was investigating the values calculated for rewards, and these all seemed to be as expected, but the objective and critic loss seem to be much larger than in other torchrl examples given.

I was able to solve the non-vectorized version of the MPE using StableBaselines3 (SB3), but given VMAS doesn't natively support SB3 I wanted to try out torchrl. I was wondering if there is a pointer towards issues that may exist in merely using the given tutorial to solve any of the MPEs that involve communication? (I didn't think there was too much that was task specific -- I changed parameter sharing and critic centralization as necessary) And if so what may be the best way to go about this?

[Feature] Allow customized, fixed viewer bound in interactive mode

Current, VMAS seemly does not support a customized, fixed viewer bound in interactive mode. This feature would be useful if the origin $(0,0)$ of a user's map is not centered. In my case, the origin of my map is located at the lower left corner, meaning that the $x$-range of my map is [0, world_x_dim], and the $y$-range of my map is [0, world_y_dim], instead of [-world_x_dim/2, world_x_dim/2] and [-world_y_dim/2, world_y_dim/2]. In addition, I do not want to shift my map since I am using an HD map with dense data.

My current workaround is to add an attribute viewer_bound in my Scenario class's make_world function:

self.viewer_bound = torch.tensor(
    [0, world_x_dim, 0, world_y_dim],
    device=device,
    dtype=torch.float32,
)

and then modify the source code of file /vmas/simulator/environment/environment.py around this line from

self.viewer.set_bounds(
    -cam_range[X],
    cam_range[X],
    -cam_range[Y],
    cam_range[Y],
)

if hasattr(self.scenario, "viewer_bound"):
    self.viewer.set_bounds(
        self.scenario.viewer_bound[0], # left boundary
        self.scenario.viewer_bound[1], # right boundary
        self.scenario.viewer_bound[2], # bottom boundary
        self.scenario.viewer_bound[3], # top boundary
    )
else:
    self.viewer.set_bounds(
        -cam_range[X],
        cam_range[X],
        -cam_range[Y],
        cam_range[Y],
    )

This way, if self.scenario contains the user-given attibute viewer_bound, the viewer bound will be set accordingly; otherwise, the funtionality is untouched.

I would appreciate it if anyone could point out if there is an easier way to achieve this. If not, I think it would be nice if the developer would add such a feature to VMAS.

[Feature] Customize step function

As can be seen from the scenario.py file, we can customize the following functions:

make_world
reset_world_at
observation
reward
done
info
extra_render
process_action

Is it possible to define a customized step function?
The reason is that my agents have dynamics that are not available in the library. The actions are not physical variables (neither force nor torque) but discrete integers, which abstract the physical inputs. Therefore, I need to customize the step function, which tells the environment how to update agent states.

If customizing the step function is not currently allowed, is there a workaround for my usage?

I am looking forward to your reply. Thanks!

Memory usage growing issue

When I try out the scenario file or run the use_vmas_env script, I find my computer freeze after some time. Then I track the memory usage of the script, found that the memory usage keeps rising, for example, when I run python3 flocking.py, the memory usage increases 0.1% every second, is this expected behavior or something wrong with my setup?

Thank you for your work!

RESOLVED: DifferentialDrive debug scenario does not launch

When running the DifferentialDrive debug scenario, I'm getting an error and nothing launches. The debug/waterfall.py scenario works great, however. Am I missing something?

❯ python vmas/scenarios/debug/diff_drive.py
Traceback (most recent call last):
  File "/home/kfu38/star-lab/het-mamp/VectorizedMultiAgentSimulator/vmas/scenarios/debug/diff_drive.py", line 110, in <module>
    render_interactively(__file__, control_two_agents=True)
  File "/home/kfu38/star-lab/het-mamp/VectorizedMultiAgentSimulator/vmas/interactive_rendering.py", line 316, in render_interactively
    InteractiveEnv(
  File "/home/kfu38/star-lab/het-mamp/VectorizedMultiAgentSimulator/vmas/interactive_rendering.py", line 76, in __init__
    self.env.render()
  File "/home/kfu38/star-lab/het-mamp/VectorizedMultiAgentSimulator/vmas/simulator/environment/gym.py", line 79, in render
    return self._env.render(
           ^^^^^^^^^^^^^^^^^
  File "/home/kfu38/star-lab/het-mamp/VectorizedMultiAgentSimulator/vmas/simulator/environment/environment.py", line 570, in render
    all_poses = torch.stack(
                ^^^^^^^^^^^^
RuntimeError: stack expects a non-empty TensorList

[Issue] Unmatched action size for kinematic bicycle model and differential drive robots

Issue Description

PR #68 introduces the kinematic bicycle model, which has two inputs: velocity (or acceleration) and steering (or rotation). However, for a 2D world, assuming there is no communication action, the function get_agent_action_size() (see below) defined in environment.py will return 3 (2 for position and 1 for rotation) instead of 2 as the action size, which does not meet the truth, since 2 actions already suffice to control an agent with the dynamics of the kinematic bicycle model. This can be verified if we look closely into the kinematic bicycle model (this line, where only the first entry is used). I think this issue applies also to differential drive robots, which can be controlled through 2 actions but 3 will be returned as the action size.

def get_agent_action_size(self, agent: Agent):
    return (
        self.world.dim_p
        + (1 if agent.action.u_rot_range != 0 else 0)
        + (self.world.dim_c if not agent.silent else 0)
        if self.continuous_actions
        else 1
        + (1 if agent.action.u_rot_range != 0 else 0)
        + (1 if not agent.silent else 0)
    )

My Case

I am trying to use MAPPO with my own customized scenario where all agents feature the dynamics of the kinematic bicycle model. However, if I run this print:

>>> print("action_spec:", env.full_action_spec)
action_spec: CompositeSpec(
    agents: CompositeSpec(
        action: BoundedTensorSpec(
            shape=torch.Size([60, 6, 3]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([60, 6, 3]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([60, 6, 3]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=cpu, shape=torch.Size([60, 6])), device=cpu, shape=torch.Size([60]))

, where shape=torch.Size([60, 6, 3]) indicates that there are 60 vectorized environments, 6 agents, and each agent has 3 actions. This automatically leads to the policy network of each agent having an output size of 6:

>>> print("policy_net:", policy_net)
policy_net: Sequential(
  (0): MultiAgentMLP(
    (agent_networks): ModuleList(
      (0): MLP(
        (0): Linear(in_features=43, out_features=256, bias=True)
        (1): Tanh()
        (2): Linear(in_features=256, out_features=256, bias=True)
        (3): Tanh()
        (4): Linear(in_features=256, out_features=6, bias=True)
      )
    )
  )
  (1): NormalParamExtractor()
)

. Note that the reason for out_features=6 (instead of out_features=3) is that PPO uses a stochastic policy and thus the policy network outputs two values (mean and standard deviation of the normal distribution) for each action. However, two out of the 6 outputs are useless as they do not contribute to the movement of agents.

I would highly appreciate it if you would have some comments on this issue. It would be indeed helpful if you would have a workaround for me. If I could provide any further information, please let me know.

A pain to install on windows

Hi just want to add that this is hard to install on windows because of the ray[rllib].
They are still in beta on windows so you will most likely get the:
**ERROR: Could not find a version that satisfies the requirement ray (from versions: none) ERROR: No matching distribution found for ray**
and if you try to read their documentation, you are stuck with ray 3.0 which is not supported by this library.

Can you maybe extend the install section a bit?
So if you have a solution for this, add that to the section or write that you only support Linux / macOS.
Or maybe we can update the requirements for this to support Ray 3.0?
Or having a branch without the support of RLLIB?

Im just coming up with suggestions :)

Changing mass but no effect

I am trying to change mass in balance
https://github.com/proroklab/VectorizedMultiAgentSimulator/blob/main/vmas/scenarios/balance.py
Line 17
self.package_mass = kwargs.get("package_mass", 5)
I changed to it 5000 but there was no effect on the performance. I think changing package mass should change episodic rewards as in the Readme it's mentioned that gravity and masses are customizable.

Is this the correct place or way to change mass. Similar observation is in reverse transport with mass.

Resetting only the vectorized environments that are done?

Hi, sorry in advance if this isn't the right place to ask these kinds of questions.

I have been playing with VMAS in its vanilla form (no torchRL/RLLib) to try and understand how to implement my own Scenarios, and currently I am confused with how VMAS handles resetting the environment. The reset() function docstring states that it handles resetting "in a vectorized way". From my testing, it seems to me that it resets all vectorized environments.

I was hoping "in a vectorized way" meant that it only reset the environments which were done and left the others alone. I would like it to behave this way to collect episode reward from episodes that are allowed to run until termination, for instance. Does VMAS have this functionality built-in? Am I misunderstanding reset()?

Thank you for the great library, by the way!

Action options for agents

Hi all,

I read your paper and got very interested in the library. As far as I understood, the library only supports forces as action-space for the agents, right? I would also be interested in controlling the agent orientation and also have only a kinematic environment to speed up training even further. Since the last commit and the paper release, was the orientation control added and is there the possibility not to consider forces but only linear an angular velocities?

Thank you for the info.

Kind regards,

menichel

World actions or post step function

Hi,

I want to make an entity disappear when entering a box.
Not sure where to put this logic as it is a post-step action, but I'm not sure I should start extending the world as it handles mostly physics.

context: Lets say that when a package entity enters a box, a human is removing it from the system or a human is introducing a new package in the system (you could call it world-actions or environmental actions as it is not something any of our agents can control).
What I think could be the options:

Make an extension of the World class, adding this into the world.stop()
Extend the Environment class with a self.scenario.post_step() after the self.world.step() and let the scenario class handle the world action logic.
Having it as an action_script for the entity, but I want it to be processed after all actions are done.

Question is, where do you think this world-action logic should be applied?

Error when running on Apple Silicon with device = "mps"

Hi, I'm running the example use_vmas_env; it runs well on Apple M1 Max using device = "cpu"; however, I'm getting an Error when changing device = "mps"

I installed PyTorch for Apple Silicon following the documentation https://developer.apple.com/metal/pytorch/

Could you help me to figure out how to fix this problem?

Here is the console log

/Users/kenny/.conda/envs/pythonProject/bin/python /Users/kenny/Projects/Pycharm/pythonProject/main.py 
Step 1
/Users/kenny/Projects/Pycharm/pythonProject/VectorizedMultiAgentSimulator/vmas/simulator/core.py:1594: UserWarning: The operator 'aten::linalg_vector_norm' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:12.)
  torch.linalg.vector_norm(a.state.pos - b.state.pos, dim=1)
Traceback (most recent call last):
  File "/Users/kenny/Projects/Pycharm/pythonProject/main.py", line 93, in <module>
    use_vmas_env(render=True, save_render=False)
  File "/Users/kenny/Projects/Pycharm/pythonProject/main.py", line 73, in use_vmas_env
    env.render(
  File "/Users/kenny/Projects/Pycharm/pythonProject/VectorizedMultiAgentSimulator/vmas/simulator/environment/environment.py", line 491, in render
    self.viewer.set_bounds(
  File "/Users/kenny/Projects/Pycharm/pythonProject/VectorizedMultiAgentSimulator/vmas/simulator/rendering.py", line 131, in set_bounds
    self.bounds = np.array([left, right, bottom, top])
  File "/Users/kenny/.conda/envs/pythonProject/lib/python3.10/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Process finished with exit code 1

Compatibility with gym==0.21

Hello, thanks for the incredible library. It seems like if my gym version is 0.21, operations like env.action_space would return an empty tuple. Is there a way to rectify this while keeping the same version of gym? I know the library is supposed to work with 0.22 but I need to keep my gym == 0.21 for other libraries.

Request: Any plans on create documentation in the future?

I was reading through your code and examples, trying to figure out how things work.
I am having a hard time figuring out your units of variables and what they mean.
Example here is u_range, u_rot_range, max_speed. Are these upper limits for a single step, like acceleration or global max?
Are we using degrees or rads? m/s or pixel/step?

[Question] Creating environments with 100 agents takes an infinite amount of time

Hi, I am having troubles instantiating environments with up to 100 agents: I debugged the code and it just waits foreverer for the class constructor to finish, meanwhile my computer resources were not busy at all, having CPU at 10% and GPU at 3%; the only exception was the ram that reached 20GB very quickly and then stopped, even tough I had 12GB left.
It doesn't seem to be a lack of computational power, and neither of memory, do you have any idea?

Simulator running slower on GPU

HI! I am trying to use this simulator after reading the paper. However, when I was running VectorizedMultiAgentSimulator/vmas/examples/use_vmas_env.py, I got some weird results.

It took: 302.1871528625488s for 100 steps of 10000 parallel environments on device cpu for RLLIB wrapped simulator
It took: 933.8081917762756s for 100 steps of 10000 parallel environments on device cuda for RLLIB wrapped simulator

It seems that the simulator running slower on GPU. Is this result reasonable?
I was running VMAS on my laptop with RTX3060.

I strongly recommend that you should check your code!!!!!

The same algorithm in the same environment, discrete action and continuous action final difference is very big. I strongly recommend that you have a good check.

Issues with adding new scenario

I am trying to develop my own new scenario (discretized version of navigation), however when I initialize it, I get the following error:

File "E:\Anaconda3\envs\omas_torch_rl\lib\site-packages\torchrl\envs\utils.py", line 718, in check_marl_grouping
    raise ValueError("No agents passed")
ValueError: No agents passed

This is not making sense as I have passed a non-zero value for n_agents during the initialization call

env = VmasEnv(
    scenario=scenario_name,
    num_envs=num_vmas_envs,
    continuous_actions=False,  # VMAS supports both continuous and discrete actions
    max_steps=max_steps,
    device=vmas_device,
    # Scenario kwargs
    **n_agents=3**,  
)

Could you help me find the issue here?
I added my scenario file in the "scenarios" folder and have added name of my scenario in the list of available scenarios (vmas/init.py)

Differentiability of rewards

While testing some stuff, I was wondering whether there is a straightforward way to obtain differentiable rewards, since everything is written in pytorch. Currently, the rewards is not if I'm not mistaken.

Upgrade to Gymnasium

Hey- would you guys be willing to upgrade from Gym to Gymnasium? OpenAI Gym hasn't been maintained and all future development work has been moved over to Gymnasium for over a year now

RL algorithms and their inputs

Hi all,

I was wondering if the PPO-based MARL algorithms you use in the paper are taken from RLlib or whether they are already available in the library without the need of an RLlib interface.

I also have a question regarding the inputs of the NN. Do you use CNN like in the Atari games? In the paper you mention known information from the neighborhood; how is the shape and size of the neighborhood customizable and is the known info wrt to the agent or to a global frame? I imagine that it is customizable but I would like to know how it is implemented now to understand better what I am seeing.

Thanks and kind regards,

menichel

Differential drive robot (ignoring directional contraints during physics substeps)

I am trying to implement a differential drive robot.
The velocity controller gives an excellent example of how to change the action output of a network into the forces that the simulator is expecting.
The "problem" I'm running into is the _integrate_state function in the core.py.
The code expects the orientation of the entity and the forces (velocity of the robot) to be independent. However, when applied to a differential-driven agent, rotating means we are also rotating our velocity vector.
Long story short, the simulation is not working well when you have a directional constraint on your movement, like a differential drive (as it ignores the constraints during substeps and is only applied before every step).

It can be hacked quickly enough for my project, so there is no need for an immediate solution. Still, as you explicitly mention the differential drive dynamic model on the front page of the repo, it might be something you should be aware of.

Sim to real

Hello, Thanks for sharing the code!

May I know if it supports deploying the multi-agent policy on real robot, such as turtle robot to conduct multi-robot collaboration. thanks!

[Issue] Inconsistent Agent Names

There seems to be inconsistencies in the naming convention of agents. Certain scenarios name agents as "Agent {n}" and others name as "Agent_{n}". Its not a huge issue but when writing a wrapper for other frameworks this can cause trouble.

Loading custom file not working on windows.

The vmas->scenarios->init,py load function is looking for "/" which is only supported on Linux as "\" is used on windows.
This makes it impossible to load your own custom environment on windows.

make the package functional with custom environments without changing package code.

Love your work so far.

But I have a... let's call it a feature request.
Right now, I have to go into your code, add my custom environment to your scenarios folder, and make it visible in the init,py file.
Just so you can load the environment for me.
make_env.py

    if not scenario_name.endswith(".py"):
        scenario_name += ".py"
    scenario = scenarios.load(scenario_name).Scenario()

I would suggest you to make a function called make_custom_env(), which takes a Scenario as an input parameter.
This would enable me to have my own custom environments located inside my own repositories instead of having a link to some old fork of your library.

def make_env(
    scenario_name,
    num_envs: int = 32,
    device: DEVICE_TYPING = "cpu",
    continuous_actions: bool = True,
    wrapper: Optional[
        Wrapper
    ] = None,  # One of: None, vmas.Wrapper.RLLIB, and vmas.Wrapper.GYM
    max_steps: Optional[int] = None,
    seed: Optional[int] = None,
    **kwargs,
):
    # load scenario from script
    if not scenario_name.endswith(".py"):
        scenario_name += ".py"
    scenario = scenarios.load(scenario_name).Scenario()
    env = Environment(
        scenario,
        num_envs=num_envs,
        device=device,
        continuous_actions=continuous_actions,
        max_steps=max_steps,
        seed=seed,
        **kwargs,
    )

    return wrapper.get_env(env) if wrapper is not None else env


def make_custom_env(
    scenario: BaseScenario,
    num_envs: int = 32,
    device: DEVICE_TYPING = "cpu",
    continuous_actions: bool = True,
    wrapper: Optional[
        Wrapper
    ] = None,  # One of: None, vmas.Wrapper.RLLIB, and vmas.Wrapper.GYM
    max_steps: Optional[int] = None,
    seed: Optional[int] = None,
    **kwargs,
):
    env = Environment(
        scenario,
        num_envs=num_envs,
        device=device,
        continuous_actions=continuous_actions,
        max_steps=max_steps,
        seed=seed,
        **kwargs,
    )

    return wrapper.get_env(env) if wrapper is not None else env

I have made this change in my clone repo, but you might find it useful enough to add it as well.

Docs: the dynamics in diff drive expects u_rot to be angular velocity

Hi,
So normally we have the u_rot to be specified as torque (or at least this is what the simulator wants it to be), however, the dynamics need angular velocity as u_rot (since runge_kutta expects the change of rotation over time).
This is not specified anywhere so some description would be nice, because the example is somewhat wrong.

Example code:

    def process_action(self, agent: Agent):
        try:
            agent.dynamics.process_force()
        except AttributeError:
            pass

However, this would give unpredictable behavior. It is barely seen in the example given as you have a u_rot_multiplier of 0.001, so this is barely visible, but if you zoom in and tweaks the agent parameters enough, you can see the misbehavior.

The process_action code should look something like this instead:

    def process_action(self, agent: Agent):
        if hasattr(agent, 'dynamics'):
            # assuming input is in forces
            torque = agent.action.u_rot
            angular_vel = torque / agent.moment_of_inertia * self.dt
            agent.action.u_rot = angular_vel
            agent.dynamics.process_force()
            agent.action.u_rot = torque
            
            # or assuming input is in velocities
            agent.dynamics.process_force()
            angular_vel =  agent.action.u_rot
            torque = angular_vel * agent.moment_of_inertia / self.dt
            agent.action.u_rot = torque

By adding docs to the dynamics.process_force() code, these input units would be much more apparent.

Correct me if I'm wrong, as I might be.

2D birds eye observation - Where in VMAS that the obs got flattened?

Hi @matteobettini,

I am trying to use train VMAS with my custom model that take both the 2D birds eye fully observable game map observation and the continuous observations as input. In the development roadmap, I noticed that you are also planning to implement the 2D birds eye view.

I created the following working example with a dummy env.

import ray
import gym
from gym import spaces
import numpy as np
from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.rllib.models.torch.visionnet import VisionNetwork as TorchVis
from ray.tune.registry import register_env
from ray.rllib.utils.framework import try_import_torch

torch, nn = try_import_torch()

ray.init(local_mode=True)




class TorchCustomModel(TorchModelV2, nn.Module):
    def __init__(self, obs_space, action_space, num_outputs, model_config, name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs, model_config, name)
        nn.Module.__init__(self)
        self.torch_sub_fc_model = TorchFC(obs_space.original_space['fc'], action_space, num_outputs, model_config, name)
        self.torch_sub_vis_model = TorchVis(obs_space.original_space['vis'], action_space, num_outputs, model_config, name)
        self.value_f = nn.Linear(2, 1)


        self.head = nn.Linear( num_outputs * 2, action_space.shape[0]*2)

                
    def forward(self, input_dict, state, seq_lens):
        fc_out, _ = self.torch_sub_fc_model({"obs": input_dict["obs"]['fc']}, state, seq_lens)
        # print("fc_out shape:", fc_out.shape)  # Debug print

        cnn_out, _ = self.torch_sub_vis_model({"obs": input_dict["obs"]['vis']}, state, seq_lens)
        # print("cnn_out shape:", cnn_out.shape)  # Debug print

        x = torch.cat((fc_out, cnn_out), -1)
        # print("concatenated shape:", x.shape)  # Debug print

        out = self.head(x)
        return out, []
 
    def value_function(self):
        vf_fc = self.torch_sub_fc_model.value_function()
        vf_cnn = self.torch_sub_vis_model.value_function()
        vf_combined = torch.stack([vf_fc, vf_cnn], -1)
        return self.value_f(vf_combined).squeeze(-1)

# My custom environment 
class MyEnv(gym.Env):
    def __init__(self, env_config):
        self.observation_space = spaces.Dict({
            "fc": spaces.Box(low=0, high=1, shape=(100,)),  
            "vis": spaces.Box(low=0, high=1, shape=(96, 96, 3))
        })
        self.action_space = spaces.Box(low=-1, high=1, shape=(2,))
    
    def reset(self):
        return {"fc": np.random.rand(100), "vis": np.random.rand(96, 96, 3)}
    
    def step(self, action):
        return {"fc": np.random.rand(100), "vis": np.random.rand(96, 96, 3)}, 1, False, {}

register_env("my_env", lambda config: MyEnv(config))

ModelCatalog.register_custom_model("my_model", TorchCustomModel)

config = DEFAULT_CONFIG.copy()
config['framework'] = 'torch'
config['model'] = {
    "custom_model": "my_model",
    "dim": 96,
    "conv_filters":[[16, [8, 8], 4], [32, [4, 4], 2], [256, [11, 11], 2]]


}
config['env'] = "my_env"

trainer = PPOTrainer(config=config)
for i in range(10):
    result = trainer.train()
    print(result)

When I apply my model to VMAS, I noticed that the values in the input_dict is being flattened some where before it is being feeding to the model. Do you mind to let me know where in VMAS that the input dict may got flattened?

fc_out, _ = self.torch_sub_fc_model({"obs": input_dict["obs"]['fc']}, state, seq_lens)