rstrivedi / melting-pot-contest-2023 Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 67.0 504 KB

License: Apache License 2.0

Python 71.41% Lua 28.58% Shell 0.02%

melting-pot-contest-2023's People

Stargazers

Watchers

Forkers

alexunderch juanduquerodriguez itstyren mr0re1 sushmaakoju benjamin-swain runno0331 jshkrob emanueltewolde ildefons jdevassi voltej syoius pasztorb strnam aarongae dronie violetxi moiduy04 marshin ziyan-wang98 asdspal antony-zhao sandguine ini zczlsde dec2g14 maxrudolph1 mohankumar21 yqqxyy juandavidvargas19 mudryi davidrother zhihan-wang-ut aakash2002 l1jl zeus2x7 anna4142 robotpsychologist 1335077753 oslumbers andy0124 boxxfish yjkim046 chunjiangmonkey coldfrenzy cartgr amyyin78 muff2n rk1a sweatyrichard fmxfranky johnlyzhou wiederholung s-a-barnett machine-learning-competition stevenqing chonsshadow jtbandurski

melting-pot-contest-2023's Issues

bugs when running the training code

2023-09-11 10:21:49,337 ERROR tune_controller.py:911 -- Trial task failed for trial PPO_meltingpot_fcb07_00000
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2495, in get
raise value
File "python/ray/_raylet.pyx", line 1787, in ray._raylet.task_execution_handler
File "python/ray/_raylet.pyx", line 1684, in ray._raylet.execute_task_with_cancellation_handler
File "python/ray/_raylet.pyx", line 1366, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1367, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1583, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 864, in ray._raylet.store_task_errors
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.init() (pid=13902, ip=172.28.0.12, actor_id=7e027fca141b6dc2cdd8f15501000000, repr=PPO)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 517, in init
super().init(
File "/usr/local/lib/python3.10/dist-packages/ray/tune/trainable/trainable.py", line 169, in init
self.setup(copy.deepcopy(self.config))
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 639, in setup
self.workers = WorkerSet(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/worker_set.py", line 179, in init
raise e.args[0].args[2]
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 525, in init
self._update_policy_map(policy_dict=self.policy_dict)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
self._build_policy_map(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
new_policy = create_policy_for_framework(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 64, in init
self._initialize_loss_from_dummy_batch()
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/policy/policy.py", line 1418, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/policy/torch_policy_v2.py", line 571, in compute_actions_from_input_dict
return self._compute_action_helper(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/utils/threading.py", line 24, in wrapper
return func(self, *a, **k)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/policy/torch_policy_v2.py", line 1291, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/torch/recurrent_net.py", line 259, in forward
return super().forward(input_dict, state, seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/torch/recurrent_net.py", line 98, in forward
output, new_state = self.forward_rnn(inputs, state, seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/torch/recurrent_net.py", line 274, in forward_rnn
self._features, [h, c] = self.lstm(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 810, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 730, in check_forward_args
self.check_input(input, batch_sizes)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 218, in check_input
raise RuntimeError(
RuntimeError: input.size(-1) must be equal to input_size. Expected 148, got 24

Faced to issues when follow the guidelines in this rep to evaluate .

Can't find file:params.json

Traceback (most recent call last):
File "/home/zhcao/Melting-Pot-Contest-2023/baselines/evaluation/evaluate.py", line 137, in
results, scenario = run_evaluation(args)
File "/home/zhcao/Melting-Pot-Contest-2023/baselines/evaluation/evaluate.py", line 47, in run_evaluation
f = open(config_file)
FileNotFoundError: [Errno 2] No such file or directory: 'None/params.json'

Where is "params.json" required by "render_models.py"

I am trying to run the baseline. when running the rendering rutine I get the following error:
Call:
python baselines/train/render_models.py --config_dir ./
Error:

Traceback (most recent call last):
  File "/home/ildefons/aicrowd/Melting-Pot-Contest-2023/baselines/train/render_models.py", line 94, in <module>
    render_model(args)
  File "/home/ildefons/aicrowd/Melting-Pot-Contest-2023/baselines/train/render_models.py", line 17, in render_model
    f = open(config_file)

I basically cannot find the params.json file. Where is it?

Error when running run_ray_train.py

File "/home/tess/anaconda3/envs/marlEnv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm_config.py", line 3623, in _check_if_correct_nn_framework_installed raise ImportError( ImportError: PyTorch was specified as the framework to use (via config.framework('torch'))! However, no installation was found. You can install PyTorch via pip install torch.

Ubuntu 20.04
Python 3.10.13 (from anaconda3)
Pytorch 2.0.1
Ray 2.6.1

TensorFlow GPU Detection Issue in WSL2

Description:

I have followed the installation guidelines for setting up TensorFlow in a WSL2 environment, but I'm encountering an issue where TensorFlow is unable to detect my GPU. I would like assistance in resolving this issue.

Steps to Reproduce:

The following command python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" return

2023-09-04 22:53:51.646492: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-04 22:53:51.723849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:51.723880: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-09-04 22:53:52.212623: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:52.212707: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:52.212726: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-09-04 22:53:53.303714: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:53.303816: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:53.303870: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:53.303919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:53.400425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:53.400533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/wsl/lib::/lib:/home/tyren/miniconda3/lib/
2023-09-04 22:53:53.400558: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

While the torch code can reports that GPU device is available:

import torch

if torch.cuda.is_available():
    num_gpus = torch.cuda.device_count()
    print(f"PyTorch can use {num_gpus} GPU(s).")
    for i in range(num_gpus):
        gpu_properties = torch.cuda.get_device_properties(i)
        print(f"GPU {i}: {gpu_properties.name}, Memory: {gpu_properties.total_memory / (1024**3):.2f}GB")
else:
    print("PyTorch cannot use GPU. Running on CPU.")

# PyTorch can use 1 GPU(s).
# GPU 0: NVIDIA RTX A1000 Laptop GPU, Memory: 4.00GB

Note:

I have already checked the following:

setup GPU following Install TensorFlow with pip
Install all package according Installation Guidelines

Any assistance in resolving this issue would be greatly appreciated.

Thank you for your help!

Issue with running training code with torch

Hello,

I have run the setup.py file and ray_patch.sh but still got the following error when running python baselines/train/run_ray_train.py --framework torch.

Traceback (most recent call last):
  File "/ccn2/u/ziyxiang/Melting-Pot-Contest-2023/baselines/train/run_ray_train.py", line 173, in <module>
    ).fit()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/tuner.py", line 347, in fit
    return self._local_tuner.fit()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 588, in fit
    analysis = self._fit_internal(trainable, param_space)
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 703, in _fit_internal
    analysis = run(
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/tune.py", line 1107, in run
    runner.step()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/execution/tune_controller.py", line 280, in step
    self._maybe_update_trial_queue()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/execution/tune_controller.py", line 411, in _maybe_update_trial_queue
    if not self._update_trial_queue(blocking=not dont_wait_for_trial):
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/execution/trial_runner.py", line 1112, in _update_trial_queue
    self.add_trial(trial)
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/execution/tune_controller.py", line 383, in add_trial
    super().add_trial(trial)
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/execution/trial_runner.py", line 597, in add_trial
    trial.create_placement_group_factory()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/experiment/trial.py", line 553, in create_placement_group_factory
    default_resources = trainable_cls.default_resource_request(self.config)
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2193, in default_resource_request
    cf.validate()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 315, in validate
    super().validate()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/pg/pg.py", line 100, in validate
    super().validate()
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm_config.py", line 773, in validate
    self._check_if_correct_nn_framework_installed(_tf1, _tf, _torch)
  File "/data/ziyxiang/anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm_config.py", line 3623, in _check_if_correct_nn_framework_installed
    raise ImportError(
ImportError: PyTorch was specified as the framework to use (via `config.framework('torch')`)! However, no installation was found. You can install PyTorch via `pip install torch`.

Double checked that torch is indeed installed, and here is output from nvidia-smi

-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |

Training ended unexpected.

Hi, @rstrivedi
Here is my arguments:
Running trails with the following arguments: Namespace(num_workers=2, num_gpus=0, local=False, no_tune=False, algo='ppo', framework='torch', exp='clean_up', seed=123, results_dir='./results', logging='INFO', wandb=False, downsample=True, as_test=False)

After starting the training, the code automatically ended around 400 steps (nearly 2 minutes), and it seems that no errors were thrown. Do you have any suggestions for modification?

(PPO pid=299168) 2023-10-19 14:23:40,547 INFO rollout_worker.py:786 -- Training on concatenated sample batches:
(PPO pid=299168)
(PPO pid=299168) { 'count': 32,
(PPO pid=299168) 'policy_batches': { 'agent_3': { 'action_dist_inputs': np.ndarray((32, 9), dtype=float32, min=-0.176, max=0.307, mean=0.013),

...

(PPO pid=299168)
(PPO pid=299168) 2023-10-19 14:23:40,553 INFO rnn_sequencing.py:178 -- Padded input for RNN/Attn.Nets/MA:
....
(RolloutWorker pid=302375) /home/ldp/anaconda3/envs/mpc_main/lib/python3.10/site-packages/gymnasium/spaces/box.py:227: UserWarning: WARN: Casting input x to numpy array.
(RolloutWorker pid=302375) logger.warn("Casting input x to numpy array.")
...
(RolloutWorker pid=302375) 2023-10-19 14:23:33,831 INFO policy.py:1294 -- Policy (worker=2) running on CPU. [repeated 7x across cluster]
(PPO pid=299168) 2023-10-19 14:23:34,325 INFO torch_policy_v2.py:113 -- Found 0 visible cuda devices. [repeated 14x across cluster]
...
(PPO pid=299168) 2023-10-19 14:23:34,339 INFO util.py:118 -- Using connectors: [repeated 14x across cluster]
(PPO pid=299168) 2023-10-19 14:23:34,339 INFO util.py:119 -- AgentConnectorPipeline [repeated 14x across cluster]
(PPO pid=299168) StateBufferConnector [repeated 14x across cluster]
(PPO pid=299168) ViewRequirementAgentConnector [repeated 14x across cluster]
(PPO pid=299168) 2023-10-19 14:23:34,339 INFO util.py:120 -- ActionConnectorPipeline [repeated 14x across cluster]
(PPO pid=299168) ConvertToNumpyConnector [repeated 14x across cluster]
(PPO pid=299168) NormalizeActionsConnector [repeated 14x across cluster]
(PPO pid=299168) ImmutableActionsConnector [repeated 14x across cluster]
(RolloutWorker pid=302374) 2023-10-19 14:23:40,526 INFO rollout_worker.py:732 -- Completed sample batch:
(RolloutWorker pid=302374) 'agent_1': { 'action_dist_inputs': np.ndarray((200, 9), dtype=float32, min=-0.0, max=0.0, mean=-0.0),
(RolloutWorker pid=302374) 'agent_2': { 'action_dist_inputs': np.ndarray((200, 9), dtype=float32, min=-0.43, max=0.916, mean=0.128),
(RolloutWorker pid=302374) 'agent_3': { 'action_dist_inputs': np.ndarray((200, 9), dtype=float32, min=-0.314, max=0.355, mean=0.004),
(RolloutWorker pid=302374) 'agent_4': { 'action_dist_inputs': np.ndarray((200, 9), dtype=float32, min=-0.433, max=0.446, mean=-0.019),
(RolloutWorker pid=302374) 'agent_5': { 'action_dist_inputs': np.ndarray((200, 9), dtype=float32, min=-0.573, max=1.049, mean=0.05),
(RolloutWorker pid=302374) 'agent_6': { 'action_dist_inputs': np.ndarray((200, 9), dtype=float32, min=-0.464, max=0.5, mean=-0.004).

Result(
metrics={'custom_metrics': {}, 'episode_media': {}, 'info': {'learner': {'agent_3': {'learner_stats': {'allreduce_latency': 0.0, 'grad_gnorm': 0.2938595721563488, 'cur_kl_coeff': 0.19999999999999998, 'cur_lr': 5.000000000000001e-05, 'total_loss': 0.01326687481046783, 'policy_loss': 0.011805192082956956, 'vf_loss': 0.0014429467907768848, 'vf_explained_var': -1.0, 'kl': 9.368463734633353e-05, 'entropy': 2.1968671936737865, 'entropy_coeff': 0.0}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': 32.0, 'num_grad_updates_lifetime': 285.5, 'diff_num_grad_updates_vs_sampler_policy': 284.5}, 'agent_6': {'learner_stats': {'allreduce_latency': 0.0, 'grad_gnorm': 0.44844337551151975, 'cur_kl_coeff': 0.19999999999999998, 'cur_lr': 5.000000000000001e-05, 'total_loss': 0.063614134408795, 'policy_loss': 0.062265382422820516, 'vf_loss': 0.0006033147813166013, 'vf_explained_var': -1.0, 'kl': 0.0037272102948426424, 'entropy': 2.197162473829169, 'entropy_coeff': 0.0}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': 32.0, 'num_grad_updates_lifetime': 285.5, 'diff_num_grad_updates_vs_sampler_policy': 284.5}, 'agent_2': {'learner_stats': {'allreduce_latency': 0.0, 'grad_gnorm': 0.33651486742391923, 'cur_kl_coeff': 0.19999999999999998, 'cur_lr': 5.000000000000001e-05, 'total_loss': 0.020297989991836643, 'policy_loss': 0.01227701953693963, 'vf_loss': 0.004015607813974049, 'vf_explained_var': -0.9549619204119633, 'kl': 0.020026806541649823, 'entropy': 2.196402873072708, 'entropy_coeff': 0.0}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': 32.0, 'num_grad_updates_lifetime': 285.5, 'diff_num_grad_updates_vs_sampler_policy': 284.5}, 'agent_0': {'learner_stats': {'allreduce_latency': 0.0, 'grad_gnorm': 0.08610846985768723, 'cur_kl_coeff': 0.19999999999999998, 'cur_lr': 5.000000000000001e-05, 'total_loss': 0.00414218608486025, 'policy_loss': 0.002945413388181151, 'vf_loss': 0.0011898500088354863, 'vf_explained_var': -1.0, 'kl': 3.462271995048046e-05, 'entropy': 2.1969731778429265, 'entropy_coeff': 0.0}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': 32.0, 'num_grad_updates_lifetime': 285.5, 'diff_num_grad_updates_vs_sampler_policy': 284.5}, 'agent_4': {'learner_stats': {'allreduce_latency': 0.0, 'grad_gnorm': 0.7889667204074692, 'cur_kl_coeff': 0.19999999999999998, 'cur_lr': 5.000000000000001e-05, 'total_loss': 0.04514940154264893, 'policy_loss': -0.005434698990562506, 'vf_loss': 0.04920632253490846, 'vf_explained_var': -1.0, 'kl': 0.006888877354785194, 'entropy': 2.195473243897421, 'entropy_coeff': 0.0}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': 32.0, 'num_grad_updates_lifetime': 285.5, 'diff_num_grad_updates_vs_sampler_policy': 284.5}, 'agent_1': {'learner_stats': {'allreduce_latency': 0.0, 'grad_gnorm': 0.20199027515032836, 'cur_kl_coeff': 0.19999999999999998, 'cur_lr': 5.000000000000001e-05, 'total_loss': 0.021706062150106096, 'policy_loss': 0.01644499337202624, 'vf_loss': 0.005171904300403789, 'vf_explained_var': -1.0, 'kl': 0.00044581278011054644, 'entropy': 2.195561667074237, 'entropy_coeff': 0.0}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': 32.0, 'num_grad_updates_lifetime': 285.5, 'diff_num_grad_updates_vs_sampler_policy': 284.5}, 'agent_5': {'learner_stats': {'allreduce_latency': 0.0, 'grad_gnorm': 2.785927937036021, 'cur_kl_coeff': 0.19999999999999998, 'cur_lr': 5.000000000000001e-05, 'total_loss': 0.356755890442761, 'policy_loss': 0.024646596205339096, 'vf_loss': 0.3239888458541317, 'vf_explained_var': -0.735680664945067, 'kl': 0.040602242583161224, 'entropy': 2.195885472130357, 'entropy_coeff': 0.0}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': 32.0, 'num_grad_updates_lifetime': 285.5, 'diff_num_grad_updates_vs_sampler_policy': 284.5}}, 'num_env_steps_sampled': 400, 'num_env_steps_trained': 400, 'num_agent_steps_sampled': 2800, 'num_agent_steps_trained': 2800}, 'sampler_results': {'episode_reward_max': nan, 'episode_reward_min': nan, 'episode_reward_mean': nan, 'episode_len_mean': nan, 'episode_media': {}, 'episodes_this_iter': 0, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [], 'episode_lengths': []}, 'sampler_perf': {}, 'num_faulty_episodes': 0, 'connector_metrics': {}}, 'episode_reward_max': nan, 'episode_reward_min': nan, 'episode_reward_mean': nan, 'episode_len_mean': nan, 'episodes_this_iter': 0, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'hist_stats': {'episode_reward': [], 'episode_lengths': []}, 'sampler_perf': {}, 'num_faulty_episodes': 0, 'connector_metrics': {}, 'num_healthy_workers': 2, 'num_in_flight_async_reqs': 0, 'num_remote_worker_restarts': 0, 'num_agent_steps_sampled': 2800, 'num_agent_steps_trained': 2800, 'num_env_steps_sampled': 400, 'num_env_steps_trained': 400, 'num_env_steps_sampled_this_iter': 400, 'num_env_steps_trained_this_iter': 400, 'num_env_steps_sampled_throughput_per_sec': 7.399711776182035, 'num_env_steps_trained_throughput_per_sec': 7.399711776182035, 'num_steps_trained_this_iter': 400, 'agent_timesteps_total': 2800, 'timers': {'training_iteration_time_ms': 54056.106, 'sample_time_ms': 6118.289, 'learn_time_ms': 47902.557, 'learn_throughput': 8.35, 'synch_weights_time_ms': 32.938}, 'counters': {'num_env_steps_sampled': 400, 'num_env_steps_trained': 400, 'num_agent_steps_sampled': 2800, 'num_agent_steps_trained': 2800}, 'done': True, 'trial_id': '00b18_00000', 'perf': {'cpu_util_percent': 1.6532467532467536, 'ram_util_percent': 8.0}, 'experiment_tag': '0'},
path='/home/ldp/competitions/meltingpot/Melting-Pot-Contest-2023/results/torch/clean_up/PPO_meltingpot_00b18_00000_0_2023-10-19_14-23-22',
checkpoint=Checkpoint(local_path=/home/ldp/competitions/meltingpot/Melting-Pot-Contest-2023/results/torch/clean_up/PPO_meltingpot_00b18_00000_0_2023-10-19_14-23-22/checkpoint_000001)
)
(RolloutWorker pid=302374) [repeated 30x across cluster]
(RolloutWorker pid=302374) { 'count': 200,
(RolloutWorker pid=302374) 'policy_batches': { 'agent_0': { 'action_dist_inputs': np.ndarray((200, 9), dtype=float32, min=-0.0, max=0.001, mean=0.0),
(RolloutWorker pid=302374) 'action_logp': np.ndarray((200,), dtype=float32, min=-2.615, max=-1.737, mean=-2.185), [repeated 7x across cluster]
(RolloutWorker pid=302374) 'actions': np.ndarray((200,), dtype=int64, min=0.0, max=8.0, mean=3.71), [repeated 7x across cluster]
...

Is there faster way to see scenario evaluation than generating video?

Hello!
In my case generating video

python baselines/evaluation/evaluate.py --num_episodes 1 --eval_on_scenario 1 --scenario allelopathic_harvest__open_0 REST_OF_ARGUMENTS

takes ~10 minutes.

Before it was vp90 codec which compressed really well, but took 3+ second per to encode per image. One episode took ~40 minutes to generate.
I have swaped vp90 with mp4v codec and now it takes only 0.3 second per frame, but the env takes ~1 second per step. It takes 20 minutes to finish the game and generate final video.

Is there any already available way to record environment faster? Mayber better codec, or make stepping faster? I see that gpu is barely utilized. Maybe record each agent as a step/game state save it into log and then play it in game/environment runner (like in replays in videogames such as Lux-AI competition on kaggle, quake, dota 2, counter-strike, etc.).

Contributing towards additional baselines/tutorials resources

Hi, I was wondering if there is any interest in help with additional tutorials/baselines? I posted on the Melting Pot repo itself (google-deepmind/meltingpot#113) but since users will likely refer here, I figure maybe it’d be useful to contribute towards this.

I am the project manager/maintainer of PettingZoo, and created a conversion wrapper adapting MeltingPot environments to work with PettingZoo using the Shimmy library (https://shimmy.farama.org/environments/meltingpot/). I have also been updating PettingZoo’s internal tutorials to give users a better starting point, with examples using Stabe-Baselines3, CleanRL RLlib, Tianshou, LangChain (proof of concept), and AgileRL. I would be happy to create some simple scripts demonstrating how to use the conversation wrappers and then basic examples with different libraries, if that is something you are interested in.

Would also be happy to help out with the baselines aspect of things, for example benchmarking the implementation of MADDPG from RLlib vs AgileRL, though I imagine the tutorials would be a more practical/simpler way to contribute.

M1 MacOS support

Hello,

The tensforflow package cannot be installed in M1 Macs, so I propose modifying this line in the setup.py file from:

'tensorflow==2.11.1'

to:

 'tensorflow==2.11.1' if sys.platform != 'darwin' or platform.processor() != 'arm' else 'tensorflow-macos==2.11.0',

resume training based on restored checkpoint

I was trying to figure out how to resume training based on a restored checkpoint with run_ray_train.py. Specifically:

I was referring to this tutorial: https://docs.ray.io/en/latest/rllib/rllib-saving-and-loading-algos-and-policies.html
The code I was using is shown below:

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.algorithms.algorithm import Algorithm
from ray.tune import registry
from baselines.train import make_envs


ray.init(local_mode=True, ignore_reinit_error=True)
registry.register_env("meltingpot", make_envs.env_creator)

## train mode, two failed attempts
my_ppo_config = PPOConfig().environment("meltingpot")
my_ppo = my_ppo_config.build()

# method1: fail at .build stage
PPOConfig().environment("meltingpot").build().restore(checkpoint_dir)

# method2: failed at .train stage
Algorithm.from_checkpoint(checkpoint_dir).train()

I came across KeyError, details shown as below:

ray::RolloutWorker.__init__() (pid=180001, ip=10.0.0.182, actor_id=17cb813ab79e0c981feebd6e01000000, repr=<ray.rllib.evaluation.rollout_worker._modify_class.<locals>.Class object at 0x7f6b2de58850>)
  File "anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 397, in __init__
    self.env = env_creator(copy.deepcopy(self.env_context))
  File "/home/researchyw20/meltingpot/code/Melting-Pot-Contest-2023/baselines/train/make_envs.py", line 10, in env_creator
    env = substrate.build(env_config['substrate'], roles=env_config['roles'])
  File "anaconda3/envs/mpc_main/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 909, in __getitem__
    raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'substrate'"

Any help on this is appreciated.

NaN episode rewards during baseline training

Problem:
I am encountering an issue while running the MeltingPot baseline Ray training model. The episode rewards I am getting are consistently NaN (Not-a-Number).

Steps to Reproduce:
python baselines/train/run_ray_train.py --num_gpus 1 --wandb True
The training args are set as"

        # training
        "seed": args.seed,
        "rollout_fragment_length": 5, # Divide episodes into fragments of this many steps each during rollouts. 
        "train_batch_size": 40, # Batch size (batch * rollout_fragment_length) Trajectories of this size are collected from rollout workers and combined into a larger batch of train_batch_size for learning. 
        "sgd_minibatch_size": 32, #  PPO further divides the train batch into minibatches for multi-epoch SGD
        "disable_observation_precprocessing": True,
        "use_new_rl_modules": False,
        "use_new_learner_api": False,
        "framework": args.framework,  # torch or tensorflow 

        # agent model
        "fcnet_hidden": (4, 4), # fully connected network
        "post_fcnet_hidden": (16,), # Layer sizes after the fully connected torso.
        "cnn_activation": "relu",
        "fcnet_activation": "relu",
        "post_fcnet_activation": "relu",
        # == LSTM ==
        "use_lstm": True,
        "lstm_use_prev_action": True,
        "lstm_use_prev_reward": False,
        "lstm_cell_size": 2,  # A cell, is an LSTM unit 
        "shared_policy": False,

Please let me know if there's any additional information or logs needed to diagnose this issue. Thank you for your assistance in resolving this problem.

Error in Clean Up Human Player Mode

Running an episode with 7 players: ['1', '2', '3', '4', '5', '6', '7']. Traceback (most recent call last): File "/home/tess/Desktop/MARL/contest/meltingpot/human_players/play_clean_up.py", line 91, in main() File "/home/tess/Desktop/MARL/contest/meltingpot/human_players/play_clean_up.py", line 83, in main level_playing_utils.run_episode( File "/home/tess/Desktop/MARL/contest/meltingpot/human_players/level_playing_utils.py", line 344, in run_episode verbose_fn(timestep, i, player_index) File "/home/tess/Desktop/MARL/contest/meltingpot/human_players/play_clean_up.py", line 44, in verbose_fn cleaned = env_timestep.observation[f'{lua_index}.PLAYER_CLEANED'] KeyError: '1.PLAYER_CLEANED'

it's only appear when verbose=True

Evaluation doesn't show results for al_harvest and clean_up substrates, and unable to get evaluation working when use_attention is True

We are starting to test attention mechanisms and we get no evaluation result both with the evaluation.py file as well as with the local_evaluation.py file. We use a working without attention configs.py file and set the attention parameters. We deactivate use_lstm and set use_attention. The evaluation.py results can be seen here: https://drive.google.com/drive/folders/1s2OIwmb_bFjUoJ9O_OnI2BG-GL_fk_gU?usp=sharing. Also, there is a sample here of the result when submitting as well as when doing local evaluation.
error_submission_oct_18.txt
output (1).txt

Shape error

Hello, I am encountering a shape error when running the training script CUDA_VISIBLE_DEVICES=0 python baselines/train/run_ray_train.py --framework torch --exp al_harvest. Any help is greatly appreciated.

2023-09-04 21:31:18,216 ERROR tune.py:1144 -- Trials did not complete: [PPO_meltingpot_55a86_00000] (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/models/torch/recurrent_net.py", line 274, in forward_rnn (PPO pid=6862) self._features, [h, c] = self.lstm( (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl (PPO pid=6862) return forward_call(*args, **kwargs) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 810, in forward (PPO pid=6862) self.check_forward_args(input, hx, batch_sizes) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 730, in check_forward_args (PPO pid=6862) self.check_input(input, batch_sizes) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 218, in check_input (PPO pid=6862) raise RuntimeError( (PPO pid=6862) RuntimeError: input.size(-1) must be equal to input_size. Expected 147, got 27 (PPO pid=6862) (PPO pid=6862) During handling of the above exception, another exception occurred: (PPO pid=6862) (PPO pid=6862) ray::PPO.__init__() (pid=6862, ip=10.64.34.33, actor_id=afbc4db286cfab682041540a01000000, repr=PPO) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 517, in __init__ (PPO pid=6862) super().__init__( (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__ (PPO pid=6862) self.setup(copy.deepcopy(self.config)) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 639, in setup (PPO pid=6862) self.workers = WorkerSet( (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 179, in __init__ (PPO pid=6862) raise e.args[0].args[2] (PPO pid=6862) RuntimeError: input.size(-1) must be equal to input_size. Expected 147, got 27

assert i == len(f), f AssertionError: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

I have manually installed melting pot and I have also changed rllib torch model file for the fix of LSTM wrapper . I am getting the assertion error while running training with these arguments .
Running trails with the following arguments: Namespace(num_workers=8, num_gpus=0, local=False, no_tune=False, algo='ppo', framework='torch', exp='clean_up', seed=123, results_dir='./results', logging='INFO', wandb=False, downsample=True, as_test=False)
Failure # 1 (occurred at 2023-09-02_22-43-34) �[36mray::PPO.train()�[39m (pid=86686, ip=######, actor_id=b2a47515069c17b28793673b01000000, repr=PPO) File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 375, in train raise skipped from exception_cause(skipped) File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 372, in train result = self.step() File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 851, in step results, train_iter_ctx = self._run_one_training_iteration() File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2835, in _run_one_training_iteration results = self.training_step() File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 455, in training_step train_results = train_one_step(self, train_batch) File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 56, in train_one_step info = do_minibatch_sgd( File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/utils/sgd.py", line 129, in do_minibatch_sgd local_worker.learn_on_batch( File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 810, in learn_on_batch info_out[pid] = policy.learn_on_batch(batch) File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper return func(self, *a, **k) File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 729, in learn_on_batch grads, fetches = self.compute_gradients(postprocessed_batch) File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper return func(self, *a, **k) File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 929, in compute_gradients pad_batch_to_sequences_of_same_size( File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/policy/rnn_sequencing.py", line 155, in pad_batch_to_sequences_of_same_size feature_sequences, initial_states, seq_lens = chop_into_sequences( File "/home/saidinesh/Desktop/Projects/Melting-Pot-Contest-2023/rllib-env/lib/python3.10/site-packages/ray/rllib/policy/rnn_sequencing.py", line 387, in chop_into_sequences assert i == len(f), f AssertionError: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

How to install dmlab2d on CentOS

I tried to setup MeltingPot environment on my lab's server but failed when I run
SYSTEM_VERSION_COMPAT=0 pip install dmlab2d
raising error:
ERROR: Could not find a version that satisfies the requirement dmlab2d (from versions: none) ERROR: No matching distribution found for dmlab2d
The system on server is CentOS Linux release 7.9.2009, and I also tried with my Mac (M1 chips) and the installation succeeded, so I suppose the installation failure should be caused by the system version. I think it would be better if I can run it on server for better GPU resources, is there some method to install dmlab2d on CentOS?