Code Monkey home page Code Monkey logo

pymarl2's Introduction

- If you want high sample efficiency, please use qmix_high_sample_efficiency.yaml
- which uses 4 processes for training, slower but higher sample efficiency.
- Performance is *not* comparable of models trained with different number of processes. 

PyMARL2

Open-source code for Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning.

This repository is fine-tuned for StarCraft Multi-agent Challenge (SMAC). For other multi-agent tasks, we also recommend an optimized implementation of QMIX: https://github.com/marlbenchmark/off-policy.

StarCraft 2 version: SC2.4.10. difficulty: 7.

2022.10.10 update: add qmix_high_sample_efficiency.yaml, which uses 4 processes for training, slower but higher sample efficiency.

2021.10.28 update: add Google Football Environments [vdn_gfootball.yaml] (use `simple115 features`).

2021.10.4 update: add QMIX with attention (qmix_att.yaml) as a baseline for Communication tasks.

Finetuned-QMIX

There are so many code-level tricks in the Multi-agent Reinforcement Learning (MARL), such as:

  • Value function clipping (clip max Q values for QMIX)
  • Value Normalization
  • Reward scaling
  • Orthogonal initialization and layer scaling
  • Adam
  • Neural networks hidden size
  • learning rate annealing
  • Reward Clipping
  • Observation Normalization
  • Gradient Clipping
  • Large Batch Size
  • N-step Returns(including GAE($\lambda$) and Q($\lambda$) ...)
  • Rollout Process Number
  • $\epsilon$-greedy annealing steps
  • Death Agent Masking

Related Works

  • Implementation Matters in Deep RL: A Case Study on PPO and TRPO
  • What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
  • The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Using a few of tricks above (bold texts), we enabled QMIX (qmix.yaml) to solve almost all hard scenarios of SMAC (Fine-tuned hyperparameters for each scenarios).

Senarios Difficulty QMIX (batch_size=128) Finetuned-QMIX
8m Easy - 100%
2c_vs_1sc Easy - 100%
2s3z Easy - 100%
1c3s5z Easy - 100%
3s5z Easy - 100%
8m_vs_9m Hard 84% 100%
5m_vs_6m Hard 84% 90%
3s_vs_5z Hard 96% 100%
bane_vs_bane Hard 100% 100%
2c_vs_64zg Hard 100% 100%
corridor Super Hard 0% 100%
MMM2 Super Hard 98% 100%
3s5z_vs_3s6z Super Hard 3% 93%(hidden_size = 256, qmix_large.yaml)
27m_vs_30m Super Hard 56% 100%
6h_vs_8z Super Hard 0% 93%($\lambda$ = 0.3, epsilon_anneal_time = 500000)

Re-Evaluation

Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a general set of hyperparameters), and find that QMIX achieves the SOTA.

Scenarios Difficulty Value-based Policy-based
QMIX VDNs Qatten QPLEX WQMIX LICA VMIX DOP RIIT
2c_vs_64zg Hard 100% 100% 100% 100% 100% 100% 98% 84% 100%
8m_vs_9m Hard 100% 100% 100% 95% 95% 48% 75% 96% 95%
3s_vs_5z Hard 100% 100% 100% 100% 100% 96% 96% 100% 96%
5m_vs_6m Hard 90% 90% 90% 90% 90% 53% 9% 63% 67%
3s5z_vs_3s6z S-Hard 75% 43% 62% 68% 56% 0% 56% 0% 75%
corridor S-Hard 100% 98% 100% 96% 96% 0% 0% 0% 100%
6h_vs_8z S-Hard 84% 87% 82% 78% 75% 4% 80% 0% 19%
MMM2 S-Hard 100% 96% 100% 100% 96% 0% 70% 3% 100%
27m_vs_30m S-Hard 100% 100% 100% 100% 100% 9% 93% 0% 93%
Discrete PP - 40 39 - 39 39 30 39 38 38
Avg. Score Hard+ 94.9% 91.2% 92.7% 92.5% 90.5% 29.2% 67.4% 44.1% 84.0%

Communication

We also tested our QMIX-with-attention (qmix_att.yaml, $\lambda=0.3$, attention_heads=4) on some maps (from NDQ) that require communication.

Senarios (200w steps) Difficulty Finetuned-QMIX (No Communication) QMIX-with-attention ( Communication)
1o_10b_vs_1r - 56% 87%
1o_2r_vs_4r - 50% 95%
bane_vs_hM - 0% 0%

Google Football

We also tested VDN (vdn_gfootball.yaml) on some maps (from Google Football). Specially, we use simple115 features to train the model (The Google Football original paper use complex CNN features). We did not test QMIX because this environment does not provide global status information.

Senarios Difficulty VDN ($\lambda=1.0$)
academy_counterattack_hard - 0.71 (Test Score)
academy_counterattack_easy - 0.87 (Test Score)

Usage

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
conda create -n pymarl python=3.8 -y
conda activate pymarl

bash install_dependecies.sh

Set up StarCraft II (2.4.10) and SMAC:

bash install_sc2.sh

This will download SC2.4.10 into the 3rdparty folder and copy the maps necessary to run over.

Set up Google Football:

bash install_gfootball.sh

Command Line Tool

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
# For Difficulty-Enhanced Predator-Prey
python3 src/main.py --config=qmix_predator_prey --env-config=stag_hunt with env_args.map_name=stag_hunt
# For Communication tasks
python3 src/main.py --config=qmix_att --env-config=sc2 with env_args.map_name=1o_10b_vs_1r
# For Google Football (Insufficient testing)
# map_name: academy_counterattack_easy, academy_counterattack_hard, five_vs_five...
python3 src/main.py --config=vdn_gfootball --env-config=gfootball with env_args.map_name=academy_counterattack_hard env_args.num_agents=4

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run n parallel experiments

# bash run.sh config_name env_config_name map_name_list (arg_list threads_num gpu_list experinments_num)
bash run.sh qmix sc2 6h_vs_8z epsilon_anneal_time=500000,td_lambda=0.3 2 0 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Kill all training processes

# all python and game processes of current user will quit.
bash clean.sh

Citation

@article{hu2021rethinking,
      title={Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Siyang Jiang and Seth Austin Harding and Haibin Wu and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

pymarl2's People

Contributors

hijkzzz avatar htensor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pymarl2's Issues

gfootball version

what is the version used of gfootball? I find many errors when using football==2.10

RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

你好,代码运行以下测试的时候会报错:
main.py --config=coma --env-config=one_step_matrix_game with save_model=True use_tensorboard=True save_model_interval=1000 t_max=50000 runner='episode' batch_size_run=1 use_cuda=False

报错信息:
Traceback (most recent calls WITHOUT Sacred internals):
File "D:/sby/RL/pymarl2/main.py", line 35, in my_main
run_REGISTRY[_config['run']](_run, config, _log)
File "D:\sby\RL\pymarl2\run\run.py", line 56, in run
run_sequential(args=args, logger=logger)
File "D:\sby\RL\pymarl2\run\run.py", line 181, in run_sequential
episode_batch = runner.run(test_mode=False)
File "D:\sby\RL\pymarl2\runners\episode_runner.py", line 70, in run
actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, test_mode=test_mode)
File "D:\sby\RL\pymarl2\controllers\basic_controller.py", line 23, in select_actions
t_env, test_mode=test_mode)
File "D:\sby\RL\pymarl2\components\action_selectors.py", line 105, in select_action
picked_actions = Categorical(masked_policies).sample().long()
File "D:\Anaconda3\lib\site-packages\torch\distributions\categorical.py", line 107, in sample
samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

我找了下问题,应该是rnn_agent.py中,x = F.relu(self.fc1(inputs.view(-1, e)), inplace=True),
多次迭代后,梯度累积,导致梯度爆炸,从而输出存在nan.
你看能否解决一下,谢谢。
我的环境是win10, pytorch1.x

TensorBoard logger not working

Hi, thanks for the good work! I installed the dependencies as instructed and successfully started training. However, it seams that the tensorboard logs are not written to /result directory although I set the use_tensorboard param to "true" in src/config/default.yaml. Could you please help me with this?

The latest smac_run_data.json

Is it possible to have the latest smac_run_data.json? Found that the results provided by smac were out of date and only had 2 million

Set trained model as opponent?

Hello,

Based on the state-of-the-art algorithm, the reality is that the winning rate is close to 1 in many maps. Is the author interested in further expanding the function of pymarl2 to realize the battle between two models obtained by different algorithms? I think this can break through the upper limit of the difficulty of SC2's built-in computer, so as to keep SMAC alive forever.

运行pymarl2 Results中没有log.json

在运行下属指令后
python src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
在results文件夹中没有pymarl中的log.json文件,请问在pymarl2中实验数据记录是存在哪里的?

另外,在原pymarl库运行的过程中,使用defult.yaml和sc2.yaml中的configuration运行后,在log.json文件中,只记录了一个episode的数据,请问这是因为configuration设置不对吗?

关于mac

您好,我是一名新手,代码阅读起来遇到一些困难。我看到说这个框架使用了mac架构,可不可以简单给我解释一下这里mac分别指什么啊?谢谢啦

QMIX在corridor地图下胜率一直为0,且会出现非法ACTION错误

使用qmix_high_sample_efficiency.yaml情况下在较难的corridor地图下,test_battle_won_mean一直为0(实际上qmix.yaml也一样)。请教下那个地方可能出了问题,附上了完整的输出文件。

另外还有个比较随机的问题,某几次实验室的时候,会出现action不合法导致的assertion错误,导致程序直接退出(如下面的cout所示),不知道这个问题是否有解决方案?

盼回复。谢谢!

[INFO 10:16:50] my_main t_env: 3086524 / 10050000
[INFO 10:16:50] my_main Estimated time left: 22 hours, 38 minutes, 44 seconds. Time passed: 6 hours, 22 minutes, 13 seconds
[INFO 10:16:58] my_main Recent Stats | t_env: 3086524 | Episode: 81500
battle_won_mean: 0.0000 ep_length_mean: 42.6525 epsilon: 0.0500 grad_norm: 0.2738
loss_td: 0.0163 q_taken_mean: 0.6618 return_mean: 9.7562 return_std: 1.3360
target_mean: 0.6574 td_error_abs: 0.0163 test_battle_won_mean: 0.0000 test_ep_length_mean: 43.0625
test_return_mean: 10.0525 test_return_std: 1.6540
[INFO 10:17:52] my_main Updated target network
Process Process-2:
Traceback (most recent call last):
File "/home/ts1-guest/anaconda3/envs/pymarl/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/ts1-guest/anaconda3/envs/pymarl/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/ts1-guest/RL_study/pymarl2/src/runners/parallel_runner.py", line 233, in env_worker
reward, terminated, env_info = env.step(actions)
File "/home/ts1-guest/RL_study/pymarl2/src/envs/starcraft/StarCraft2Env.py", line 406, in step
sc_action = self.get_agent_action(a_id, action)
File "/home/ts1-guest/RL_study/pymarl2/src/envs/starcraft/StarCraft2Env.py", line 477, in get_agent_action
assert avail_actions[action] == 1,
AssertionError: Agent 1 cannot perform action 14
CloseHandler: 127.0.0.1:54100 disconnected
cout.txt

Why use NRNNAgent and NMAC?

Hi, I've been reading through the code and found that the NRNNAgent(from n_rnn_agent.py) has almost negligible differences from the RNNAgent(from rnn_agent.py). The same goes between the BasicMAC(basis_controller.py) and the NMAC(n_controller.py).
It could be my carelessness. Could you kindly explain the reasons behind this design? What does the "n" in the names(as in n_controller.py) stand for?
Thank you in advance!

Running Trained Agents

Hi there,

Is there a simple command to run already an already trained model, but without learning? I understand I can comment on the training in the code and remove the epsilon greedy action selection to add a selection that always chooses the learned model, but isn't there a straightforward way or command to run a trained model n times to get results for analysis?

Question about GRF

Hi, Awesome work! You extended GRF into the pymarl famework. However when I run it with vdn_gfootball.yaml, there is a lot of debugging information. Could you please help me to fix it?

Detail: absl Dump "episode_done": count limit reached / disabled

[Help]pysc2.lib.remote_controller.ConnectError: Failed to connect to the SC2 websocket. Is it up?

When I run command

python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor

I encounter this error.

Traceback (most recent call last):
  File "src/main.py", line 109, in <module>
    ex.run_commandline(params)
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 318, in run_commandline
    options=args,
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "src/main.py", line 35, in my_main
    run_REGISTRY[_config['run']](_run, config, _log)
  File "/root/pymarl2/src/run/run.py", line 54, in run
    run_sequential(args=args, logger=logger)
  File "/root/pymarl2/src/run/run.py", line 177, in run_sequential
    episode_batch = runner.run(test_mode=False)
  File "/root/pymarl2/src/runners/parallel_runner.py", line 89, in run
    self.reset()
  File "/root/pymarl2/src/runners/parallel_runner.py", line 78, in reset
    data = parent_conn.recv()
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

Here(google-deepmind/pysc2#281) says I have to open the Starcraft2 game as well instead of just open the battle net, but I don't know how to open them.

Could you give any advices?

VMIX算法报NAN

Traceback (most recent call last):                                                                                                                  [665/3388]
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline                                    
    return self.run(                                                                                                                                          
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run                                                
    run()                                                                                                                                                     
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__                                                  
    self.result = self.main_function(*args)                                                                                                                   
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function                     
    result = wrapped(*args, **kwargs)                                                                                                                         
  File "src/main.py", line 38, in my_main                                                                                                                     
    run_REGISTRY[_config['run']](_run, config, _log)                                                                                                          
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 54, in run                                                                 
    run_sequential(args=args, logger=logger)                                                                                                                  
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 195, in run_sequential                                                     
    learner.train(episode_sample, runner.t_env, episode)                                                                                                      
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 58, in train                      
    advantages, td_error, targets_taken, log_pi_taken, entropy = self._calculate_advs(batch, rewards, terminated, actions, avail_actions,                     
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 115, in _calculate_advs                                
    entropy = categorical_entropy(pi).reshape(-1)  #[bs, t, n_agents, 1]                                                                                      
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/components/action_selectors.py", line 110, in categorical_entropy                            
    return Categorical(probs=probs).entropy()                                                                                                                 
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/categorical.py", line 64, in __init__                              
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)                                                  
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(                                                                                                                                         
ValueError: Expected parameter probs (Tensor of shape (8, 54, 10, 18)) of distribution Categorical(probs: torch.Size([8, 54, 10, 18])) to satisfy the constrai
nt Simplex(), but found invalid values:

后面一截是数据没有贴上来,问题就是里面有nan

关于SC2的版本问题

请问作者,可否同在2.4.6的版本环境中与本项目进行对比实验呢?,如果在这个版本环境中,是否有需要注意调整的参数?

Problem when modify maps

Hi,

I am doing some personal researchs, and I used one of your maps (1o_10b_vs_1r.SC2Map) due to the terrain design tha allow me some tasks. I have modified the maps regarding the type and number of agents but when I tried to change some terrain features the code gives me the following error

Error

The only modifications I made are elevate some of the terrain so I do not change the size of the map.

Do you know the reason of this error?

Thanks!

About the Linux environment

Hello,

I would like to do more test with pymarl2. However, it seems that SC2.4.10 can not work on CentOS Linux 7.9.2009 due to glibc_2.17 and I got the following:

/StarCraftII/Versions/Base75689/SC2_x64: /usr/lib64/libc.so.6: version 'GLIBC_2.18' not found (required by/StarCraftII/Libs/libstdc++.so.6)

Could you please provide your operating environment info? Thanks!

TypeError: 'Boost.Python.class' object is not iterable

Traceback (most recent call last):
  File "src/main.py", line 14, in <module>
    from run import REGISTRY as run_REGISTRY
  File "/workspace/pymarl2/src/run/__init__.py", line 1, in <module>
    from .run import run as default_run
  File "/workspace/pymarl2/src/run/run.py", line 12, in <module>
    from learners import REGISTRY as le_REGISTRY
  File "/workspace/pymarl2/src/learners/__init__.py", line 4, in <module>
    from .ppo_learner import PPOLearner
  File "/workspace/pymarl2/src/learners/ppo_learner.py", line 3, in <module>
    from controllers.n_controller import NMAC
  File "/workspace/pymarl2/src/controllers/__init__.py", line 3, in <module>
    from .basic_controller import BasicMAC
  File "/workspace/pymarl2/src/controllers/basic_controller.py", line 2, in <module>
    from components.action_selectors import REGISTRY as action_REGISTRY
  File "/workspace/pymarl2/src/components/action_selectors.py", line 1, in <module>
    from matplotlib.pyplot import xcorr
  File "/root/anaconda3/envs/pymarl/lib/python3.8/site-packages/matplotlib/__init__.py", line 129, in <module>
    from . import _api, _version, cbook, _docstring, rcsetup
  File "/root/anaconda3/envs/pymarl/lib/python3.8/site-packages/matplotlib/cbook/__init__.py", line 2048, in <module>
    class _OrderedSet(collections.abc.MutableSet):
  File "/root/anaconda3/envs/pymarl/lib/python3.8/abc.py", line 85, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
TypeError: 'Boost.Python.class' object is not iterable

Hello,运行README中的样例时报了这个错,请问该怎么解决呢?

COMA和Qplex的一些问题

首先非常感谢作者的代码分享。
第一点:COMA算法在运行8m_vs_9m时也遇到了NAN的问题,如下:
ValueError: Expected parameter probs (Tensor of shape (107520, 15)) of distribution Categorical(probs: torch.Size([107520, 15])) to satisfy the constraint Simplex(), but found invalid values:
tensor([[0.0000, 0.2148, 0.2207, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.2108, 0.2247, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.2141, 0.2229, ..., 0.0000, 0.0000, 0.0000],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]],
device='cuda:0', grad_fn=)
第二点:
Qplex在8m_vs_9m任务中,训练后期出现效果下坠的现象,请问是算法本身的问题,还是超参数没有调整后呢?
1

COMA出现nan

issue30中相同情况
但按照您的回复加大entropy loss无法解决
事实上这一步的错误在第一次训练计算entropy loss之前就会出现,修改entropy loss无效
具体出现错误的位置coma_learner的80行 dist_entropy = Categorical(pi).entropy().view(-1)

和pymarl进行对比,pymarl的coma算法的config中mask_before_softmax是FALSE,本仓库中是TRUE, 推测是这导致的问题
然后尝试将basic_controller.py中疑似导致nan的41行agent_outs[reshaped_avail_actions == 0] = -1e10修改后,不再出现nan

nan的问题应该就是这里导致的了

关于test_won_rates

您好,有一个小问题请教您,test_won_rates 为什么对应的时间步每次都不一样呢,如果这样画图的时候应该怎么处理呢?

AttributeError: 'TracebackException' object has no attribute 'exc_traceback' and RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

when i run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor,the result is an error.

Traceback (most recent call last):
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
    return self.run(
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "src/main.py", line 38, in my_main
    run_REGISTRY[_config['run']](_run, config, _log)
  File "/home/jindingquan/pymarl2-master/src/run/run.py", line 54, in run
    run_sequential(args=args, logger=logger)
  File "/home/jindingquan/pymarl2-master/src/run/run.py", line 114, in run_sequential
    buffer = ReplayBuffer(scheme, groups, args.buffer_size, env_info["episode_limit"] + 1,
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 209, in __init__
    super(ReplayBuffer, self).__init__(scheme, groups, buffer_size, max_seq_length, preprocess=preprocess, device=device)
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 28, in __init__
    self._setup_data(self.scheme, self.groups, batch_size, max_seq_length, self.preprocess)
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 75, in _setup_data
    self.data.transition_data[field_key] = th.zeros((batch_size, max_seq_length, *shape), dtype=dtype, device=self.device)
RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/main.py", line 112, in <module>
    ex.run_commandline(params)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
    print_filtered_stacktrace()
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
    print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
    return "".join(filtered_traceback_format(tb_exception))
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
    current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'

how to fix it.Please help!!!!!!!!

KeyError: 'gfootball'

I wanted to run Google's football and found that an error was reported. The final error can be solved through sacred, but why is KeyError: 'gfootball' reported?
`pymarl Failed after 0:00:00!
Traceback (most recent call last):
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 263, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 212, in run
run()
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 233, in call
self.result = self.main_function(*args)
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 48, in captured_function
result = wrapped(*args, **kwargs)
File "src/main.py", line 38, in my_main
run_REGISTRY[_config['run']](_run, config, _log)
File "/home/zhq/cheng/pymarl2-master/src/run/run.py", line 54, in run
run_sequential(args=args, logger=logger)
File "/home/zhq/cheng/pymarl2-master/src/run/run.py", line 85, in run_sequential
runner = r_REGISTRY[args.runner](args=args, logger=logger)
File "/home/zhq/cheng/pymarl2-master/src/runners/parallel_runner.py", line 20, in init
env_fn = env_REGISTRY[self.args.env]
KeyError: 'gfootball'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "src/main.py", line 112, in
ex.run_commandline(params)
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 290, in run_commandline
print_filtered_stacktrace()
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 482, in print_filtered_stacktrace
print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 516, in format_filtered_stacktrace
return ''.join(filtered_traceback_format(tb_exception))
File "/home/zhq/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 562, in filtered_traceback_format
current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'
`

动作探索选择的问题

我比较了一下 pymarl 和 pymarl2 的代码

发现在 pymarl 的 basic_controller.py 中的这个动作探索选择
https://github.com/oxwhirl/pymarl/blob/c971afdceb34635d31b778021b0ef90d7af51e86/src/controllers/basic_controller.py#L40-L48

if not test_mode:
    # Epsilon floor
    epsilon_action_num = agent_outs.size(-1)
    if getattr(self.args, "mask_before_softmax", True):
        # With probability epsilon, we will pick an available action uniformly
        epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float()

    agent_outs = ((1 - self.action_selector.epsilon) * agent_outs
                   + th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)

被移动到了 action_selectors.py 中

epsilon_action_num = (avail_actions.sum(-1, keepdim=True) + 1e-8)
masked_policies = ((1 - self.epsilon) * masked_policies
+ avail_actions * self.epsilon/epsilon_action_num)
masked_policies[avail_actions == 0] = 0

而且计算方式貌似在是否mask上有所不同,请问一下为什么要这样改动哇

咨询两个问题

作者您好,我从‘starry...’那,被您评论来。之前一开始选择星际争霸开源代码,由于原版的注释太少,才选择'starry...'。
1.现在运行qmix,是都已经微调过的吗?
2.maps的由来,请问Q_plex里的类似7sz map是怎么来的?在下载星际争霸的网页上,没看到这个地图,您这里也有主动提供新地图。

win10

Hi, when I run the code in win10, it occurs that:

def recursive_dict_update(d, u):
for k, v in u.items():

Exception has occurred: AttributeError
'NoneType' object has no attribute 'items'
File "D:\src\main.py", line 55, in recursive_dict_update
for k, v in u.items():
File "D:\src\main.py", line 95, in
config_dict = recursive_dict_update(config_dict, env_config)

Do you know how to solve it? Thanks.

策略迭代问题

你好,我在你们的文章中看到S=EPI。 where S is the total number of samples, E is the number of samples in each episode,
P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次

关于QMIX在5m_vs_6m上训练的问题

你好,请问一下,如果直接采用QMIX(config=qmix.yaml),在5m_vs_6m上训练,需要修改qmix.yaml的文件参数吗?我们采用仓库提供的版本直接在5m_vs_6m上训练。test_won_rate大概在0.2左右,但是论文中可以达到0.9,想请教一下是哪里的问题
image

difference between parallel_runner and episode_runner?

Hello! I found that many SMAC series papers emphasize the use of 8 parallel environment sampling in their experimental part, and in the open source code of some papers, it is found that many researchers cannot reproduce the results of the paper through parallel_runner and batch_size_run == 8, but need to use episode_runner to reproduce the results, I would like to ask what is the difference between the two and why the better performance in some scenarios comes from episode_runner?
I am looking forward to your reply, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.