anujmahajanoxf / maven Goto Github PK

View Code? Open in Web Editor NEW

56.0 7.0 21.0 5.04 MB

Submission for MAVEN: Multi-Agent Variational Exploration

Dockerfile 0.79% Shell 0.96% Python 98.26%

maven's Introduction

MAVEN

Code, poster and slides for MAVEN: Multi-Agent Variational Exploration, NeurIPS 2019.

The paper can be found at https://arxiv.org/abs/1910.07483.

Citation

Please use the following bibtex entry for citation:

@inproceedings{mahajan2019maven,
  title={MAVEN: Multi-Agent Variational Exploration},
  author={Mahajan, Anuj and Rashid, Tabish and Samvelyan, Mikayel and Whiteson, Shimon},
  booktitle={Advances in Neural Information Processing Systems},
  pages={7611--7622},
  year={2019}
}

maven's People

Contributors

Stargazers

Watchers

maven's Issues

Training 6h_vs_8z failed

Hi, I tried to run the code on 6h_vs_8z and failed. It shows something wrong in the starcraft2.py. And I also found 6h_vs_8z is not a registered game in your code. Could you please help me to fix it?

PySC2==3.0

Traceback (most recent call last):
  File "/home/me/anaconda3/envs/sc_ray/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/me/anaconda3/envs/sc_ray/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/me/maven/maven_code/src/runners/parallel_runner.py", line 333, in env_worker
    env.reset()
  File "/home/me/maven/maven_code/src/envs/starcraft2/starcraft2.py", line 270, in reset
    return self.get_obs(), self.get_state()
  File "/home/me/maven/maven_code/src/envs/starcraft2/starcraft2.py", line 878, in get_state
    ally_state[al_id, 1] = al_unit.weapon_cooldown / max_cd # cooldown
TypeError: unsupported operand type(s) for /: 'float' and 'NoneType'

Error when loading environment

Hi, I saw the following error when the code runs:

"Process Process-1:
Traceback (most recent call last):
File "/home/jovyan/miniconda3/envs/pytorch/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/jovyan/miniconda3/envs/pytorch/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/jovyan/MAVEN/maven_code/src/runners/parallel_runner.py", line 311, in env_worker
env = env_fn.x()
File "/home/jovyan/MAVEN/maven_code/src/envs/init.py", line 7, in env_fn
return env(**kwargs)
File "/home/jovyan/MAVEN/maven_code/src/envs/starcraft2/starcraft2.py", line 142, in init
self._launch()
File "/home/jovyan/MAVEN/maven_code/src/envs/starcraft2/starcraft2.py", line 206, in _launch
self._sc2_proc = self._run_config.start(version=self.game_version, window_size=self.window_size)
File "/home/jovyan/miniconda3/envs/pytorch/lib/python3.6/site-packages/pysc2/run_configs/platforms.py", line 205, in start
want_rgb=want_rgb, extra_args=extra_args, **kwargs)
File "/home/jovyan/miniconda3/envs/pytorch/lib/python3.6/site-packages/pysc2/run_configs/platforms.py", line 88, in start
self, exec_path=exec_path, version=self.version, **kwargs)
TypeError: type object got multiple values for keyword argument 'version'"

Do you have any idea on this? Also can you share your "requirement.txt" so I can know which version of the tool I should use? Many thanks!

run error

Hi, could you please help me to solve the problem? It's happened when I test code 'python3 src/main.py --config=noisemix_episode --env-config=sc2 with env_args.map_name=3s5z'

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 132, 8, 14]], which is output 0 of SliceBackward, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The running configurations in the paper

Hello,

Thanks for your codes. I'm running the codes with noisemix_smac.yaml in algs and sc2.yaml in envs.

However, it is hard to reproduce the MAVEN's results in the corridor and 6h_vs_8z.

Could you please provide the running configurations of MAVEN in your paper.

Thanks!

Docker setup

Thanks for the very inspiring paper and code. Following your suggestion, I tried to run your code in the provided docker image. However, when I run
bash run.sh $GPU python3 src/main.py --config=noisemix_episode --env-config=sc2 with env_args.map_name=3s5z,
I get error:
Launching container named 'iliu3_python3_1204202011_0_SlUA' on GPU '0' standard_init_linux.go:207: exec user process caused "exec format error".

I am able to run the code of pymarl (https://github.com/oxwhirl/pymarl) repo with the docker image they provided.

Thanks.

TypeError: type object got multiple values for keyword argument 'version'

I pip installed the dependencies fromrequirements.txt of PYMARL on python3.6 and I don't think there is a problem with dependencies, but I keep receiving the following error when I run the experiment python src/main.py --config=noisemix_episode --env-config=sc2 with env_args.map_name=3s5z

Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mop216/MAVEN/maven_code/src/runners/parallel_runner.py", line 311, in env_worker
    env = env_fn.x()
  File "/home/mop216/MAVEN/maven_code/src/envs/__init__.py", line 7, in env_fn
    return env(**kwargs)
  File "/home/mop216/MAVEN/maven_code/src/envs/starcraft2/starcraft2.py", line 142, in __init__
    self._launch()
  File "/home/mop216/MAVEN/maven_code/src/envs/starcraft2/starcraft2.py", line 206, in _launch
    self._sc2_proc = self._run_config.start(version=self.game_version, window_size=self.window_size)
  File "/home/mop216/MAVENN/lib64/python3.6/site-packages/pysc2/run_configs/platforms.py", line 205, in start
    want_rgb=want_rgb, extra_args=extra_args, **kwargs)
  File "/home/mop216/MAVENN/lib64/python3.6/site-packages/pysc2/run_configs/platforms.py", line 88, in start
    self, exec_path=exec_path, version=self.version, **kwargs)
TypeError: type object got multiple values for keyword argument 'version'

Please note that --config=noise_qmix_parallel --env-config=nmatrix configuration runs properly and uses nmatrix environment. I think there goes something wrong with the starcraft2.py which I cannot figure it out.

Could you please give a baseline model so that we could see the result intuitively?

Hi,
I just setup the environment and everything worked fine! Thanks for the great work~

But I wonder if you could provide us with the baseline model so we could use it without retraining. Because the training process on those super hard maps takes a really long time.

Thanks again and I am glad for your reply~

the code of 4step and 3step map

May I know the code of environment "3step" and "4step" ? I only find map file here but I do not know the reward and environment configuration of both map in neither paper or code.

Help for the configuration of nstep_matrix game.

Hello,

Can you share MAVEN's .ymal file for the n_step matrix in the paper?

Some parameters are missing in the plot_keys/keys_matrix_games and I'm not sure my current setting is the same as yours.

Thank you very much!

How can I reproduce the results of Corridor

Hi, which config and command should I use to train the results of Corridor?

Gradient error on noisemix

I am trying to reproduce your results and I have an issue running the code (not running on a docker). The backpropagation fails with the following error:

[ERROR 13:59:02] pymarl Failed after 0:00:23!
Traceback (most recent calls WITHOUT Sacred internals):
  File "src/main.py", line 34, in my_main
    run(_run, _config, _log)
  File "/home/USER/.tmp/MAVEN/maven_code/src/run.py", line 48, in run
    run_sequential(args=args, logger=logger)
  File "/home/USER/.tmp/MAVEN/maven_code/src/run.py", line 181, in run_sequential
    learner.train(episode_sample, runner.t_env, episode)
  File "/home/USER/.tmp/MAVEN/maven_code/src/learners/noise_q_learner.py", line 168, in train
    loss.backward()
  File "/home/USER/.general_env/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/USER/.general_env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 97, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 56, 3, 9]], which is output 0 of SliceBackward, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3.8/subprocess.py", line 1079, in wait
    return self._wait(timeout=timeout)
  File "/usr/lib/python3.8/subprocess.py", line 1796, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['tee', '-a', '/tmp/tmp8fr28bt_']' timed out after 1 seconds

I have spent some time searching for the error and came up with the following:

In the noise QLearner, the latest version of PyMARL shows that the target_max_qvals should be computed from a detached version of mac_out in the train function, section "Max over target Q-Values"
Here I am not sure at all, but computing q_softmax_actions with the detached mac_out removes the backprop error. However, I don't know if the MI loss is correctly backpropagated afterwards.

This is what I changed to have an up and running code, in the train function of src/learners/noise_q_learner.py:

        # Max over target Q-Values
        if self.args.double_q:
            mac_out_detach = mac_out.clone().detach()
            mac_out_detach[avail_actions == 0] = -9999999
            cur_max_actions = mac_out_detach[:, 1:].max(dim=3, keepdim=True)[1]
            target_max_qvals = torch.gather(
                target_mac_out, 3, cur_max_actions
            ).squeeze(3)
            # Get actions that maximise live Q (for double q-learning)
            #mac_out[avail_actions == 0] = -9999999
            #cur_max_actions = mac_out[:, 1:].max(dim=3, keepdim=True)[1]
            #target_max_qvals = th.gather(target_mac_out, 3, cur_max_actions).squeeze(3)
        else:
            target_max_qvals = target_mac_out.max(dim=3)[0]

        # Discriminator
        mac_out_detach = mac_out.clone().detach()
        mac_out_detach[avail_actions == 0] = -9999999
        q_softmax_actions = torch.nn.functional.softmax(
            mac_out_detach[:, :-1], dim=3
        )
        #mac_out[avail_actions == 0] = -9999999
        #q_softmax_actions = th.nn.functional.softmax(mac_out[:, :-1], dim=3)

Can you tell me if these changes are ok and if not how the gradient propagation should be fixed? I assume the proper way to fix the discriminator backprop problem would be to remove the part of the target that is in line with unavailable actions. I will keep looking for that in the code but I would really like to have your input.

anujmahajanoxf / maven Goto Github PK

maven's Introduction

MAVEN

Citation

maven's People

Contributors

Stargazers

Watchers

Forkers

maven's Issues

Recommend Projects

Recommend Topics

Recommend Org