We provide the training data by using MADDPG in 8 MPE scenarios. Pictures, testing data (in ./benchmark_files/), training data (in ./learning_curves/), trained models (in ./models/) are all included.
List of scenarios
Coop s1 - "simple_reference", (No Formal Name), SameR Coop
Coop s2 - "simple_speaker_listener", (Cooperative communication), SameR Coop
Coop s3 - "simple_spread", (Cooperative navigation), SameR Coop
Comp s4 - "simple_adversary", (Physical deception), Non-zerosum Comp
Comp s5 - "simple_crypto", (Covert communication), Zerosum Comp
Comp s6 - "simple_push", (Keep-away), Non-zerosum Comp
Comp s7 - "simple_tag", (Predator-prey), Non-zerosum Comp
Coop&Comp s8 - "simple_world_comm" (No Formal Name), Non-zerosum Comp, SameR Coop, DiifR Coop
One can run the following commands.
python train.py --scenario simple_reference --save-dir models/s1/ma_s1_e20/ --exp-name ma_s1_e20 --benchmark
python train.py --scenario simple_speaker_listener --save-dir models/s2/ma_s2_e20/ --exp-name ma_s2_e20 --benchmark
python train.py --scenario simple_spread --save-dir models/s3/ma_s3_e20/ --exp-name ma_s3_e20 --benchmark
python train.py --scenario simple_adversary --save-dir models/s4/ma_s4_e20/ --exp-name ma_s4_e20 --benchmark
python train.py --scenario simple_crypto --save-dir models/s5/ma_s5_e20/ --exp-name ma_s5_e20 --benchmark
python train.py --scenario simple_push --save-dir models/s6/ma_s6_e20/ --exp-name ma_s6_e20 --benchmark
python train.py --scenario simple_tag --save-dir models/s7/ma_s7_e20/ --exp-name ma_s7_e20 --benchmark
python train.py --scenario simple_world_comm --save-dir models/s8/ma_s8_e20/ --exp-name ma_s8_e20 --benchmark
Status: Archive (code is provided as-is, no updates expected)
This is the code for implementing the MADDPG algorithm presented in the paper: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. It is configured to be run in conjunction with environments from the Multi-Agent Particle Environments (MPE). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.
Update: the original implementation for policy ensemble and policy estimation can be found here. The code is provided as-is.
-
To install,
cd
into the root directory and typepip install -e .
-
Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)
We demonstrate here how the code can be used in conjunction with the Multi-Agent Particle Environments (MPE).
-
Download and install the MPE code here by following the
README
. -
Ensure that
multiagent-particle-envs
has been added to yourPYTHONPATH
(e.g. in~/.bashrc
or~/.bash_profile
). -
To run the code,
cd
into theexperiments
directory and runtrain.py
:
python train.py --scenario simple
- You can replace
simple
with any environment in the MPE you'd like to run.
-
--scenario
: defines which environment in the MPE is to be used (default:"simple"
) -
--max-episode-len
maximum length of each episode for the environment (default:25
) -
--num-episodes
total number of training episodes (default:60000
) -
--num-adversaries
: number of adversaries in the environment (default:0
) -
--good-policy
: algorithm used for the 'good' (non adversary) policies in the environment (default:"maddpg"
; options: {"maddpg"
,"ddpg"
}) -
--adv-policy
: algorithm used for the adversary policies in the environment (default:"maddpg"
; options: {"maddpg"
,"ddpg"
})
-
--lr
: learning rate (default:1e-2
) -
--gamma
: discount factor (default:0.95
) -
--batch-size
: batch size (default:1024
) -
--num-units
: number of units in the MLP (default:64
)
-
--exp-name
: name of the experiment, used as the file name to save all results (default:None
) -
--save-dir
: directory where intermediate training results and model will be saved (default:"/tmp/policy/"
) -
--save-rate
: model is saved every time this number of episodes has been completed (default:1000
) -
--load-dir
: directory where training state and model are loaded from (default:""
)
-
--restore
: restores previous training state stored inload-dir
(or insave-dir
if noload-dir
has been provided), and continues training (default:False
) -
--display
: displays to the screen the trained policy stored inload-dir
(or insave-dir
if noload-dir
has been provided), but does not continue training (default:False
) -
--benchmark
: runs benchmarking evaluations on saved policy, saves results tobenchmark-dir
folder (default:False
) -
--benchmark-iters
: number of iterations to run benchmarking for (default:100000
) -
--benchmark-dir
: directory where benchmarking data is saved (default:"./benchmark_files/"
) -
--plots-dir
: directory where training curves are saved (default:"./learning_curves/"
)
-
./experiments/train.py
: contains code for training MADDPG on the MPE -
./maddpg/trainer/maddpg.py
: core code for the MADDPG algorithm -
./maddpg/trainer/replay_buffer.py
: replay buffer code for MADDPG -
./maddpg/common/distributions.py
: useful distributions used inmaddpg.py
-
./maddpg/common/tf_util.py
: useful tensorflow functions used inmaddpg.py
If you used this code for your experiments or found it helpful, consider citing the following paper:
@article{lowe2017multi, title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments}, author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor}, journal={Neural Information Processing Systems (NIPS)}, year={2017} }