Code Monkey home page Code Monkey logo

sarnet's Introduction

Structured Attentive Reasoning Network (SARNet)

Code repository for Learning Multi-Agent Communication through Structured Attentive Reasoning

Cite

If you use this code please consider citing SARNet

@inproceedings{rangwala2020learning,
 author = {Rangwala, Murtaza and Williams, Ryan},
 booktitle = {Advances in Neural Information Processing Systems},
 pages = {10088--10098},
 title = {Learning Multi-Agent Communication through Structured Attentive Reasoning},
 url = {https://proceedings.neurips.cc/paper/2020/file/72ab54f9b8c11fae5b923d7f854ef06a-Paper.pdf},
 volume = {33},
 year = {2020}
}

Installation

  • To install, cd into the root directory and type pip install -e .

  • Known dependencies: Python (3.5.4+), OpenAI gym (0.10.5), tensorflow (1.14.0)

Install my implementation of [Multi-Agent Particle Environments (MPE)] included in this repository. (https://github.com/openai/multiagent-particle-envs), given in the repository

  • cd into multiagent-particle-envs and type pip install -e .

Install my implementation of [Traffic Junction] included in this repository. (https://github.com/IC3Net/IC3Net/tree/master/ic3net-envs), given in the repository

  • cd into ic3net-envs and type python setup.py develop

Architectures Implemented

Use the following architecture names for --adv-test and --good-test, to define the agents communication. Adversarial agents are the default agents for fully-cooperative environments, i.e. good agents are only used for competing environments.

  • SARNet: --adv-test SARNET or --good-test SARNET

  • TarMAC: --adv-test TARMAC or --good-test TARMAC

  • CommNet: --adv-test COMMNET or --good-test COMMNET

  • IC3Net: --adv-test IC3NET or --good-test IC3NET

  • MADDPG: --adv-test DDPG or --good-test DDPG

To use MAAC-type Critic

  • MAAC: --adv-critic-model MAAC or --gd-critic-model MAAC

Environments

For multi-agent particle environment: Parse the following arguments --env-type: takes in the following environment arguments.

  • Multi-Agent Particle Environemt: mpe

'--scenario': takes in the following environment arguments. For multi-agent particle environment use the following:

  • Predator-Prey with 3 vs 1: simple_tag_3
  • Predator-Prey with 6 vs 2: simple_tag_6
  • Predator-Prey with 12 vs 4: simple_tag_12
  • Predator-Prey with 15 vs 5: simple_tag_15
  • Cooperative Navigation with 3 agents: simple_spread_3
  • Cooperative Navigation with 6 agents: simple_spread_6
  • Cooperative Navigation with 10 agents: simple_spread_10
  • Cooperative Navigation with 20 agents: simple_spread_20
  • Physical Deception with 3 vs 1: simple_adversary_3
  • Physical Deception with 4 vs 2: simple_adversary_6
  • Physical Deception with 12 vs 4 agents: simple_adversary_12

For Traffic Junction -

  • Traffic Junction: --env-type ic3net --scenario traffic-junction

Specifying Number of Agents

Number of cooperating agents can be specified by --num-adversaries. For environments with competing agents, the code automatically accounts for the remaining "good" agents.

Training Policies

We support training through DDPG for continuous action spaces and REINFORCE for discrete action spaces. Parse the following arguments:

  • --policy-grad maddpg for continuous action spaces
  • --policy-grad reinforce for discrete action spaces

Additionally, in order to enable TD3, and recurrent trajectory updates use, --td3 and specify the trajectory length to make updates over by --len-traj-update 10

Recurrent Importance Sampling is enabled by --PER-sampling

Example Scripts

  • Cooperative Navigation with 6 SARNet Agents: python train.py --policy-grad maddpg --env-type mpe --scenario simple_spread_6 --num_adversaries 6 --key-units 32 --value-units 32 --query-units 32 --len-traj-update 10 --td3 --PER-sampling --encoder-model LSTM --max-episode-len 100

  • Traffic Junction with 6 SARNet Agents: python train.py --env-type ic3net --scenario traffic_junction --policy-grad reinforce --num-adversaries 6 --adv-test SARNET --gpu-device 0 --exp-name SAR-TJ6-NoCurrLr --max-episode-len 20 --num-env 50 --dim 6 --add_rate_min 0.3 --add_rate_max 0.3 --curr_start 250 --curr_end 1250 --num-episodes 500000 --batch-size 500 --difficulty easy --vision 0 --batch-size 500

References

Theano based abstractions from Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.

Segment Tree for PER OpenAI Baselines

Attention Based Abstractions/Operations MAC Network

sarnet's People

Contributors

murtazarang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sarnet's Issues

program interruption

Hello! I'm very sorry to bother you. Every time the program runs to about 1000 rounds, the system always automatically terminates the program. I guess the reason is that the computer configuration is too low. Later, even if I reduced the process to 10 and the batch_size to 64, it still couldn't run. 2000 rounds, what kind of server configuration do you use to run this program? Or do you know of any solution? I will be very grateful!

still cannot reproduce the results

I update my repository and run the following commands :
python train.py --policy-grad reinforce --env-type ic3net --scenario traffic_junction --num_adversaries 6 --key-units 32 --value-units 32 --query-units 32 --len-traj-update 10 --encoder-model LSTM --max-episode-len 20 --add_rate_min 0.3
and get bad results:
......
steps: 468000, episodes: 23400.0, mean episode success: 0.409841880341 8804, time: 718.005
steps: 472000, episodes: 23600.0, mean episode success: 0.411190677966 1017, time: 724.915
steps: 476000, episodes: 23800.0, mean episode success: 0.412306722689 07564, time: 731.739
steps: 480000, episodes: 24000.0, mean episode success: 0.413377083333 33334, time: 738.46
...Finished total of 24010.0 episodes.

I update the codes to tf2 so there might be bugs. Could you please show me your output?

Questions regarding the algorithm

I have a few question about your training algorithm:

  1. How are shared policy parameters updated? From my understanding, it seems you are updating them once in each agent that uses the shared params. Is this correct?
  2. How many rounds of communication are you performing per environment time step? I see only one round, as opposed to 2 or more rounds in TarMAC.
  3. In the policy optimization step, how do you compute the fresh actions that will be passed in to the q function? It seems that you are first taking the actions from the replay buffer and only replacing the action for the agent being optimized; is this correct? Also, when you compute this action, are you generating new messages in the forward pass through all the agents? My concern here is that there will be a mismatch between the fresh messages and the stale actions of the other agents.

Thanks for your help.

Cannot reproduce the results

Hello, I ran the following commands as mentioned in Readme.md :
python train.py --policy-grad reinforce --env-type ic3net --scenario traffic_junction --num-adversaries 6 --key-units 32 --value-units 32 --query-units 32 --len-traj-update 10 --encoder-model LSTM --max-episode-len 20 --add_rate_min 0.3 --num-env 200 --num-episodes 1000000
and after training for a long time, the output results were still bad:
steps: 19752000, episodes: 987600.0, mean episode success: 0.3341001923855812, time: 28891.135
Is there something wrong with my settings?

Query on GPU utilization for training

Hi,
Thank you for releasing the code base of the paper.
Could you provide some details on the computation time and computing setup used to produced the results for the paper ?
Does the code implementation support multiple gpu usage ?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.