cts198859 / deeprl_network Goto Github PK

multi-agent deep reinforcement learning for networked system control.

Python 66.53% Jupyter Notebook 33.34% Shell 0.13%

deeprl_network's Introduction

Networked Multi-agent RL (NMARL)

This repo implements the state-of-the-art MARL algorithms for networked system control, with observability and communication of each agent limited to its neighborhood. For fair comparison, all algorithms are applied to A2C agents, classified into two groups: IA2C contains non-communicative policies which utilize neighborhood information only, whereas MA2C contains communicative policies with certain communication protocols.

Available IA2C algorithms:

Available MA2C algorithms:

Available NMARL scenarios:

ATSC Grid: Adaptive traffic signal control in a synthetic traffic grid.
ATSC Monaco: Adaptive traffic signal control in a real-world traffic network from Monaco city.
CACC Catch-up: Cooperative adaptive cruise control for catching up the leadinig vehicle.
CACC Slow-down: Cooperative adaptive cruise control for following the leading vehicle to slow down.

Requirements

Python3 == 3.5
Tensorflow == 1.12.0
SUMO >= 1.1.0

Usages

First define all hyperparameters (including algorithm and DNN structure) in a config file under [config_dir] (examples), and create the base directory of each experiement [base_dir]. For ATSC Grid, please call build_file.py to generate SUMO network files before training.

To train a new agent, run

python3 main.py --base-dir [base_dir] train --config-dir [config_dir]

Training config/data and the trained model will be output to [base_dir]/data and [base_dir]/model, respectively.

To access tensorboard during training, run

tensorboard --logdir=[base_dir]/log

To evaluate a trained agent, run

python3 main.py --base-dir [base_dir] evaluate --evaluation-seeds [seeds]

Evaluation data will be output to [base_dir]/eva_data. Make sure evaluation seeds are different from those used in training.

To visualize the agent behavior in ATSC scenarios, run

python3 main.py --base-dir [base_dir] evaluate --evaluation-seeds [seed] --demo

It is recommended to use only one evaluation seed for the demo run. This will launch the SUMO GUI, and view.xml can be applied to visualize queue length and intersectin delay in edge color and thickness.

Reproducibility

The paper results are based on an out-of-date SUMO version 0.32.0. We have re-run the ATSC experiments with SUMO 1.2.0 using the master code, and provided the following training plots as reference. The paper conclusions remain the same.

Grid	Monaco

The pytorch impelmention is also avaliable at branch pytorch.

Citation

For more implementation details and underlying reasonings, please check our paper Multi-agent Reinforcement Learning for Networked System Control.

@inproceedings{
chu2020multiagent,
title={Multi-agent Reinforcement Learning for Networked System Control},
author={Tianshu Chu and Sandeep Chinchali and Sachin Katti},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=Syx7A3NFvH}
}

deeprl_network's People

Contributors

Stargazers

Watchers

Forkers

murtazarang dogordog spuronlee zhengbwcetc zhangmwg minalspatil zeroun yanyuema manueldonsante ssinghzar majadoon jeremydouglas91 aladinoster enoorani zhiyongc shitianyu-hue anirudhajitani nguyentrihai93 lamperougeyxy dongchen06 baojialiustc marl-cee-uw sandguine wanghuimu hlhsu bututoubaobei arm-comal esiseraj lorinchen tanxiangtj rainwangphy reinholdm tianqi-777 ynuwm limount genyoung qiu1234567 testmonkey02 wuao652 supershun1978 michaelperl hilbert521 jackory ecustboy jordiluque miracle1207 yandazhu0925 dong4325 josephthinhtran fb1n15 yangfengwxy aaronanima kingsvalley blankslide zzfoutofspace ericschuma yyds-xtt ancerhaides skydvn ljp-luo hell-to-heaven qiaowenchuan yaozhang-nwpu xyua0528 yining20 zhangtjtongxue yukimura0119 moumuyun muyun1996 mmatthews06 hejichao2020 yuanzhi0515 lstar939699 chenbindeng x-yang1021 milkigit pinkmoon-io mnaveed2021 projecttopstep avg-indian-coder shenjiede vishwajithsandaru toksjazz babylong123 dtbinh wyq199321 qst75693 mak2508 ahmad-573 cg904699855 zhhangbian

deeprl_network's Issues

Can you explain a bit about the Largegrid Env?

Thanks for sharing the code. It really helps me understand the paper and algorithms. But I can't really figure out what the largegrid Env is about? Is this an environment that others define in paper?If no, can you do me a favor and explain this a bit. Otherwise, may you point out what the paper is.
Thanks again for offering your hands.

Could run this framework on custom environment? How?

FingerPrint algorithm

hello,

For the FingerPrint, in original paper Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning, it use iteration number and annulling rate as the fingerprint.

but in your code, i cannot recognize which fingerprint you are using, and it seems you don't update the fingerprint. Could you help to understand that?

Delay time and queue length

Thank you for sharing your code

But i did not see the code to create queue length and delay time as shown in your paper

具体demo运行示例

运行代码，按照步骤来的，但第一句就说什么是不对，总说少参数

CPU or gpu?

Hello, thank you for sharing the code. The code seems to be running on the CPU and is very slow. Can I use GPU acceleration? What should I do if possible?

Errors pop out when running in ATSC net environment

There is sth wrong with the package "traci" and it stopped the training all the time.

2020-09-26 11:16:41,135 [INFO] Training: a dim [6, 4, 2, 2, 2, 4, 2, 4, 2, 5, 2, 2, 4, 2, 2, 4, 2, 3, 6, 3, 2, 4, 4, 4, 4, 4, 6, 3], agent dim: 28
2020-09-26 11:16:41,136 [INFO] Use cpu for pytorch...
2020-09-26 11:16:41,208 [ERROR] Can not find checkpoint for /home/liubo/deeprl_net/ia2c_net_1.0/model/
Loading configuration... done.
Error: Answered with error to command 0xc2: The phase duration must be given as an integer.
Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 104, in train
trainer.run()
File "/home/liubo/deeprl_network/utils.py", line 218, in run
ob, done, R = self.explore(ob, done)
File "/home/liubo/deeprl_network/utils.py", line 156, in explore
next_ob, reward, done, global_reward = self.env.step(action)
File "/home/liubo/deeprl_network/envs/atsc_env.py", line 182, in step
self._set_phase(action, 'yellow', self.yellow_interval_sec)
File "/home/liubo/deeprl_network/envs/atsc_env.py", line 516, in _set_phase
self.sim.trafficlight.setPhaseDuration(node_name, phase_duration)
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/_trafficlight.py", line 283, in setPhaseDuration
tc.CMD_SET_TL_VARIABLE, tc.TL_PHASE_DURATION, tlsID, phaseDuration)
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/connection.py", line 141, in _sendDoubleCmd
self._sendExact()
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/connection.py", line 109, in _sendExact
raise TraCIException(err, prefix[1], _RESULTS[prefix[2]])
traci.exceptions.TraCIException: The phase duration must be given as an integer.
Error: tcpip::Socket::recvAndCheck @ recv: peer shutdown
Quitting (on error).

continuous action spaces

Thanks for your good contribution, Can all the algorithms be used only in discrete action spaces and not in continuous action spaces? If I want to use these algorithms in continuous action space, how should I modify them?
Thanks!

size mismatch in `config_ia2c_grid` PyTorch implementation

When I use the PyTorch implementation to train with config_ia2c_grid, I encounter this bug

Could anyone give me some feedback? Thanks!

Could you share the requirements.txt file?

Could you tell me which version of Python, Tensorflow you are using for the code?

I am trying to run the basic CCAC setup without Sumo and I am getting the following errors:
Traceback (most recent call last):
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 490, in apply_op
preferred_dtype=default_dtype)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 614, in _TensorTensorConversionFunction
% (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype bool for Tensor with dtype int64: 'Tensor("nc/boolean_mask/Reshape_1:0", shape=(8,), dtype=int64)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 99, in train
model = init_agent(env, config['MODEL_CONFIG'], total_step, seed)
File "main.py", line 63, in init_agent
total_step, config, seed=seed)
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 196, in init
total_step, seed, model_config)
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 110, in _init_algo
self.policy = self._init_policy()
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 240, in _init_policy
self.neighbor_mask, n_fc=self.n_fc, n_h=self.n_lstm)
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 198, in init
self._init_policy(n_agent, neighbor_mask, n_h)
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 329, in _init_policy
self.pi_fw, self.v_fw, self.new_states = self._build_net('forward')
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 287, in _build_net
h, new_states = lstm_comm(ob, policy, done, self.neighbor_mask, self.states, 'lstm_comm')
File "/network/home/jitanian/thesis/deeprl_network/agents/utils.py", line 192, in lstm_comm
mi = tf.expand_dims(tf.reshape(tf.boolean_mask(out_m, masks[i]), [-1]), axis=0)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1117, in boolean_mask
return _apply_mask_1d(tensor, mask)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1089, in _apply_mask_1d
indices = squeeze(where(mask), squeeze_dims=[1])
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2326, in where
return gen_array_ops.where(input=condition, name=name)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3824, in where
result = _op_def_lib.apply_op("Where", input=input, name=name)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 513, in apply_op
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'input' of 'Where' Op has type int64 that does not match expected type of bool.

The pytorch implementation cannot use gpu to train

I want to train it on GPU. But I see the code can only train on CPU as default. So I change the parameter: use_gpu=True as follows.

But then, I got this error:

So how can I fix this bug? Thank you very much

How is the traffic distribution graph drawn？

Sorry to bother you, Can you tell me how to draw this graph, I haven't found the source code in the repository.Thank you very much.

setup_sumo.h failed on ubuntu 18.04.3 LTS

libtool: link: g++-5 -Wall -Wformat -Woverloaded-virtual -Wshadow -O2 -DNDEBUG -Wuninitialized -ffast-math -fstrict-aliasing -finline-functions -fomit-frame-pointer -fexpensive-optimizations -DHAVE_JPEG_H=1 -DHAVE_PNG_H=1 -DHAVE_TIFF_H=1 -DHAVE_ZLIB_H=1 -DHAVE_BZ2LIB_H=1 -DHAVE_XFT_H=1 -I/usr/include/freetype2 -DHAVE_XSHM_H=1 -DHAVE_XSHAPE_H=1 -DHAVE_XCURSOR_H=1 -DHAVE_XRENDER_H=1 -DHAVE_XRANDR_H=1 -DHAVE_XFIXES_H=1 -DHAVE_XINPUT_H=1 -DNO_XIM -DHAVE_GLU_H=1 -DHAVE_GL_H=1 -o .libs/chart chart.o icons.o ./.libs/libCHART-1.6.so /tmp/fox-20210117-6371-x26beb/fox-1.6.56/src/.libs/libFOX-1.6.so ../src/.libs/libFOX-1.6.so -lX11 -lXext /usr/lib/x86_64-linux-gnu/libfreetype.so -lfontconfig -lXft -lXcursor -lXrender -lXrandr -lXfixes -lXi -lm -ldl -lpthread -lrt -ljpeg -lpng -ltiff -lz -lbz2 -lGLU -lGL -Wl,-rpath -Wl,/home/linuxbrew/.linuxbrew/Cellar/fox/1.6.56_2/lib
/home/linuxbrew/.linuxbrew/bin/ld: /home/linuxbrew/.linuxbrew/lib/libfontconfig.so: undefined reference to `FT_Done_MM_Var'

ubuntu@ubuntu-intel-nuc:~/deeprl_network$ nm -D /home/linuxbrew/.linuxbrew/lib/libfontconfig.so | grep FT_Done_MM_Var
U FT_Done_MM_Var

why are the reward norm different between different model?

For example,
In config_ia2c_catchup.ini, the reward_norm is 800
while in config_ma2c_dial_catchup.ini, the reward_norm is 5000

Whether manual parallel sampling will cause problem with LSTM design

Hi, since sumo is too slow and do not support parallel sampling as we know, we are trying to manually construct several parallel envs during training with sumo as the core each, following a serial manner. It seems like this becomes an off-policy training process since samples from several envs are collected. While my concern is whether this will disturb the LSTM since it records global hidden states of a single env.
If we want to end up with a parallel sampling manner, is asynchronous sampling necessary?

size mismatch when running environent

When I run the given models in ATSC Monaco(didn't change any code), error appears as follow:
'RuntimeError: size mismatch, m1: [1 x 56], m2: [48 x 64] at /opt/conda/conda-bld/pytorch_1579022071601/work/aten/src/TH/generic/THTensorMath.cpp:136'

the complete error is:
'
Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 104, in train
trainer.run()
File "/home/ziqi/deeprl_network/utils.py", line 218, in run
ob, done, R = self.explore(ob, done)
File "/home/ziqi/deeprl_network/utils.py", line 151, in explore
policy, action = self._get_policy(ob, done)
File "/home/ziqi/deeprl_network/utils.py", line 115, in _get_policy
policy = self.model.forward(ob, done, self.ps)
File "/home/ziqi/deeprl_network/agents/models.py", line 271, in forward
actions, out_type)
File "/home/ziqi/deeprl_network/agents/policies.py", line 221, in forward
h, new_states = self._run_comm_layers(ob, done, fp, self.states_fw)
File "/home/ziqi/deeprl_network/agents/policies.py", line 341, in _run_comm_layers
s_i = self._get_comm_s(i, n_n, x, h, p)
File "/home/ziqi/deeprl_network/agents/policies.py", line 486, in _get_comm_s
return F.relu(self.fc_x_layers[i](torch.cat([x[i].unsqueeze(0), nx_i], dim=1))) +
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
'