Code Monkey home page Code Monkey logo

deeprl_network's Introduction

Networked Multi-agent RL (NMARL)

This repo implements the state-of-the-art MARL algorithms for networked system control, with observability and communication of each agent limited to its neighborhood. For fair comparison, all algorithms are applied to A2C agents, classified into two groups: IA2C contains non-communicative policies which utilize neighborhood information only, whereas MA2C contains communicative policies with certain communication protocols.

Available IA2C algorithms:

Available MA2C algorithms:

Available NMARL scenarios:

  • ATSC Grid: Adaptive traffic signal control in a synthetic traffic grid.
  • ATSC Monaco: Adaptive traffic signal control in a real-world traffic network from Monaco city.
  • CACC Catch-up: Cooperative adaptive cruise control for catching up the leadinig vehicle.
  • CACC Slow-down: Cooperative adaptive cruise control for following the leading vehicle to slow down.

Requirements

Usages

First define all hyperparameters (including algorithm and DNN structure) in a config file under [config_dir] (examples), and create the base directory of each experiement [base_dir]. For ATSC Grid, please call build_file.py to generate SUMO network files before training.

  1. To train a new agent, run
python3 main.py --base-dir [base_dir] train --config-dir [config_dir]

Training config/data and the trained model will be output to [base_dir]/data and [base_dir]/model, respectively.

  1. To access tensorboard during training, run
tensorboard --logdir=[base_dir]/log
  1. To evaluate a trained agent, run
python3 main.py --base-dir [base_dir] evaluate --evaluation-seeds [seeds]

Evaluation data will be output to [base_dir]/eva_data. Make sure evaluation seeds are different from those used in training.

  1. To visualize the agent behavior in ATSC scenarios, run
python3 main.py --base-dir [base_dir] evaluate --evaluation-seeds [seed] --demo

It is recommended to use only one evaluation seed for the demo run. This will launch the SUMO GUI, and view.xml can be applied to visualize queue length and intersectin delay in edge color and thickness.

Reproducibility

The paper results are based on an out-of-date SUMO version 0.32.0. We have re-run the ATSC experiments with SUMO 1.2.0 using the master code, and provided the following training plots as reference. The paper conclusions remain the same.

Grid Monaco

The pytorch impelmention is also avaliable at branch pytorch.

Citation

For more implementation details and underlying reasonings, please check our paper Multi-agent Reinforcement Learning for Networked System Control.

@inproceedings{
chu2020multiagent,
title={Multi-agent Reinforcement Learning for Networked System Control},
author={Tianshu Chu and Sandeep Chinchali and Sachin Katti},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=Syx7A3NFvH}
}

deeprl_network's People

Contributors

cts198859 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeprl_network's Issues

Can you explain a bit about the Largegrid Env?

Thanks for sharing the code. It really helps me understand the paper and algorithms. But I can't really figure out what the largegrid Env is about? Is this an environment that others define in paper?If no, can you do me a favor and explain this a bit. Otherwise, may you point out what the paper is.
Thanks again for offering your hands.

FingerPrint algorithm

hello,

For the FingerPrint, in original paper Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning, it use iteration number and annulling rate as the fingerprint.

but in your code, i cannot recognize which fingerprint you are using, and it seems you don't update the fingerprint. Could you help to understand that?

Delay time and queue length

Thank you for sharing your code

But i did not see the code to create queue length and delay time as shown in your paper

image

具体demo运行示例

运行代码,按照步骤来的,但第一句就说什么是不对,总说少参数

CPU or gpu?

Hello, thank you for sharing the code. The code seems to be running on the CPU and is very slow. Can I use GPU acceleration? What should I do if possible?

Errors pop out when running in ATSC net environment

There is sth wrong with the package "traci" and it stopped the training all the time.

2020-09-26 11:16:41,135 [INFO] Training: a dim [6, 4, 2, 2, 2, 4, 2, 4, 2, 5, 2, 2, 4, 2, 2, 4, 2, 3, 6, 3, 2, 4, 4, 4, 4, 4, 6, 3], agent dim: 28
2020-09-26 11:16:41,136 [INFO] Use cpu for pytorch...
2020-09-26 11:16:41,208 [ERROR] Can not find checkpoint for /home/liubo/deeprl_net/ia2c_net_1.0/model/
Loading configuration... done.
Error: Answered with error to command 0xc2: The phase duration must be given as an integer.
Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 104, in train
trainer.run()
File "/home/liubo/deeprl_network/utils.py", line 218, in run
ob, done, R = self.explore(ob, done)
File "/home/liubo/deeprl_network/utils.py", line 156, in explore
next_ob, reward, done, global_reward = self.env.step(action)
File "/home/liubo/deeprl_network/envs/atsc_env.py", line 182, in step
self._set_phase(action, 'yellow', self.yellow_interval_sec)
File "/home/liubo/deeprl_network/envs/atsc_env.py", line 516, in _set_phase
self.sim.trafficlight.setPhaseDuration(node_name, phase_duration)
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/_trafficlight.py", line 283, in setPhaseDuration
tc.CMD_SET_TL_VARIABLE, tc.TL_PHASE_DURATION, tlsID, phaseDuration)
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/connection.py", line 141, in _sendDoubleCmd
self._sendExact()
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/connection.py", line 109, in _sendExact
raise TraCIException(err, prefix[1], _RESULTS[prefix[2]])
traci.exceptions.TraCIException: The phase duration must be given as an integer.
Error: tcpip::Socket::recvAndCheck @ recv: peer shutdown
Quitting (on error).

continuous action spaces

Thanks for your good contribution, Can all the algorithms be used only in discrete action spaces and not in continuous action spaces? If I want to use these algorithms in continuous action space, how should I modify them?
Thanks!

Could you share the requirements.txt file?

Could you tell me which version of Python, Tensorflow you are using for the code?

I am trying to run the basic CCAC setup without Sumo and I am getting the following errors:
Traceback (most recent call last):
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 490, in apply_op
preferred_dtype=default_dtype)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 614, in _TensorTensorConversionFunction
% (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype bool for Tensor with dtype int64: 'Tensor("nc/boolean_mask/Reshape_1:0", shape=(8,), dtype=int64)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 99, in train
model = init_agent(env, config['MODEL_CONFIG'], total_step, seed)
File "main.py", line 63, in init_agent
total_step, config, seed=seed)
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 196, in init
total_step, seed, model_config)
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 110, in _init_algo
self.policy = self._init_policy()
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 240, in _init_policy
self.neighbor_mask, n_fc=self.n_fc, n_h=self.n_lstm)
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 198, in init
self._init_policy(n_agent, neighbor_mask, n_h)
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 329, in _init_policy
self.pi_fw, self.v_fw, self.new_states = self._build_net('forward')
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 287, in _build_net
h, new_states = lstm_comm(ob, policy, done, self.neighbor_mask, self.states, 'lstm_comm')
File "/network/home/jitanian/thesis/deeprl_network/agents/utils.py", line 192, in lstm_comm
mi = tf.expand_dims(tf.reshape(tf.boolean_mask(out_m, masks[i]), [-1]), axis=0)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1117, in boolean_mask
return _apply_mask_1d(tensor, mask)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1089, in _apply_mask_1d
indices = squeeze(where(mask), squeeze_dims=[1])
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2326, in where
return gen_array_ops.where(input=condition, name=name)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3824, in where
result = _op_def_lib.apply_op("Where", input=input, name=name)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 513, in apply_op
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'input' of 'Where' Op has type int64 that does not match expected type of bool.

The pytorch implementation cannot use gpu to train

I want to train it on GPU. But I see the code can only train on CPU as default. So I change the parameter: use_gpu=True as follows.
image
But then, I got this error:
image
So how can I fix this bug? Thank you very much

setup_sumo.h failed on ubuntu 18.04.3 LTS

libtool: link: g++-5 -Wall -Wformat -Woverloaded-virtual -Wshadow -O2 -DNDEBUG -Wuninitialized -ffast-math -fstrict-aliasing -finline-functions -fomit-frame-pointer -fexpensive-optimizations -DHAVE_JPEG_H=1 -DHAVE_PNG_H=1 -DHAVE_TIFF_H=1 -DHAVE_ZLIB_H=1 -DHAVE_BZ2LIB_H=1 -DHAVE_XFT_H=1 -I/usr/include/freetype2 -DHAVE_XSHM_H=1 -DHAVE_XSHAPE_H=1 -DHAVE_XCURSOR_H=1 -DHAVE_XRENDER_H=1 -DHAVE_XRANDR_H=1 -DHAVE_XFIXES_H=1 -DHAVE_XINPUT_H=1 -DNO_XIM -DHAVE_GLU_H=1 -DHAVE_GL_H=1 -o .libs/chart chart.o icons.o ./.libs/libCHART-1.6.so /tmp/fox-20210117-6371-x26beb/fox-1.6.56/src/.libs/libFOX-1.6.so ../src/.libs/libFOX-1.6.so -lX11 -lXext /usr/lib/x86_64-linux-gnu/libfreetype.so -lfontconfig -lXft -lXcursor -lXrender -lXrandr -lXfixes -lXi -lm -ldl -lpthread -lrt -ljpeg -lpng -ltiff -lz -lbz2 -lGLU -lGL -Wl,-rpath -Wl,/home/linuxbrew/.linuxbrew/Cellar/fox/1.6.56_2/lib
/home/linuxbrew/.linuxbrew/bin/ld: /home/linuxbrew/.linuxbrew/lib/libfontconfig.so: undefined reference to `FT_Done_MM_Var'

ubuntu@ubuntu-intel-nuc:~/deeprl_network$ nm -D /home/linuxbrew/.linuxbrew/lib/libfontconfig.so | grep FT_Done_MM_Var
U FT_Done_MM_Var

Whether manual parallel sampling will cause problem with LSTM design

Hi, since sumo is too slow and do not support parallel sampling as we know, we are trying to manually construct several parallel envs during training with sumo as the core each, following a serial manner. It seems like this becomes an off-policy training process since samples from several envs are collected. While my concern is whether this will disturb the LSTM since it records global hidden states of a single env.
If we want to end up with a parallel sampling manner, is asynchronous sampling necessary?

size mismatch when running environent

When I run the given models in ATSC Monaco(didn't change any code), error appears as follow:
'RuntimeError: size mismatch, m1: [1 x 56], m2: [48 x 64] at /opt/conda/conda-bld/pytorch_1579022071601/work/aten/src/TH/generic/THTensorMath.cpp:136'

the complete error is:
'
Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 104, in train
trainer.run()
File "/home/ziqi/deeprl_network/utils.py", line 218, in run
ob, done, R = self.explore(ob, done)
File "/home/ziqi/deeprl_network/utils.py", line 151, in explore
policy, action = self._get_policy(ob, done)
File "/home/ziqi/deeprl_network/utils.py", line 115, in _get_policy
policy = self.model.forward(ob, done, self.ps)
File "/home/ziqi/deeprl_network/agents/models.py", line 271, in forward
actions, out_type)
File "/home/ziqi/deeprl_network/agents/policies.py", line 221, in forward
h, new_states = self._run_comm_layers(ob, done, fp, self.states_fw)
File "/home/ziqi/deeprl_network/agents/policies.py", line 341, in _run_comm_layers
s_i = self._get_comm_s(i, n_n, x, h, p)
File "/home/ziqi/deeprl_network/agents/policies.py", line 486, in _get_comm_s
return F.relu(self.fc_x_layers[i](torch.cat([x[i].unsqueeze(0), nx_i], dim=1))) +
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.