Light

nkrgit / fleet-scheduling-using-maddpg-multi-agent-rl Goto Github PK

View Code? Open in Web Editor NEW

7.0 1.0 1.0 1.76 MB

Developed a Multi-Agent DDPG to solve Vehicle Scheduling problem.

Jupyter Notebook 100.00%

fleet-management grid-world keras-tensorflow maddpg openai-gym-environments q-learning

fleet-scheduling-using-maddpg-multi-agent-rl's Introduction

Fleet-Scheduling-using-MADDPG-Multi-Agent-RL

Goal:

To develop a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to solve a Multi-Agent Environment (i.e., Vehicle Scheduling Environment) and Simple Adversary: OpenAI Multi-Agent particle environment.

Multi-Agent Environment

Two cars in a 4x4 Grid-world environment
- 1st car – Goal - To reach top right of the environment
- 2nd car – Goal - To reach top left of the environment
- State space: 16 states: {s0, s1, s2,...s15}
- Action space: {0: down, 1: up, 2: right, 3: left, 4: no move}
- Reward structure
  - Towards the target: 1
  - Away from the target: -3
  - Stays in same position: -5
  - Reaches target: 100

Simple Adversary - OpenAI Multi Agent particle environment

3 agents – 1 adversary and 2 good agents (Physical deception)
Environment – 2 landmarks (Green – target landmark, Black – dummy landmark)
Rewards:
- For agents:
  - Positive reward - based on the distance between the closest agent to the target landmark
  - Negative reward – based on the distance between the adversary to the target landmark
- For adversary:
  - Positive reward – based on the distance between the adversary to target landmark

Implementation:

Implemented Q-learning and MADDPG on both Vehicle Scheduling and Simple Adversary Environments

MADDPG:

Every Agent has
- Actor Network:
  - Inputs: States, Actions
  - Outputs: Probabilities
- Critic Network:
  - Inputs: States, Actions
  - Outputs: Q values
To avoid running targets (i.e. freeze weights) target networks are used
- Target Actor Network (i.e. performed soft updates)
- Target Critic Network (i.e. performed soft updates)

Improved Version of MADDPG

I developed an improved version of MADDPG, where I have used the ε-greedy approach even after applying noise to actions chosen from the deterministic policy.

Observations:

Q learning is not working well for the Vehicle Scheduling environment.
The MADDPG algorithm is working better when compared to the Q-learning algorithm.
Proper attention should be given while implementing the MADDPG algorithm since it may lead to over-estimation of the Q-value using the Critic network.
MADDPG is working well for a continuous action-state value environment (i.e., Simple Adversary)

fleet-scheduling-using-maddpg-multi-agent-rl's People

Contributors

Stargazers

Watchers

Forkers

xfyecn

fleet-scheduling-using-maddpg-multi-agent-rl's Issues

NameError: name 'RandomAgent' is not defined

Hi nkrgit,

  I am very sorry to disturb you. I run the code unsuccefullly, here is some detail about that. May you help me? 
  thanks in advance.

Sincerely,
Xianfeng Ye

$py Fleet_Scheduling_Report_MADDPG.py
/home/srv/anaconda/anaconda3/envs/rllib/lib/python3.8/site-packages/google/colab/data_table.py:30: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
from IPython.utils import traitlets as _traitlets
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
Traceback (most recent call last):
File "Fleet_Scheduling_Report_MADDPG.py", line 415, in
q_learning()
File "Fleet_Scheduling_Report_MADDPG.py", line 213, in q_learning
agent = RandomAgent(env)#creating a random agent to explore the given environments
NameError: name 'RandomAgent' is not defined

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.