Code Monkey home page Code Monkey logo

wargames's Introduction

WarGames

An Analysis of Emergent Properties of Information Processing Systems Operating in Complex Environments

"A strange game. The only winning move is not to play. How about a nice game of chess?" ~ Joshua (WarGames, 1983)

Goal

To understand how the economics of an environment influence the emergence of cooperation and conflict in multi-agent, multi-goal interaction.

Why?

  1. A system is called degenerate when it reaches a state where it's impossible for agents to achieve the optimal outcome. Understanding the conditions in which a system degenerates allows us to optimize the processes that develop human systems.
  2. Understanding the nature of cooperation and conflict through the study of mechanical information processing systems (i.e. computers) yields insights into the beginnings of human society and builds a foundation for analysis of extraterrestrial life.

Problem Formulation

We propose the use of reinforcement learning as a model of behavior for a rational agent operating in a partially observed environment. Note that the environment often includes other agents. We then vary the properties of the environment and agent models and study the resulting interaction through statistical simulation.

Experiments

  1. Shoot Out - Agent A and Agent B live in a nxn grid world with a single overlapping path to a goal state. Agent A and Agent B can either 1) stay still 2) move in any of the 4 cardinal directions or 3) attack in any of the 4 cardinal directions. Only one agent can stay in the goal square. Will Agent A and Agent B learn to attack one another to maximize their individual reward?
  2. Pas de Deux - Agent A and Agent B again live in a nxn grid world. This time a reward is only achieved if both agents occupy squares that are mirror images of each other. Over time, the reward derived from staying in a goal state decreases and a rational agent would move to another square. Will Agent A and Agent B learn to mirror each other's actions to achieve maximal reward?
  3. Prisoner's Dilemma - Agent A and Agent B live in a simple world that simulates the conditions of Prisoner's Dilemma. Will Agent A and Agent B learn to cooperate in a single iteration of Prisoner's Dilemma? What if we increase the number of iterations of Prisoner's Dilemma? What if the number of total iterations of Prisoner's Dilemma is known by the agents?
  4. One Night Werewolf - Repeat experiment 3 but allowing each agent to signal an intent to the other agents. Each agent can choose to either lie or not to lie. How does this change the outcome of iterated Prisoner's Dilemma? What if we add a third Agent C? What if each agent has a predefined level of trustworthiness that is known by them alone and allows them to lie about their intent only a certain fraction of the time? What if some percentage of the time an agent's intended signal is distorted (i.e. the communication mode is lossy as is true of the English language)?
  5. Alien Invasion - Agent A, Agent B, and SuperAgent C live in a nxn grid world with obstacles. The goal state for Agent A and Agent B is to 'capture' SuperAgent C. SuperAgent C has complete knowledge of the environment and the locations of Agent A and B. Will Agent A and B learn to cooperate to capture the more technologically advanced Agent C?

Results

Explore

AgentRandom

AgentRandom

AgentQ

AgentQ

Shoot Out

Pas de Deux

Prisoner's Dilemma

In the figures below, orange represents betrayal and green represents cooperation. The movement of the agents through the grid represents progression through time.

Will Agent A and Agent B learn to cooperate in single iteration Prisoner's Dilemma? PrisonersDilemmaSingleIteration

In single iteration Prisoner's Dilemma, the optimal action is to betray. As illustrated in the figure above, the reinforcement learning agents have recovered the game theoretic optimal behavior.

What if the agents can remember a history of the last 5 iterations of Prisoner's Dilemma? PrisonersDilemmaHistory

What if the agents can remember a history of the last 3 iterations and the number of iterations of Prisoner's Dilemma remaining is known by the agents? PrisonersDilemmaHistoryTimeLeft

One Night Werewolf

Alien Invasion

Thought Experiments

The experiments below are only meant to think about. If reinforcement learning proves to be a reasonable model for a general information processing system that is capable of recovering the game theoretic optimal strategy, maybe someday these thought experiments could be formulated in a testable way.

  1. Neurons & Neocortex - Simulate an artificial neuron with a simple action space: {fire, do nothing}. When a neuron fires, it changes the properties of its local environment which often includes the environment of neighboring neurons. The goal of each neuron is to use an internal representation of the local environment to optimize the system's overall capacity to predict the next input. What patterns in actuation emerge when an entire population of neurons is exposed to spatio-temporally varying input? What if the inputs are completely random? What if the inputs represent an encoding of simple patterns of beeps? What if the inputs represent an encoding of Beethoven's 5th Symphony?
  2. Religion as a Natural Phenomenon - Are religious belief systems an emergent property of information processing systems evolving in the constraints of our physical universe and an Earth-like environment? What belief systems would emerge in Flatland?

Resources

Game Theory of Mind

Game Theory Using Genetic Algorithms

Understanding Agent Cooperation

Homo Economicus

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Multiagent Cooperation and Competition with Deep Reinforcement Learning

When Machine Learning Meets AI and Game Theory

Deep Learning for Predicting Human Strategic Behavior

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

Multi-Agent Cooperation and the Emergence of Natural Language

Multi-Agent Reinforcement Learning Paper Collection

Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control

Complex dynamics of elementary cellular automata emerging from chaotic rules

Modeling the Formation of Social Conventions in Multi-Agent Populations

Valuing knowledge, information and agency in Multi-agent Reinforcement Learning: a case study in smart buildings

Ingredients for Robotic Research

Social norm complexity and past reputations in the evolution of cooperation

wargames's People

Contributors

blisschapman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.