WarGames

An Analysis of Emergent Properties of Information Processing Systems Operating in Complex Environments

"A strange game. The only winning move is not to play. How about a nice game of chess?" ~ Joshua (WarGames, 1983)

Goal

To understand how the economics of an environment influence the emergence of cooperation and conflict in multi-agent, multi-goal interaction.

Why?

A system is called degenerate when it reaches a state where it's impossible for agents to achieve the optimal outcome. Understanding the conditions in which a system degenerates allows us to optimize the processes that develop human systems.
Understanding the nature of cooperation and conflict through the study of mechanical information processing systems (i.e. computers) yields insights into the beginnings of human society and builds a foundation for analysis of extraterrestrial life.

Problem Formulation

We propose the use of reinforcement learning as a model of behavior for a rational agent operating in a partially observed environment. Note that the environment often includes other agents. We then vary the properties of the environment and agent models and study the resulting interaction through statistical simulation.

Experiments

Shoot Out - Agent A and Agent B live in a nxn grid world with a single overlapping path to a goal state. Agent A and Agent B can either 1) stay still 2) move in any of the 4 cardinal directions or 3) attack in any of the 4 cardinal directions. Only one agent can stay in the goal square. Will Agent A and Agent B learn to attack one another to maximize their individual reward?
Pas de Deux - Agent A and Agent B again live in a nxn grid world. This time a reward is only achieved if both agents occupy squares that are mirror images of each other. Over time, the reward derived from staying in a goal state decreases and a rational agent would move to another square. Will Agent A and Agent B learn to mirror each other's actions to achieve maximal reward?
Prisoner's Dilemma - Agent A and Agent B live in a simple world that simulates the conditions of Prisoner's Dilemma. Will Agent A and Agent B learn to cooperate in a single iteration of Prisoner's Dilemma? What if we increase the number of iterations of Prisoner's Dilemma? What if the number of total iterations of Prisoner's Dilemma is known by the agents?
One Night Werewolf - Repeat experiment 3 but allowing each agent to signal an intent to the other agents. Each agent can choose to either lie or not to lie. How does this change the outcome of iterated Prisoner's Dilemma? What if we add a third Agent C? What if each agent has a predefined level of trustworthiness that is known by them alone and allows them to lie about their intent only a certain fraction of the time? What if some percentage of the time an agent's intended signal is distorted (i.e. the communication mode is lossy as is true of the English language)?
Alien Invasion - Agent A, Agent B, and SuperAgent C live in a nxn grid world with obstacles. The goal state for Agent A and Agent B is to 'capture' SuperAgent C. SuperAgent C has complete knowledge of the environment and the locations of Agent A and B. Will Agent A and B learn to cooperate to capture the more technologically advanced Agent C?

Results

Explore

AgentRandom

AgentQ

Shoot Out

Pas de Deux

Prisoner's Dilemma

In the figures below, orange represents betrayal and green represents cooperation. The movement of the agents through the grid represents progression through time.

Will Agent A and Agent B learn to cooperate in single iteration Prisoner's Dilemma?

In single iteration Prisoner's Dilemma, the optimal action is to betray. As illustrated in the figure above, the reinforcement learning agents have recovered the game theoretic optimal behavior.

What if the agents can remember a history of the last 5 iterations of Prisoner's Dilemma?

What if the agents can remember a history of the last 3 iterations and the number of iterations of Prisoner's Dilemma remaining is known by the agents?

One Night Werewolf

Alien Invasion

Thought Experiments

The experiments below are only meant to think about. If reinforcement learning proves to be a reasonable model for a general information processing system that is capable of recovering the game theoretic optimal strategy, maybe someday these thought experiments could be formulated in a testable way.

Neurons & Neocortex - Simulate an artificial neuron with a simple action space: {fire, do nothing}. When a neuron fires, it changes the properties of its local environment which often includes the environment of neighboring neurons. The goal of each neuron is to use an internal representation of the local environment to optimize the system's overall capacity to predict the next input. What patterns in actuation emerge when an entire population of neurons is exposed to spatio-temporally varying input? What if the inputs are completely random? What if the inputs represent an encoding of simple patterns of beeps? What if the inputs represent an encoding of Beethoven's 5th Symphony?
Religion as a Natural Phenomenon - Are religious belief systems an emergent property of information processing systems evolving in the constraints of our physical universe and an Earth-like environment? What belief systems would emerge in Flatland?