Andrew Errity

School Of Computing, Dublin City University.
Home

Research

Publications

Undergrad Work

Links

Contact

Evolving Strategies for the Prisoner's Dilemma

Functional Specification

Further details of this software may be found here.

1 Introduction

1.1 Scope and purpose

This Functional Specification aims to describe the features of this software project. It describes the product's behaviour as seen by an external observer, and contains the information and data needed for the design. This document defines what the functionality will be, but not how that functionality will be implemented. This document will also provide some background information and detail the projects objectives.


1.2 The Prisoner's Dilemma

The Prisoner's Dilemma is a classic game used by social scientists and game theorists to investigate collaboration and cooperation. The name comes from a particular illustration of the interaction which is credited to Albert Tucker in the 1950s [1]. The story is that of two prisoners placed in separate cells, with the aim of getting one prisoner to implicate the other. Each prisoner is given the option to defect against the other, by giving evidence against them. If both prisoners defect (give evidence) then the judge, in no doubt over their guilt, will send them both to prison for 3 years. If both prisoners cooperate (don't give evidence), then the judge, with less clear indication of guilt, will send them both to prison for only 1 year. However if one prisoner defects and the other does not, the judge will take this as a clear sign of guilt, allowing the defector (evidence giver) to walk free whilst sentencing the other prisoner to 5 years. The rules of this game can be more clearly illustrated in the following "pay-off matrix" (Table 1).


Table 1

This game has proven to be an important model when studying cooperation. There is only one rational way to play this game: always defect. If you think your opponent will cooperate, then you can get a higher score (less time in prison) by defecting. If you think your opponent will defect, the best you can do is also defect. Therein lies the dilemma, when played by rational players both players will always defect, this is due to the fact that from a player's local viewpoint this is always the best choice. However it can be seen, when viewed globally, that if both cooperated, each would have done better.

This game becomes more interesting when play is repeated indefinitely allowing players to develop more sophisticated strategies; this variation of the game is known as the Iterated Prisoner's Dilemma (IPD). One of the most successful strategies in the IPD, as proven by Axelrod [2], is known as Tit-for-Tat. This strategy involves cooperating on the first iteration and in subsequent rounds copying the opponent's previous move.

Despite the seeming simplicity of this game it has generated huge quantities of research and fascinates social scientists. It's a game whose principles may have wide-ranging explanatory power. For example, some biologists believe "that many wild animals and plants are engaged in ceaseless games of Prisoner's Dilemma" (Dawkins 1989). The game is also used to study aspects of economic, social and political behaviour such as the nuclear arms race and stock market buying patterns.


1.3 Overview

The aim of this project is to create a working model of the Iterated Prisoner's Dilemma (IPD) described above. This model will allow many different prisoners (or organisms) to exist in the world, each organism beginning with a random strategy. Each organism shall then engage in numerous games of IPD with other random organisms. Each organism's performance in the world can be represented as the organisms average "pay-off" as determined by the game rules. The best performing strategies can then be allowed to reproduce in order to evolve an optimal strategy for the game.


1.4 Objectives

The purpose of this project is to evolve optimal strategies for the iterated prisoner's dilemma game. This will also provide a comparison of the effective performance of different strategies. Furthermore the project will feature a graphical user interface which will allow users to study the interactions of the varying strategies and the population changes of each strategy. In addition the GUI will allow the user to alter the model settings and view the effects on the world. This program should be useful to those with an interest in game theory, social sciences, artificial life or to casual users.


1.5 Constraints

It is proposed the application be written in Java. Thus the only software requirement is that the target platform runs the Java virtual machine.

2 Functional and Data Description

2.1 Genetic Algorithms

Genetic Algorithms take the essential aspects of the traditional picture of biological evolution and represent them as computer models. This is done by creating a virtual world populated with 'organisms', each organism's performance in the world must be quantifiable; this quantity is then used to represent each organism's relative 'fitness'. The 'fittest' organisms are then permitted to reproduce in some manner. This process is then repeated until the average fitness of the population has converged to an optimal value.

This procedure will be used to 'evolve' optimal strategies for playing the IPD. In order to apply this scheme to the IPD we need some way to represent the players within the system. The prisoners will be represented by a 'chromosome' which will determine how they play the game and will be used for reproduction. This 'chromosome' will be a simple string of bits representing the player's strategy.

The exact nature of this bit string will be left for more technical documentation. However an indication of one way this bit string may encode a player's strategy can be found in the "Evolution of Cooperation" by Robert Axelrod, the most famous source for this type of study. Axelrod based the player's next response on the outcome of the last 3 plays. As each game as four possible outcomes (CC, DD, CD, DC : Cooperate, Defect) a 64bit sting (43) could be used to encode a player's strategy as a sequence of C's and D's.

This representation of each player's strategy can then be used to allow a player to play an indefinite number of rounds of IPD against a random opponent. The "pay-offs" gained during this can be used to calculate each players fitness. This fitness will determine which players reproduce. This reproduction will be handled by a genetic algorithm. The basic genetic operators of cross-over and mutation will be used to achieve this reproduction. Crossover involves taking the two parent chromosomes, breaking them both at the same randomly chosen location and rejoining the parts, one from each parent. The effect is that the offspring inherits part of the chromosome from one parent and part from the other. Mutation then takes place by randomly changing a very small proportion of the bits in the chromosome.


2.2 Graphical User Interface

This model of the IPD will be presented to the user via a graphical interface (e.g. a java applet).

This interface will contain a large area which provides an abstract view of the players within the world. Each strategy will be represented as a different colour. The world will be displayed as a two-dimensional plane; this plane will display the players, with players next to those they are currently playing the IPD with. This will allow a user to see the population variations and which strategies are playing each other.

The interface will also feature a statistics section which will provide the user with information regarding population sizes, rates of change, average pay-offs and any other useful information. This information will be presented to the user in the form of graphs or charts. The manner in which this data is presented will be user configurable.

In addition the interface will feature a component which allows a user to alter the program variables, thus allowing a user to manipulate the model and view the effects of these changes. These variables may include initial population sizes, fitness thresholds, pay offs, speed, degree of mutation and any other useful variables.

Figure 1 below provides an example of how such a GUI may look. This is intended as an example; the actual implementation may differ from this.


Figure 1

3 References

[1] Shaun P. Hargreaves Heap and Yanis Varoufakis, "Game Theory : A Critical Introduction", Routledge, 146, 1995
[2] Robert Axelrod, "The Evolution of Cooperation", New York: Basic Books, 1984