Evolution and Repeated Games

Download Report

Transcript Evolution and Repeated Games

Evolution and Repeated Games
D. Fudenberg (Harvard)
E. Maskin (IAS, Princeton)
Cooperate
Defect
C
D
2, 2
1,3
3, 1
0, 0
Theory of repeated games important
• central model for explaining how self-interested
agents can cooperate
• used in economics, biology, political science and
other fields
2
But theory has a serious flaw:
• although cooperative behavior possible, so is
uncooperative behavior (and everything in
between)
• theory doesn’t favor one behavior over another
• theory doesn’t make sharp predictions
3
Evolution (biological or cultural) can promote
efficiency
• might hope that uncooperative behavior will be
“weeded out”
• this view expressed in Axelrod (1984)
4
Basic idea:
Cooperate
Defect
•
•
C
D
2, 2
1,3
3, 1
0, 0
Start with population of repeated game strategy Always D
Consider small group of mutants using Conditional C (Play C until someone
plays D, thereafter play D)
– does essentially same against Always D as Always D does
– does much better against Conditional C than Always D does
•
•
Thus Conditional C will invade Always D
uncooperative behavior driven out
5
C
D
C
2, 2
1,3
D
3, 1
0, 0
But consider ALT
Alternate between C and D until pattern broken, thereafter play D
• can’t be invaded by some other strategy
– other strategy would have to alternate or else would do much worse
against ALT than ALT does
• Thus ALT is “evolutionarily stable”
• But ALT is quite inefficient (average payoff 1)
6
• Still, ALT highly inflexible
– relies on perfect alternation
– if pattern broken, get D forever
• What if there is a (small) probability of
mistake in execution?
7
C
D
C
2, 2
1,3
D
3, 1
0, 0
• Consider mutant strategy s identical to ALT except if (by mistake)
alternating pattern broken
– s signals “intention” to cooperate by playing C in following period
– if other strategy plays C too, s plays C forever
– if other strategy plays D, s plays D forever
s identical to ALT before pattern broken
s and ALT each get about 0 against ALT, after pattern broken
s gets 2 against s; ALT gets about 0 against s, after pattern broken
•
•
•
• so s invades ALT
8
Main results in paper (for 2-player symmetric repeated
games)
(1) If s evolutionarily stable and
– discount rate r small (future important)
– mistake probability p small (but p > 0)
then s (almost) “efficient”
(2) If payoffs (v, v) “efficient”,
•
then exists ES strategy s (almost) attaining (v, v) provided
– r small
– p small relative to r
generalizes Fudenberg-Maskin (1990), in which r = p = 0
9
Finite symmetric 2–player game g : A  A  
2
• if g  a1 , a2    v1 , v2  , then g  a2 , a1    v2 , v1 
• normalize payoffs so that min max g1  a1 , a2   0
a2
a1
• V   convex hull  g  a1 , a2   a1 , a2   A  A
10



•  v1 , v2   V strongly efficient if w  v1  v2  max   v1  v2 
 v1 ,v2 V
2, 2
1,3
 2, 2 unique strongly efficient pair
3, 1
0, 0
0, 0
1, 2
 v , v  v  v
1
2,1
2
1
2
 3, 1  v1  2 strongly efficient
0, 0
11
Repeated game: g repeated infinitely many times


 a t 1 , a t 1
,  a  t 1 , a  t 1 
• period t history h  a1 1 , a2 1 ,


•   h   a2 1 , a1 1 ,
1
2
2
1
• H = set of all histories
repeated game strategy s : H   A
– assume finitely complex (playable by finite computer)
• in each period, probability p that i makes mistake
– chooses ai  si  h  (equal probabilities for all actions)
– mistakes independent across players
•
12
t 1



r
 1 
r, p
U1  s1 , s2  
E  
 g1  a1  t  , a2  t   s1 , s2 , p 
1  r  t 1  1  r 

t 1



r
1


r, p
U1  s1 , s2 h  
E  
 g1  a1  t  , a2  t   s1 , s2 , p, h 
1  r  t 1  1  r 

13
• informally, s evolutionarily stable (ES), if no
mutant s can invade population with big
proportion s and small proportion s
• formally, s is ES w.r.t. q , r , pif for all s  s and
all
qq
1  q U1r , p  s, s   qU1r , p  s, s
 1  q U1r  s, s   qU1r , p  s, s
• evolutionary stability
– expressed statically here
– but can be given precise dynamic meaning
14
•
•
•
•
population of b players
suppose time measure in “epochs” T = 1, 2, . . .
strategy sT  state in epoch T
− most players in population use sT
group of mutants (of size a) plays s'
•
a drawn randomly from 1, 2, , a , where a b  q
s' drawn randomly from finitely complex
strategies
M random drawings of pairs of players
− each pair plays repeated game
•
sT 1 = strategy with highest average score
15
Theorem 1: For any q , p, r and   0, there
exists T  such that, for all T   T  , there exists
M  T   such that, for all M  M  T  ,
(i) if s not ES, Pr sT t  s, for all t  T  sT  s  
(ii) if s is ES, Pr sT t  s for all t  T  sT  s  1  
16
Let v  min v  0 there exists v such that  v, v strongly efficient
Theorem 2: Given   0 and q  0, there exist r and p
such that, for all r   0, r  and p   0, p  ,
if s is ES w.r.t. q , r , p
then
U1r , p  s, s h   v   for all h.
17
2, 2
1,3
v  2, So U1r , p  s, s   2  
3, 1
0, 0
0, 0
1, 2
v  1, So U1r , p  s, s   1  
2,1
0, 0
18
Proof:
Suppose U1r , p  s, s h   v   for some h
• will construct mutant s' that can invade

r, p
• let h  arg min U1  s, s h 
• if s = ALT, h= any history for which alternating
pattern broken
19
Construct s' so that
•
•
s  h   s  h  , if h not a continuation of h or   h 
after h , strategy s'
–
•
“signals” willingness to cooperate by playing differently
from s for 1 period (assume s is pure strategy)
– if other player responds positively, plays strongly
efficiently thereafter
– if not, plays according to s thereafter



after  h (assume  h  h ), s
– responds positively if other strategy has signaled, and
thereafter plays strongly efficiently
– plays according to s otherwise
 
 
20
•
•
•
•
because h is already worst history,
s' loses for only 1 period by signaling
(small loss if r small)
if p small, probability that s' “misreads” other
player’s intention is small
hence, s' does nearly as well against s as s does
against itself
(even after h or   h  )
s' does very well against itself (strong efficiency),
after h or   h 




U1r , p s, s h  U1 s, s   h   w
21
•
•
remains to check how well s does against s'
by definition of h ,




U1r , p s, s h  U1r , p s, s   h   w   for some  >0
•
Ignoring effect of p,




U1r , p s, s h  U1r , p s, s h  v   ,
Also, after deviation by s', punishment started again, and so




U1r , p s, s   h   U1r , p s, s   h  .
Hence




U1r , p s, s h  U1r , p s, s   h   w  
•
so s does appreciably worse against s' than s' does against s'
22
• Summing up, we have:
r, p
r, p
1

q
U
s
,
s

q
U
  1  
 s, s
1
 1  q U1r , p  s, s   q U1r , p  s, s
• s is not ES
23
• Theorem 2 implies for Prisoner’s Dilemma that,
for any   0,
U1r , p  s, s h   2   for r and p small
• doesn’t rule out punishments of arbitrary (finite)
length
24
• Consider strategy s with “cooperative” and
“punishment” phases
– in cooperative phase, play C
– stay in cooperative phase until one player plays D, in
which case go to punishment phase
– in punishment phase, play D
– stay in punishment phase for m periods (and then go
back to cooperative phase) unless at some point some
player chooses C, in which case restart punishment
• For any m,


U1r , p s, s h  2 (efficiency), as r  0
p 0
25
Can sharpen Theorem 2 for Prisoner’s Dilemma:
Given q , there exist r and p such that, for all
r   0, r  and p   0, p  , if s is ES w.r.t. q , r , p,
then it cannot entail a punishment lasting more
than 3  2q periods
2q
Proof: very similar to that of Theorem 2
26
For r and p too big, ES strategy s may not be
“efficient”
• if r  , back in one-shot case
• if p  12 , then even fully cooperative
strategies in Prisoner’s Dilemma generate
payoffs  1.
27
Theorem 3: Let  v, v  V  with v  v
For all   0, there exist q  0 and r  0 s.t.
for all r  r there exists p  0 s.t.
for all p  p there exists s ES w.r.t. q , r , p for which
U1r , p  s, s   v  
28
Proof: Construct s so that
• along equilibrium path of (s, s), payoffs are
(approximately) (v, v)
• punishments are nearly strongly efficient
– deviating player (say 1) minimaxed long enough wipe
out gain
– thereafter go to strongly efficient point

v


,
w
  v   
– overall payoffs after deviation: 
• if r and p small (s, s) is a subgame perfect
equilibrium
29
• In Prisoner’s Dilemma, consider s that
– plays C the first period
– thereafter, plays C if and only if either both players played
C previous period or neither did
• strategy s
– is efficient
– entails punishments that are as short as possible
– is modification of Tit-for-Tat (C the first period; thereafter,
do what other player did previous period)
• Tit-for-Tat not ES
– if mistake (D, C) occurs then get wave of alternating
punishments:
(C, D), (D, C), (C, D), ...
until another mistake made
30
•
d
a
b
c
a
0, 0
4,1
0, 0
0, 0
b
1, 4
0, 0
0, 0
0, 0
c
0, 0
0, 0
0, 0
0, 0
d
0, 0
0, 0
0, 0
2, 2
modified battle of sexes
Let s = play d as long as in all past periods
– both players played d
– neither played d
if single player deviates from d
– henceforth, that player plays b
– other player plays a
•
s is ES even though inefficient
– any attempt to improve on efficiency, punished forever
– can’t invade during punishment, because punishment efficient
31
Consider potential invader s'
For any h, s' cannot do better against s than s does against itself, since (s, s)
equilibrium
hence, for all h,
()
U1r , p  s, s h   U1r , p  s, s h  (otherwise s can't invade)
and so
()
U1r , p  s, s h   U1r , p  s, s h 
For s' to invade, need
() U1r , p  s, s h   U1r , p  s, s   h  


 U1r , p  s, s h   U1r , p s, s   h  for some h
Claim: () implies h' involves deviation from equil path of (s, s)
only other possibility:
–
s' different from s on equil path
–
then s' punished by 
– violates ()
r, p
r, p

we thus have U1  s, s h   U1 s, s   h   w
Hence, from (), rhs of ()  w , so inequality not feasible


32
For Theorem 3 to hold, p must be small relative to r
• consider modified Tit-for-Tat against itself
(play C if and only if both players took same action last
period)
• with every mistake, there is an expected loss of
2 – (½ · 3 + ½ (−1)) = 1 the first period
2 – 0 = 2 the second period
• so over-all the expected loss from mistakes is
 1 r 
approximately 3 p 

r


• By contrast, a mutant strategy that signals, etc. and doesn’t
1 r 
punish at all against itself loses only about p 

r


• so if r is small enough relative to p, mutant can invade
33