Evolution and Repeated Games
Download
Report
Transcript Evolution and Repeated Games
Evolution and Repeated Games
D. Fudenberg (Harvard)
E. Maskin (IAS, Princeton)
Cooperate
Defect
C
D
2, 2
1,3
3, 1
0, 0
Theory of repeated games important
• central model for explaining how self-interested
agents can cooperate
• used in economics, biology, political science and
other fields
2
But theory has a serious flaw:
• although cooperative behavior possible, so is
uncooperative behavior (and everything in
between)
• theory doesn’t favor one behavior over another
• theory doesn’t make sharp predictions
3
Evolution (biological or cultural) can promote
efficiency
• might hope that uncooperative behavior will be
“weeded out”
• this view expressed in Axelrod (1984)
4
Basic idea:
Cooperate
Defect
•
•
C
D
2, 2
1,3
3, 1
0, 0
Start with population of repeated game strategy Always D
Consider small group of mutants using Conditional C (Play C until someone
plays D, thereafter play D)
– does essentially same against Always D as Always D does
– does much better against Conditional C than Always D does
•
•
Thus Conditional C will invade Always D
uncooperative behavior driven out
5
C
D
C
2, 2
1,3
D
3, 1
0, 0
But consider ALT
Alternate between C and D until pattern broken, thereafter play D
• can’t be invaded by some other strategy
– other strategy would have to alternate or else would do much worse
against ALT than ALT does
• Thus ALT is “evolutionarily stable”
• But ALT is quite inefficient (average payoff 1)
6
• Still, ALT highly inflexible
– relies on perfect alternation
– if pattern broken, get D forever
• What if there is a (small) probability of
mistake in execution?
7
C
D
C
2, 2
1,3
D
3, 1
0, 0
• Consider mutant strategy s identical to ALT except if (by mistake)
alternating pattern broken
– s signals “intention” to cooperate by playing C in following period
– if other strategy plays C too, s plays C forever
– if other strategy plays D, s plays D forever
s identical to ALT before pattern broken
s and ALT each get about 0 against ALT, after pattern broken
s gets 2 against s; ALT gets about 0 against s, after pattern broken
•
•
•
• so s invades ALT
8
Main results in paper (for 2-player symmetric repeated
games)
(1) If s evolutionarily stable and
– discount rate r small (future important)
– mistake probability p small (but p > 0)
then s (almost) “efficient”
(2) If payoffs (v, v) “efficient”,
•
then exists ES strategy s (almost) attaining (v, v) provided
– r small
– p small relative to r
generalizes Fudenberg-Maskin (1990), in which r = p = 0
9
Finite symmetric 2–player game g : A A
2
• if g a1 , a2 v1 , v2 , then g a2 , a1 v2 , v1
• normalize payoffs so that min max g1 a1 , a2 0
a2
a1
• V convex hull g a1 , a2 a1 , a2 A A
10
• v1 , v2 V strongly efficient if w v1 v2 max v1 v2
v1 ,v2 V
2, 2
1,3
2, 2 unique strongly efficient pair
3, 1
0, 0
0, 0
1, 2
v , v v v
1
2,1
2
1
2
3, 1 v1 2 strongly efficient
0, 0
11
Repeated game: g repeated infinitely many times
a t 1 , a t 1
, a t 1 , a t 1
• period t history h a1 1 , a2 1 ,
• h a2 1 , a1 1 ,
1
2
2
1
• H = set of all histories
repeated game strategy s : H A
– assume finitely complex (playable by finite computer)
• in each period, probability p that i makes mistake
– chooses ai si h (equal probabilities for all actions)
– mistakes independent across players
•
12
t 1
r
1
r, p
U1 s1 , s2
E
g1 a1 t , a2 t s1 , s2 , p
1 r t 1 1 r
t 1
r
1
r, p
U1 s1 , s2 h
E
g1 a1 t , a2 t s1 , s2 , p, h
1 r t 1 1 r
13
• informally, s evolutionarily stable (ES), if no
mutant s can invade population with big
proportion s and small proportion s
• formally, s is ES w.r.t. q , r , pif for all s s and
all
qq
1 q U1r , p s, s qU1r , p s, s
1 q U1r s, s qU1r , p s, s
• evolutionary stability
– expressed statically here
– but can be given precise dynamic meaning
14
•
•
•
•
population of b players
suppose time measure in “epochs” T = 1, 2, . . .
strategy sT state in epoch T
− most players in population use sT
group of mutants (of size a) plays s'
•
a drawn randomly from 1, 2, , a , where a b q
s' drawn randomly from finitely complex
strategies
M random drawings of pairs of players
− each pair plays repeated game
•
sT 1 = strategy with highest average score
15
Theorem 1: For any q , p, r and 0, there
exists T such that, for all T T , there exists
M T such that, for all M M T ,
(i) if s not ES, Pr sT t s, for all t T sT s
(ii) if s is ES, Pr sT t s for all t T sT s 1
16
Let v min v 0 there exists v such that v, v strongly efficient
Theorem 2: Given 0 and q 0, there exist r and p
such that, for all r 0, r and p 0, p ,
if s is ES w.r.t. q , r , p
then
U1r , p s, s h v for all h.
17
2, 2
1,3
v 2, So U1r , p s, s 2
3, 1
0, 0
0, 0
1, 2
v 1, So U1r , p s, s 1
2,1
0, 0
18
Proof:
Suppose U1r , p s, s h v for some h
• will construct mutant s' that can invade
r, p
• let h arg min U1 s, s h
• if s = ALT, h= any history for which alternating
pattern broken
19
Construct s' so that
•
•
s h s h , if h not a continuation of h or h
after h , strategy s'
–
•
“signals” willingness to cooperate by playing differently
from s for 1 period (assume s is pure strategy)
– if other player responds positively, plays strongly
efficiently thereafter
– if not, plays according to s thereafter
after h (assume h h ), s
– responds positively if other strategy has signaled, and
thereafter plays strongly efficiently
– plays according to s otherwise
20
•
•
•
•
because h is already worst history,
s' loses for only 1 period by signaling
(small loss if r small)
if p small, probability that s' “misreads” other
player’s intention is small
hence, s' does nearly as well against s as s does
against itself
(even after h or h )
s' does very well against itself (strong efficiency),
after h or h
U1r , p s, s h U1 s, s h w
21
•
•
remains to check how well s does against s'
by definition of h ,
U1r , p s, s h U1r , p s, s h w for some >0
•
Ignoring effect of p,
U1r , p s, s h U1r , p s, s h v ,
Also, after deviation by s', punishment started again, and so
U1r , p s, s h U1r , p s, s h .
Hence
U1r , p s, s h U1r , p s, s h w
•
so s does appreciably worse against s' than s' does against s'
22
• Summing up, we have:
r, p
r, p
1
q
U
s
,
s
q
U
1
s, s
1
1 q U1r , p s, s q U1r , p s, s
• s is not ES
23
• Theorem 2 implies for Prisoner’s Dilemma that,
for any 0,
U1r , p s, s h 2 for r and p small
• doesn’t rule out punishments of arbitrary (finite)
length
24
• Consider strategy s with “cooperative” and
“punishment” phases
– in cooperative phase, play C
– stay in cooperative phase until one player plays D, in
which case go to punishment phase
– in punishment phase, play D
– stay in punishment phase for m periods (and then go
back to cooperative phase) unless at some point some
player chooses C, in which case restart punishment
• For any m,
U1r , p s, s h 2 (efficiency), as r 0
p 0
25
Can sharpen Theorem 2 for Prisoner’s Dilemma:
Given q , there exist r and p such that, for all
r 0, r and p 0, p , if s is ES w.r.t. q , r , p,
then it cannot entail a punishment lasting more
than 3 2q periods
2q
Proof: very similar to that of Theorem 2
26
For r and p too big, ES strategy s may not be
“efficient”
• if r , back in one-shot case
• if p 12 , then even fully cooperative
strategies in Prisoner’s Dilemma generate
payoffs 1.
27
Theorem 3: Let v, v V with v v
For all 0, there exist q 0 and r 0 s.t.
for all r r there exists p 0 s.t.
for all p p there exists s ES w.r.t. q , r , p for which
U1r , p s, s v
28
Proof: Construct s so that
• along equilibrium path of (s, s), payoffs are
(approximately) (v, v)
• punishments are nearly strongly efficient
– deviating player (say 1) minimaxed long enough wipe
out gain
– thereafter go to strongly efficient point
v
,
w
v
– overall payoffs after deviation:
• if r and p small (s, s) is a subgame perfect
equilibrium
29
• In Prisoner’s Dilemma, consider s that
– plays C the first period
– thereafter, plays C if and only if either both players played
C previous period or neither did
• strategy s
– is efficient
– entails punishments that are as short as possible
– is modification of Tit-for-Tat (C the first period; thereafter,
do what other player did previous period)
• Tit-for-Tat not ES
– if mistake (D, C) occurs then get wave of alternating
punishments:
(C, D), (D, C), (C, D), ...
until another mistake made
30
•
d
a
b
c
a
0, 0
4,1
0, 0
0, 0
b
1, 4
0, 0
0, 0
0, 0
c
0, 0
0, 0
0, 0
0, 0
d
0, 0
0, 0
0, 0
2, 2
modified battle of sexes
Let s = play d as long as in all past periods
– both players played d
– neither played d
if single player deviates from d
– henceforth, that player plays b
– other player plays a
•
s is ES even though inefficient
– any attempt to improve on efficiency, punished forever
– can’t invade during punishment, because punishment efficient
31
Consider potential invader s'
For any h, s' cannot do better against s than s does against itself, since (s, s)
equilibrium
hence, for all h,
()
U1r , p s, s h U1r , p s, s h (otherwise s can't invade)
and so
()
U1r , p s, s h U1r , p s, s h
For s' to invade, need
() U1r , p s, s h U1r , p s, s h
U1r , p s, s h U1r , p s, s h for some h
Claim: () implies h' involves deviation from equil path of (s, s)
only other possibility:
–
s' different from s on equil path
–
then s' punished by
– violates ()
r, p
r, p
we thus have U1 s, s h U1 s, s h w
Hence, from (), rhs of () w , so inequality not feasible
32
For Theorem 3 to hold, p must be small relative to r
• consider modified Tit-for-Tat against itself
(play C if and only if both players took same action last
period)
• with every mistake, there is an expected loss of
2 – (½ · 3 + ½ (−1)) = 1 the first period
2 – 0 = 2 the second period
• so over-all the expected loss from mistakes is
1 r
approximately 3 p
r
• By contrast, a mutant strategy that signals, etc. and doesn’t
1 r
punish at all against itself loses only about p
r
• so if r is small enough relative to p, mutant can invade
33