learning-Paris

Download Report

Transcript learning-Paris

Bayesian and non-Bayesian Learning in Games
Ehud Lehrer
Tel Aviv University,
School of Mathematical Sciences
Including joint works with:
Ehud Kalai, Rann Smorodinsky, Eioln Solan.
Learning in Games
Informal definition of learning: a decentralized process that
converges (in some sense) to (some) equilibrium.
Bayesian (rational) learning: Players do not start in equilibrium,
but
• they have some initial belief about other players’ strategies
• they are rational: they maximize their payoffs
• they take into account future payoffs
Convergence in REPEATED GAME
Non-Bayesian learning: Players
• don’t have any initial belief about other players’ strategies
• don’t maximize their payoffs
• don’t take into account future payoffs
Convergence (of the empirical frequency) to an equilibrium of the
ONE-SHOT GAME
Bayesian vs. non-Bayesian
Bayesian learning: Players do not start in equilibrium, but they
start with a “grain” of idea about what other players do.
Nature of results: players eventually play something close to an
equilibrium of the repeated game.
Non-Bayesian learning: Players have no idea about other
players’ actions. They don’t care to maximize payoffs.
Nature of results: the statistics of past actions looks like an
equilibrium of the one-shot game.
Important tools
Bayesian learning: merging of two probability measures along a
a filtration (an increasing sequence of  - fields)
Non-Bayesian learning: approachability
Both were initiated by Blackwell (the first with Dubins)
Repeated Games with Vector Payoffs
• I = finite set of actions of player 1.
• J = finite set of actions of player 2.
• M = (mi,j) = a payoff matrix. Entries are vectors in Rd.
A set F is approachable by player 1 if there is a strategy  s.t.
 ,  , N ,  ,
P ,  sup d ( xn , F )     
 n N

A set F is excludable by player 2 if there is a strategy  s.t.
 ,  , N ,  ,


P , inf d ( xn , F )    
n N
There are sets which are neither approachable nor
excludable.
Approachability
Applications (a sample):
• No-regret (Hannan)
• Repeated games with incomplete information (Aumann-Maschler)
• Learning (Foster-Vohra, Hart-Mas Colell)
• Manipulation of calibration tests (Foster-Vohra, Lehrer,
Smorodinsky-Sandroni-Vohra)
• Generating generalized normal-number (Lehrer)
Characterization of Approachable Sets
mp,q = i,j pi mi,j qj
H(p) = { mp,q , q  (I) }
A closed set F  Rd is a B-set if for every x  F there is y  F that
satisfies:
1. y is a closest point in F to x.
2. The hyperplane perpendicular to the line xy that passes through y
separates between x and H(p), for some p  (I).
H(p0)
x
y
the line xy
F
the hyperplane perpendicular to xy
that passes through y
Characterization of Approachable Sets
Theorem [Blackwell, 1956]: every B-set F is approachable.
The approaching strategy plays at each stage n the mixed action p
such that H(p) and x are separated by the hyperplane connecting x
and a closest point to x in F.
With this strategy: E  d ( xn , F )  2 | M |
n
Theorem [Blackwell, 1956]: every convex set is either
approachable or excludable.
Theorem [Hou, 1971; Spinat, 2002]: every minimal (w.r.t. set
inclusion) approachable set is a B-set.
Or: A set is approachable if and only if it contains a B-set.
Bounded Computational Capacity
A strategy is k-bounded-recall if it depends only on the last k
pairs of actions (and it does not depend on previously played
actions).
A (non-deterministic) automaton is given by:
• A finite state space.
• A probability distribution over states, according to which the
initial state is chosen.
• A set of inputs (say, the set I × J of action pairs).
• A set of outputs (say, I , the set of player 1’s actions).
• A rule that assigns to each state a probability distribution over
outputs.
• A transition rule that assigns to every state and every input a
probability distribution over the next state.
Approachability and Bounded Capacity
A set F is approachable with bounded-recall strategies by
player 1 if for every >0, the set B(F, ) := { y : d(y, F)  } is
approachable by some bounded-recall strategy.
A set F is excludable against bounded-recall strategies by player 2
if player 2 has a strategy  such that
 ,  , bounded-recall  , N ,


P , inf d ( xn , F )    
n N
Theorem (w/ Eilon Solan): The following statements are equivalent.
1. The set F is approachable with bounded-recall strategies.
2. The set F is approachable with automata.
3. The set F contains a convex approachable set.
4. The set F is not excludable against bounded-recall strategies.
4 points to note
Main Theorem
Theorem: The following statements are equivalent for closed sets.
1. The set F is approachable with bounded-recall strategies.
2. The set F is approachable with automata.
3. The set F contains a convex approachable set.
4. The set F is not excludable against bounded-recall strategies.
1. A set is approachable with automata if and only if it is
approachable by bounded-recall strategies.
2. A complete characterization of sets that are approachable
with bounded-recall strategies.
3. A set which is not approachable with bounded-recall
strategies, is excludable against all bounded-recall strategies.
4. We do not know whether the same holds for automata.
Example
(-1,1) (1,-1)
On board
(1,1) (-1,-1)
Good news: in applications target sets are convex
( a point or a whole -- positive or negative -- orthant).
Approachability in Hilbert space
• I = finite set of actions of player 1.
• J = finite set of actions of player 2.
• M = (mi,j) = a payoff matrix. Entries are points in HS
(random variables).
All may change with the stage n.
A set F is approachable by player 1 if there is a strategy  s.t.
 ,  , N ,  ,
P ,  sup d ( xn , F )     
 n N

Advantage: allows for infinitely many constraints
Theorem: Suppose that at stage n, the average payoff is xn
and y is a closest point in F to xn . If the hyperplane perpendicular to the
line xn y that passes through y separates between xn and H(p), for some
p  (I), then F is approachable.
Approachability and law of large numbers
X 1 , X 2 ,... are uncorrelated r.v.’s with E ( X i )  0.
E ( X i X j ) is the dot product.
F is 0
At any stage n, E( X n X n1 )  0.
Xn
X n 1
F
The game: each players has only one action. The payoff at stage n
is X n . Thus, F is approachable. This is the strong law of large
numbers. (When the payoffs are not uniformly bounded, there is an
additional boundedness condition.)
Problem: Approachability in norm spaces.
Activeness function
H is L2 (even over a finite probability space).
At stage n the characteristic function K n indicates which
coordinates are active and which are not.
n
The average payoff at stage n is X n 
K X
t 1
n
t
K
t 1
t
t
Applications: 1. repeated games with incomplete information –
different games are active on different times
2. construction of normal numbers
3. manipulability of many calibration tests
4. general no-regret theorem (against many replacing schemes)
5. convergence to correlated eq. along many sequences
Activeness function – cont.
Theorem: suppose that F is convex. Let yn be the closest point in F to
the average payoff at time n, X n . If the hyperplane perpendicular to the
line
K n 1
( X n  yn )
n
K
t 1
t
that passes through yn separates between
p  (I), then F is approachable.
X n and H(p), for some