0, 1 - Carnegie Mellon School of Computer Science
Download
Report
Transcript 0, 1 - Carnegie Mellon School of Computer Science
Communication Complexity as a
Lower Bound for Learning in Games
Vincent Conitzer and Tuomas Sandholm
Computer Science Department
Carnegie Mellon University
Overview
• Little work has been done on finding lower bounds on
convergence time for learning in games
• Here, we derive such lower bounds for settings where
the other player’s payoffs are not observed
– Other player’s actions can be observed
• We use tools from communication complexity theory to
get these lower bounds
Matrix games
Player 2
(simultaneously)
chooses a column
Player 1
chooses a row
4, 2
3, 0
Player 1’s
payoff
0, 3
1, 1
Player 2’s
payoff
Solution concepts
(Strict) Dominance:
one strategy always
better than another
4, 2
0, 3
3, 0
1, 1
Iterated Dominance:
repeat dominance process
4, 2
0, 3
3, 0
1, 1
4, 2
0, 3
3, 0
1, 1
(Pure) Nash Equilibrium: pair of
strategies so that neither player
wants to unilaterally deviate
Learning in games (in this paper)
4, 2
3, 0
4, ?
3, ?
0, ?
1, ?
0, 3
1, 1
?, 2
?, 0
?, 3
?, 1
Many different algorithms for
learning in games…
• Iterated best response
• Fictitious play
• …
• Sometimes don’t converge, but when they do:
How do we know there are no faster algorithms?
Communication lower bound
• Suppose players are only interested in converging to a
solution as soon as possible
– No concern for payoffs while learning
– No concern for which of the solutions they converge to
• They still need to communicate enough information to
compute a solution
• In n x n game, in a round, each player can communicate
log(n) bits to the other by choice of strategy
• If c is the number of bits of communication required to
compute the solution…
• … then require at least c/(2log(n)) rounds to convergence
• Communication complexity theory tells us c
Two-player model for
communication complexity [Yao 79]
• Player 1 holds input x, player 2 holds input y
• They seek to compute the value of a known function f(x, y)
– Mapping to {0, 1}
• Players alternatingly send a bit according to protocol
• When protocol terminates function value should be known
• Deterministic communication complexity = worst case
number of bits sent for one pair x, y for the best protocol
• Nondeterministic protocols: next bit to be sent chosen
nondeterministically
– Allowed false positives OR false negatives as long as always
correct for some nondeterministic choices
A lower bounding technique from
communication complexity
• Tree of all possible
communications:
Player 1
Player 2
• Every input pair x, y goes
to some leaf
0
0
1
1 0
1
f=1 f=1 f=0
x1, y1
x2, y1
x2, y3
• If x1, y1 goes to same leaf as x2, y2, then so must x1, y2 and x2, y1
– Only possible if f is same in all 4 cases
• Suppose we have a fooling set {(x1, y1), (x2, y2), …, (xn, yn)} all with
f(xi, yi) = f0, but for any i j, either f(xi, yj) f0 or f(xj, yi) f0
– So all n pairs xi, yi go to different leaves => tree depth log(n)
f=0
• Also lower bound on nondeterministic comm. complexity
– With false positives or negatives allowed, depending on f0
Our strategy
• For each solution concept we study, we give a
communication protocol for finding a solution
– Gives upper bound on communication complexity O(f(n))
• Then we give a large fooling set
– Matching lower bound on communication complexity (f(n))
– Fooling set F here is set of games each of which has (does not
have) at least one solution, and
– for any pair of games g1 and g2 in F, either
• using the row player’s payoffs from g1 and the column player’s payoffs
from g2 yields a game that has no solution (at least one solution), or
• using the row player’s payoffs from g2 and the column player’s payoffs
from g1 yields a game that has no solution (at least one solution)
1, 1
0, 0
0, 0
1, 1
0, 0
1, 1
1, 1
0, 0
Row payoffs
Column payoffs
1, 0
0, 1
0, 1
1, 0
Pure Nash – upper bound
• What is the communication complexity of
determining whether a pure-strategy Nash
equilibrium exists?
• One protocol: communicate for every entry
whether you would deviate from it
3, ? 2, ? 0, ?
3, ? 4, ? 1, ?
1, ? 4, ? 0, ?
Nash Eq.
?, 2 ?, 3 ?, 3
?, 4 ?, 0 ?, 3
?, 2 ?, 2 ?, 3
• One bit per entry, so O(n2) communication
Pure Nash – lower bound
• Consider games with all entries
0, 1 or 1, 0
• Almost never have a pure Nash eq
– Exception: whole row of 1, 0 or
whole column of 0, 1
– Exclude exceptions from fooling set
0, 1 1, 0 0, 1
1, 0 0, 1 1, 0
1, 0 1, 0 0, 1
0, 1 1, 0 0, 1
0, 1 0, 1 1, 0
1, 0 1, 0 0, 1
• But for any two different games,
can combine payoffs to get 1, 1
Nash equilibrium:
Row payoffs
• Fooling set of size (almost) 2^(n2),
so require (n2) communication
Column payoffs
0, 1 1, 0 0, 1
1, 1 0, 1 1, 0
1, 0 1, 0 0, 1
Pure Nash with unique best response
– upper bound
• What if the column player always
has a unique best response?
• Let just her communicate her best
responses
• Now row player knows whether a
pure Nash eq exists
– can communicate this in one bit
?, 2 ?, 2 ?, 3
?, 4 ?, 0 ?, 3
?, 2 ?, 2 ?, 3
3, ? 2, ? 0, ?
3, ? 4, ? 1, ?
1, ? 4, ? 0, ?
• Pointing out a unique best response requires log(n) bits of
communication, so O(n log(n)) communication total
Pure Nash with unique best response
- lower bound
• Consider (generalization to
n x n of) this game:
– No Nash eq
– Column player has unique
best response
• Any permutation of the rows
will still have no Nash eq
swapped
0, 1
1, 0
1, 0
1, 0
0, 1
1, 0
1, 0
1, 0
0, 1
0, 1
1, 0
1, 0
1, 0
1, 0
0, 1
1, 0
0, 1
1, 0
Row payoffs
Column payoffs
• But combining payoffs as
0, 1
1, 0
1, 0
before always yields game
1, 0
0, 0
1, 1
with Nash equilibria
1, 0
1, 1
0, 0
• Fooling set of size n!
• => require (log(n!)) = (n log(n)) communication
Iterated strict dominance
– upper bound
• One protocol: let players alternatingly communicate a
dominated strategy
3, ?
4, ?
1, ?
2, ?
4, ?
1, ?
1, ?
0, ?
0, ?
3, ?
4, ?
1, ?
2, ?
4, ?
1, ?
1, ?
0, ?
0, ?
?, 2
?, 4
?, 4
?, 3
?, 2
?, 3
?, 0
?, 1
?, 5
?, 2
?, 4
?, 4
?, 3
?, 2
?, 3
?, 0
?, 1
?, 5
• Communicating a strategy takes log(n) bits, at most 2n
strategies to eliminate, so O(n log(n)) communication
Iterated strict dominance lower bound
• Consider (generalization
to n x n of) this game:
• Any permutation of the
rows will also solve:
swapped
• But combining the
payoffs will not solve:
• Fooling set of size n!, so
require at least (log(n!)) =
(n log(n)) communication
0, 1
1, 0
1, 0
1, 0
0, 1
0, 1
1, 0
1, 0
0, 1
0, 1
0, 1
1, 0
0, 1
0, 1
0, 1
1, 1
0, 1
1, 0
1, 0
1, 0
0, 1
0, 1
1, 0
1, 0
0, 1
0, 1
1, 0
0, 1
0, 1
0, 1
1, 1
0, 1
Row payoffs
Column payoffs
0, 1
1, 0
1, 0
0, 1
0, 1
1, 0
0, 1
0, 1
0, 1
0, 1
1,STUCK!
1
1, 1
1, 0
1, 0
0, 0
0, 1
Conclusions & additional results
• Pure Nash equilibrium: (n2)
• Pure Nash eq with unique best response: (n log
n)
• Iterated strict dominance: (n log n)
– Whether or not dominance by mixed strategies allowed
• Iterated weak dominance: (n log n)
– Whether or not dominance by mixed strategies allowed
– O(n log n) only known to hold for nondeterministic
• Backward induction: (n) where n = #nodes in
tree
Future research
• How do specific learning algorithms compare?
• Can we find sensible learning algorithms that achieve the
lower bounds?
– Taking into account agents’ preferences over solutions and over
paths of learning
• If not, can we get stronger lower bounds by restricting
the allowed communication?
– E.g., allow players to only communicate with actions that
appear to give good payoffs
Thank you for your attention!