Seminar00 - University of Virginia

Download Report

Transcript Seminar00 - University of Virginia

Seminar on
Coordinated Systems
SYS 793
Fall 2004
Department of Systems and
Information Engineering
University of Virginia
Motivation
• We are increasingly reliant on large-scale,
distributed engineering systems: Internet, national
power grid, etc.
– coordination is often achieved in engineering systems through the
specification of ad hoc protocols for relatively well defined
(constrained) interactions between distributed systems.
– existing protocols are not adequately tuned for new applications
and/or unexpected situations
– there appears to be little in the way of underlying guiding principles
and theory for designing and operating such systems
– Notions of game theory and decentralized control go only part way
toward revealing the basic problems associated with distributed
engineering systems
2
SYS 793
• In this seminar:
– we will explore the recent literature on the
intersections between game theory,
distributed planning in robotics and artificial
intelligence, and distributed control.
– Faculty and students are expected to present
a paper (to be chosen from the list given
below).
• After each presentation we shall have a discussion
session where the main contributions of each
paper will be critically assessed.
3
Parameters
• Time:
– Wednesdays, 5-6:15
• Place:
– Olsson 005
• Credit:
– One hour, pass/fail
• Registration:
– Schedule number 95083 (SYS 793, Section 2)
• Webpage:
https://toolkit.itc.virginia.edu/cgilocal/tk/UVa_SEAS_2004_Fall_SYS793-2
4
Student Responsibilities
• To earn a passing grade, each student will:
– prepare a presentation on at least one
approved paper
– lead subsequent discussion on salient features
of the paper
– hand over slides for publication on the course
website
5
Partial List of Approved Papers
R. Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1:67–96, 1974.
H. Bui, S. Venkatesh, and D. Kieronska. A framework for coordination and learning among teams of agents. Lecture
Notes in Computer Science, 1441:164–178, 1998.
J. Doran, S. Franklin, N. Jennings, and T. Norman. On cooperation in multi-agent systems. The Knowledge Engineering
Review, 12(3):309–314, 1997.
G. Ellison. Learning, local interaction and coordination. Econometrica, 61:1047–1071, 1993.
D. Gauthier. Coordination. Dialogue, 14:195–221, 1975.
D. Gilbert. Rationality and salience. Philosophical Studies, 57:61–77, 1989.
V. Gervasi and G. Prencipe. Robotic cops: The intruder problem. In Proceedings of IEEE International Conference on
Systems, Man, and Cybernetics, 2003.
M. Kandori, G. Mailath, and R. Rob. Learning, mutation and long-run equilibria in games. Econometrica, 61:29–56,
1993.
G. Prencipe. Corda: Distributed coordination of a set of autonomous mobile robots. In Proceedings of the Fourth
European Research Seminar on Advances in Distributed Systems, pages 185–190, 2001.
P. Vanderschraaf. Learning and coordination: inductive deliberation, equilibrium and convention. Routledge, 2001.
X. Wang and T. Sandholm. Reinforcement learning to play an optimal nash equilibirum in team markov games. In
Advances in Neural Information Processing Systems 15 (NIPS-2002), 2002.
G. Weiss. Multiagent Systems: a Modern Approach to Distributed Artificial Intelligence. The MIT Press, Cambridge,
1999.
6
Agenda
• Coordinated Systems Research Group
(CSRG)
• Introduction to “Coordinated Systems”
7
Coordinated Systems
Research Group
CSRG Mandate
• The CSRG focuses on issues of coordination in large
scale decentralized engineering systems.
– Engineering systems are increasingly reliant on coordination,
more than on (centralized) optimization processes and/or
control.
– Typically, coordination is achieved through ad hoc protocols for
relatively well defined (constrained) interactions between
distributed systems.
– However, as information systems become more integrated into
society, we find that existing protocols aren’t adequately tuned
for new applications and/or unexpected situations.
• CSRG goals:
– Develop theory for achieving coordination with limited
communication and without the benefit or predefined protocols
– Transform theory into practice by developing and evaluating
prototype applications
9
CSRG Activities
• Distributed decision-making without communication
– fault tolerant and adaptive construction of overlay networks for
group communication and collaboration.
– remote sensing.
– robotic coordination.
• Auction mechanisms for “on-demand” IT services
– market mechanisms for efficient allocation of distributed
computation and communication services
• Fictitious play in Internet traffic engineering
– account for competing interests of end-to-end connections within
and across Internet autonomous systems
10
Peter Beling
• Associate Professor
– Department of Systems and
Information Engineering
• Research:
– Financial engineering
– Optimization theory &
computational complexity
• Funded Projects:
– Solution Concepts for Static
Coordination Problems
• NASA LaRC Grant NNL-04-AA66G
11
Alfredo Garcia
• Assistant Professor
– Department of Systems and
Information Engineering
• Research:
– Modeling and control of
communications networks
– Stochastic Optimization and
Optimal Control
• Funded Projects:
– Complex Networks
Optimization
• NSF Grant DMI-0217371
– Security of Supply & Strategic
Learning in Restructured
Power Markets
• NSF Grant ECS-0224747
12
Stephen Patek
• Assistant Professor
– Department of Systems and
Information Engineering
• Research:
– Modeling and control of
communications networks
– Stochastic Optimization and
Optimal Control
– Coordination Processes
• Funded Projects:
– Dynamic Coordination Processes
for Distributed Planning with
Limited Communication, NSF
(DST-0414727)
13
Current Students
• Himanshu Gupta
• Kaushik Sinha
• Yijia Zhao
14
Introduction to Coordinated
Systems
Example: Dial / Wait
• Two-Players, Two Actions:
Dial
Wait
Dial
Wait
• What makes sense?
– Two “coordinated minimum-cost solutions” :
(Dial, Wait) and (Wait, Dial).
– Unfortunately, neither player knows which one to select.
16
Some Thoughts
• Arbitrarily selecting an action is irrational!
– If Player 1 arbitrarily chooses to Dial, Player 2 could also arbitrarily
choose to Dial, resulting in a worst case outcome.
– Worst case solutions are easy to achieve arbitrarily (without
coordination).
• Randomly selecting an action makes more sense!
– If Player 1 chooses to Dial with probability p and Player 2 chooses to
Dial with probability q, then the expected cost of the outcome is
V ( p, q )  E[Cost | p, q ]
 pq  (1  p )(1  q )
 Pr(Not connecting )
– Only in introducing “mixed actions” (randomized decisions) is it
meaningful to talk about expected cost.
17
Interesting Observations
• Suppose Player 1 chooses p = .5, then
E[Cost | .5, q]  .5q  (1  .5)(1  q)  .5[q  (1  q)]  .5
In other words, regardless of Player 2’s decision, Player 1 is able to “lock
in” an expected cost of .5.
• Suppose Player 2 chooses q = .5, then
E[Cost | p,.5]  p(.5)  (1  p)(1  .5)  .5[ p  (1  p)]  .5
In other words, regardless of Player 1’s decision, Player 2 is able to lock in
an expected cost of .5.
18
A Strong Equilibrium
Each player has the ability to lock in an
expected have of .5, namely by choosing p = .5
and q = .5, respectively.
– In fact, given p = .5 and q = .5, neither player has
the ability to change (let alone) improve the
expected cost of the solution.
– So, p = .5 and q = .5 constitutes a strong
equilibrium solution for the Dial / Wait problem.
19
Connection to Game Theory
• The Dial / Wait problem is a finite, two-player, noncooperative, game in strategic form (with identical
interests)
• For the Dial / Wait game there are two pure strategy
Nash equilibria:
– (Dial, Wait)
– (Wait, Dial)
and exactly one mixed (non-pure) strategy mixed
Nash equilibrium:
– p = .5, q = .5.
20
Remarks
• The pure strategy equilibria are completely uninteresting.
– They can only be achieved if the players are allowed to coordinate their
actions.
• Minimax is correct here, but it is not a reasonable approach in
general.
• The mixed strategy equilibrium makes a lot of sense.
– It achieves a decent value of expect cost and is not the result of an
arbitrary decision.
– In fact, all that’s required in this example is that one of the two players
play their equilibrium strategy.
• But, in general, there can be multiple mixed strategy
equilibria, and it is not obvious which one each player would
select.
21
N-Agent, Multistage Problems
Uncoordinated Decision-Making
We consider N-agent multistage decision
situations, where
– all N agents must select an available action at each
stage, without coordinating their actions in
advance (simultaneous decision making)
– all N agents perceive the same cost (disutility)
associated the joint selection of actions at each
stage.
23
Key Idea
Since all agents share the same notion of cost,
they would coordinate their actions (if they
could) to pick out a minimum-cost joint
solution.
– This would an easy, even without coordination, if
there were a single minimum-cost joint selection
of actions.
– Unfortunately, if more than one “coordinated
minimum-cost solution” exists, then there may not
be a clear course of action for all agents.
24
Dynamic Version of the Dial / Wait
Problem
(Dial, Dial)
(Wait, Wait)
not connected
1
(Dial, Wait)
(Wait, Dial)
connected

• Both players decide to Dial or Wait in stages.
– If both decide to Dial or if both decide to Wait, then they remain
unconnected.
• Both players are interested in reaching  in as few stages as
possible.
25
Formalization
• Statespace, X
– Set of conditions associated with the operation of an underlying system.
• Mixed Action sets, Ai(x)
– Actions available to player i when the system is in state x 2 X.
• Transition reward function, gx(a1, …, aN)
– Expected reward (perceived equally by all players) associated with the profile
of mixed actions (a1, …., aN) at state x 2 X.
• State transition probability, pxy(a1, …., aN)
– Probability of transitioning from x 2 X to y 2 X under the profile of actions
(a1, …, aN).
• Time horizon, T
– Number of stages of decision-making before the process ends.
26
Network Connection Recovery
Example
Set Up
• Consider a network in which n connections are
served by a direct link between two nodes A and B.
m
n
A
n-m
…
…
n connections
B
• Suppose two alternative links exist with capacities m and
(n-m), respectively.
• Note that all n connections can still be accommodated, but how
28
should they re-route themselves?
Dynamic Recovery
• The initial state of this process is < m, n-m >.
• Rules:
– If k < m connections select the top link, then those k connections
are satisfied, but the other link is overwhelmed.
• The system transitions to < m-k, n-m >, with n-k connections left to
be satisfied.
• We are left with a network routing problem similar to the original one
but involving the same or fewer unsatisfied connections.
– If k = m connections select the top link, then all n connections are
satisfied.
• The system transitions to 
– If k > m connections select the top link, then the top link is
overwhelmed, but the n-k other connections are satisfied.
• The system transitions to < m, k-m >, with k connections left to be
satisfied.
• We are left with a network routing problem similar to the original one
but involving the same or fewer unsatisfied connections.
• All players randomize their selection of links.
29
Remote Sensing
Example
Problem Overview
• N sensing platforms
– Identical in capability
• M targets
• Autonomous planning
• Strict collision avoidance protocol
– Impossible for two or more platforms investigate a
single target simultaneously
• Finite Time Horizon, T
31
Simple Example
• Parameters
– 3 UAVs
– 5 points of interest
– 2 sensing opportunities
• Stage 1
– Green UAV successfully
reaches its target
– Blue and Red UAVS
compete and fail to make
the necessary
observations
32
Simple Example
• Stage 2
– Green and Blue UAVs
compete a new target
– Red UAV is successful
• Summary
– We end up only
observing two targets!
– (With coordination, we
could have arranged for
all points of interest to be
revealed.)
33
Research Question
What decision rule should we implement
within each sensing platform so that we gather
as much information as possible within the
available time, without requiring the platforms
to coordinate their actions?
34
SYS 793 Topics of Interest
Topics of Interest
• Game Theory
– solution concepts for rational decision-making
– alternatives to Nash and correlated equilibria for noncooperative games
– equilibrium selection
• Learning algorithms for games
– including fictitious play
– reinforcement learning
• Multi-Agent Frameworks/Applications
– Robotic Cops, Robo-anything
– Team theory
• Philosophy of Coordination
36