GSMDPs for Multi-Robot Sequential Decision Making
Download
Report
Transcript GSMDPs for Multi-Robot Sequential Decision Making
By: Messias, Spaan, Lima
Presented by: Mike Plasker
DMES – Ocean Engineering
Introduction
Robotic Planning under uncertainty
MDP solutions
Limited real-world application
Assumptions for Multi-Robot teams
Communication (Inexpensive, free, or costly)
Synchronous and steady state transitions
Discretization of environment
A Different Approach
States and actions discrete (like MDP)
Continuous measure of time
State transitions regarded as random ‘events’
Advantages
Non-Markovian effects of discretization minimized
Fully reactive to changes
Communication only required for ‘events’
GSMDPs
Generic temporal probability distributions over events
Can model concurrent (persistently enabled) events
Solvable by discrete-time MDP algorithms by
obtaining an equivalent (semi-)Markovian model
Avoids negative effects of synchronous alternatives
Why GSMDPs for Robotics
Cooperative Robotics requires:
Operation in inherently continuous environments
Uncertainty in actions (and observations)
Joint decision making for optimization
Reactive
Definitions
multiagent GSMDP: tuple <d, S, X, A, T, F, R, C, h>
d = number agents
S = state space (contains state factors)
X = state factors
A = set of joint actions
T = transition function
F = time model
R = instantaneous reward function
C = cumulative reward rate
h = planning over continuous time
Definitions
Event in a GSMDP:
An abstraction to state transitions that share the same
properties
Persistently enabled events:
Events that are enabled from step ‘t’ to step ‘t+1’, but not
triggered at step ‘t’
Common Approach
Synchronous action
Pre-defined time step
• Performance
• Reaction time
GSMDPs
Persistently enabled events modeled by allowing their
temporal distributions to depend on the time they
were enabled
Explicit modeling of non-Markovian effects from
discretization
Communication efficiency
Modeling Events
Group state transitions as events to minimize temporal
distributions and transitions(battery low)
Transition function found by estimating relative
frequency of each transition in the event
Time model found by timing the transition data
Approximated as a phase-type distribution
Replaces events with acyclic Markov chains
Events (cont.)
Not always possible
Decompose events with minimum duration into
deterministically timed transitions
Can then better approximate using phase-type
distribution
Solving a GSMDP
Can be viewed as an equivalent discrete-time MDP
Almost all solution algorithms for MDPs work
Experiment
Robotic soccer
Score a goal (reward 150)
Passing around obstacle (reward 60)
Results
MDP: T = 4s
GSMDP
Results
No idle time
Reduced
communication
Improved scoring
efficiency
System failures (zero
goals) independent
of model
Example Video
Future Work
Extend to partially observable domains
Apply bilateral phase distributions to increase the class
of non-Markovian events that are able to be modeled
Questions?
MESSIAS, J.; SPAAN, M.; LIMA, P.. GSMDPs for Multi-Robot Sequential DecisionMaking. AAAI Conference on Artificial Intelligence, North America, jun. 2013. Available
at: <http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6432/6843>. Date
accessed: 06 Apr. 2014