GSMDPs for Multi-Robot Sequential Decision Making

Download Report

Transcript GSMDPs for Multi-Robot Sequential Decision Making

By: Messias, Spaan, Lima
Presented by: Mike Plasker
DMES – Ocean Engineering
Introduction
 Robotic Planning under uncertainty
 MDP solutions
 Limited real-world application
Assumptions for Multi-Robot teams
 Communication (Inexpensive, free, or costly)
 Synchronous and steady state transitions
 Discretization of environment
A Different Approach
 States and actions discrete (like MDP)
 Continuous measure of time
 State transitions regarded as random ‘events’
Advantages
 Non-Markovian effects of discretization minimized
 Fully reactive to changes
 Communication only required for ‘events’
GSMDPs
 Generic temporal probability distributions over events
 Can model concurrent (persistently enabled) events
 Solvable by discrete-time MDP algorithms by
obtaining an equivalent (semi-)Markovian model
 Avoids negative effects of synchronous alternatives
Why GSMDPs for Robotics
Cooperative Robotics requires:
 Operation in inherently continuous environments
 Uncertainty in actions (and observations)
 Joint decision making for optimization
 Reactive
Definitions
multiagent GSMDP: tuple <d, S, X, A, T, F, R, C, h>
d = number agents
S = state space (contains state factors)
X = state factors
A = set of joint actions
T = transition function
F = time model
R = instantaneous reward function
C = cumulative reward rate
h = planning over continuous time
Definitions
Event in a GSMDP:
An abstraction to state transitions that share the same
properties
Persistently enabled events:
Events that are enabled from step ‘t’ to step ‘t+1’, but not
triggered at step ‘t’
Common Approach
 Synchronous action
 Pre-defined time step
• Performance
• Reaction time
GSMDPs
 Persistently enabled events modeled by allowing their
temporal distributions to depend on the time they
were enabled
 Explicit modeling of non-Markovian effects from
discretization
 Communication efficiency
Modeling Events
 Group state transitions as events to minimize temporal




distributions and transitions(battery low)
Transition function found by estimating relative
frequency of each transition in the event
Time model found by timing the transition data
Approximated as a phase-type distribution
Replaces events with acyclic Markov chains
Events (cont.)
 Not always possible
 Decompose events with minimum duration into
deterministically timed transitions
 Can then better approximate using phase-type
distribution
Solving a GSMDP
 Can be viewed as an equivalent discrete-time MDP
 Almost all solution algorithms for MDPs work
Experiment
 Robotic soccer
 Score a goal (reward 150)
 Passing around obstacle (reward 60)
Results
MDP: T = 4s
GSMDP
Results
 No idle time
 Reduced
communication
 Improved scoring
efficiency
 System failures (zero
goals) independent
of model
Example Video
Future Work
 Extend to partially observable domains
 Apply bilateral phase distributions to increase the class
of non-Markovian events that are able to be modeled
Questions?
MESSIAS, J.; SPAAN, M.; LIMA, P.. GSMDPs for Multi-Robot Sequential DecisionMaking. AAAI Conference on Artificial Intelligence, North America, jun. 2013. Available
at: <http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6432/6843>. Date
accessed: 06 Apr. 2014