Transcript RACE

Decision-Theoretic Planning with
(Re)Deployment of Components in
Distributed Real-time & Embedded Systems
Douglas C. Schmidt
[email protected]
Nishanth Shankaran, John S. Kinnebrew,
Gautam Biswas, Dipa Suri, & Adam S. Howell
Research Sponsored by NASA, Lockheed Martin, & Raytheon
Enterprise Distributed Real-time Embedded (DRE) Systems
• Operate under
limited resources
• Tight real-time
performance QoS
constraints
• Dynamic &
uncertain
environments
• Dynamic & changing goals
• Distribution of computation
• Multiple onboard processors
• Task distribution among satellites/processors
• Integration of information
• Data collection – Gizmo agent
• Process data – Science agent
• Downlink to earth – Comm. agent
• Coordinated operation
Limited CPU, memory
& network bandwidth
Possible hardware failure
2
Motivating Application: Earth Science Enterprise Mission
Trigger
Collect Data
Message A
Message B
Process Data
Final Result
Downlink to Earth
System Description
• End-to-end systems-tasks/work-flows represented as operational string
of components
• Classes of operational strings with
respect to importance
• Mission Critical, Mission
Support, & Best Effort
• Operational strings simultaneously share resources
• Strings are dynamically added/removed from the system based on mission & mode
System requirements
1. Automatically & accurately adapt to dynamic changes in requirements & conditions
2. Handle failures arising from system failures
Integrated Solution
Spreading Activation Partial Order Planner
Resource Allocation & Control Engine
(SA-POP) for decision-theoretic planning
(RACE), a dynamic resource management
3
under resource constraints, combined with
framework for DRE systems
SA-POP Research & Development Challenges
Research Challenges
1. Efficiently handle uncertainty in
planning
2. Incorporate resource-aware scheduling
with planning
Development Challenges
1. Take advantage of functionally
interchangeable components to
efficiently meet resource constraints
2. Plan with multiple interacting goals, but
produce distinct operational strings
Probabilistic Domain
Knowledge
Mission
Goals
System
Knowledge
SA-POP
Deployment, Configuration, & Control
SA-POP is available at:
www.dre.vanderbilt.edu/~jkinnebrew/SA-POP
4
SA-POP: Planning in DRE Systems with Components
Task is an abstraction of functionality
Mission
Probabilistic Domain
Goals
Knowledge
• Multiple (parameterized) components
may have the same function but different
resource usage
Task
Task Network specifies probabilistic effects
Network
& requirements for tasks
Spreading
• Condition nodes specify data flow &
Activation
system/environmental conditions
Planning
Scheduling
• Task nodes have links to/from condition nodes
specifying effects/preconditions
SA-POP
• Links incorporate probabilistic information about domains
Task Map allows conversion between tasks & components
Operational
Strings
• Maps tasks (functionality abstraction) to parameterized
components (implementation)
• Associates expected or worst case resource
usage with each implementation
Operational String specifies a component-based
application to achieve a goal
System
Knowledge
Task
Map
Deployment, Configuration, & Control
• Set of tasks along with ordering & timing constraints
• Data connections between tasks
• Implementation (parameterized component) suggested for each task
5
SA-POP: Expected Utility Calculation using Spreading Activation
Forward propagation
of probabilities
Backward propagation
of utilities
ci
c
c
aj
c
Task
node
Precondition
nodes
w ij 
(input data or
system/environment
s
P (a j | c i  true )  P (a sj preconditions)
| c i  false )
P (a sj | c i  true )  P (a sj | c i  false )
ck
Effect nodes
(output data or system/
environment effects)
w jk  P (ck  true | a xj ) or  P (ck  false | a xj )
a xj  a j was executed
Task
Network
Task
Map
a sj  Action a j was successful
Precondition Link Weights
Effect
Link Weights
Spreading
Activation
Planning
Scheduling
SA-POP
6
SA-POP: Operational String Generation
Four hierarchical decision points in each interleaved planning+scheduling step:
Partial Order Planning:
1. Goal/subgoal choice: choose an open condition, which is goal or subgoal
unsatisfied in the current plan.
2. Task choice: choose a task that can achieve current open condition.
Resource Constrained Scheduling:
3. Task instantiation: choose an implementation for this task from the Task Map.
4. Scheduling decision(s): adjust task start/end time windows and/or add
ordering constraints between tasks to avoid potential resource violations.
Continue recursively
Mission
Goals
Task
Network
Planning
Task
Map
Scheduling
SA-POP
7
RACE Research & Development Challenges
Research Challenges
1. Efficiently allocate computing & network
resources to application components
2. Avoid over-utilization of system resources
– ensure system stability
3. Maintain QoS even in the presence of
failure
4. Ensure end-to-end QoS requirements are
met – even under high load conditions
Development Challenges
1. Need multiple resource management
algorithms depending on application
characteristics & current system condition
(resource availability)
Intelligent Mission Planner
(SA-POP)
Operational Strings with
Varying Resource & QoS
Requirements
Resource Allocation and Control
Engine (RACE)
Uniform Interface to
Deploy and Manager
Components
Target Platform with Varying Resource
Availabilities and Capabilities
2. Single resource management mechanism customized for a specific mission goal
or set of mission goals might be effective for that specific scenario
3. However, can not be reused for other scenario  Reinvent the wheel for every
scenario
8
RACE Functional Architecture
• Dynamic resource management framework
atop CORBA Component Model (CCM)
Intelligent Planner
(SA-POP)
middleware (CIAO/DAnCE)
• Can easily be generalized to other
Operational Strings with
Varying Resource &
middleware, e.g., DDS
QoS Requirements
• Allocates components to available
Allocation
Control
resources
Algorithms Resource Allocation & Algorithms
• Configure components to satisfy QoS
Control Engine (RACE)
requirements based on dynamic mission
goals
Uniform Interface to
Deploy and Manager
• Perform run-time adaptation
Components
Monitor and Adapt
Resource Allocation
CIAO/DAnCE
to Application
• Coarse-grained mechanisms
Middleware
Components
• React to new missions, drastic
Deploy on Target
changes in mission goals, or
Domain
unexpected circumstances such as
loss of resources
Target Platform with Varying Resource
Availabilities and Capabilities
• e.g., component re-allocation or
migration
• Fine-grained mechanisms
• Compensate for drift & smaller variations in resource usage
• e.g., adjustment of application parameters, such as QoS settings
9
RACE Software Component Architecture
Input Adapter
• A generic interface that translates
application components descriptors to
internal data structure
Descriptors of assembly/components
to be deployed
Input Adapter
Applicaton Components
Plan Analyzer
• Examines input descriptors & select
appropriate allocation algorithms
based on application characteristics
• Add appropriate application QoS &
resource monitors
Planner Manager
• Maintains a registry of installed
planners (algorithm implementations)
along with their over head & currently
executing sequences of planners that
are generated by the Plan Analyzer
Component
Fetch
Information Component
Ready
Information
Plan Manager
Application Data
Deployment Plan
Plan
Analyzer
Middleware
Framework
Deploy Components
Modify Application Property
Deploy Components/
Modify Components
Allocators
Resource
Utilization
Ready
Domain
Resources
Controllers
System
Resource
Utilization
Resource
Utilization
Target Manager
10
RACE Software Component Architecture
CIAO/DAnCE Middleware
• A CCM middleware framework atop of
which components of the operational
strings are deployed
Descriptors of assembly/components
to be deployed
Input Adapter
Applicaton Components
Target Manager
• Runtime resource utilization monitor
that tracks the utilization of system
resources, such as onboard CPU,
memory, & network bandwidth
utilization.
Component
Fetch
Information Component
Ready
Information
Plan Manager
Application Data
Deployment Plan
Plan
Analyzer
Middleware
Framework
Deploy Components
Modify Application Property
Deploy Components/
Modify Components
Allocators
Resource
Utilization
Ready
Domain
Resources
Controllers
System
Resource
Utilization
Resource
Utilization
Target Manager
11
RACE: Addressing Dynamic Resource Management Challenges
Resource Utilization
Application QoS Information
Need for a control framework
• Allocation algorithms allocates
resource to components based on
current system condition &
estimated resource requirements
• No accurate apriori knowledge of
input workload & how the
execution time depend on input
workload
Target
Manager
Resource
Utilization
CPU
Monitors
• Dynamic changes in system operating
modes
Control objectives:
• Ensure end-to-end QoS requirements are
met at all times
• Ensure resource utilization is below the
set-point – ensure system stability
QoS
Manager
RACE
Controller
OS
Control
Agent
RACE
Control
Agent
Application
Control
Agents
QoS
Information
Application
QoS
Monitors
• RACE Controller: Reallocates
resources to meet control objectives
• RACE Control Agents: Maps resource
reallocation to OS/application specific
parameters
RACE is available with the Component-Integrated ACE ORB (CIAO)
(deuce.doc.wustl.edu/Download.html)
12
Experimentation Results – Hardware / Software Testbed
Node 1
Node 3
Node 2
Mission Critical
Operational String
Node 5
Node 4
Node 6
Best-Effort
Operational String
• Experiments were performed on the ISISLab testbed at Vanderbilt University
(www.dre.vanderbilt.edu/ISISlab)
• Hardware Configuration
• 6 nodes with 2.8 GHz Intel Xeon dual processor, 1 GB physical memory,
1Ghz Ethernet network interface, & 40 GB hard drive
• Software Configuration
• Redhat Fedora Core Release 4 operating
• ACE+TAO+CIAO Middleware
• Two operational strings - one mission-critical, one best effort -with 6
components each were deployed on 6 nodes
13
Experimentation Results – Performance Analysis
• An end-to-end deadline of 500 ms was
specified for the mission-critical
operational string
• Mission critical string was deployed at
time T = 0s, & best-effort was deployed
at time T = 1800 sec
• Until T = 1800 sec, end-to-end
execution time of mission critical string
is lower than its deadline
• At T = 1800 sec, end-to-end execution
time of mission critical string is way
above its deadline
Execution
Deadline
Time
• This is due to excessive resources consumption by best-effort string
• RACE reacts to increase in execution time by perform adaptive system control
modifications by modifying operating system priority, scheduler class and/or tearing
down lower priority operational string(s)
RACE ensures end-to-end deadline of mission critical string is met
even under fluctuations in resource availability/demand
14
Lessons Learned from SA-POP & RACE Integration
Flexible
• Pluggable resource allocation & control Mission
Goals
algorithms in RACE
• SA-POP task network & task map
tailored to domain
Mission
Scientists
• Unlimited combinations of goals, goal
priorities, & timing requirements in SA-POP
• Shared task map allows substitution of
functionally equivalent task implementations
by RACE
SA-POP
Spreading
Activation
Operational
Strings
Planning +
Scheduling
Task Map
Task
Network
Domain
Experts
Deployment/Mission
Feedback
Deployment, Configuration & Control
Mechanism
Allocation
Algorithms
RACE
Control
Algorithms
Uniform
Interface
Scalable
Resource
Application
to deploy and manage components
Utilization
Performance
• Separation of concerns between SA-POP & Data
Data
Component Middleware Infrastructure
RACE limits search spaces in each
(CIAO/DAnCE)
• SA-POP handles cascading planning
Deploy and manage components
choices in operational string generation
• SA-POP only considers resource allocation
Resource
Application
Satellite System
Monitors
feasibility with course-grained (system-wide) Monitors
resource constraints
• RACE handles resource allocation optimization with fine-grained (individual
processing node) resource constraints & dynamic control for fixed operational
strings
15
Lessons Learned from SA-POP & RACE Integration
SA-POP
Dynamic
• SA-POP task network with spreading
activation provides expected utility
information for generating robust
applications in uncertain environments
Spreading
Activation
Mission
Goals
Mission
Scientists
Operational
Strings
• SA-POP replanning achieved efficiently
with incrementally updated task network
& plan repair as necessary
Task Map
Task
Network
Domain
Experts
Deployment/Mission
Feedback
Deployment, Configuration & Control
Mechanism
• RACE control algorithms alleviate need
for replanning in many cases
• RACE provides reallocation &
redeployment of revised operational
strings when replanning is necessary
Planning +
Scheduling
Allocation
Algorithms
Resource
Utilization
Data
RACE
Control
Algorithms
Uniform
Interface
to deploy and manage components
Application
Performance
Data
Component Middleware Infrastructure
(CIAO/DAnCE)
Deploy and manage components
Resource
Monitors
Satellite System
Application
Monitors
16