Transcript Chapter19

Design of Distributed Real-Time
Systems
Ramani Arunachalam
Case Study: MARS
●
MARS (Maintainable Real-time system)
–
Distributed, fault-tolerant, hard real-time
–
Objectives
–
●
Guaranteed timeliness
●
Testability
●
Maintainability
●
Fault-tolerance
●
Systematic software development
Time-triggered architecture
Objectives
●
●
Guaranteed timeliness
–
Based on resource adequacy at peak load
–
Statistical assurances not enough
Testability
–
●
Architecture should support testability of timeliness
Maintainability
–
Needed to remedy hardware faults, design errors and
respond to change requests
–
Localized consequences -> minimized effort
Objectives
●
●
Fault Tolerance
–
Redundancy
–
On-line maintenance
Systematic software development
–
No 'trial and error' integration
–
OS guarantees predictable temporal behaviour
State View
●
Time Triggered observation of states
–
●
Observe RT entities at predefined intervals
Intelligent input output
–
Observation grid
–
Intelligent sensor
●
Preprocesses raw data from input device
●
observes at finer granularity called Perception granularity
State View
●
Intelligent actuator
–
●
Post-processes data from computer system before
sending to output device
State Messages
–
Produced at observation points
–
Minimal synchronization requirement
–
No need for buffer management
–
Unidirectional (from RT entity)
Structure
●
●
Clusters
–
Autonomous subsystems
–
Disjoint name spaces
–
State message exchanges
–
Composed of Fault-tolerant units (FTUs)
–
Real-time communication channel (TDMA)
FTU
–
Composed of replicated components
–
Active and shadow components
FTU
FTU
Structure
●
●
Component
–
Smallest replaceable unit
–
Fail-silent (Correct results or none)
–
Termination upon failure
Task Execution
–
Task : Software inside component
–
Starts at predefined time
–
Proceeds without any communication or
synchronization
–
Execution time is deterministic
Operation
●
●
●
●
●
●
Results of periodic tasks sent as state messages
Execution time of communication is also
predefined
A Real-time transaction is a progression of
processing and communication actions between a
stimulus from and a response to the environment.
Static scheduling (at compile time!)
At run-time, no surprises
Modes (operating, emergency)
Fault-tolerance
●
Two levels of redundancy
●
Active redundancy at FTU level
–
●
Time redundancy at component level
–
●
●
If a component fails, standby becomes active
Every task is executed twice and results compared
TDMA monitor
–
Monitors temporal behaviour
–
Controls the output from component
Distributed clock synchronization
Fault-tolerance
●
●
Replica determinism
–
All replicated components perform the same state
changes at the same point in time
–
Prohibit reading of local time
–
All replicas should agree when to change mode
Component reintegration
–
i-state, h-state
–
Reintegration point: when size of h-state is small
–
New component gets the h-state at this point
Summary
●
●
Maintenance
–
Failed component doesn't affect FTU
–
On-line reintegration after repair
–
Change in software
●
Does it fit in current schedule?
●
Otherwise, new mode with new schedule
Summary
–
Strict separation of functionality, timeliness and
dependability.
–
Designed for temporal behaviour, testing simplified.
Delta-4 XPA
●
Objectives
–
“A real-time system is not assured to meet deadlines
outside operational envelope”
–
Bounded-demand school
–
–
●
operational envelope is predictable
●
Impractical assumption for complex systems
Unbounded-demand school
●
Complete definition of operational envelope is not possible
●
Graceful degradation if it falls outside the envelope
XPA implements hard real-time but falls into besteffort behaviour when required.
DELTASE
Group management Layer
Time and Group communication
Abstract network layer
(physical + MAC+ firmware)
Architecture
●
●
●
Network infrastructure
–
FDDI supports urgent traffic, built-in fault tolerance
–
Token bus/ring has media redundancy for availability
Time
–
Internal time maintained by distributed time server
–
Clocks synchronized to tens of microseconds
–
External time – one of the standard time
Group communication
–
Services from atomic multicast to datagram
–
Very fast services of varying reliability
Architecture
●
Group communication
–
Distributed replication management
●
BestEffortN – guarantee delivery to N elements
●
BestEffortTo - guarantee delivery to named elements
●
●
AtLeastN, atLeastTo – guaranteed service even when
sender fails
Group management
–
Distributed Group manager object
–
Management and distribution of groups of objects
–
Incorporates knowledge of various modes of
replication
Architecture
●
●
Application support environment (Deltase)
–
Client-server and producer-consumer interactions
–
Apps written using deltase or converted using
preprocessors
Timeliness
–
What to do under overload conditions?
●
●
Static off-line scheduling – too many possibilities
On-line scheduling – can find feasible schedules if not
overload.
Timeliness
●
●
Scheduling policy uses “precedence”
–
Combination of priority and earliest-deadline
–
Few priority classes to avoid unfairness
–
Within priority class, earliest-deadline-first.
Design-time and run-time timeliness
–
Targetline : instant chosen by designer for provision
of service
–
Liveline and deadline: earliest and latest time at which
service may be provided
–
Violation of these detected at runtime and design-time
actions defined.
Preemption
●
Leader-follower model for replication
–
Decisions made by a privileged replica i.e. Leader
–
Preemption point
●
–
Point at which an interrupt will be served
High precedence msg arrives for a process not
running currently
●
Increase the process's precedence to that of msg
●
Causes the process to be scheduled
●
These actions propogated to followers
●
Followers perform identical operations
Desynchronization
●
Followers must not be too apart from leaders
●
Followers too fast
●
–
Reach the preemption point before leader
–
remain blocked until leader notifies
Followers too slow
–
Leader timestamps notifications
–
If follower didn't execute the action by T+t(desync)
●
Desynchonization event raised
●
Another follower takes over
Summary
●
Communication support using groups
–
●
●
Oriented to distributed computing
Tradeoffs between QOS and efficiency
–
Group mgr uses atomic multicast for orderly delivery
–
Leader-follower uses reliable, non-ordered delivery
Group management service
–
Executes leader-follower, detects replica failure
–
Clone the replica at another node.