CS514: Intermediate Course in Operating Systems

Download Report

Transcript CS514: Intermediate Course in Operating Systems

Reliable Distributed Systems
Real Time Systems
Topics for this lecture

Adding clocks to distributed systems





Also, to non-distributed ones
We’ll just touch on real-time operating systems
and scheduling; an active area until recently
Recent machines are so fast that importance of
topic is now reduced…
Using time in broadcast protocols
Comparison of CASD with gossip
Many systems need real-time



Air traffic control talks about “when” planes
will be at various locations
Doctors talk about how patients responded
after drug was given, or change therapy after
some amount of time
Process control software runs factor floors by
coordinating what machine tools do, and
when
Many systems need real-time


Video and multi-media systems need
isochronous communication protocols that
coordinate video, voice, and other data
sources
Telecommunications systems must guarantee
real-time response despite failures, for
example when switching telephone calls
Real time in real systems



These real systems combine the need for logical
consistency and coordination (for control) with the
need for physical coordination and timely behavior
Issue becomes one of integrating real-time tools and
support with logical tools and support such as we
have already considered
Not real-time or logical time, but want both for
different aspects of a single application!
Clock Synchronization and
Synchronous Systems



Up to now, we restricted attention to logical
notions of time: “a happens before b”
But recall that we also touched on real clocks
Two views of real-time:


Supporting clocks that programs can consult as
needed
Making direct use of real-time inside protocols
Clock Synchronization



Topic was extremely active during early
1980’s
Best known algorithms include the one in
OSF/1 UNIX (based on one by Marzullo at
Xerox), the “optimally accurate” algorithm of
Srikanth and Toueg, and the “probabilistic”
algorithm of Cristian
Introduction of Global Positioning System is
eliminating the need for this form of
synchronization
Clock synchronization
protocols


Would like to think of network as having a
single “clock” that all processes can consult
But need to implement this using many
“local” clocks that can:



Initially be set incorrectly
Drift over the course of time
Clock synchronization tries to keep clocks
close to true real-time and minimize their
tendency to drift
Precision and Accuracy



Accuracy measures local clocks relative to an external
source of accurate real-time. Accurate clocks are
close to real-time
Precision measures local clocks relative to each other.
Precise clocks are close to each other
Skew is the numerical limit on the maximum distance that correct clocks can drift apart. E.g. could
say the “maximum skew is 1sec” for some system
How clock synchronization used
to work



Periodically, all processors would run a clock
sync protocol, for example by broadcasting
the reading from their clocks
Each receives a set of values from the others
(sets may differ due to faults!)
Algorithm would pick a synchronized value
from the set; analysis used to prove
properties of clocks
Global Positioning System




Satellite system launched by military in early
1990’s, became public and inexpensive
Can think of satellites as broadcasting the
time
Small radio receiver picks up signals from
three satellites and triangulates to determine
position
Same computation also yields extremely
accurate clock (accurate to a few
milliseconds)
Clock synchronization with
GPS




Put two GPS receivers (or more) on a
network
Periodically, receivers broadcast the “true
time”
Other machines only need to adjust their
clocks to overcome propagation delays for
clock sync messages through the network!
Well matched to the a-posteriori clock
synchronization approach
Basic idea




GPS receiver broadcasts “the time is now
10:00” on a broadcast network (ethernet)
Receivers note the time when they receive
the message: [10:01, 9:58, ....] and reply
with values
GPS receiver subtracts the median value
Differences [1, -2, ...] now give the drift of
the clock of the destination relative to the
median clock
A-posteriori method,
adjustment stage



Now we distribute these drift values back to
the processors, which compensate for the
rate of drift over the time period to the next
synchronization
Can show that this yields clocks that are
optimal both in accuracy and precision
A processor with a broken clock has a good
chance of discovering it during
synchronization
Using real-time


One option is to use a real-time operating
system, clock synchronization algorithm, and
to design protocols that exploit time
Example: MARS system uses pairs of
redundant processors to perform actions
fault-tolerantly and meet deadlines. Has
been applied in process control systems.
(Another example: Delta-4)
Features of real-time
operating systems

The O/S itself tends to be rather simple


They are structured in terms of “tasks”




Big black boxes behave unpredictably
A task is more or less a thread
But typically come with expected runtime,
deadlines, priorities, “interruptability”, etc
User decomposes application into task-like
component parts and then expresses goals in
a form that RTOS can handle
Widely used on things like medical devices
RTOS can be beneficial



Lockheed Martin
ATL timed CORBA
method invocations
Variation in
response time was
huge with a normal
Linux OS
When using a
Timesys RTOS the
variability is
eliminated!
Real-time broadcast protocols


Can also implement broadcast protocols that
make direct use of temporal information
Examples:



Broadcast that is delivered at same time by all
correct processes (plus or minus the clock skew)
Distributed shared memory that is updated within
a known maximum delay
Group of processes that can perform periodic
actions
A real-time broadcast
p0
p1
p2
p3
p4
p5
t+a
t
t+b
*
*
*
*
*
Message is sent at time t by p0. Later both p0 and p1 fail. But
message is still delivered atomically, after a bounded delay, and
within a bounded interval of time (at non-faulty processes)
A real-time distributed shared
memory
p0
p1
p2
t
set x=3
t+a
t+b
x=3
p3
p4
p5
At time t p0 updates a variable in a distributed shared memory.
All correct processes observe the new value after a bounded
delay, and within a bounded interval of time.
Periodic process group:
Marzullo
p0
p1
p2
p3
p4
p5
Periodically, all members of a group take some action.
Idea is to accomplish this with minimal communication
The CASD protocols



Also known as the “ -T” protocols
Developed by Cristian and others at IBM, was
intended for use in the (ultimately, failed) FAA
project
Goal is to implement a timed atomic
broadcast tolerant of Byzantine failures
Basic idea of the CASD
protocols




Assumes use of clock synchronization
Sender timestamps message
Recipients forward the message using a
flooding technique (each echos the message
to others)
Wait until all correct processors have a copy,
then deliver in unison (up to limits of the
clock skew)
CASD picture
p0
p1
p2
p3
p4
p5
t+a
t
t+b
*
*
*
*
*
p0, p1 fail. Messages are lost when echoed by p2, p3
Idea of CASD




Assume known limits on number of processes that
fail during protocol, number of messages lost
Using these and the temporal assumptions, deduce
worst-case scenario
Now now that if we wait long enough, all (or no)
correct process will have the message
Then schedule delivery using original time plus a
delay computed from the worst-case assumptions
The problems with CASD



In the usual case, nothing goes wrong, hence
the delay can be very conservative
Even if things do go wrong, is it right to
assume that if a message needs between 0
and ms to make one hope, it needs [0,n*  ]
to make n hops?
How realistic is it to bound the number of
failures expected during a run?
CASD in a more typical run
p0
p1
p2
p3
p4
p5
t
t+a
t+b
*
*
*
*
*
*
... leading developers to employ more
aggressive parameter settings
p0
p1
p2
p3
p4
p5
t
t+a
t+b
*
*
*
*
*
*
CASD with over-aggressive paramter settings
starts to “malfunction”
p0
p1
p2
p3
p4
p5
t
t+a
t+b
*
*
*
*
all processes look “incorrect” (red) from time to time
CASD “mile high”


When run “slowly” protocol is like a real-time
version of abcast
When run “quickly” protocol starts to give
probabilistic behavior:

If I am correct (and there is no way to know!)
then I am guaranteed the properties of the
protocol, but if not, I may deliver the wrong
messages
How to repair CASD in this
case?


Gopal and Toueg developed an extension, but
it slows the basic CASD protocol down, so it
wouldn’t be useful in the case where we want
speed and also real-time guarantees
Can argue that the best we can hope to do is
to superimpose a process group mechanism
over CASD (Verissimo and Almeida are
looking at this).
Why worry?



CASD can be used to implement a distributed
shared memory (“delta-common storage”)
But when this is done, the memory
consistency properties will be those of the
CASD protocol itself
If CASD protocol delivers different sets of
messages to different processes, memory will
become inconsistent
Why worry?



In fact, we have seen that CASD can do just
this, if the parameters are set aggressively
Moreover, the problem is not detectable
either by “technically faulty” processes or
“correct” ones
Thus, DSM can become inconsistent and we
lack any obvious way to get it back into a
consistent state
Using CASD in real
environments




Would probably need to set the parameters
close to the range where CASD can
malfunction, but rarely
Hence would need to add a self-stabilization
algorithm to restore consistent state of
memory after it becomes inconsistent
Problem has not been treated in papers on
CASD
pbcast protocol does this
Using CASD in real
environments

Once we build the CASD mechanism how
would we use it?



Could implement a shared memory
Or could use it to implement a real-time state
machine replication scheme for processes
US air traffic project adopted latter approach

But stumbled on many complexities…
Using CASD in real
environments

Pipelined computation

Transformed computation
Issues?


Could be quite slow if we use conservative
parameter settings
But with aggressive settings, either process
could be deemed “faulty” by the protocol

If so, it might become inconsistent



Protocol guarantees don’t apply
No obvious mechanism to reconcile states within
the pair
Method was used by IBM in a failed effort to
build a new US Air Traffic Control system
Similar to MARS

Research system done in Austria by Hermann
Kopetz




Basic idea is that everything happens twice
Receiver can suppress duplicates but is
guaranteed of at least one copy of each message
Used to overcome faults without loss of real-time
guarantees
MARS is used in the BMW but gets close to a
hardware f.tol. scheme
Many more issues….




What if a process starts to lag?
What if applications aren’t strictly deterministic?
How should such a system be managed?
How can a process be restarted?


If not, the system eventually shuts down!
How to measure the timing behavior of
components, including the network
FAA experience?



It became too hard to work all of this
out
Then they tried a transactional
approach, also had limited success
Finally, they gave up!


$6B was lost…
A major fiasco, ATC is still a mess
Totem approach




Start with extended virtual synchrony model
Analysis used to prove real-time delivery
properties
Enables them to guarantee delivery within
about 100-200ms on a standard broadcast
LAN
Contrast with our 85us latency for Horus!
Tradeoffs between
consistency, time


Notice that as we push CASD to run
faster we lose consistency
Contrast with our virtual synchrony
protocols: they run as fast as they can
(often, much faster than CASD when it
is not malfunctioning) but don’t
guarantee real-time delivery
A puzzle



Suppose that experiments show that 99.99%
of Horus or Ensemble messages are delivered
in 85us +/- 10us for some known maximum
load
Also have a theory that shows that 100% of
Totem messages are delivered in about
150ms for reasonable assumptions
And have the CASD protocols which work well
with  around 250ms for similar LAN’s
A puzzle




Question: is there really a difference between
these forms of guarantees?
We saw that CASD is ultimately probabilistic.
Since Totem makes assumptions, it is also,
ultimately, probabilistic
But the experimentally observed behavior of
Horus is also probabilistic
... so why isn’t Horus a “real-time” system?
What does real-time mean?

To the real-time community?



A system that provably achieves its deadlines
under stated assumptions
Often achieved using delays!
To the pragmatic community?


The system is fast enough to accomplish our goals
Experimentally, it never seems to lag behind or
screw up
Some real-time issues

Scheduling




Given goals, how should tasks be scheduled?
Periodic, a-periodic and completely ad-hoc tasks
What should we do if a system misses its
goals?
How can we make components highly
predictable in terms of their real-time
performance profile?
Real-time today

Slow transition



Older, special purpose operating systems and
components, carefully hand-crafted for
predictability
Newer systems are simply so fast (and can be
dedicated to task) that what used to be hard is
now easy
In effect, we no longer need to worry about realtime, in many cases, because our goals are so
easily satisfied!