CS514: Intermediate Course in Operating Systems

Download Report

Transcript CS514: Intermediate Course in Operating Systems

CS514: Intermediate
Course in Operating
Systems
Professor Ken Birman
Ben Atkin: TA
Lecture 23: Nov. 14
Topics for this lecture
• Adding clocks to distributed systems
– Also, to non-distributed one
– We won’t look at real-time operating
systems and scheduling, but this was an
active area until recently
– Recent machines are so fast that
importance of topic is now reduced…
• Using time in broadcast protocols
• Comparison of CASD with gossip
Many systems need realtime
• Air traffic control talks about “when”
planes will be at various locations
• Doctors talk about how patients
responded after drug was given, or
change therapy after some amount
of time
• Process control software runs factor
floors by coordinating what machine
tools do, and when
Many systems need realtime
• Video and multi-media systems need
isochronous communication
protocols that coordinate video,
voice, and other data sources
• Telecommunications systems must
guarantee real-time response
despite failures, for example when
switching telephone calls
Real time in real
systems
• These real systems combine the need for
logical consistency and coordination (for
control) with the need for physical
coordination and timely behavior
• Issue becomes one of integrating real-time
tools and support with logical tools and
support such as we have already
considered
• Not real-time or logical time, but want both
for different aspects of a single
application!
Clock Synchronization and
Synchronous Systems
• Up to now, we restricted attention to
logical notions of time: “a happens
before b”
• Today look at adding real clocks to
such a system
• Two views of real-time:
– Supporting clocks that programs can
consult as needed
– Making direct use of real-time inside
protocols
Clock Synchronization
• Topic was extremely active during
early 1980’s
• Best known algorithms include the
one in OSF/1 UNIX (based on one by
Marzullo at Xerox), the “optimally
accurate” algorithm of Srikanth and
Toueg, and the “probabilistic”
algorithm of Cristian
• Introduction of Global Positioning
System is eliminating the need for
this form of synchronization
Clock synchronization
protocols
• Would like to think of network as
having a single “clock” that all
processes can consult
• But need to implement this using
many “local” clocks that can:
– Initially be set incorrectly
– Drift over the course of time
• Clock synchronization tries to keep
clocks close to true real-time and
minimize their tendency to drift
Precision and Accuracy
• Accuracy measures local clocks relative to
an external source of accurate real-time.
Accurate clocks are close to real-time
• Precision measures local clocks relative to
each other. Precise clocks are close to
each other
• Skew is the numerical limit on the
maximum dis-tance that correct clocks
can drift apart. E.g. could say the
“maximum skew is 1sec” for some system
Thought question
• Suppose that you were an air-traffic
controller
• Assume that clock synchronization
is by a protocol that runs on your
local area network. Each control
center runs its own clock sync.
• Would you prefer an algorithm that
optimizes for the best possible
accuracy, or for the best possible
precision?
... and the answer
• Air traffic controller basically says to
planes “during time t0 to t1 fly from a to b”
• Planes presumably have accurate clocks
.... so ATC needs accurate clocks, too.
• Precision is like internal consistency.
Accuracy is like external consistency
(dynamic uniformity)
How clock synchronization
used to work
• Periodically, all processors would
run a clock sync protocol, for
example by broadcasting the reading
from their clocks
• Each receives a set of values from
the others (sets may differ due to
faults!)
• Algorithm would pick a synchronized
value from the set; analysis used to
prove properties of clocks
Thinks known about
clock sync.
• Accuracy of clock synchronization is
limited by end-to-end latencies for
messages passing through the
network
• Example: you tell me that it is
10:00am, exactly.
• I am unsure how long the message
took to pass through the network,
hence I conclude that the time is in
the range [10:00, 10:00+]
Better way to express
this
• Suppose that I know something
about the mean message
propagation latency
• The quality of clock synchronization
is at best proportional to the
possible variation in latency relative
to this mean
• A-posteriori method of Verissimo and
Rodriguez uses this to obtain optimal
precision (but is less accurate than
the algorithm of Srikanth, Toueg)
Global Positioning
System
• Satellite system launched by military
in early 1990’s, became public and
inexpensive
• Can think of satellites as
broadcasting the time
• Small radio receiver picks up signals
from three satellites and
triangulates to determine position
• Same computation also yields
extremely accurate clock (accurate
to a few milliseconds)
Clock synchronization
with GPS
• Put two GPS receivers (or more) on a
network
• Periodically, receivers broadcast the
“true time”
• Other machines only need to adjust
their clocks to overcome
propagation delays for clock sync
messages through the network!
• Well matched to the a-posteriori
clock synchronization approach
Basic idea
• GPS receiver broadcasts “the time is
now 10:00” on a broadcast network
(ethernet)
• Receivers note the time when they
receive the message: [10:01, 9:58,
....] and reply with values
• GPS receiver subtracts the median
value
• Differences [1, -2, ...] now give the
drift of the clock of the destination
relative to the median clock
A-posteriori method,
adjustment stage
• Now we distribute these drift values
back to the processors, which
compensate for the rate of drift over
the time period to the next
synchronization
• Can show that this yields clocks that
are optimal both in accuracy and
precision
• A processor with a broken clock has
a good chance of discovering it
during synchronization
Using real-time
• One option is to use a real-time
operating system, clock
synchronization algorithm, and to
design protocols that exploit time
• Example: MARS system uses pairs of
redundant processors to perform
actions fault-tolerantly and meet
deadlines. Has been applied in
process control systems. (Another
example: Delta-4)
Real-time broadcast
protocols
• Can also implement broadcast
protocols that make direct use of
temporal information
• Examples:
– Broadcast that is delivered at same time
by all correct processes (plus or minus
the clock skew)
– Distributed shared memory that is
updated within a known maximum delay
– Group of processes that can perform
periodic actions
A real-time broadcast
p0
p1
p2
p3
p4
p5
t+a
t
t+b
*
*
*
*
*
Message is sent at time t by p0. Later both p0 and p1 fail. But
message is still delivered atomically, after a bounded delay, and
within a bounded interval of time (at non-faulty processes)
A real-time distributed
shared memory
p0
p1
p2
t
set x=3
t+a
t+b
x=3
p3
p4
p5
At time t p0 updates a variable in a distributed shared memory.
All correct processes observe the new value after a bounded
delay, and within a bounded interval of time.
Periodic process group:
Marzullo
p0
p1
p2
p3
p4
p5
Periodically, all members of a group take some action.
Idea is to accomplish this with minimal communication
The CASD protocols
• Also known as the “ -T” protocols
• Developed by Cristian and others at
IBM, was intended for use in the
(ultimately, failed) FAA project
• Goal is to implement a timed atomic
broadcast tolerant of Byzantine
failures
Basic idea of the CASD
protocols
• Assumes use of clock
synchronization
• Sender timestamps message
• Recipients forward the message
using a flooding technique (each
echos the message to others)
• Wait until all correct processors
have a copy, then deliver in unison
(up to limits of the clock skew)
CASD picture
p0
p1
p2
p3
p4
p5
t+a
t
t+b
*
*
*
*
*
p0, p1 fail. Messages are lost when echoed by p2, p3
Idea of CASD
• Assume known limits on number of
processes that fail during protocol, number
of messages lost
• Using these and the temporal
assumptions, deduce worst-case scenario
• Now now that if we wait long enough, all
(or no) correct process will have the
message
• Then schedule delivery using original time
plus a delay computed from the worst-case
assumptions
The problems with
CASD
• In the usual case, nothing goes
wrong, hence the delay can be very
conservative
• Even if things do go wrong, is it right
to assume that if a message needs
between 0 and ms to make one
hope, it needs [0,n*  ] to make n
hops?
• How realistic is it to bound the
number of failures expected during a
run?
CASD in a more typical run
p0
p1
p2
p3
p4
p5
t
t+a
t+b
*
*
*
*
*
*
... leading developers to employ more
aggressive parameter settings
p0
p1
p2
p3
p4
p5
t
t+a
t+b
*
*
*
*
*
*
CASD with over-aggressive paramter
settings starts to “malfunction”
p0
p1
p2
p3
p4
p5
t
t+a
t+b
*
*
*
*
all processes look “incorrect” (red) from time to time
CASD “mile high”
• When run “slowly” protocol is like a
real-time version of abcast
• When run “quickly” protocol starts
to give probabilistic behavior:
– If I am correct (and there is no way to
know!) then I am guaranteed the
properties of the protocol, but if not, I
may deliver the wrong messages
How to repair CASD in
this case?
• Gopal and Toueg developed an
extension, but it slows the basic
CASD protocol down, so it wouldn’t
be useful in the case where we want
speed and also real-time guarantees
• Can argue that the best we can hope
to do is to superimpose a process
group mechanism over CASD
(Verissimo and Almeida are looking
at this).
Why worry?
• CASD can be used to implement a
distributed shared memory (“deltacommon storage”)
• But when this is done, the memory
consistency properties will be those
of the CASD protocol itself
• If CASD protocol delivers different
sets of messages to different
processes, memory will become
inconsistent
Why worry?
• In fact, we have seen that CASD can
do just this, if the parameters are
set aggressively
• Moreover, the problem is not
detectable either by “technically
faulty” processes or “correct” ones
• Thus, DSM can become inconsistent
and we lack any obvious way to get
it back into a consistent state
Using CASD in real
environments
• Would probably need to set the
parameters close to the range where
CASD can malfunction, but rarely
• Hence would need to add a selfstabilization algorithm to restore
consistent state of memory after it
becomes inconsistent
• Problem has not been treated in
papers on CASD
• pbcast protocol does this
Using CASD in real
environments
• Once we build the CASD mechanism
how would we use it?
– Could implement a shared memory
– Or could use it to implement a real-time
state machine replication scheme for
processes
• US air traffic project adopted latter
approach
– But stumbled on many complexities…
Using CASD in real
environments
• Pipelined computation
• Transformed computation
Issues?
• Could be quite slow if we use
conservative parameter settings
• But with aggressive settings, either
process could be deemed “faulty” by
the protocol
– If so, it might become inconsistent
• Protocol guarantees don’t apply
– No obvious mechanism to reconcile
states within the pair
• Utility of the method thus unclear
Similar to MARS
• Research system done in Austria by
Hermann Kopetz
– Basic idea is that everything happens
twice
– Receiver can suppress duplicates but is
guaranteed of at least one copy of each
message
– Used to overcome faults without loss of
real-time guarantees
• MARS is used in the BMW but gets
close to a hardware f.tol. scheme
Many more issues….
• What if a process starts to lag?
• What if applications aren’t strictly
deterministic?
• How should such a system be
managed?
• How can a process be restarted?
– If not, the system eventually shuts down!
• How to measure the timing behavior
of components, including the network
FAA experience?
• It became too hard to work all
of this out
• Then they tried a transactional
approach, also had limited
success
• Finally, they gave up!
– $6B was lost…
– A major fiasco, ATC is still a mess
Totem approach
• Start with extended virtual
synchrony model
• Analysis used to prove real-time
delivery properties
• Enables them to guarantee delivery
within about 100-200ms on a
standard broadcast LAN
• Contrast with our 85us latency for
Horus!
Tradeoffs between
consistency, time
• Notice that as we push CASD to
run faster we lose consistency
• Contrast with our virtual
synchrony protocols: they run
as fast as they can (often, much
faster than CASD when it is not
malfunctioning) but don’t
guarantee real-time delivery
A puzzle
• Suppose that experiments show that
99.99% of Horus or Ensemble
messages are delivered in 85us +/10us for some known maximum load
• Also have a theory that shows that
100% of Totem messages are
delivered in about 150ms for
reasonable assumptions
• And have the CASD protocols which
work well with  around 250ms for
similar LAN’s
A puzzle
• Question: is there really a difference
between these forms of guarantees?
• We saw that CASD is ultimately
probabilistic. Since Totem makes
assumptions, it is also, ultimately,
probabilistic
• But the experimentally observed
behavior of Horus is also
probabilistic
• ... so why isn’t Horus a “real-time”
system?
What does real-time
mean?
• To the real-time community?
– A system that provably achieves its
deadlines under stated assumptions
– Often achieved using delays!
• To the pragmatic community?
– The system is fast enough to accomplish
our goals
– Experimentally, it never seems to lag
behind or screw up
Some real-time issues
• Scheduling
– Given goals, how should tasks be
scheduled?
– Periodic, a-periodic and completely adhoc tasks
• What should we do if a system
misses its goals?
• How can we make components
highly predictable in terms of their
real-time performance profile?
Real-time today
• Slow transition
– Older, special purpose operating
systems and components, carefully
hand-crafted for predictability
– Newer systems are simply so fast (and
can be dedicated to task) that what
used to be hard is now easy
– In effect, we no longer need to worry
about real-time, in many cases, because
our goals are so easily satisfied!