Transcript CS514-lec

CS514: Intermediate
Course in Operating
Systems
Professor Ken Birman
Ben Atkin: TA
Lecture 2: August 29
Overview of Lecture
• Fundamentals: terminology and
components of a reliable
distributed computing system
• Communication technologies
and their properties
• Basic communication services
• Internet protocols
• End-to-end argument
Some terminology
• A program is the code you type in
• A process is what you get when you run it
• A message is used to communicate
between processes. Arbitrary size.
• A packet is a fragment of a message that
might travel on the wire. Variable size but
limited, usually to 1400 bytes or less.
• A protocol is an algorithm by which
processes cooperate to do something
using message exchanges.
More terminology
• A network is the infrastructure that links
the computers, workstations, terminals,
servers, etc.
– It consists of routers
– They are connected by communication links
• A network application is one that fetches
needed data from servers over the network
• A distributed system is a more complex
application designed to run on a network.
Such a system has multiple processes that
cooperate to do something.
A network is like a “mostly
reliable” post office
Why isn’t it totally
reliable?
• Links can corrupt messages
– Rare in the high quality ones on the
Internet “backbone”
– More common with wireless
connections, cable modems, ADSL
• Routers can get overloaded
– When this happens they drop messages
– As we’ll see, this is very common
• But protocols that retransmit lost
packets can increase reliability
How do distributed systems differ
from network applications?
• Distributed systems may have many
components but are often designed to
mimic a single, non-distributed process
running at a single place.
• “State” is spread around in a distributed
system
• Networked application is free-standing and
centered around the user or computer
where it runs. (E.g. “web browser.)
Distributed system is spread out, decentralized. (E.g. “air traffic control system”)
What about the Web?
• Browser is independent: fetches data you
request when you ask for it.
• Web servers don’t keep track of who is
using them. Each request is self-contained
and treated independently of all others.
– Cookies don’t count: they sit on your machine
– And the database of account info doesn’t count
either… this is “ancient” history, nothing recent
• ... So the web has two network
applications that talk to each other
– The browser on your machine
– The web server it happens to connect with…
which has a database “behind” it
You and the Web
Cookie identifies this
user, encodes past
preferences
HTTP request
Web browser with
stashed cookies
Database
Web servers are kept current by the
database but usually don’t talk to it
when your request comes in
You and the Web
Web servers immediately
forget the interaction
Reply updates cookie
You and the Web
Web servers have no
memory of the interaction
Purchase is a “transaction”
on the database
Examples of Distributed
Systems
• Air traffic control system with
workstations for the controllers
• Banking/brokerage trading system
that coord-inates trading (risk
management) at multiple locations
• Factory floor control system that
monitors devices and replans work
as they go on/offline
This course is about
reliability
• We want to build distributed systems that
can be relied upon to do the correct thing
and to provide services according to the
user’s expectations
• Not all systems need reliability
– If a web site doesn’t respond, you just try again later
– If you end up with two wheels of brie, well, throw a
party!
• Reliability is a growing requirement in
“critical” settings but these remain a small
percentage of the overall market for
networked computers
Reliability is a broad
term
• Fault-Tolerance: remains correct despite failures
• High or continuous availability: resumes service
after failures, doesn’t wait for repairs
• Performance: provides desired responsiveness
• Recoverability: can restart failed components
• Consistency: coordinates actions by multiple
components, so they mimic a single one
• Security: authenticates access to data, services
• Privacy: protects identity, locations of users
“Failure” also has many
meanings
• Halting failures: component simply stops
• Fail-stop: halting failures with notifications
• Omission failures: failure to send/recv.
message
• Network failures: network link breaks
• Network partition: network fragments into
two or more disjoint subnetworks
• Timing failures: action early/late; clock
fails, etc.
• Byzantine failures: arbitrary malicious
behavior
Examples of failures
• My PC suddenly freezes up while running a
text processing program. No damage is
done. This is a halting failure
• A network file server tells its clients that it
is about to shut down, then goes offline.
This is a failstop failure. (The notification
can be trusted)
• An intruder hacks the network and
replaces some parts with fakes. This is a
Byzantine failure.
More terminology
• A real-world network is what we work on.
It has computers, links that can fail, and
some problems synchronizing time. But
this is hard to model in a formal way.
• An asynchronous distributed system is a
theoretical model of a network with no
notion of time
• A synchronous distributed system, in
contrast, has perfect clocks and bounds all
all events, like message passing.
Model we’ll use?
• Our focus is on real-world networks,
halting failures, and extremely practical
techniques
• The closest model is the asynchronous
one; we use it to reason about protocols
– Most often, employ asynchronous model to
illustrate techniques we can actually implement
in real-world settings
– And usually employ the synchronous model to
obtain impossibility results
– Question: why not prove impossibility results in
an asynchronous model, or use the synchronous
one to illustrate techniques that we might really
use?
ISO protocol layers:
Oft-cited Standard
Application
The program using a communication connection
Presentation Software to encode data into messages, and decode on reception
Logic associated with guaranteeing end-to-end reliability and
Session
flow control, if desired
Software for fragmenting big messages into small packets
Transport
Routing functionality, limited to small packets
Network
Data-link
The protocol that represents packets on the wire
• ISO is tied to a TCP-style of connection
• Match with modern protocols is poor
•We are mostly at “layer 4” – session
Internet protocol suite
• Can be understood in terms of ISO
• Defines “addressing” standard, basic
network layer (IP packets, limited to
1400 bytes), and session protocols
(TCP, UDP, UDP-multicast)
• Includes standard “domain name
service” that maps host names to IP
addresses
• DNS itself is tree-structured and
caches data
Major internet protocols
•
•
•
•
•
•
TCP, UDP, FTP, Telnet
Email: Simple Mail Transfer Protocol (SMTP)
News: Network News Transfer Protocol (NNTP)
DNS: Domain name service protocol
NIS: Network information service (a.k.a. “YP”)
LDAP: Protocol for talking to the management
information database (MIB) on a computer
• NFS: Network file system protocol for UNIX
• X11: X-server display protocol
• Web: HyperText Transfer Protocol (HTTP), and SSL
(one of the widely used security protocols)
Typical hardware
options
• Ethernet: 10Mbit CSMA technology, limited
to 1400 byte packets. Uses single coax
cable.
• FDDI: twisted pair, self-repairing if cable
breaks
• Bridged Ethernet: common in big LAN’s,
ring with multiple ethernet segments
• Fast Ethernet: 100Mbit version of ethernet
• ATM: switching technology for fiber optic
paths. Can run at 155Mbits/second or
more. Very reliable, but mostly used in
telephone systems.
Implications for
reliability?
• Protocol designers have problems
predicting the properties of localarea networks
• Latencies and throughput may vary
widely even in a single installation
• Hardware properties differ widely;
often, must assume the leastcommon-denominator
• Packet loss a minor problem in
hardware itself
Technology trends
Note tremendous growth
in WAN speeds
700
600
CPU MIPS
Memory MB
LAN Mbits
WAN Mbits
O/S overhead
500
400
300
200
100
0
19851990
19901995
19952000
20002005
Source: Scientific American, Sept. 1995
Typical latencies
(milliseconds)
Disk I/O
1000
100
Ethernet
RPC
10
1
ATM
roundtrip
0.1
WAN
roundtrip
5
-2
0
0
0
0
0
0
2
1
9
9
5
-2
0
0
5
9
9
-1
0
9
9
1
9
8
5
-1
9
9
0
0.01
1
WAN, disk
latencies are
fairly constant
due to
physical
limitations
Note dramatic drop in
LAN latencies over ATM
O/S latency: the most expensive
overhead on LAN communication!
40
35
30
25
20
15
10
5
0
O/S
overhead as
percentage
19851990
19952000
Broad observations
• A discontinuity is currently occuring in
WAN communication speeds!
• Other performance curves are all similar
• Disks have “maxed out” and hence are
looking slower and slower
• Memory of remote computers looks “closer
and closer”
• O/S imposed communication latencies has
risen in relative terms over past decade!
Implications?
• The revolution in WAN
communication we are now seeing is
not surprising and will continue
• Look for a shift from disk storage
towards more use of access to
remote objects “over the network”
• O/S overhead is already by far the
main obstacle to low latency and
this problem will seem worse and
worse unless O/S communication
architectures evolve in major ways.
More Implications
• Look for full motion video to the
workstation by around 2005 or 2010
• Low LAN latencies: an unexploited “niche”
• One puzzle: what to do with extremely high
data throughput but relatively high WAN
latencies
• O/S architecture and whole concept of O/S
must change to better exploit the “pool of
memory” of a cluster of machines;
otherwise, disk latencies will loom higher
and higher
Reliability and
performance
• Some think that more reliable means “slower”
– Indeed, it usually costs time to overcome failure
– For example, if a packet is lost probably need to
resend it, and may need to solicit the retransmission
• But for many applications, performance is a big
part of the application itself: too slow means “not
reliable” for these!
• Reliable systems thus must look for highest
possible performance
• ... but unlike unreliable systems, they can’t cut
corners in ways that make them flakey but faster
Back to the internet: IP
layer
• Addresses have a machine address and a
“port” number. The port selects the
application when a packet arrives. (If none
is bound to that port, packet is dropped)
• Fixed maximum size of 1400 bytes
• Each machine can have multiple addresses
• Special “broadcast” address delivered to all
• “Class D” addresses used for multicast
groups
• Running out of addresses, so tricks with
addresses are increasingly common
IP gotcha’s
• IP messages are not authenticated in any
way. The sender can lie about who it is,
and can send to any host or port on the
network
• A system can lie about its own machine
address
• IP messages are not reliable. They can be
(and are) dropped at many stages
• IP routing: basically static these days
IP multicast
sends to: “123.45.87.51”
both blue machines accept
IP address “123.45.87.51”
• Key insight: IP address is just an abstraction.
• Any machine can potentially “spoof” any other!
UDP protocol
• Lives above IP and looks much like IP
• Permits larger packets (fragments
them into a burst of IP packets; if any
is lost UDP packet will be dropped on
receive side)
• Also can run in a multicast mode
• Most applications use UDP; IP layer is
typically “reserved” for kernel-level
applications
UDP loss rates can be
very high!
• Hunt experimented with this (book
reproduces some of his data)
• UDP is normally very reliable
• If sender overruns receiver, even
briefly, 100% of data may be lost!
• Easy to provoke this problem even
with source, dest on same
machine!!!
• O/S makes no effort to detect or
avoid loss!
TCP protocol
• Implemented over IP, considered “reliable”
• Supports a byte-stream model, like a pipe.
• Implemented using sliding-window
protocol
• Many variations on protocol to optimize
performance, window size, reduce header
size, etc. We’ll focus on the “most basic”
TCP protocol here and won’t look at the
optimizations
TCP sliding window
sender provides data
window has k “segments”
mi+k mi+k-1 ....
receiver replies with
acks and nacks. sender
resends missing data
mi
-
IP packets carry segments
- mi+k-2 - mi+k-3 ...
mi
receiver consumes data
Observations
• Trick is to have a big enough window
so that data flow is continuous. This
permits sender, receiver to “match”
their data rates
• Must retransmit as soon as possible
but not so soon that duplicates get
through (undesirable overhead)
• Channel “breaks” after many retries,
i.e. after a crash or if the network
gets “lossy” for a while
TCP failures: not
“failstop”
• Many applications treat a broken
TCP channel as a sign of remote
crash
• But in fact, a TCP channel can easily
break under network overload or due
to other transient conditions!
• Applications would then believe
destination to have failed when it is
actually still running.
A basic insight
• In fact, there is no way to know if a
message has reached its destination
unless it is explicitly acknowledged
• And if it is, the sender of the ack.
has no way to know if it was
received!
• Distributed systems are always
slightly out of sync! This will be a
very big issue for us later
Most distributed
systems use UDP or TCP
• Applications typically lack
permission to use IP (otherwise
could “break” TCP protocol)
• UDP multicast is hard to use
because it isn’t always available and
information on hardware layout of a
LAN is not available in any standard
format or from any standard service
• Under heavy load, both UDP and TCP
begin to misbehave (high loss rates,
broken channels, etc)
End-to-End argument
• Suppose an IP packet will take n
hops to its destination, and can be
lost with probability p on each hop
• Now, say that we want to transfer a
file of k records that each fit in one
IP (or UDP) packet
• Should we use a retransmission
protocol running “end-to-end” or n
TCP protocols in a chain?
End-to-End argument
source
dest
Probability of successful transit: (1-p)n,
Expected packets lost: k-k*(1-p)n
Saltzer et. al. analysis
• If p is very small, the overhead of
the n TCP protocols (excess bytes
sent, lost CPU time, etc) will be
higher than if we just send the whole
file, then retransmit the missing
records
• Generalization: low-level transport
systems should focus on speed, not
reliability; the application layer
should worry about “properties”
needed by the application
Example
• Suppose that 2% of file will be resent.
Wouldn’t want to impose a 10% overhead
on the net and slow the transfer down by
7% for this purpose!
• But the end-to-end argument would not
apply if:
– p or n is large, hence (1-p)n approaches 0
– cost of recovery when a problem occurs is very
high
– reliability property is hard for users to
implement
• Justify complex mechanisms against these
points.
For next time
• Read chapters 1-3
• If you were in charge of extending
the communication layer, what
would you change or add?
• If computers can set their own
addresses and fill out packets in any
way they like, how can a system like
UNIX possibly support user-id’s over
a network? Or are user-id’s a big
fake?