Transcript CS514-lec

CS514: Intermediate
Course in Operating
Systems
Professor Ken Birman
Ben Atkin: TA
Perspectives on Computing
Systems and Networks
• CS314: Hardware and architecture
• CS414: Operating Systems with a focus on singleprocessor and multi-processor systems
• CS513: A course on security for operating systems
and networks
• CS514: Emphasis on “middleware”: networks,
distributed computing, technologies for building
reliable applications over the middleware
• CS614: A survey of current research frontiers in the
operating systems and middleware space
• CS444, CS476, CS644, CS676: networks, routers,
theory of network protocols, not offered recently
Styles of Course
• CS514 tries to be practical in emphasis:
– We look at the tools used in real products and
real systems
– The focus is on technology one could build / buy
– But not specific products
• CS614 emphasis is on research
opportunities
– We try to understand the state of the art
– Idea is to find good research topics
• Both have projects, but
– CS514 builds on popular middleware
components
– CS614 tries to break new ground
Recent Trends
• Massive network rollout
• Larger and larger numbers of small
devices, web-compatible cell phones
• Object orientation and components
emerge as prevailing structural option
• Widespread use of transactions for
reliability and atomicity
• XML: The web-ization of everything
• Java/Jini, .NET: code can run on anything
• Client-server yielding to scalable
replication
Understanding Trends
• Basically two options
– Study the fundamentals
– Then apply to specific tools
• Or
– Study specific tools
– Extract fundamental insights from
examples
Understanding Trends
• Basically two options
– Study the fundamentals
– Then apply to specific tools
• Or
– Study specific tools
– Extract fundamental insights from
examples
Ken’s bias
• I work on reliable, secure distributed
computing
– Air traffic control systems
– Stock exchanges
– Next generation electric power grid
• To me, the question is:
How can we build systems that do what
we need them to do, reliably, accurately,
and in a secure manner?
Butler Lampson’s Insight
• Why computer scientists didn’t
invent the web
– CS researchers would have wanted it to
“work”
– The web doesn’t really work
– But it doesn’t really need to!
• Gives some reason to suspect that
Ken’s bias isn’t widely shared!
World Wide Web
• A seductive pass-time, but
increasingly seen as a serious
business model
• Idea would be to put information you
need at your fingertips to enable
better, more informed, more
intelligent actions
• The Web can also replace paper
entirely: a world-wide tool for
sharing knowledge
Relying on the Web:
Banking
• Companies and individuals will need
to rely on the Web for this model to
work:
– Broker will rely upon up-to-the minute
stock quotes and investment data and
advice
– Back office will trade stocks based on
what the broker currently wants
– Criminals will try and violate
security/privacy to steal funds or
manipulate trades
Relying on the Web:
Medicine
• Web-style interface in a hospital
• Doctor relies on accuracy of patient
status records to make treatment
decisions
• Nurse relies on accuracy of drug
dosage and frequency data to
administer treatment
• Hospital legally obligated to provide
for security and privacy of the data
Relying on the Web:
Publisher
• More and more publications will go
electronic in coming years (so will
movies, MTV videos, classical music,
etc)
• Publisher’s edge: quality of authors,
quality of material. Will “sell”
information
• But for this to work, need reliable
ways to charge for access and to
limit access to authorized
individuals!
Air Traffic Control on
the Web
• Web interface could easily show
planes, natural for controller
interactions
• But clearly need to know that
trajectory and flight data is current
and consistent
• Also need help with routing options
• Continuous availability is vital.
Security and privacy also needed
New Air Traffic Control
System: AAS
• Started by FAA in 1989 to
replace existing ATC system
• Current system has video
display of radar for controllers
to use
• Database has information about
each flight
• Telephones to talk to the planes
ATC systems divide
country up
More details on ATC
• Each sector has a control center
• Centers may have few or many (50)
controllers
• Data comes from a radar system that
broadcasts updates every 10
seconds
• Database keeps other flight data
• Controllers each “own” smaller subsectors
Current System has
Problems!
• Overloaded computers that often
crash
• Getting slow as volume of air traffic
rises
• Inconsistent displays a problem:
phantom planes, missing planes,
stale information
• Some major outages recently
(Newark down for 1/2 hour, LA down
for 1 hour in 1995). One near-miss
associated with LA outage
Concept of New System
• Replace video terminals with
workstations
• Build a highly available real-time
system guaranteeing no more than 3
seconds downtime per year
• Offer much better user interface to
ATC controllers, with intelligent
course recommendations and
warnings about future course
changes that will be needed
ATC Architecture
NETWORK INFRASTRUCTURE
DATABASE
Technologies Used
• Base on standard, off-the-shelf
workstations (easier to maintain,
upgrade, manage)
• IBM proposed software for faulttolerance and consistent system
implementation
• Fancy graphical user interface much
like the Web, pop-up menus for
control decisions, etc.
Project Was a Fiasco!!
• IBM unable to implement a faulttolerant software architecture!
Problem was much harder than they
expected.
• Even a non-distributed interface
turned out to be very hard, major
delays, scaled back goals
• Resulting system is unsatisfactory
even before delivery
Free Flight
• Many think this is the next step in
aviation
• Planes use GPS receivers to track
own location accurately
• Combine radar and a shared
database to see each other
• Each pilot makes own routing
decisions
• ATC controllers only act in
emergencies
Free Flight (cont)
• Now each plane is like an ATC
workstation
• Each pilot must make decisions
consistent with those of other pilots
• ... but if FAA’s project failed in 1994,
why should free flight succeed in
2010?
• Something is wrong with the
distributed systems infrastructure!
Other critical
applications
•
•
•
•
•
Banking, stock markets, stock brokers
Heath care, hospital automation
Control of power plants, electric grid
Telecommunications infrastructure
Electronic commerce and electronic cash
on the Web (very important emerging area)
• Corporate “information” base: a company’s
memory of decisions, technologies,
strategy
• Military command, control, intelligence
systems
We depend on
distributed systems!
• If these critical systems don’t work
–
–
–
–
When we need them
Correctly
Fast enough
Securely and privately
• ... then revenue, health and safety,
and national security may be at risk!
Signs of a Crisis in
Computing
• Highly visible fiascos: ATC project,
Denver lug-gage handling system,
London Stock Exchange.
• Hackers pose an increasingly
serious threat: dis-rupted telephone
services, breakins to critical
computing systems
• Vendors offering little in the way of
reliability (security situation is
better)
Critical Needs of Critical
Applications
• Security: Can tell who is doing what and
can use this to enforce authorization
• Privacy: Intruders can’t see data or user
id’s
• Availability: System is continuously “up”
• Recoverability: Can restart failed
components
• Consistency: Actions of system at different
locations are consistent with each other.
Web Brownouts
2
1
cafe.org
cornell.edu
...
sf.cafe.org
The network name
service is structured
like an inverted tree.
cs.cornell.edu
...
3
4
6
9
Web brower’s system only
needs to contact local name
and web services.
Local Web Proxy
(cached documents)
5
Cornell Web Proxy
(cached documents)
7
Cornell Web
Server
8
• Domain name service (DNS) can overload
(1-3)
• Server or proxies can overload, crash (4-9)
• Communication lines can overload or break
• DNS or proxy can return “stale” data
Infrastructure Needs to
Change
• To avoid brownouts need to make
more use of replicated (cached) data
• DNS replication: caching of host
addresses
• Web proxies: replicate copies of
documents
• Creates a new challenge:
– Coherence: guarantee that a cached
copy of an object is up to date
What this course is
really about
• Distributed computing is rapidly
transforming the way we work, live,
the way that companies do business.
• Increasingly, distributed computing
systems are the only ones you can
buy.
• The challenge: build distributed
systems which can be relied upon in
critical settings
What’s the Story Today?
• Few distributed systems or Web
applications consider reliability
issues
• The ones that do worry about
reliability are often naive about what
they are getting into, leading to
highly visible failures
• But we do have technical answers to
many of the basic problems and
some exciting initial options
Goals for this course?
• Understand the basic technologies
from which distributed systems are
constructed
• Maintain a degree of emphasis on
reliability issues throughout: how
reliable are the standard
technologies? Can they be used
reliably despite their limitations?
• Look at advanced technologies in
context of real systems built in
standard ways
Trends are changing
• More and more pressure on industry
– When the network is down, your company won’t
make money
– Clients want tools they can rely on
• This is creating pressure on vendors who
offer middleware
• Result is a new emphasis on scalability
and reliability
• We want reliability, as long as we can have
performance and scalability too.
Technologies we will
cover
• RPC and client-server computing; Streams
• Internet technologies (email, news, msg. bus) and
trends (the “next generation Internet”)
• DCE, Corba, COM: Object-Oriented and Component
Environments
• Web technologies (HTTP, XML), how the popular
scalable architectures work
• Process group computing and scalability issues
• Transactions and reliability
• Just a Taste of Security
• System Management, Clusters, Realtime
Course Overview: 24
lectures
• Intro + Basic technologies: 4
lectures
• Web and Internet: 2 lectures
• Reliability technologies
–
–
–
–
–
–
Distributed “group” solutions: 6 lectures
Security options: 2 lectures
Real-time issues: 2 lectures
Transactional systems: 2 lectures
Management: 3 lectures
Other topics: 3 lectures
Project
• CS514 has
– Homeworks, from time to time
– A reasonably ambitious software project (can be
used to satisfy your MEng project requirement)
• Projects can be done in groups
• Usually involve tackling reliability or
scalability with some popular technology
• This semester, hoping to use two Javaoriented b2b technologies
– HP’s eSpeak
– BEA Systems “WebLogic”
• You’ll teach yourself how to use them
Major Themes?
• Modularity (also known as objectorientation). Better structured systems
are more reliable.
• Performance. Technologies need to be
fast to be perceived as working well
• Exploiting group structures. These are
common in reliable distributed systems
• Rigor. We want to know why a technique
works: ad-hoc solutions often break under
stress
Scalability
• Suddenly the hot issue for industry
• Basically, customers expect
solutions that
– Can be developed on a small scale
– Continue to work during prime-time
– Scalability and stability: can be
considered from many dimensions
• Today, most of the most popular
solutions scale poorly!
The Prevailing Mindset
• Many developers believe that
reliable systems are clumsy,
overengineered, slow
• Image: a “robust bridge”. Sounds
like some sort of ugly, heavy eyesore
• The Web and the Net are about
elegant, light-weight, fast systems:
“antithesis” of robust ones
• Reliability is also at odds with using
standard components and packages
Insights From Course?
• Reliability techniques are often very
elegant
• Complexity is a challenge; modularity
used to control these costs
• Can achieve high performance in reliable
distributed systems
... but they sometimes are hard to
combine with standard technologies
Lightweight but Resilient
Bridges, Secure Computing
Enclaves
Lightweight but Resilient
Bridges, Secure Computing
Enclaves
• A good way to imagine the
technology we seek
• Our job is to build those enclaves
• Trick is to use the technical tools
the right way!
• In CS514, we won’t study the
security aspects of the problem in
more than a shallow manner
Recommended Reading
• Textbook: read the Introduction
• While surfing the Web, think about
outages
• Keep a count over half an hour of surfing
the net: how often did you have
problems? What sorts of problems?
• Find the University of Michigan Web
pages on internet availability. What does
this data tell you?