ppt - Computer Science

Download Report

Transcript ppt - Computer Science

Reliability Tools and Options
Professor Ken Birman
Dept. of Computer Science
Cornell University
Last Time
 We saw that reliability is a complex
spectrum of properties and tradeoffs
 We developed the idea of e-triage
 And we glanced at some technologies
Today
 Last of three lectures on reliability
 Focus on technologies in more depth




What can they do for us?
How do they work?
How well to they integrate?
Limitations? Scalability issues?
Technologies
 Communication Tools:











TCP/IP
Remote Procedure Call (or “method invocation”)
Process group membership tracking and multicast
Publish/subscribe (also called “MOMS”)
Checkpoint and Restart, perhaps with mirrored disks
Transactions and Databases
Web servers and Java/JNI/JavaScript
Components and Object-oriented architectures
Cluster fault-tolerance and load-balancing
Traditional Linux tools and “scripts”
Hardware reliability – fault-tolerant computers
Sorting Things Out
 Computer scientists like to think in terms
of big chunks of technology that they
classify into categories
 Often we talk about “layers”
 Lowest layers are close to the hardware
 Higher layers deal with things closer to
the user who sits in front of a screen
Examples of Layers
Applications
Server Technologies
Middleware
Network Protocols
Operating System
What Makes a Layer?
 Layer uses stuff below it but nothing from
above it
 And the layer offers a set of services to things
above it
 Sometimes we imagine a layer as a thing that
transforms a computer or a network into a
new one with new properties!
 Somewhat like looking through a set of magic
eyeglasses, each one somehow transforming
the world into a magic new world…
Examples of Layers
Applications
Server Technologies
Middleware
Network Protocols
Operating System
Operating System
 Major ones are





Windows (several varients), from Microsoft
Linux (one of many versions of Unix)
Macintosh OS
Palm OS
VxWorks, QNX
 Many other minor ones
Operating System
 It runs the hardware for one computer

Also supports “processes”, manages memory and other
resources, provides security
 People refer to the OS as a “platform”


Applications use OS features and run “on” it
They don’t need to deal with special issues involving the
hardware because the OS handles them
 These days OS also includes components that handle
networking
 A modern OS is structured as a set of “objects”
Protocols
 These are little programs that run in
applications or in the OS
 They work by sending messages over network
connections
 Goal is to do something useful in a distributed
manner



For example, network can lose packets
But web pages can’t tolerate missing chunks of data!
So web uses a protocol that resends lost packets
Representative
Protocols?
 Just look at two examples to get the feel
 Don’t worry about the details
 Idea is to understand the “kinds of
things” each layer is doing, not the
specifics


We do teach the specifics in Cornell courses
But any one of these would take weeks to
cover in a comprehensive way
Communication Tools:
TCP/IP
 The basic communication technology of the
Web
 Works like a telephone call:




Your browser connects to a server using its IP
address (looks like 128.64.77.133)
Your request is sent as a message over the
connection. The result comes back.
The connection automatically matches sending and
receiving rates (easily fooled by noisy links!)
Also, automatically corrects for data loss
TCP sliding window
sender provides data
window has k “segments”
mi+k mi+k-1 ....
receiver replies with
acks and nacks. sender
resends missing data
mi+k+1
-
When acknowledgement is
received, segment number
keeps incrementing but slot
number is reused.
IP packets carry segments
- mi+k-2 - mi+k-3 ...
mi
receiver consumes data
TCP/IP: Pros and
Cons
 Simple, widely supported way to communicate
 Overcomes packet loss, duplication, out of
order delivery


But can reduce rate down to zero when network
becomes congested, easily fooled by a noisy link
Also, connections can break even if neither
endpoint actually fails.
 Things that use TCP, like web browsers, inherit
these benefits… and these problems!
Communication Tools:
RPC
 Idea is that each program declares a set
of actions it can perform – “methods”
that can be invoked using an “interface”
 Client programs “bind” to interface
 Send a message to invoke a method,
reply comes back in form of a message
too. Special protocols overcome failure
The basic RPC protocol
client
“binds” to
server
server
registers with
name service
The basic RPC protocol
client
“binds” to
server
server
registers with
name service
prepares,
sends request
receives request
The basic RPC protocol
client
“binds” to
server
server
registers with
name service
prepares,
sends request
receives request
invokes handler
The basic RPC protocol
client
“binds” to
server
server
registers with
name service
prepares,
sends request
receives request
invokes handler
sends reply
The basic RPC protocol
client
“binds” to
server
server
registers with
name service
prepares,
sends request
receives request
invokes handler
sends reply
unpacks reply
RPC Summary
 Basic technology in most “client-server”
situations with exception of the Web
 Can hide packet loss but not server failure
 Can certainly fail (due to timeout) when server
and client are actually both healthy
 Many limitations in terms of form of data you
can send, packet size, etc.
When are they used?
 TCP is used to transfer “objects”


Usually objects are reasonably large
Examples are email messages, files, web
pages, copies of programs
 RPC is used when a program asks for a
service provided by some other program

Best for small requests and replies
Examples of Layers
Applications
Server Technologies
Middleware
Network Protocols
Operating System
Concept Of
Middleware
 Middleware is any kind of a software tool that
runs over a basic infrastructure



Provides a standard set of services for some class
of applications
Idea is that OS and network may be “too general”
Middleware creates a better environment for some
large class of applications that all share a need
poorly addressed by the lower layers
 Middleware is increasingly important
Communication Middleware
Example: Multicast
 Broad term covering a variety of one-many
communication tools
 We talk about the:




Process group: set of programs for which
membership is tracked
Multicast: a way of sending data to group
State transfer: brings a joining program up to date
Order, atomicity: guarantee that messages are seen
in same order by all members, despite failure
Virtual Synchrony
Model
G0={p,q}
G1={p,q,r,s}
p
G2={q,r,s}
G3={q,r,s,t}
crash
q
r
s
t
r, s request to join
r,s added; state xfer
p fails
t requests to join
t added, state xfer
Communication Middleware
Example: Publish/Subscribe
 Packaging of one-many communication
tools into an elegant, easily understood
form
 Idea is that data producers “publish”
information, marked with “subjects” that
each item is about
 Subscribers “subscribe” to the subjects
of interest to them
Conceptually, a
message “bus”




Boxes are publishers (red / green subjects)
Circles are subscribers (“ “ )
Disks represent spoolers used for playback
Flexible and easily extended over time
 Supports huge numbers of subjects
Conceptually, a
message “bus”




Boxes are publishers (blue / green subjects)
Circles are subscribers (“ “ )
Disks represent spoolers used for playback
Flexible and easily extended over time
 Supports huge numbers of subjects
Conceptually, a
message “bus”




Boxes are publishers (blue / green subjects)
Circles are subscribers (“ “ )
Disks represent spoolers used for playback
Flexible and easily extended over time
 Supports huge numbers of subjects
Publish/Subscribe
Pros and Cons
 Conceptually very simple, popular
 But in practice the infrastructure can be
limiting and cumbersome
 Often end up with more or less all processes
receiving more or less all the messages,
anyhow
 Example of a technology that made more
sense when computers were slower
When Are They Used?
 Process groups?



New York Stock Exchange, Swiss Exchange
French air traffic control system
AEGIS rebuild
 Publish-Subscribe message bus



Most trading floors
Factory automation and process control
Some internal use for gluing databases to web sites
Examples of Layers
Applications
Server Technologies
Middleware
Network Protocols
Operating System
Servers
 Many modern technologies follow a clientserver programming model


You are the client
The server handles incoming requests
 This model is probably the big success of the
1980-2000 period for computing
 Normally, client connects to server on network
and uses some form of RPC to talk to it
Servers
 Web servers
 Database servers
 Weblogic: a fancy web server that combines
features needed for eCommerce sites
 Mail servers, message queuing servers
 Other application-specific servers

E.g. computer-aided design, payroll, etc…
Servers
 Secretly, most servers are a database
perhaps extended to know about a
specific category of application or use


We call this domain-specific refinements
Idea is that an Oracle database, out of the
box, is a very general platform but that a lot
of work is needed to use it for, say, payroll
 Databases use “transactional” model
Transactions and
Databases
 One of the very big, well supported
technologies
 Associated with databases
 Each program “runs a transaction”
begin
action1 action2 action3 ….
commit or abort
 Either entire transaction is performed, or entire
transaction is erased (if disrupted by crash)
ACID Properties
 Atomicity: entire group of actions is treated as
one “atomic unit”
 Concurrency: more than one can run at the
same time on the same database
 Isolation: but they are isolated from each
other, as if only one ran at a time
 Durability: committed transactions survive
failures and recoveries
Pros and Cons
 Mixture of a powerful model with powerful,
comprehensive vendor support



More or less integrated with web
But recovery can be slow
And high availability databases usually sacrifice some aspects
of ACID guarantee
 Note that vendors offer “replication” products but
nobody uses these – performance is terrible.
 Hot topic: cluster-style parallel servers

Clustering is a way to get scalability
Trends in Systems
 Enough on layers
 In previous lectures looked at business issues
associated with the Internet
 Today have also seen lots of technology



Mixture of current systems
Emerging products and systems
Technologies
 What comes next in distributed computing?
Ways of posing
questions
 As a business question:

I want to get rich, what should I invest in?


Ultimately a flakey and meaningless question
Should ask “what should I learn about”
 As a research question

I want to be famous, what should I invent?

If you’re so smart, you should tell me!
 As a big-picture question


Where is dramatic change inevitable?
This question makes more sense than the others
Looking for Exciting
Change
 Our goal is to anticipate dramatic,
unexpected change
 Is there a methodology for identifying
the big opportunities?
 How can we apply it to networks and
distributed computing?
Traditional Areas
 File systems
 Communications
 Naming of objects, interoperation
 Security
 Resource management
 Transactions
 Extensibility
Emerging areas
 Scalable service management
 Tools for hosting data
 Mechanisms for offloading work from
customers onto 3rd party solution
provider systems
 QoS mechanisms
 Power-aware and mobility support
Where are the big
opportunities?
 We could review these one topic at a
time, but that might get dull
 Can we develop a methodology for
recognizing big opportunities and
“leaping in”?
Technology trends
CPU MIPS
700
600
500
400
300
200
100
0
Memory MB
LAN Mbits
5
0
0
-2
0
0
0
O/S
overhead
0
2
1
9
9
5
-2
0
0
5
9
9
-1
0
9
9
1
1
9
8
5
-1
9
9
0
WAN Mbits
Source: Scientific American, Sept. 1995
Note tremendous growth
in WAN speeds
Typical latencies
(milliseconds)
Disk I/O
1000
100
Ethernet
RPC
10
1
ATM
roundtrip
0.1
WAN
roundtrip
5
-2
0
0
0
0
0
0
2
1
9
9
5
-2
0
0
5
9
9
-1
0
9
9
1
1
9
8
5
-1
9
9
0
0.01
WAN, disk latencies are
fairly constant due to
physical limitations
O/S latency: the most expensive
overhead on LAN communication!
40
35
30
25
20
15
10
5
0
O/S
overhead in
proportional
terms
19851990
19952000
Suggests?
 Notice that revolutionary opportunity is
triggered by technical discontinuity
 To predict a revolution…
… just identify a technology sector about to
be shaken up by a trend that breaks the
usual relationships
… predict “big things will happen”
Recent revolutions
 Internet became much faster, more
widely available
 Operating systems became object
oriented


Enabled the Web
Which enabled all sorts of B2B
developments people knew were coming…
Other examples?
 For a long time, PCs were slow and
balky, but very cheap
 But around 1990 technology gave us a
fast, big PC


Suddenly, desktop world yielded to PC world
Price point can trigger a discontinuity
Other examples?
 We used to be short on memory hence relied
heavily on disks
 But around 1985 memory sizes and cost
changed the equation
 Suddenly massive caches made sense


Giving us ideas like log-structured file systems and
new styles of caching in file and database systems
A world where 100% hit rates made sense
Looking to the
future?
 Major discontinuities:




Move from PC to PDA/telephone hybrids
Mobility, disconnected operation
Emergence of huge numbers of computing
systems that need to cooperate
Perhaps, some form of QoS?
Want to have an
impact?
 Trick is to zero in on one of these areas
 Be an early player


For example, get a mobile hand-held system and
start to play with it
Lots of things in the legacy infrastructure just aren’t
right for it


Your opportunity: fix a few of them by doing the obvious
things
And you’ll instantly be famous!
Mobile Trends
 Nomadicity: increasingly powerful nomadic
devices



Anticipate fusion of web browser, telephone and
also PDA functionality
Some devices of this sort already exist – but they
remain primitive
Low bandwidth interaction a big obstacle right now
– you can’t talk to it, but typing without a keyboard
is a pain
Mobile trends
 Communications standards



We already are seeing widespread use of
wireless ethernet cards
Bluetooth is the next big step: widespread
low-power connectivity for small devices
XML helps: data objects are readily
understood… fewer proprietary standards
Mobile trends
 Power conservation



Also better understood
Flexibility: compute faster or slower, move
code or data, sleep or run more actively
Signal strength also a factor
Mobile trends
 Suggests a future in which



We’ll move from place to place with our
computing context
In a given setting, devices find the
appropriate local resources and can talk to
them
And device is smart about when to ship
code, when to ship data
Mobile trends
 But this also points to a missing link: exciting
research opportunity

How to do naming of objects in this new mobile
world?




User wants a single personalized name for resources and a
single name space
But we also need to share things
And how to organize or structure a nomadic or
wireless environment
Peer-to-peer and multi-peer opportunity will be
enormous
Illustrating…
 A discontinuous development



From fixed infrastructure to mobile wireless one
High performance but power-aware
Fusion of previously independent technologies
(voice, web, email)
 Stress on existing infrastructure


We tend to adapt the existing infrastructure to the
new setting
But a whole new approach may be needed
Driving…
 New ideas in file systems

How should we do file systems for mobile and
wireless systems?
 Communication


How should we do point to point and multicast for
wireless peer-to-peer or “ad-hoc” networks?
Is TCP the right protocol for a wireless connection
to a server?
 The list goes on…
Dangers
 It is easy to overreach


People tend to try to do 10 things all at the
same time…
Need to be incremental
 Challenge?


Picking the right first step
The right infrastructure can enable just
about anything!
But we’re out of time…
 Take-aways from this lecture series?
 Business roles in eCommerce



Examples of existing sectors
Some thought about business role in developing
new technology-limited ventures
And some review of how technologies are
structured
 Leading to an angle on how to identify big
emerging opportunity areas
What should I know?
 If you want to remember just one thing…



Remember the French air traffic control project
Where the US project overreached and failed, the
French went slowly, tested like crazy, and built a
better system that really worked
Scalability and stability of technology is the key
 Be French!



Also drink moderate amounts of good red wine
Visit http://www.fromages.com now and then
Remember that vision of the world as 100 people…