P00603_ArchitectureOfDS

download report

Transcript P00603_ArchitectureOfDS

C Cox
P00603 Lectures 3:
Distributed Systems Architecture
Multicomputer and Multiprocessor
Middleware
The client-server model
RPC
Distributed Programming paradigms
Programming with sockets
Fault tolerance
Amdahl’s law
Reading
•
•
•
•
A.S. Tanenbaum and M. van Steen
(2003) Distributed Systems - principles
and paradigms, , Prentice Hall, **
G. Colouris, J, Dollimore and T.
Kindberg (2005) Distributed Systems –
concepts and design, Addison Wesley,
4th Edition **
B. Wilkinson and M Allen (2004)
Parallel Programming, techniques and
applications using networked
workstations and parallel computers,
Prentice Hall. *
Middleware Architecture with Patterns
and Frameworks,
http://proton.inrialpes.fr/~krakowia/MW
-Book/Chapters/Intro/intro.html
1
Single CPU Systems
Disk(s)
CPU
I/O
Memory
Multi-CPU Systems (1):
Shared Memory Multiprocessor Systems
CPU
CPU
Disk(s)
CPU
I/O
Memory
Multi-CPU Systems (2):
Distributed Memory Multicomputer systems
CPU
Memory
CPU
Memory
....
network
Note: can have hybrid containing both
shared and distributed memory
Multiprocessor and Multicomputer Architectures
• Multiprocessor –
• tightly coupled MIMD computer system with shared
memory
• Some authors (e.g. Tanenbaum) do not consider
shared memory multiprocessors to be true distributed
systems
• Multicomputer –
• Loosely coupled MIMD computer system with
distributed (or private) memory. May use Distributed
Shared Memory.
Bus based multiprocessors
1.7
Switch based multiprocessors
a)
b)
A crossbar switch – fast access but n2 switches
An omega switching network – log2(n) stages, each with n/2 switches
1.8
Processor Coupling
•
Loosely coupled, e.g. Distributed Memory Multicomputer
network
M
P
M
P
M
P
 IPC by message passing
 Typically PC or workstation clusters
 Physically distributed components
 Characterized by longer message delays and limited bandwidth
• Tightly Coupled, e.g. Shared Memory Multiprocessor (hypothetical)
Shared Memory
M
P
M
P
M
P
M
P
I/O
 Processors connected in close proximity
Characterized by short message delays, high bandwidth
IPC via Shared Memory
The architecture illustrated is not practical (except for small numbers of
processors) because of memory and bus contention etc
Processor-Memory Interconnection Network
Distributed (or private) memory
Shared memory
1.6
Homogeneous Multicomputer Systems – Processor
Arrays
a)
b)
Grid
Hypercube
1-9
Typical Supercomputer Architecture: e.g. Thinking Machines
CM-2, CM-5, Cray T3E etc.
Also Inmos Transputer
What is middleware, why do we need it?
What is it?
• Software designed to help build distributed systems
•
•
Is considered as the “/” in Client/Server
Traditionally are the layers between the application and the operating
system
• Hides complexity and heterogeneity of distributed system
• Provides common programming abstraction and infrastructure for
distributed applications
• Generally consists of a set of functions or services to allow
processes running on separate machines to interact
•
Examples: Sun RPC, DCE, CORBA, DCOM, Java RMI, PVM
11
Where is Middleware found
• Layer between OS and distributed applications that provides
interoperability and a portable API
•
•
•
Built on top of transport layer in the ISO/OSI 7 layer reference model
Bridges gap between low-level OS comms and programming language
abstractions
Middleware covers presentation and session OSI layer
DistributedApplications
Applications
Distributed
Distributed
Applications
Middleware
OperatingSystem
SystemComms
Comms
Operating
Operating
System Comms
Network
Network
Network
(remote calls, object invocation,
messages, …)
(sockets, IP, TCP, UDP, …)
(packets, bits…)
12
Why is it needed?
• Hard to integrate network applications (different NOS,
DB etc.) so support for distributed applications
• Need for system integration, not just systems
programming
• Cant afford custom solutions
• Open systems (well defined interface, portable,
interoperable etc.)
• Distributed transaction processing (integrity, security
etc.)
• End to end QoS requirements
13
Middleware Functionality
• Location transparency to access servers across a
network
• Operates independently of network services
• Allows data to be filtered for security or privacy
• Can provide fault tolerance (reliable operation)
• Can provide and monitor Quality of Service
• Masks heterogeneity of underlying networks, hardware,
operating system and programming languages – so
provides a uniform programming model with standard
services
14
Some types of Middleware
•
•
•
•
•
•
•
Remote Procedure Call
Object Oriented Middleware
Message Oriented Middleware
Message passing libraries
SQL database oriented
Transaction processing monitor
Enterprise Service Bus
15
Object Oriented Middleware - OOM
• Objects can be local or remote
• Proxy objects provide local interface to remote objects
• e.g. Java RMI, CORBA (OMG), DCOM (Microsoft)
16
Message Oriented Middleware - MOM
• Is a middleware technology for asynchronous message passing
programming model, rather than request-reply
• Provides a message queue, similar to the operating system mailbox
feature
• Designed to address limitations of RPC
• Queue manager can provide QoS for fault tolerance, priority etc
• Provides a standard API across platforms
• Advanced Message Queing Protocol (AMQP) becoming a standard
• Provides fault tolerance
•
Ref: http://en.wikipedia.org/wiki/Message-oriented_middleware
17
Message Passing Libraries – PVM, MPI
• Libraries of functions providing
• dynamic process creation
• message passing primitives
• synchronization primitives
•
•
•
•
A degree of fault tolerance, portability and load balance
Parameter marshalling to allow heterogenous operation
Single client multiple server model
Ideal for parallel computing, numerical analysis and HPC
18
Transaction Processing (TP) Monitor Middleware
• Used for processing distributed database transactions that require
a single multi-stage transaction to be shared between multiple
parties
• Transaction needs to be ACID
ACID:
Atomic, Consistent, Isolated and Durable
19
Enterprise Service Bus – ESB
• Is an evolution of Enterprise Application Integration
(EAI=middleware to integrate systems and applications)
• Is a software architecture providing services via an event driven
messaging engine (the bus)
• The Enterprise Service Bus (or message broker) provides secure
interoperability between applications, using XML and web service
interfaces, e.g. J2EE, .NET
• Provides a message queue, similar to the operating system mailbox
feature
• Queue manager can provide QoS for fault tolerance, priority etc
20
Middleware Programming Options:
• Library of functions, e.g. PVM
• Via external Interface Definition Language (IDL), e.g. rpcgen IDL
defines interface to remore components + mapping from IDL to
programming language
• Language with runtime system for distributed processing provided,
e.g. JavaRMI
21
Distributed Computing Middleware Technologies
Low level Sockets IPC
• Connection-oriented (stream) TCP
• Connectionless (datagram) UDP
Shared Memory – Threads
• POSIX threads
Parallel Compilers
• Parallel C
• Fortran90, High Performance Fortran
Message passing – programming libraries
• PVM
• MPI
22
(cotd) Distributed Computing Technologies
Object Oriented Middleware
• CORBA (OMG)
• Java RMI
• COM and DCOM (Microsoft)
.NET
• Microsoft framework/environment for building web services
Vmware
• Run multiple virtual computers from a single PC
Grid Middleware (toolkits)
• Globus, Condor, gLite (used at CERN LHC)
23
Key Commercial IT Middleware
• XML – W3C open standard document encoding
• SOAP –Simple Object Access Protocol, for
implementing web services
• Web Services – interoperable communications
mechanism between networked machines
• Service Oriented Architecture – services to provide
communications, e.g. DCOM, CORBA
24
Client Server Model
•
•
•
•
Subset of message passing
2 entities: Client, Server
basic communication by peer-peer protocols
interaction based on request-reply (or invocation-result) dialogue
request
Client
Server
reply
example servers: file, database, SQL, web, mail, etc.
25
•
•
•
•
•
•
•
Client
Client Server - Entity Roles
starts communication
must have server destination ready (server starts up first)
makes requests of servers
never use other clients
do not carry out requests, only issues them
(in practice a client may, for example, download code from a remote server and
run it on a local server)
request
Client
Server
reply
•
•
•
•
•
•
•
•
Server
holds information
provides a service
starts before client
wait for client request
carry out request
return result (reply) to client
never make use of clients
can make requests of other servers (delegation)
26
Comparison to Peer-Peer Model
•
•
•
•
one entity: Peer
any peer can request or carry out requests
more complex dialogue
who starts sending etc
request
Peer
Peer
reply
27
Client-Server Design:
Thin and Thick Clients
•
Partitioning of software between client and server
• Thin Client
•
•
client processing is small, bulk takes place at server
client hardware requirements are less, e.g. PC X-terminal running
Exceed
• Thick (or Fat) Client
•
•
client performs bulk of processing with server providing basic tasks,
e.g. disk storage and retrieval
e.g. PC connection to internet. This architecture provides a practical
solution but creates an ever increasing need for faster processors,
high capacity storage and complex operating systems (bloatware)
28
Client-Server Design:
Binding and the Name Server
•
•
•
•
•
•
locate a server (find address) = binding
address = process + machine on network
binding can be static (fixed server) or dynamic (location may
change). Note, dynamic binding implies location transparency,
e.g. if server fails another can be found and address of service
changed to it.
name server = fixed (static) server that gives address + status of
servers
name server can locate a server by broadcasting to all
name server often used to authenticate clients
29
Client-Server Design: Use of Name Server
Server
request
register
reply
Server address
Client
Name
Server
Request a server or
service
30
Example Client-Server Protocols
Need for Protocols
•
Allow heterogeneous platforms onto
internet
•
Allow flexibility in application
modifications
•
Hides lower level detail from upper
layers
•
Provides good abstraction for
constructing communications
software
•
TCP/IP Reference Model - Protocol
stack for the internet (usually called
IP, or more exactly IPv4 or IPv6)
Code
From
To
Semantics
REQ
C
S
Request
REP
S
C
Reply
ACK
one
other
Message has arrived
AYA
C
S
Are you alive
IAA
S
C
I am alive
TA
S
C
Server busy, try again
AU
S
C
Address unknown
31
General Server Design Techniques Iterative Server
• code contains a loop – per service invocation
• only one client request at a time
– may lead to bottleneck if server busy
• algorithm
loop
receive request
process request
return reply
note: may need poison request to kill (stop) server
32
General Server Design Techniques Concurrent Server
• code contains UNIX fork (to create duplicate or
clone process)
• clone (child) process services request, original (parent)
process awaits next request
• algorithm
loop
receive request
fork to create clone server
If clone Then
process request
return result
kill (clone) process
EndIf
• note: may need poison request to kill (stop) server
33
Statefull and Stateless server design
Statefull – server stores full client transaction details
• advantage: less data to send (e.g. customer number) in
repeated client requests, server can resend reply if previous is
lost
• disadvantage: more difficult to recover if server fails, risk of
loosing server data
• best for repeated client requests
Stateless – server stores no client data
• advantage: easier to recover from server fail since no data is
stored i.e. client is not relying on server storage
• disadvantage: more lengthy messages since full client data is
needed for each transaction
• best for one off client requests
34
RPC - Remote Procedure Call (Birrell and Nelson 1984)
•
•
•
•
•
•
Client makes call to process running on a remote machine using message
passing
Synchronous request-reply semantics implemented with message passing
Extended to remote object method invocation in Java RMI
Marshalling of parameters and return values via stubs
e.g. Sun RPC, Java RMI, NFS operates using an RPC via UDP
Usually have limited exception handling facility
Birrell, A. D.; Nelson, B. J. (1984). "Implementing remote procedure calls".
ACM Transactions on Computer Systems 2: 39.
35
Remote Procedure Calling Mechanism
36
RPC Call example
Message flow reverses
for server return to client
Client Program:
sum = server->Add(3,4);
Server Program:
int Add(int x,int, y) {}
Client Stub:
int Add(int x, int y)
{ Alloc message buffer;
Mark as “Add” call;
Add x,y to buffer;
Send message;
}
Server Stub:
Add_Stub(Message)
{ Remove x,y from buffer
r = Add(x, y);
}
RPC Runtime:
Send message to server;
RPC Runtime:
Receive message;
Dispatch,
call Add_Stub;
37
RPC Pseudocode
// client code
result = function(parameters)
// client stub
function(parameters)
{ address a=bind(“function”);
socket s=connect(a);
send(s,”function”);
send(s,parameters);
receive(s,result); //blocking
return result;
}
// server main loop
void rpc_server()
{ register(“function”,address);
while (true)
{ socket s=accept(); //blocking
receive(s,id);
if (id == “function”)
dispatch_function(s);
close(s);
}
}
// server stub
void dispatch_function(socket s)
{ receive(s,parameters);
result = function(parameters);
send(s,result);
}
Can use rpcgen tool to
automate RPC code
generation
38
Asynchronous Remote Procedure Call - Timing
39
Distributed Computing Paradigms
• Network programming – the client server model using TCP or
UDP, sockets and message passing
• Concurrent Programming – UNIX fork and threads. Parallel
programming using clusters of multicomputers and message
passing libraries, e.g. PVM, MPI
• Object Based Systems - Java RMI, CORBA etc. ref
Tanenbaum Ch.9
• Distributed File Systems - e,g. NFS, (ref. Tanenbaum Ch.10)
• Document Based Systems - e.g. Lotus Notes and WWW (ref
Tanenbaum Ch.11)
• Distributed Coordination Based Systems - e.g. Linda, TIB,
Java Jini, (ref Tanenbaum Ch.12)
40
Recovery and Fault Tolerance - Reading
• 1. Tannenbaum, Chapters 6 and 7
• 2. Coulouris, Chapter 15
• 3. Storey, N (1996) Safety Critical Computer Systems,
Prentice Hall, Chapter 6
• 4. Hennessy and Patterson (2003) Computer Architecture,
Morgan Kaufmann
Fault Tolerance
 Hardware, software and networks cannot be totally free from
failures
 Fault tolerance is a non-functional (QoS) requirement that requires
a system to continue to operate, even in the presence of faults
 Fault tolerance should be achieved with minimal involvement of
users or system administrators (who can be an inherent source of
failures themselves)
 Distributed systems can be more fault tolerant than centralized
(where a failure is often total), but with more processor hosts
generally the occurrence of individual faults is likely to be more
frequent
 Notion of a partial failure in a distributed system
 In distributed systems the replication and redundancy can be
hidden (by the provision of transparency)
42
Fault Tolerance in distributed systems is achieved by:
• Hardware redundancy, i.e. replicated
facilities to provide a high degree of
availability and fault tolerance
• Software recovery, e.g. by rollback to
recover systems back to a recent consistent
state upon detection of a fault
As (distributed) systems become more complex it
becomes more important to deal with errors locally
(good design principle)
43
Faults: attributes, consequences and strategies
Attributes
• Availability
• Reliability
• Safety
• Confidentiality
• Integrity
• Maintainability
44
Consequences
• Fault
• Error
• Failure
Strategies
• Fault prevention
• Fault tolerance
• Fault recovery
• Fault forcasting
Failure Models in Distributed Systems
Concepts: Client uses a collection of servers
+ Dependency of components (e.g. database depends on disk file system)
Failure Types in Server
· Crash – server halts, but was working ok until then, e.g. O.S.
failure
· Omission – server fails to receive or respond or reply, e.g.
server not listening or buffer overflow
· Timing – server response time is outside its specification,
client may give up
· Response – incorrect response or incorrect processing due to
control flow out of synchronization
· Arbitrary value (or Byzantine) – server behaving erratically,
for example providing arbitrary responses at arbitrary times.
Server output is inappropriate but it is not easy to determine this
to be incorrect. E.g. duplicated message due to buffering
problem. Alternatively there may be a malicious element
involved. [Byzantine: Turkish empire, 330-1423, dogged by
conspiracies, lies etc]
45
Reliable Client-Server Communication
Client-Server semantics works fine providing client and server do not
fail, i.e. continue to run.
In the case of process failure the following situations need to be dealt with:
Client unable to locate server
Client request to server is lost
Server crash after receiving client request
Server reply to client is lost
Client crash after sending server request
46
Client unable to locate server, e.g. server down, or server has changed
Solution
- Use an exception handler – but this is not always possible in the
programming language used
Client request to server is lost
Solution
- Use a timeout to await server reply, then re-send – but be careful about
idempotent operations
- If multiple requests appear to get lost assume ‘cannot locate server’ error
47
Server crash after receiving client request
Problem may be not being able to tell if request was carried out (e.g. client
requests print page, server may stop before or after printing, before
acknowledgement)
Options
- rebuild server and retry client request (assuming ‘at least once’ semantics
for request)
- give up and report request failure (assuming ‘at most once’ semantics)
what is usually required is exactly once samantics, but this difficult to
guarantee
Server reply to client is lost
Client can simply set timer and if no reply in time assume server down,
request lost or server crashed during processing request.
If possible make operations idempotent (repeatable)
48
Client crash after sending server request
Server unable to reply to client (orphan request)
Options and Issues:
- Extermination of orphan servers, but this can escalate creating grand
orphans. Also there may be files open etc
- Reincarnation. Time divided into epochs (large intervals). When client
restarts it broadcasts to all, and starts a new time epoch. Servers dealing
with client requests from a previous epoch can be terminated. Also
unreachable servers (e.g. in different network areas) may later reply, but will
refer to obsolete epoch numbers.
- Gentle reincarnation, as above but an attempt is made to contact the client
owner (e.g. who may be logged out) to take action
Expiration, server times out if client cannot be reached to return reply
49
Client Server Call Semantics
At least once
- request can be transmitted end executed more than once as the operations
are idempotent i.e. op(state) = op( op(state) )
At most once
- duplicate requests are ignored (filtered out) but server reply is re-sent,
without reapplying the request/operation
Maybe
- Implies no fault tolerance, no guarantee of reply from server is promised. If
server crashes, clients gets no help.
Exactly Once
- Is the ideal, but usually there is no way to arrange this.
50
Fault Tolerance in Distributed Systems
see ITU-T X.700:
Fault Management
System attributes:
· Availability – system always ready for use, or probability that system
is ready or available at a given time
· Reliability – property that a system can run without failure, for a given
time
· Safety – indicates the safety issues in the case the system fails
· Maintainability – refers to the ease of repair to a failed system
Failure in a distributed system = when a service cannot be fully provided
System failure may be partial
A single failure may affect other parts of a system (failure escalation)
51
Faults, Errors and Failures
Fault
Error
causes
Fault – is a defect within
the system
Error – is observed by a
deviation from the expected
behaviour of the system
Failure occurs when the
system can no longer
perform as required (does
not meet spec)
Fault Tolerance – is ability
of system to provide a
service, even in the
presence of errors
52
Failure
results in
Types of Fault
Hard or Permanent – repeatable error, e.g. failed
component, power fail, fire, flood, design error
(usually software), sabotage
Soft Fault
Transient – occurs once or seldom, often due to
unstable environment (e.g. bird flies past microwave
transmitter)
Intermittent – occurs randomly, but where factors
influencing fault are not clearly identified, e.g.
unstable component
Operator error – human error
Hardware Fault Models
Example from Digital Logic Design
 Stuck At fault, e.g. using AND(.) and OR(+) gates
X=A.B
with A=0 X=0
(since 0.B=0)
A=1 X=B
(since 1.B=B)
X=A+B
with A=0 X=B
(since 0+B=B)
A=1 X=1
(since 1+B=1)
 Bridging fault – two inputs become joined or short-circuited (possibly model
using ‘wired OR’ or ‘wired AND’ function)
 Stuck Open – output becomes fixed, and independent of the inputs
e.g X=A.B represented as X=1
 Observability – ability to see effect of a fault (system outputs are sufficient to
determine system state)
 Controllability – ability to control system state via the system inputs
Testability theory from
digital systems design
53
Hardware Failure Terminology
MTBF
MTTD
fault
occurs



error
noticed
MTTR
error
diagnosed
time
repair
fault
occurs
error
noticed
MTTD – mean time to detection
MTTR – mean time to repair
MTBF – mean time between faults, a measure of system reliability
e.g. component MTBF=5days and MTTR=4 hours
what is component availability?
Ans: prob of component non-availability
4
4


 0.032
5  24  4 124
so availability=1-0.032=0.968 (96.8%)
54
Software Faults
· Program code (may) contains ‘bugs’ if actual behaviour disagrees with
the intended specification. These faults may occur from:
o specification error
o design error
o coding error, e.g. use on un-initialized variables
o integration error
o run time error e.g. operating system stack overflow, divide by zero
Software failure is (usually) deterministic, i.e. predictable, based on the
state of the system. There is no random element to the failure – unless
the system state cannot be specified precisely. A non-deterministic fault
behaviour usually indicates that the relevant system state parameters
have not been identified.
Fault coverage – defines the fraction of possible faults that can be
detected by testing (statement, condition or structural analysis)
55
Distributed, Parallel and Concurrency Concepts
Distributed
Physically separate autonomous processors that interact and collaborate
Parallel
Processing occurring on more than one processor within the same time
frame
Concurrent
Processing occurring on more than one processor that is synchronized in
real-time
Parallel Computing Concepts
•
Multiple parallel or concurrent processes that are running and
cooperating at the same time (spinning plates analogy)
•
Concept of Speedup:
SpeedupP  
Time with 1 processor
P

Time with P processor 1  P - 1 f
•
This is Amdahl’s law and is a function of the number of processors and
the fraction of code that must run sequentially
•
Deal with communication and synchronization in lecture 2.
With 1 processor
f sT
1 - f s T
time for job = T
serial
part
proc 1
proc 2
f sT
parallel
part
1 - f s  T
P
1 - f s  T
P
With P processors
...
proc P
1 - f s  T
P
time for job = f sT  1 - f s 
T
P
Deriving the Amdahl Speedup formulae
T
T1
PT1
P
P
S p P   1 



T p fT1  1 - f T1 P fPT1  1 - f T1 fP  1 - f  1  P - 1 f
Speedup:
In limit:
E p P  
Efficiency:
f=0.01
S p P 
P

1
f
f=0.20
message passing speedup
message passing speedup
message passing speedup
5
14
12
20
lim P  
S p P  
1
1  P - 1 f
f=0.05
25
S p P 
4
15
10
5
8
Speedup
Speedup
Speedup
10
Series1
6
4
1
3
5
7
0
9 11 13 15 17 19 21 23 25 27 29
P
Series1
Series1
2
1
2
0
3
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 291
P
3
5
7
9 11 13 15 17 19 21 23 25 27 29
P
Example problem
•
•
•
98% of the code for a problem can run in parallel. If the time taken with 1
processor is 12 seconds, calculate:
a) The time required, the speedup and efficiency where 20 and 80 processors
are used, respectively.
a) Solution:
f  0.02 T1  12
P  20
T
P
T p  fT1  1 - f T1 P  S p P   1 
T p 1  P - 1 f
•
T20  fT1  1 - f T1 P  0.828
T
12
S p 20   1 
 14.5
T p .828
T80  fT1  1 - f T1 P  0.381
T
12
S p 80   1 
 31
T p 0.381
E p 20  
E p 80  
S p 20 
P
 0.725
S p 80 
P
 0.39
b)
Determine the minimum number of processors required in order to
achieve an efficiency of 0.5
Example problem ..cotd
•
•
98% of the code for a problem can run in parallel. If the time taken with 1
processor is 12 seconds, calculate:
b)
Determine the minimum number of processors required in order to
achieve an efficiency of 0.5
•
b) solution:
•
E p P  
1
 0.5 with
1  P - 1 f
f  0.02
has solution P  51
Distributed System Design – for message passing
program design
•
•
•
•
•
•
•
•
Start with sequential program
Determine dependencies the identify code that can execute concurrently
• Need to understand algorithm
• Exploit inherent parallelism
• My require some algorithm restructuring
Decompose problem using control (functional) or data parallelism
Consider available machine architecture
Choose programming paradigm
Determine communication and add message passing code
Compile and test
Optimise performance
• Measure performance
• Locate bottlenecks
• Minimise message passing
• Load balance
62