CSS434: Parallel & Distributed Computing

Download Report

Transcript CSS434: Parallel & Distributed Computing

CSS434 System Models
Textbook Ch2
Professor: Munehiro Fukuda
CSS434 System Models
1
Outline





Parallel versus distributed systems
Service layers
Platform models
Middleware models
Reasons for distributed systems
CSS434 System Models
2
Parallel v.s. Distributed Systems
Parallel Systems
Distributed Systems
Memory
Tightly coupled shared memory
UMA, NUMA
Distributed memory
Message passing, RPC, and/or used
of distributed shared memory
Control
Global clock control
SIMD, MIMD
No global clock control
Synchronization algorithms needed
Processor
interconnection
Order of Tbps
Bus, mesh, tree, mesh of tree, and
hypercube (-related) network
Order of Gbps
Ethernet(bus), token ring and SCI
(ring), myrinet(switching network)
Main focus
Performance
Scientific computing
Performance(cost and scalability)
Reliability/availability
Information/resource sharing
CSS434 System Models
3
Service Layers in Distributed
Systems
Applications, services
Middlew are
Operating s ys tem
Platform
Computer and netw ork hardw are
CSS434 System Models
4
Distributed Computing
Environment
DCE Applications
Threads
RPC
Distributed Time Service
Security Distributed File Service
Name
Platforms
CSS434 System Models
5
Platform Milestones in
Distributed Systems
1945-1950s
Loading monitor
1950s-1960s
Batch system
1960s
Multiprogramming
1960s-1970s
Time sharing systems
Multics, IBM360
1969-1973
WAN and LAN
ARPAnet, Ethernet
1960s-early1980s
Minicomputers
PDP, VAX
Early 1980s
Workstations
Alto
1980s – present
Workstation/Server models
Sprite, V-system
1990s
Clusters
Beowulf
Late 1990s
Grid computing
Globus, Legion
CSS434 System Models
6
Platforms






Minicomputer model
Workstation model
Workstation-server model
Processor-pool model
Cluster model
Grid computing
CSS434 System Models
7
Minicomputer Model
Minicomputer
Minicomputer


ARPA
net
Minicomputer
Extension of Time sharing system
 User must log on his/her home minicomputer.
 Thereafter, he/she can log on a remote machine by telnet.
Resource sharing
 Database
 High-performance devices
CSS434 System Models
8
Workstation Model
Workstation
Workstation
100Mbps
LAN
Workstation


Workstation
Workstation
Process migration
 Users first log on his/her personal workstation.
 If there are idle remote workstations, a heavy job may
migrate to one of them.
Problems:
 How to find am idle workstation
 How to migrate a job
 What if a user log on the remote machine
CSS434 System Models
9
Workstation-Server Model
Client workstations

Diskless
Workstation

Graphic/interactive applications processed in local

All file, print, http and even cycle computation
Workstation
Workstation
requests are sent to servers.

Server minicomputers

Each minicomputer is dedicated to one or more
100Gbps
different types of services.
LAN

Client-Server model of communication

RPC (Remote Procedure Call)

RMI (Remote Method Invocation)
MiniMiniMini A Client process calls a server process’
Computer Computer Computer
function.
file server http server cycle server
 No process migration invoked
 Example: NFS

CSS434 System Models
10
Processor-Pool Model

100Mbps
LAN
Server 1

Server N

Clients:
 They log in one of terminals
(diskless workstations or X
terminals)
 All services are dispatched to
servers.
Servers:
 Necessary number of processors
are allocated to each user from
the pool.
Better utilization but less interactivity
CSS434 System Models
11
Cluster Model
Workstation
Workstation
Client
 Takes a client-server
Workstation
model

Server
100Mbps
 Consists of many
LAN
PC/workstations
http server2
connected to a highhttp server N
http server1
speed network.
Slave
Master Slave Slave
 Puts more focus on
N
node
1
2
performance: serves for
requests in parallel.

1Gbps SAN
CSS434 System Models
12
Grid Computing

Workstation
Supercomputer

Cluster


Supercomputer
Cluster


Workstation
Collect computing power of
supercomputers and clusters sparsely
located over the nation and make it
available as if it were the electric grid
Distributed Supercomputing

Very large problems needing lots of CPU,
memory, etc.
High-Throughput Computing

Harnessing many idle resources
On-Demand Computing

Remote resources integrated with local
computation
Data-intensive Computing

Using distributed data
Collaborative Computing

Minicomputer
High-speed
Information high way
Goal
Workstation

Support communication among multiple parties
CSS434 System Models
13
Middleware Models
Middleware Models
Platforms
Client-server model
Workstation-server model
Services provided by multiple
servers
Cluster model
Proxy servers and caches
ISP server
Cluster model
Peer processes
Workstation model
Mobile code and agents
Workstation model
Workstation-server model
Thin clients
Processor-pool model
Cluster model
CSS434 System Models
14
Client-Server Model
Client
invocation
res ult
HTTP server
invocation
Server
File server
DNS server
Server
res ult
Client
Key:
Process :
Computer:
Workstation
Workstation
Workstation
100Gbps
LAN
MiniMiniMiniComputer
Computer
Computer
file server http server cycle server
CSS434 System Models
15
Services Provided by Multiple
Servers
Service
Server
Replication
• Availability
• Performance
Client
Server
Workstation
Workstation
Client
Server
Workstation
100Gbps
LAN
Slave
N
MasterSlave Slave
node 1
2
Ex. altavista.digital.com DB server
CSS434 System Models
1Gbps SAN
16
Proxy Servers and Caches
Web
server
Client
Proxy
server
Client
Web
server
Ex. Internet Service Provider
Workstation
Workstation
Workstation
100Gbps
LAN
MasterSlave Slave
node 1
2
Slave
N
1Gbps SAN
CSS434 System Models
17
Peer Processes
Application
Application
Coordination
code
Coordination
code
Application
Distributed whiteboard application
Coordination
code
Workstation
Workstation
Workstation
CSS434 System Models
100Gbps
LAN
Workstation
Workstation
18
Mobile Code and Agents
a) client request res ults in the dow nloading of applet c ode
Client
Applet code
Web
s erv er
b) client interac ts w ith the applet
Client
Applet
Web
s erv er
Workstation
Workstation
Workstation
100Gbps
LAN
MiniMiniMiniComputer
Computer
Computer
file server http server cycle server
CSS434 System Models
19
Network Computers and Thin
Clients
Network computer or PC
Compute server
X11
Diskless workstations
Application
Process
network
Thin
Client
Workstation
Workstation
Workstation
100Gbps
LAN
100Gbps
LAN
Server 1
MasterSlave Slave
node 1
2
Slave
N
Server N
1Gbps SAN
CSS434 System Models
20
Reasons for Distributed
Computing Systems

Inherently distributed applications


Information sharing among distributed users




Emergence of Gbit network and high-speed/cheap MPUs

Effective for coarse-grained or embarrassingly parallel applications
Reliability
Non-stopping (availability) and voting features.
Scalability


Sharing DB/expensive hardware and controlling remote lab. devices
Better cost-performance ratio / Performance


CSCW or groupware
Resource sharing


Distributed DB, worldwide airline reservation, banking system
Loosely coupled connection and hot plug-in
Flexibility

Reconfigure the system to meet users’ requirements
CSS434 System Models
21
Network v.s. Distributed
Operating Systems
Features
Network OS
Distributed OS
SSI
(Single System Image)
NO
Ssh, sftp, no view of remote
memory
YES
Process migration, NFS,
DSM (Distr. Shared memory)
Autonomy
High
Local OS at each computer
No global job coordination
Low
A single system-wide OS
Global job coordination
Fault Tolerance
Unavailability grows as faulty
machines increase.
Unavailability remains little
even if fault machines
increase.
CSS434 System Models
22
Issues in Distributed Computing System
Transparency (=SSI)





Access transparency
 Memory access: DSM
 Function call: RPC and RMI
Location transparency
 File naming: NFS
 Domain naming: DNS (Still location concerned.)
Migration transparency
 Automatic state capturing and migration
Concurrency transparency (See the next page)
 Event ordering: Message delivery and memory consistency
Other transparency:
 Failure, Replication, Performance, and Scaling
CSS434 System Models
23
Issues in Distributed Computing System
Event Ordering
s end
X
receiv e
1
m1
2
Y
receiv e
4
m2
s end
3
receiv e
Phy sical
time
receiv e
s end
Z
receiv e
receiv e
m3
A
t1
m1
m2
receiv e receiv e receiv e
t3
t2
CSS434 System Models
24
Issues in Distributed Computing System
Reliability




Faults
 Omission failure (See the next page.)
 Byzantine failure
Fault avoidance
 The more machines involved, the less avoidance capability
Fault tolerance
 Redundancy techniques
 K-fault tolerance needs K + 1 replicas
 K-Byzantine failures needs 2K + 1 replicas.
 Distributed control
 Avoiding a complete fail stop
Fault detection and recovery
 Atomic transaction
 Stateless servers
CSS434 System Models
25
Omission and Arbitrary Failure
Class of failure
Fail-stop
Affects
Process
Description
Process halts and remains halted. Other processes may
detect this state.
Crash
Process Process halts and remains halted. Other processes may
not be able to detect this state.
Omission
Channel A message inserted in an outgoing message buffer never
arrives at the other end’s incoming message buffer.
Send-omission Process A process completes a send, but the message is not put
in its outgoing message buffer.
Receive-omission Process A message is put in a process’s incoming message
buffer, but that process does not receive it.
Arbitrary
Process or Process/channel exhibits arbitrary behaviour: it may
(Byzantine)
channel send/transmit arbitrary messages at arbitrary times,
commit omissions; a process may stop or take an
incorrect step.
CSS434 System Models
26
Flexibility


User
applications
Monolithic
Kernel
(Unix)
Ease of modification
Ease of enhancement
User
applications
Monolithic
Kernel
(Unix)
User
applications
Monolithic
Kernel
(Unix)
User
applications
User
applications
User
applications
Daemons
(file, name,
Paging)
Daemons
(file, name,
Paging)
Daemons
(file, name,
Paging)
Microkernel
(Mach)
Microkernel
(Mach)
Microkernel
(Mach)
Network
Network
CSS434 System Models
27
Performance/Scalability
Unlike parallel systems, distributed systems involves OS
intervention and slow network medium for data transfer
 Send messages in a batch:


Cache data


Avoid OS intervention (= zero-copy messaging).
Avoid centralized entities and algorithms


Avoid repeating the same data transfer
Minimizing data copy


Avoid OS intervention for every message transfer.
Avoid network saturation.
Perform post operations on client sides

Avoid heavy traffic between clients and servers
CSS434 System Models
28
Heterogeneity



Data and instruction formats depend on each machine
architecture
If a system consists of K different machine types, we
need K–1 translation software.
If we have an architecture-independent standard
data/instruction formats, each different machine
prepares only such a standard translation software.

Java and Java virtual machine
CSS434 System Models
29
Security


Lack of a single point of control
Security concerns:





Messages may be stolen by an enemy.
Messages may be plagiarized by an enemy.
Messages may be changed by an enemy.
Services may be denied by an enemy.
Cryptography is the only known practical
mechanism.
CSS434 System Models
30
Exercises (No turn-in)
1.
2.
3.
4.
5.
6.
7.
In what respect are distributed computing systems superior
to parallel systems?
In what respect are parallel systems superior to distributed
computing systems?
Discuss the difference between the workstation-server and
the processor-pool model from the availability view point.
Discuss the difference between the processor-pool and the
cluster model from the performance view point.
What is Byzantine failure? Why do we need 2k+1 replica for
this type of failure?
Discuss about pros and cons of Microkernel.
Why can we avoid OS intervention by zero copy?
CSS434 System Models
31