Networks of Workstations - Wright State University

Download Report

Transcript Networks of Workstations - Wright State University

Networks of Workstations
Prabhaker Mateti
Wright State University
Overview
• Parallel computers
• Concurrent computation
• Parallel Methods
• Message Passing
• Distributed Shared Memory
• Programming Tools
• Cluster configurations
Prabhaker Mateti, Networks of Workstations
2
Granularity of Parallelism
• Fine-Grained Parallelism
• Medium-Grained Parallelism
• Coarse-Grained Parallelism
• NOWs (Networks of Workstations)
Prabhaker Mateti, Networks of Workstations
3
Fine-Grained Machines
• Tens of thousands of Processors
• Processors
– Slow (bit serial)
– Small (K bits of RAM)
– Distributed Memory
• Interconnection Networks
– Message Passing
• Single Instruction Multiple Data (SIMD)
Prabhaker Mateti, Networks of Workstations
4
Sample Meshes
• Massively Parallel Processor (MPP)
• TMC CM-2 (Connection Machine)
• MasPar MP-1/2
Prabhaker Mateti, Networks of Workstations
5
Medium-Grained Machines
• Typical Configurations
– Thousands of processors
– Processors have power between
coarse- and fine-grained
• Either shared or distributed
memory
• Traditionally: Research Machines
• Single Code Multiple Data (SCMD)
Prabhaker Mateti, Networks of Workstations
6
Medium-Grained Machines
• Ex: Cray T3E
• Processors
– DEC Alpha EV5 (600 MFLOPS peak)
– Max of 2048
• Peak Performance: 1.2 TFLOPS
• 3-D Torus
• Memory: 64 MB - 2 GB per CPU
Prabhaker Mateti, Networks of Workstations
7
Coarse-Grained Machines
• Typical Configurations
– Hundreds of Processors
• Processors
– Powerful (fast CPUs)
– Large (cache, vectors, multiple fast buses)
• Memory: Shared or Distributed-Shared
• Multiple Instruction Multiple Data
(MIMD)
Prabhaker Mateti, Networks of Workstations
8
Coarse-Grained Machines
• SGI Origin 2000:
– PEs (MIPS R10000): Max of 128
– Peak Performance: 49 Gflops
– Memory: 256 GBytes
– Crossbar switches for interconnect
• HP/Convex Exemplar:
– PEs (HP PA-RISC 8000): Max of 64
– Peak Performance: 46 Gflops
– Memory: Max of 64 GBytes
– Distributed crossbar switches for interconnect
Prabhaker Mateti, Networks of Workstations
9
Networks of Workstations
• Exploit inexpensive Workstations/PCs
• Commodity network
• The NOW becomes a “distributed memory
multiprocessor”
• Workstations send+receive messages
• C and Fortran programs with PVM, MPI, etc.
libraries
• Programs developed on NOWs are portable to
supercomputers for production runs
Prabhaker Mateti, Networks of Workstations
10
“Parallel” Computing
• Concurrent Computing
• Distributed Computing
• Networked Computing
• Parallel Computing
Prabhaker Mateti, Networks of Workstations
11
Definition of “Parallel”
• S1 begins at time b1, ends at e1
• S2 begins at time b2, ends at e2
• S1 || S2
– Begins at min(b1, b2)
– Ends at max(e1, e2)
– Equiv to S2 || S1
Prabhaker Mateti, Networks of Workstations
12
Data Dependency
• x := a + b; y := c + d;
• x := a + b || y := c + d;
• y := c + d;
x := a + b;
• X depends on a and b, y depends
on c and d
• Assumed a, b, c, d were
independent
Prabhaker Mateti, Networks of Workstations
13
Types of Parallelism
• Result
• Specialist
• Agenda
Prabhaker Mateti, Networks of Workstations
14
Perfect Parallelism
• Also called
– Embarrassingly Parallel
– Result parallel
• Computations that can be subdivided
into sets of independent tasks that
require little or no communication
– Monte Carlo simulations
– F(x, y, z)
Prabhaker Mateti, Networks of Workstations
15
MW Model
• Manager
–
–
–
–
Initiates computation
Tracks progress
Handles worker’s requests
Interfaces with user
• Workers
– Spawned and terminated by manager
– Make requests to manager
– Send results to manager
Prabhaker Mateti, Networks of Workstations
16
Reduction
• Combine several sub-results into
one
• Reduce r1 r2 … rn with op
• Becomes r1 op r2 op … op rn
Prabhaker Mateti, Networks of Workstations
17
Data Parallelism
• Also called
– Domain Decomposition
– Specialist
• Same operations performed on
many data elements
simultaneously
– Matrix operations
– Compiling several files
Prabhaker Mateti, Networks of Workstations
18
Control Parallelism
• Different operations performed
simultaneously on different processors
• E.g., Simulating a chemical plant; one
processor simulates the preprocessing
of chemicals, one simulates reactions in
first batch, another simulates refining
the products, etc.
Prabhaker Mateti, Networks of Workstations
19
Process communication
• Shared Memory
• Message Passing
Prabhaker Mateti, Networks of Workstations
20
Shared Memory
• Process A writes to a memory
location
• Process B reads from that memory
location
• Synchronization is crucial
• Excellent speed
Prabhaker Mateti, Networks of Workstations
21
Shared Memory
• Needs hardware support:
– multi-ported memory
• Atomic operations:
– Test-and-Set
– Semaphores
Prabhaker Mateti, Networks of Workstations
22
Shared Memory
Semantics: Assumptions
•
•
•
•
Global time is available. Discrete increments.
Shared variable s, = vi at ti, i=0,…
Process A: s := v1 at time t1
Assume no other assignment occurred after
t1.
• Process B reads s at time t and gets value v.
Prabhaker Mateti, Networks of Workstations
23
Shared Memory:
Semantics
• Value of Shared Variable
– v = v1, if t > t1
– v = v0, if t < t1
– v = ??, if t = t1
• t = t1 +- discrete quantum
• Next Update of Shared Variable
– Occurs at t2
– t2 = t1 + ?
Prabhaker Mateti, Networks of Workstations
24
Condition Variables and
Semaphores
• Semaphores
– V(s) ::= < s := s + 1 >
– P(s) ::= <when s > 0 do s := s – 1>
• Condition variables
– C.wait()
– C.signal()
Prabhaker Mateti, Networks of Workstations
25
Distributed Shared
Memory
• A common address space that all
the computers in the cluster share.
• Difficult to describe semantics.
Prabhaker Mateti, Networks of Workstations
26
Distributed Shared
Memory: Issues
• Distributed
– Spatially
– LAN
– WAN
• No global time available
Prabhaker Mateti, Networks of Workstations
27
Messages
• Messages are sequences of bytes
moving between processes
• The sender and receiver must
agree on the type structure of
values in the message
• “Marshalling” of data
Prabhaker Mateti, Networks of Workstations
28
Message Passing
• Process A sends a data buffer as a
message to process B.
• Process B waits for a message
from A, and when it arrives copies
it into its own local memory.
• No memory shared between A and
B.
Prabhaker Mateti, Networks of Workstations
29
Message Passing
• Obviously,
– Messages cannot be received before they
are sent.
– A receiver waits until there is a message.
• Asynchronous
– Sender never blocks, even if infinitely many
messages are waiting to be received
– Semi-asynchronous is a practical version of
above with large but finite amount of
buffering
Prabhaker Mateti, Networks of Workstations
30
Message Passing: Point to
Point
• Q: send(m, P)
– Send message M to process P
• P: recv(x, Q)
– Receive message from process P, and
place it in variable x
• The message data
– Type of x must match that of m
– As if x := m
Prabhaker Mateti, Networks of Workstations
31
Broadcast
• One sender, multiple receivers
• Not all receivers may receive at
the same time
Prabhaker Mateti, Networks of Workstations
32
Types of Sends
• Synchronous
• Asynchronous
Prabhaker Mateti, Networks of Workstations
33
Synchronous Message
Passing
• Sender blocks until receiver is
ready to receive.
• Cannot send messages to self.
• No buffering.
Prabhaker Mateti, Networks of Workstations
34
Message Passing: Speed
• Speed not so good
– Sender copies message into system
buffers.
– Message travels the network.
– Receiver copies message from system
buffers into local memory.
– Special virtual memory techniques
help.
Prabhaker Mateti, Networks of Workstations
35
Message Passing:
Programming
• Less error-prone cf. shared
memory
Prabhaker Mateti, Networks of Workstations
36
Message Passing:
Synchronization
• Synchronous MP:
– Sender waits until receiver is ready.
– No intermediary buffering
Prabhaker Mateti, Networks of Workstations
37
Barrier Synchronization
• Processes wait until “all” arrive
Prabhaker Mateti, Networks of Workstations
38
Parallel Software
Development
• Algorithmic conversion by
compilers
Prabhaker Mateti, Networks of Workstations
39
Development of
Distributed+Parallel Programs
• New code + algorithms
• Old programs rewritten
– in new languages that have
distributed and parallel primitives
– With new libraries
• Parallelize legacy code
Prabhaker Mateti, Networks of Workstations
40
Conversion of Legacy
Software
• Mechanical conversion by software
tools
• Reverse engineer its design, and
re-code
Prabhaker Mateti, Networks of Workstations
41
Automatically parallelizing
compilers
Compilers analyze programs and
parallelize (usually loops).
Easy to use, but with limited
success
Prabhaker Mateti, Networks of Workstations
42
OpenMP on Networks of
Workstations
• The OpenMP is an API for shared
memory architectures.
• User-gives hints as directives to
the compiler
• http://www.openmp.org
Prabhaker Mateti, Networks of Workstations
43
Message Passing Libraries
• Programmer is responsible for data
distribution, synchronizations, and
sending and receiving information
• Parallel Virtual Machine (PVM)
• Message Passing Interface (MPI)
• BSP
Prabhaker Mateti, Networks of Workstations
44
BSP: Bulk Synchronous
Parallel model
• Divides computation into supersteps
• In each superstep a processor can work
on local data and send messages.
• At the end of the superstep, a barrier
synchronization takes place and all
processors receive the messages which
were sent in the previous superstep
• http://www.bsp-worldwide.org/
Prabhaker Mateti, Networks of Workstations
45
BSP Library
• Small number of subroutines to
implement
– process creation,
– remote data access, and
– bulk synchronization.
• Linked to C, Fortran, … programs
Prabhaker Mateti, Networks of Workstations
46
Parallel Languages
• Shared-memory languages
• Parallel object-oriented languages
• Parallel functional languages
• Concurrent logic languages
Prabhaker Mateti, Networks of Workstations
47
Tuple Space: Linda
• <v1, v2, …, vk>
• Atomic Primitives
– In (t)
– Read (t)
– Out (t)
– Eval (t)
• Host language: e.g., JavaSpaces
Prabhaker Mateti, Networks of Workstations
48
Data Parallel Languages
• Data is distributed over the
processors as a arrays
• Entire arrays are manipulated:
– A(1:100) = B(1:100) + C(1:100)
• Compiler generates parallel code
– Fortran 90
– High Performance Fortran (HPF)
Prabhaker Mateti, Networks of Workstations
49
Parallel Functional
Languages
• Erlang http://www.erlang.org/
• SISAL http://www.llnl.gov/sisal/
• PCN Argonne
Prabhaker Mateti, Networks of Workstations
50
Clusters
Prabhaker Mateti, Networks of Workstations
51
Buildings-Full of
Workstations
1. Distributed OS have not taken a foot
hold.
2. Powerful personal computers are
ubiquitous.
3. Mostly idle: more than 90% of the uptime?
4. 100 Mb/s LANs are common.
5. Windows and Linux are the top two
OS in terms of installed base.
Prabhaker Mateti, Networks of Workstations
52
Cluster Configurations
• NOW -- Networks of Workstations
• COW -- Clusters of Dedicated
Nodes
• Clusters of Come-and-Go Nodes
• Beowulf clusters
Prabhaker Mateti, Networks of Workstations
53
Beowulf
• Collection of compute nodes
• Full trust in each other
– Login from one node into another
without authentication
– Shared file system subtree
• Dedicated
Prabhaker Mateti, Networks of Workstations
54
Close Cluster Configuration
High Speed Network
compute
node
compute
node
compute
node
compute
node
File
Server
node
Service Network
gateway
node
Front-end
External Network
Prabhaker Mateti, Networks of Workstations
55
Open Cluster Configuration
High Speed Network
compute
node
compute
node
compute
node
compute
node
File
Server
node
External Network
Front-end
Prabhaker Mateti, Networks of Workstations
56
Interconnection Network
• Most popular: Fast Ethernet
• Network topologies
– Mesh
– Torus
• Switch v Hub
Prabhaker Mateti, Networks of Workstations
57
Software Components
• Operating System
– Linux, FreeBSD, …
• Parallel programming
– PVM, MPI
• Utilities, …
• Open source
Prabhaker Mateti, Networks of Workstations
58
Software Structure of PC Cluster
Parallel Program
Parallel Program
Parallel Program
PARALLEL VIRTUAL MACHINE LAYER
OS LAYER
OS LAYER
OS LAYER
HARDWARE
LAYER
HARDWARE
LAYER
HARDWARE
LAYER
HIGH-SPEED NETWORK
Prabhaker Mateti, Networks of Workstations
59
Single System View
• Single system view
– Common filesystem structure view
from any node
– Common accounts on all nodes
– Single software installation point
• Benefits
– Easy to install and maintain system
– Easy to use for users
Prabhaker Mateti, Networks of Workstations
60
Installation Steps
• Install Operating system
• Setup a Single System View
– Shared filesystem
– Common accounts
– Single software installation point
• Install parallel programming packages
such as MPI, PVM, BSP
• Install utilities, libraries, and
applications
Prabhaker Mateti, Networks of Workstations
61
Linux Installation
• Linux has many distributions:
Redhat, Caldera, SuSe, Debian, …
• Caldera is easy to install
• All above upgrade with RPM
package management
• Mandrake and SuSe come with a
very complete set of software
Prabhaker Mateti, Networks of Workstations
62
Clusters with Part Time
Nodes
• Cycle Stealing: Running of jobs on a
workstation that don't belong to the
owner.
• Definition of Idleness: E.g., No
keyboard and no mouse activity
• Tools/Libraries
– Condor
– PVM
– MPI
Prabhaker Mateti, Networks of Workstations
63
Migration of Jobs
• Policies
– Immediate-Eviction
– Pause-and-Migrate
• Technical Issues
– Checkpointing: Preserving the state
of the process so it can be resumed.
– Migrating from one architecture to
another
Prabhaker Mateti, Networks of Workstations
64
Summary
• Parallel
– computers
– computation
• Parallel Methods
• Communication primitives
– Message Passing
– Distributed Shared Memory
• Programming Tools
• Cluster configurations
Prabhaker Mateti, Networks of Workstations
65