ECI, July 2005

Download Report

Transcript ECI, July 2005

Technologies for
Cluster Computing
Oren Laadan
Columbia University
<[email protected]>
ECI, July 2005
Course Overview (contd)

What is Cluster Computing ?


Middleware



SSI, operating system support, software support
Virtualization & Process Migration
Resource sharing


Parallel computing, enabling technologies,
definition of a cluster, taxonomy
Job assignment, load balancing, information
dissemination
Grids
ECI – July 2005
2
Motivation

Demanding Applications
Modeling and simulations (physics, weather,
CAD, aero-dynamics, finance, pharmaceutical)
 Business and E-commerce (Ebay, Oracle)
 Internet (Google, eAnything)
 Number crunching (encryption, data mining)
 Entertainment (animation, simulators)


CPUs are reaching physical limits
Dimensions
 Heat dissipation

ECI – July 2005
3
How to Run Applications Faster ?

3 ways to improve performance:
Work Harder
 Work Smarter
 Get Help


And in computers:
Using faster hardware
 Optimized algorithms and techniques
 Multiple computers to solve a particular task

ECI – July 2005
4
Parallel Computing

Hardware: Instructions or Data ?
SISD – classic cpu
 SIMD – vector computers
 MISD – pipelined computers
 MIMD – general purpose parallelism


Sofware ajdustments
Parallel programming: multiple processes
collaborating, with communication and
synchronization between them
 Operating systems, compilers etc.

ECI – July 2005
5
Parallel Computer Architectures
Taxononmy of MIMD:




SMP - Symmetric Multi Processing
MPP - Massively Parallel Processors
CC-NUMA - Cache-Coherent NonUniform Memory Access
Distributed Systems



COTS – Commodity Off The Shelf
NOW – Network of Workstations
Clusters
ECI – July 2005
6
Taxononmy of MIMD (contd)

SMP





2-64 processors today
Everything shared
Single copy of OS
Scalability issues (hardware, software)
MPP




Nothing shared
Several hundred nodes
Fast interconnection
Inferior cost/performance ratio
ECI – July 2005
7
Taxonomy of MIMD (contd)

CC-NUMA



Scalable multiprocessor system
Global view of memory at each node
Distributed systems




Conventional networks of independent nodes
Multiple system images and OS
Each node can be of any type (SMP, MPP etc)
Difficult to use and extract performance
ECI – July 2005
8
Taxonomy of MIMD (contd)

Clusters
Nodes connected with high-speed network
 Operate as an integrated collection of
resources
 Single system image
High performance computing – commodity
super computing
High availability computing – missions
critical applications

ECI – July 2005
9
Taxonomy of MIMD - summary
ECI – July 2005
10
Enabling Technologies

Performance of individual components







Microprocessor (x2 every 18 months)
Memory capacity (x4 every 3 years)
Storage (capacity same !) – SAN, NAS
Network (scalable gigabit networks)
OS, Programming environments
Applications
Rate of performance improvements
exceeds specialized systems
ECI – July 2005
11
The “killer” workstation

Traditional usage



Workstations w/ Unix for science & industry
PC’s for administrative work & work processing
Recent trend


Rapid convergence in processor performance and
kernel-level functionality of PC vs Workstations
Killer CPU, killer memory, killer network, killer OS,
killer applications…
ECI – July 2005
12
Computer Food Chain
ECI – July 2005
13
Towards Commodity HPC




Link together multiple computers to jointly
solve a computational problem
Ubiquitous availability of commodity high
performance components
Out: expensive and specialized proprietary
and parallel computers
In: cheaper clusters of loosely coupled
workstations
ECI – July 2005
14
History of Cluster Computing
PDA
Clusters
1960
ECI – July 2005
1980s
1990
1995+
2000+
15
Why PC/WS Clustering Now ?








Individual PCs/WS become increasing powerful
Development cycle of supercomputers too long
Commodity networks bandwidth is increasing and
latency is decreasing
Easier to integrate into existing networks
Typical low user utilization of PCs/WSs ( < 10% )
Development tools for PCs/WS are more mature
PCs/WS clusters are cheap and readily available
Clusters can leverage from future technologies and
be easily grown
ECI – July 2005
16
What is a Cluster ?


Cluster - a parallel or distributed processing
system, which consists of a collection of
interconnected stand-alone computers
cooperatively working together as a single,
integrated computing resource.
Each node in the cluster is



A UP/MP system with memory, I/O facilities, & OS
Connected via fast interconnect or LAN
Appear as a single system to users and applications
ECI – July 2005
17
Cluster Architecture
Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications
Parallel Programming Environment
Cluster Middleware
(Single System Image and Availability Infrastructure)
PC/Workstation
PC/Workstation
PC/Workstation
PC/Workstation
Communications
Communications
Communications
Communications
Software
Software
Software
Software
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Cluster Interconnection Network/Switch
ECI – July 2005
18
A Winning Symbiosis

Parallel Processing
Create MPP or DSM –like parallel processing systems

Network RAM
Use cluster-wide available memory to aggregate a
substantial cache in RAM

Software RAID
Use arrays of WS disks to provide cheap, highly
available and scalable storage and parallel IO

Multi-path communications
Use multiple networks for parallel file transfer
ECI – July 2005
19
Design Issues









Cost/performance ratio
Increased Availability
Single System Image (look-and-feel of one system)
Scalability (physical, size, performance, capacity)
Fast communication (network and protocols)
Resource balancing (cpu, network, memory, storage)
Security and privacy
Manageability (administration and control)
Usability and applicability (programming environment,
cluster-aware apps)
ECI – July 2005
20
Cluster Objectives

High performance
Usually dedicated clusters for HPC
Partitioning between users

High throughput
Steal idle cycles (cycle harvesting)
Maximum utilization of available resources

High availability
Fail-over configuration
Heartbeat connections

Combined: HP nd HA
ECI – July 2005
21
Example: MOSIX at HUJI
ECI – July 2005
22
Example: Berkeley NOW
ECI – July 2005
23
Cluster Components
•
•
•
•
•
•
•
•
Nodes
Operating System
Network
Interconnects
Communication protocols & services
Middleware
Programming models
Applications
Cluster Components: Nodes

Multiple High Performance Computers




PCs
Workstations
SMPs (CLUMPS)
Processors




Intel/AMD x86 Processors
IBM PowerPC
Digital Alpha
Sun SPARC
ECI – July 2005
25
Cluster Components: OS

Basic services:




Easy access to hardware
Share hardware resources seemlessly
Concurrency (multiple threads of control)
Operating Systems:






Linux
Microsoft NT
SUN Solaris
Mach (-kernel)
Cluster OS
OS gluing layers
ECI – July 2005
(Beowulf, and many more)
(Illinois HPVM, Cornell Velocity)
(Berkeley NOW, C-DAC PARAM)
(CMU)
(Solaris MC, MOSIX)
(Berkeley Glunix)
26
Cluster Components: Network

High Performance Networks/Switches








Ethernet (10Mbps), Fast Ethernet (100Mbps),
Gigabit Ethernet (1Gbps)
SCI (Scalable Coherent Interface- 12µs latency)
ATM (Asynchronous Transfer Mode)
Myrinet (1.2Gbps)
QsNet (5µsec latency for MPI messages)
FDDI (fiber distributed data interface)
Digital Memory Channel
InfiniBand
ECI – July 2005
27
Cluster Components:
Interconnects

Standard Ethernet



Fast Ethernet, and Gigabit Ethernet



10 Mbps, cheap, easy way deploy
bandwidth & latency don’t match CPU capabilities
Fast Ethernet – 100 Mbps
Gigabit Ethernet – 1000Mbps
Myrinet



1.28 Gbps full duplex interconnect, 5-10s latency
Programmable on-board processor
Leverage MPP technology
ECI – July 2005
28
Interconnects (contd)

Infiniband




SCI – Scalable Coherent Interface



Latency < 7s
Insdustry standard based on VIA
Connects components within a system
Interconnection technology for clusters
Directory based cache scheme
VIA – Virtual Interface Architecture

Standard for low-latency communications
software interface
ECI – July 2005
29
Cluster Interconnects: Comparison
Criteria
Gigabit
Ethernet
Gigabit
cLAN
Infiniband
Myrinet
SCI
Bandwidth (MB/s)
< 100
< 125
850
230
< 320
Latency (µs)
< 100
7-10
<7
10
1-2
Hardware
Availability
Now
Now
Now
Now
Now
Linux Support
Now
Now
Now
Now
Now
1000’s
1000’s
> 1000’s
1000’s
1000’s
Protocol
implementation
Hardware
Firmware
on adaptor
Hardware
Firmware
on adaptor
Firmware
on adaptor
VIA support
NT / Linux
NT / Linux
Software
Linux
Software
MPI support
MVICH
3rd party
MPI/Pro
3rd party
3rd party
Max # of nodes
ECI – July 2005
30
Cluster Components:
Communication protocols

Fast Communication Protocols (and user Level
Communication):
 Standard TCP/IP, 0-Copy TCP/IP
 Active Messages (Berkeley)
 Fast Messages (Illinois)
 U-net (Cornell)
 XTP (Virginia)
 Virtual Interface Architecture (VIA)
ECI – July 2005
31
Cluster Components:
Communication services

Communication infrastructure




Provide important QoS parameters


Bulk-data transport
Streaming data
Group communications
Latency, bandwidth, reliability, fault-tolerance
Wide range of communication methodologies



RPC
DSM
Stream-based and message passing (e.g., MPI, PVM)
ECI – July 2005
32
Cluster Components: Middleware


Resides between the OS and the
applications
Provides infrastructure to transparently
support:

Single System Image (SSI)
Makes collection appear as a single machine

System Availability (SA)
Monitoring, checkpoint, restart, migration

Resource Management and Scheduling (RMS)
ECI – July 2005
33
Cluster Components:
Programming Models

Threads (PCs, SMPs, NOW..)






OpenMP
MPI (Message Passing Interface)
PVM (Parallel Virtual Machine)
Software DSMs (Shmem)
Compilers



POSIX Threads, Java Threads
Parallel code generators, C/C++/Java/Fortran
Performance Analysis Tools
Visualization Tools
ECI – July 2005
34
Cluster Components:
Applications


Sequential
Parametric Modeling


Embarrassingly parallel
Parallel / Distributed



Cluster-aware
Grand Challenging applications
Web servers, data-mining
ECI – July 2005
35
Clusters Classification (I)

Application Target

High Performance (HP) Clusters


Grand Challenging Applications
High Availability (HA) Clusters

Mission Critical applications
ECI – July 2005
36
Clusters Classification (II)

Node Ownership
Dedicated Clusters
 Non-dedicated clusters

Adaptive parallel computing
 Communal multiprocessing

ECI – July 2005
37
Clusters Classification (III)

Node Hardware

Clusters of PCs (CoPs)

Piles of PCs (PoPs)
Clusters of Workstations (COWs)
 Clusters of SMPs (CLUMPs)

ECI – July 2005
38
Clusters Classification (IV)

Node Operating System








Linux Clusters (e.g., Beowulf)
Solaris Clusters (e.g., Berkeley NOW)
NT Clusters (e.g., HPVM)
AIX Clusters (e.g., IBM SP2)
SCO/Compaq Clusters (Unixware)
Digital VMS Clusters
HP-UX clusters
Microsoft Wolfpack clusters
ECI – July 2005
39
Clusters Classification (V)

Node Configuration

Homogeneous Clusters


Semi-Homogeneous Clusters


All nodes will have similar architectures and
run the same OS
Similar architectures and OS, varying
performance capabilities
Heterogeneous Clusters

All nodes will have different architectures and
run different OSs
ECI – July 2005
40
Clusters Classification (VI)

Levels of Clustering





Group Clusters (#nodes: 2-99)
Departmental Clusters (#nodes: 10s to 100s)
Organizational Clusters (#nodes: many 100s)
National Metacomputers (WAN/Internet)
International Metacomputers (Internet-based,
#nodes: 1000s to many millions)



Grid Computing
Web-based Computing
Peer-to-Peer Computing
ECI – July 2005
41
Summary: Key Benefits

High Performance
With cluster-aware applications

High Throughput
Resource balancing and sharing

High Availability
Redundancy in hardware, OS, applications

Expandability and Scalability
Expand on-demand by adding HW
ECI – July 2005
42