Cluster - University of Technology

Download Report

Transcript Cluster - University of Technology

Cluster
Trần Hữu Lộc (00706140)
Nguyễn Thành Trung(00706151)
1
Outline





Introduction
Cluster architectures
System Design
Parallel Programming Environments and
Tools
Cluster Applications
2
Introduction




Solving grand challenge applications using
computer modeling, simulation and analysis
(Weather Forecasting, Military Applications,
Simulation, astrophysics …)
Mini computers were large and expensive
The development of powerful microprocessors
High speed LAN
3
How to Run Applications Faster ?



Using faster hardware
Optimized algorithms and techniques used to
solve computational tasks
Multiple computers to solve a particular task
4
History






In the 1960s, or even late 1950s
Research clusters in hand with that of both networks
and the Unix operating system from the early 1970s
The first commercial clustering product was ARCnet,
developed by Datapoint in 1977
VAXcluster in 1984
Tandem Himalaya and the IBM S/390 Parallel
Sysplex in 1994
…
5
What is Cluster ?



A cluster is a type of parallel or distributed processing system,
which consists of a collection of interconnected stand-alone
computers cooperatively working together as a single, integrated
computing resource.
A node:a single or multiprocessor system with memory, I/O
facilities, & OS
A cluster:
 generally 2 or more computers (nodes) connected together in a
single cabinet, or physically separated & connected via a LAN
 Provide a cost-effective way to gain features and benefits
6
Cluster Architecture
Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications
Parallel Programming Environment
Cluster Middleware
(Single System Image and Availability Infrastructure)
PC/Workstation
PC/Workstation
PC/Workstation
PC/Workstation
Communications
Communications
Communications
Communications
Software
Software
Software
Software
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Cluster Interconnection Network/Switch
7
System Design





Performance Requirements
Hardware Platforms
Operating Systems
Single System Image (SSI)
Middleware
8
Performance Requirements

Common Cluster Modes




High Performance (dedicated).
High Throughput (idle cycle harvesting).
High Availability (fail-over).
A Unified System – HP and HA within the same
cluster
9
Performance Requirements

The Need for Performance Evaluation




Hardware – Idle processors due to conflicts over memory
access & communications paths.
Operating System – Inefficient internal scheduler, file
systems and memory allocation/de-allocation.
Middleware – Inefficient distribution and coordination of
tasks, high inter-processor communications latency due to
inefficient middleware.
Applications – Inefficient algorithms that do not exploit the
natural concurrency of a problem.
10
Performance Requirements

Some indices for global measurements

Execution rate: The execution rate measures the
machine output per unit of time, measured in MIPS
(million instructions per second)
Speedup (Sp)

Efficiency (Ep)


Sp 
Ep 
T1
Tp
Sp
P
R p , U p , Qp
11
Hardware Platforms
 Multiple
High Performance
Computers
PCs
 Workstations
 SMPs (CLUMPS)

12
Hardware Platforms

Processors
 Intel x86 Processors
 Pentium Pro and Pentium Xeon
 AMD x86, Cyrix x86, etc.
 Digital Alpha
 Alpha 21364 processor integrates processing, memory controller,
network interface into a single chip
 IBM PowerPC
 Sun SPARC
 SGI MIPS
 HP PA
13
Network Technology

Communication Protocols
 Connection-oriented or connectionless
 Offering various levels of reliability, including fully guaranteed to
arrive in order (reliable), or not guaranteed (unreliable)
 Not buffered (synchronous), or buffered (asynchronous)



Internet Protocols: TCP/IP, UDP
Low-latency Protocols:Active Messages, Fast Messages, the
VMMC (Virtual Memory-Mapped
Communication) system, U-net, and Basic Interface for
Parallelism (BIP),
14
Network Technology

Hardware Products
 Ethernet (10Mbps),
 Fast Ethernet (100Mbps),
 Gigabit Ethernet (1Gbps)
 SCI (Scalable Coherent Interface- MPI- 12µsec latency)
 ATM (Asynchronous Transfer Mode)
 Myrinet (1.28Gbps)
 QsNet (Quadrics Supercomputing World, 5µsec latency for MPI
messages)
 Digital Memory Channel
 FDDI (fiber distributed data interface)
 InfiniBand
15
Operating Systems


The operating system for a cluster lies at every node
2 fundamental services for users

make the computer hardware easier to use


share hardware resources among users


create a virtual machine that differs markedly from the real
machine
Processor - multitasking
The new concept in OS services



support multiple threads of control in a process itself
parallelism within a process
multithreading
16
Operating Systems
17
Operating Systems

Node Operating System








Linux Clusters (e.g., Beowulf)
Solaris Clusters (e.g., Berkeley NOW)
NT Clusters (e.g., HPVM)
AIX Clusters (e.g., IBM SP2)
SCO/Compaq Clusters (Unixware)
Digital VMS Clusters
HP-UX clusters
Microsoft Wolfpack clusters
18
Single System Image (SSI)




Hides the heterogeneous and distributed
nature of the available resources, presents
them to users and applications as a single
unified computing resource
High availability
Transparency of resource management
Scalable performance
19
Single System Image (SSI)

Services and Benefits









Single entry point
Single user interface
Single process space
Single I/O space (SIOS)
Single file hierarchy
Single virtual networking
Single job-management system
Single control point and management
Checkpointing and Process Migration
20
Middleware
Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications
Parallel Programming Environment
Cluster Middleware
(Single System Image and Availability Infrastructure)
PC/Workstation
PC/Workstation
PC/Workstation
PC/Workstation
Communications
Communications
Communications
Communications
Software
Software
Software
Software
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Cluster Interconnection Network/Switch
21
Middleware

Introduction
+ A layer of software sandwiched between the operating system
and applications.
+ A means of integrating software applications running in a
heterogeneous environment.

Heterogeneity
+ Hardware platform become heterogeneous
+ Must support very different applications

Overview
+ Help application developer overcome these heterogeneities.
+ Provides services for the management and administration of a
heterogeneous system
22
Middleware – Technological scope







Message-based Middleware
RPC-based Middleware
CORBA
OLE/COM
Internet Middleware
Java Technologies
Cluster Management Software
23
Middleware – Technological scope

Message-based Middleware
+ Uses common communications protocol to exchange data
between applications which hides low level message passing
primitives from application developer
+ Parallel Virtual Machine (PVM) and MPI

RPC-based Middleware
+ Remote Procedure Call (RPC) allows request process directly
executing a procedure on another and receive a response
+ use Marshalling to transfer data structures in RPC from one to
another
+ Network Information Services [9] (NIS) and Network File Services
[10] (NFS)
24
Middleware – Technological scope

COBRA
+ An architectural framework that specifies the mechanisms for
processing distributed objects
+ Object Management Architecture (OMA): Object Request Broker
(ORB), Object services, Application services, Application objects.

COM/OLE
+ Object Linking and Embedding (OLE): highly generic object
model and a set of interfaces (Object Oriented) allowing apps to
intercommunicate
+ Component Object Model (COM) model defines mechanisms for
the creation of objects and communication between clients and
objects that are distributed across distributed environment.
25
Middleware – Technological scope

Internet Middleware
+ HyperText Transport Protocol (HTTP) and Common Gateway
Interface (CGI), v.v.

Java Technologies
+ Java Remote Method Invocation (RMI)
+ Jini: a set of APIs and network protocols used to create and
deploy distributed systems organized as federations of services

Cluster Management Software
+ Administer and manage jobs submitted to workstation clusters
+ Optimize the use of the available resources, set priority, steal
CPU cycle, task-migration, ensure task complete
26
System Administration

Introduction

Manageability of a system: how usable in terms of
actually producing computations value and what
“comfort level” for users
Computer science research: performance testing,
benchmarking, and software tuning.
Production-computing environment: provide
reliable computing cycles with dependable
networking, application software, and OS
Good systems manageability will directly equate
to better results



27
System Administration

System Planning

Hardware Considerations: low cost/compute cycle ratio
workstations
Performance Specifications: performance testing,
benchmarking, and software tuning.
Memory speed and interleave
Processor core speed vs. bus speed
PCI bus speed and width
Multiprocessor issues: single- or multiprocessor building
blocks
Cluster Interconnect Considerations: require efficient data
transfers, effective drain on processor cycles associated with
transfers, highly optimized network interconnects






28
System Administration

Software Considerations

Remote Access: Windows (Telnet, Terminal service, IIS), Unix (SSH,
Telnet, XWindows, FTP).

System Installation: Windows (Remote Installation Service, third-party
tool: Norton Ghost, Imagecast), Unix (Linux Utility for cluster Install (LUI) of
IBM, VA SystemImager of VA Linux)

System Monitoring & Remote Control of Nodes

Probing by direct access to kernel memory
Probing by File System Interface
Collecting the Performance Information
Scalability
Optimizing the Network Traffic
Reducing The Intrusiveness





29
System Administration
30
System Administration

Remote Management: Tools and Technology

Remote monitoring and control of nodes, copy/move/remove files, remote
shutdown, restart, security maintenance, parallel execution

Scheduling Systems
31
Parallel Programming Environments
and Tools

Threads (PCs, SMPs, NOW..)



MPI (Message Passing Interface)



POSIX Threads
Java Threads
Linux, NT, on many Supercomputers
PVM (Parallel Virtual Machine)
Parametric Programming
32
Parallel Programming Environments
and Tools


Software DSMs (Shmem)
Compilers



RAD (rapid application development tools)




C/C++/Java
Parallel programming with C++ (MIT Press book)
GUI based tools for PP modeling
Debuggers
Performance Analysis Tools
Visualization Tools
33
Applications


Sequential
Parallel / Distributed (Cluster-aware app.)

Grand Challenging applications






Weather Forecasting
Quantum Chemistry
Molecular Biology Modeling
Engineering Analysis (CAD/CAM)
……………….
PDBs, web servers, data-mining
34
Operational Benefits

High Performance: aggregate computing power
across nodes to solve a problem faster.

Expandability and Scalability: easily to
expand and increase size of nodes.

High Throughput: harness the ever-growing
power of desktop computing resources while
protecting the rights and needs of their interactive
users.

High Availability: provide high availability of
service
35
References



Cluster Computing White Paper - Mark Baker,
University of Portsmouth, UK
Cluster Computing - Architectures, Operating
Systems, Parallel Processing & Programming
Languages - Richard S. Morrison
High Performance Cluster Computing: Architectures
and Systems – slide (Hai Jin and Raj Buyya)
36