Parallel Computing Overview

Download Report

Transcript Parallel Computing Overview

Introduction to Parallel Computing:
Architectures, Systems, and
Programming
Prof. Rajkumar Buyya
Cloud Computing and Distributed Systems (CLOUDS) Lab.
The University of Melbourne, Australia
www.buyya.com
Serial Vs. Parallel Services
COUNTER 2
COUNTER
COUNTER 1
Q
Please
Overview of the Talk
 Introduction
 Why Parallel Processing ?
 Parallel System H/W Architecture
 Parallel Operating Systems
 Parallel Programming Models
 Summary
Computing Elements
Applications
Programming paradigms
Threads Interface
Operating System
Microkernel
Multi-Processor Computing System
P
P
P
P Processor
P
Thread
P
..
P
Process
Hardware
Two Eras of Computing
Architectures
System Software/Compiler
Applications
P.S.Es
Architectures
System Software
Applications
P.S.Es
Sequential
Era
Parallel
Era
1940
50
60
70
80
90
2000
Commercialization
R&D
Commodity
2030
History of Parallel Processing

The notion of parallel processing can be traced
to a tablet dated around 100 BC.


Tablet has 3 calculating positions capable of
operating simultaneously.
From this we can infer that:

They were aimed at “speed” or “reliability”.
Motivating Factor: Human Brain

The human brain consists of a large
number (more than a billion) of neural
cells that process information. Each cell
works like a simple processor and only
the massive interaction between all cells
and their parallel processing makes the
brain's abilities possible.


Individual neuron response speed is slow
(ms)
Aggregated speed with which complex
calculations carried out by (billions of)
neurons demonstrate feasibility of parallel
processing.
Why Parallel Processing?

Computation requirements are ever increasing:


simulations, scientific prediction (earthquake),
distributed databases, weather forecasting (will it
rain tomorrow?), search engines, e-commerce,
Internet service applications, Data Center
applications, Finance (investment risk analysis), Oil
Exploration, Mining, etc.
Silicon based (sequential) architectures
reaching their limits in processing capabilities
(clock speed) as they are constrained by:

the speed of light, thermodynamics
Human Architecture! Growth Performance
Vertical
Growth
Horizontal
5
10
15 20 25
30
Age
35
40
45 . . . .
Computational Power Improvement
C.P.I
Multiprocessor
Uniprocessor
1
2. . . .
No. of Processors
Why Parallel Processing?


Hardware improvements like pipelining,
superscalar are not scaling well and require
sophisticated compiler technology to exploit
performance out of them.
Techniques such as vector processing works
well for certain kind of problems.
Why Parallel Processing?


Significant development in networking
technology is paving a way for network-based
cost-effective parallel computing.
The parallel processing technology is now
mature and is being exploited commercially.

All computers (including desktops and laptops) are
now based on parallel processing (e.g., multicore)
architecture.
Processing Elements
Architecture
Processing Elements


Flynn proposed a classification of computer systems
based on a number of instruction and data streams that
can be processed simultaneously.
They are:

SISD (Single Instruction and Single Data)


SIMD (Single Instruction and Multiple Data)


Data parallel, vector computing machines
MISD (Multiple Instruction and Single Data)


Conventional computers
Systolic arrays
MIMD (Multiple Instruction and Multiple Data)

General purpose machine
SISD : A Conventional Computer
Instructions
Data Input

Processor
Data Output
Speed is limited by the rate at which computer can
transfer information internally.
Ex: PCs, Workstations
The MISD Architecture
Instruction
Stream A
Instruction
Stream B
Instruction Stream C
Processor
Data
Output
Stream
A
Data
Input
Stream
Processor
B
Processor
C
More of an intellectual exercise than a practical
configuration. Few built, but commercially not available

SIMD Architecture
Instruction
Stream
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Processor
A
Data Output
stream B
Processor
B
Processor
C
Data Output
stream C
Ci<= Ai * Bi
Ex: CRAY machine vector processing, Thinking machine cm*
Intel MMX (multimedia support)
MIMD Architecture
Instruction Instruction Instruction
Stream A Stream B Stream C
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Processor
A
Data Output
stream B
Processor
B
Processor
C
Data Output
stream C
Unlike SISD, MISD, MIMD computer works asynchronously.
Shared memory (tightly coupled) MIMD e.g., Multicore
Distributed memory (loosely coupled) MIMD
Shared Memory MIMD machine
Processor
A
M
E
M B
O U
R S
Y
Processor
B
M
E
M B
O U
R S
Y
Processor
C
M
E
M B
O U
R S
Y
Global Memory System
Communication:
Source PE writes data to GM & destination PE retrieves it
 Easy to build, conventional OSes of SISD can be easily be ported
 Limitation : reliability & expandability. A memory component or
any processor failure affects the whole system.
 Increase of processors leads to memory contention.
Ex. : Silicon graphics supercomputers and now Multicore systems
Distributed Memory MIMD
IPC
IPC
channel
channel
Processor
A



Processor
B
Processor
C
M
E
M B
O U
R S
Y
M
E
M B
O U
R S
Y
M
E
M B
O U
R S
Y
Memory
System A
Memory
System B
Memory
System C
Communication : IPC (Inter-Process Communication) via High Speed
Network.
Network can be configured to ... Tree, Mesh, Cube, etc.
Unlike Shared MIMD


easily/ readily expandable
Highly reliable (any CPU failure does not affect the whole system)
Types of Parallel Systems

Tightly Couple Systems:

Shared Memory Parallel



Distributed Memory Parallel



Smallest extension to existing
systems
Program conversion is
incremental
Completely new systems
Programs must be reconstructed
Loosely Coupled Systems:

Clusters (now Clouds)



Built using commodity systems
Centralised management
Grids


Aggregation of distributed
systems
Decentralized management
Laws of caution.....

Speed of computation is proportional to the square
root of system cost.
C
i.e. Speed = Cost
S

Speedup by a parallel computer increases as the
logarithm of the number of processors.
S
 Speedup = log2(no. of processors)
P
Caution....



Very fast development in network computing and
related area have blurred concept boundaries, causing
lot of terminological confusion: concurrent computing,
parallel computing, multiprocessing, supercomputing,
massively parallel processing, cluster computing,
distributed computing, Internet computing, grid
computing, Cloud computing, etc.
At the user level, even well-defined distinctions such as
shared memory and distributed memory are
disappearing due to new advances in technologies.
Good tools for parallel application development and
debugging are yet to emerge.
Caution....

There is no strict delimiters for contributors to
the area of parallel processing:


computer architecture, operating systems, high-level
languages, algorithms, databases, computer
networks, …
All have a role to play.
Operating Systems for
High Performance
Computing
Operating Systems for PP


MPP systems having thousands of processors
requires OS radically different from current
ones.
Every CPU needs OS :



to manage its resources
to hide its details
Traditional systems are heavy, complex and not
suitable for MPP
Operating System Models


Frame work that unifies features, services and
tasks performed
Three approaches to building OS....






Monolithic OS
Layered OS
Microkernel based OS
Client server OS
Suitable for MPP systems
Simplicity, flexibility and high performance are
crucial for OS.
Monolithic Operating System
Application
Programs
Application
Programs
System Services
User Mode
Kernel Mode
Hardware


Better application Performance
Ex: MS-DOS
Difficult to extend
Layered OS
Application
Programs
Application
Programs
System Services
User Mode
Kernel Mode
Memory & I/O Device Mgmt
Process Schedule
Hardware
 Easier to enhance
 Each layer of code access lower level interface
Ex : UNIX
 Low-application performance
Traditional OS
Application
Programs
Application
Programs
User Mode
Kernel Mode
OS
Hardware
OS Designer
New trend in OS design
Application
Programs
Application
Programs
Servers
User Mode
Kernel Mode
Microkernel
Hardware
Microkernel/Client Server OS
(for MPP Systems)
Client
Application
Thread
lib.
File
Server
Network
Server
Display
Server
User
Kernel
Microkernel
Send
Reply




Hardware
Tiny OS kernel providing basic primitive (process, memory, IPC)
Traditional services becomes subsystems
Monolithic Application Perf. Competence
OS = Microkernel + User Subsystems
Few Popular Microkernel Systems
MACH, CMU
PARAS, C-DAC
Chorus
QNX
(Windows)
Parallel Programs


Consist of multiple active “processes”
simultaneously solving a given problem.
And the communication and synchronization
between them (parallel processes) forms the
core of parallel programming efforts.
Parallel Programming Models

Shared Memory Model




Message Passing Model



PVM
MPI
Hybrid Model



DSM
Threads/OpenMP (enabled for clusters)
Java threads (HKU JESSICA, IBM cJVM)
Mixing shared and distributed memory model
Using OpenMP and MPI together
Object and Service Oriented Models

Wide area distributed computing technologies


OO: CORBA, DCOM, etc.
Services: Web Services-based service composition
Summary/Conclusions

Parallel processing has become a reality:



E.g., SMPs are used as (Web) Servers extensively.
Threads concept utilized everywhere.
Clusters have emerged as popular data centers and
processing engines:


E.g., Google search engine.
The emergence of commodity highperformance CPU, networks, and OSs have
made parallel computing applicable to
enterprise and consumer applications.


E.g., Oracle {9i,10g} database on Clusters/Grids.
E.g. Facebook and Twitter running on Clouds