Parallel, Distributed, and Multithreaded Computing

Download Report

Transcript Parallel, Distributed, and Multithreaded Computing

Parallel Processing:
Architecture Overview
Subject Code: 433-498
Rajkumar Buyya
Grid Computing and Distributed Systems (GRIDS) Lab.
The University of Melbourne
Melbourne, Australia
www.gridbus.org
WW Grid
Overview of the Talk





Why Parallel Processing ?
Parallel Hardwares
Parallel Operating Systems
Parallel Programming Paradigms
Grand Challenges
Computing Elements
Applications
Programming paradigms
Threads Interface
Operating System
Microkernel
Multi-Processor Computing System
P
P
P
P Processor
P
Thread
P
..
P
Process
Hardware
Two Eras of Computing
Architectures
System Software/Compiler
Applications
P.S.Es
Architectures
System Software
Applications
P.S.Es
Sequential
Era
Parallel
Era
1940
50
60
70
80
90
2000
Commercialization
R&D
Commodity
2030
History of Parallel Processing

PP can be traced to a tablet dated around
100 BC.


Tablet has 3 calculating positions.
Infer that multiple positions:

Reliability/ Speed

Motivating factors
Just as we learned to fly, not by
constructing a machine that flaps its
wings like birds, but by applying
aerodynamics principles
demonstrated by the nature...

We modeled PP after those of
biological species.
Motivating Factors
Aggregated speed with
which complex calculations
carried out by neurons-individual
response is slow (ms) –
demonstrate feasibility of PP
Why Parallel Processing?
Computation requirements are ever
increasing -- visualization, distributed
databases, simulations, scientific
prediction (earthquake), etc.
Sequential architectures reaching
physical limitation (speed of light,
thermodynamics)
Human Architecture! Growth Performance
Vertical
Growth
Horizontal
5
10
15 20 25
30
Age
35
40
45 . . . .
Computational Power
Improvement
C.P.I.
Multiprocessor
Uniprocessor
1
2. . . .
No. of Processors
Why Parallel Processing?
The Tech. of PP is mature and can be
exploited commercially; significant
R & D work on development of tools &
environment.
Significant development in Networking
technology is paving a
heterogeneous computing.
way
for
Why Parallel Processing?
Hardware
improvements
like
Pipelining, Superscalar, etc., are nonscalable and requires sophisticated
Compiler Technology.
Vector
Processing works well for
certain kind of problems.
Parallel Program has &
needs ...
 Multiple
“processes” active simultaneously
solving a given problem, general multiple
processors.
 Communication
and synchronization of its
processes (forms the core of parallel
programming efforts).
Processing Elements
Architecture
Processing Elements
 Simple classification by Flynn:
(No. of instruction and data streams)




SISD - conventional
SIMD - data parallel, vector computing
MISD - systolic arrays
MIMD - very general, multiple approaches.
 Current focus is on MIMD model, using
general purpose processors.
(No shared memory)
SISD : A Conventional Computer
Instructions
Data Input

Processor
Data Output
Speed is limited by the rate at which computer can
transfer information internally.
Ex:PC, Macintosh, Workstations
The MISD Architecture
Instruction
Stream A
Instruction
Stream B
Instruction Stream C
Processor
Data
Output
Stream
A
Data
Input
Stream
Processor
B
Processor
C
More of an intellectual exercise than a practicle
configuration. Few built, but commercially not available

SIMD Architecture
Instruction
Stream
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Processor
A
Data Output
stream B
Processor
B
Processor
C
Data Output
stream C
Ci<= Ai * Bi
Ex: CRAY machine vector processing, Thinking machine cm*
Intel MMX (multimedia support)
MIMD Architecture
Instruction Instruction Instruction
Stream A Stream B Stream C
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Processor
A
Data Output
stream B
Processor
B
Processor
C
Data Output
stream C
Unlike SISD, MISD, MIMD computer works asynchronously.
Shared memory (tightly coupled) MIMD
Distributed memory (loosely coupled) MIMD
Shared Memory MIMD machine
Processor
A
M
E
M B
O U
R S
Y
Processor
B
M
E
M B
O U
R S
Y
Processor
C
M
E
M B
O U
R S
Y
Global Memory System
Comm: Source PE writes data to GM & destination retrieves it
 Easy to build, conventional OSes of SISD can be easily be ported
 Limitation : reliability & expandibility. A memory component or
any processor failure affects the whole system.
 Increase of processors leads to memory contention.
Ex. : Silicon graphics supercomputers....
Distributed Memory MIMD
IPC
IPC
channel
channel
Processor
A



Processor
B
Processor
C
M
E
M B
O U
R S
Y
M
E
M B
O U
R S
Y
M
E
M B
O U
R S
Y
Memory
System A
Memory
System B
Memory
System C
Communication : IPC on High Speed Network.
Network can be configured to ... Tree, Mesh, Cube, etc.
Unlike Shared MIMD


easily/ readily expandable
Highly reliable (any CPU failure does not affect the whole system)
Laws of caution.....

Speed of computers is proportional to the square of
their cost.
i.e. cost =
Speed
C
(speed = cost2)
S

Speedup by a parallel computer increases as the
logarithm of the number of processors.
 Speedup = log2(no. of processors) S
P
Caution....
 Very fast development in PP and related area
have blurred concept boundaries,
of terminological confusion :
computing/ programming, parallel
processing,
multiprocessing,
computing, etc.
causing lot
concurrent
computing/
distributed
It’s hard to imagine a field
that changes as rapidly as
computing.
Caution....
Computer Science is Immature Science.
(lack of standard taxonomy, terminologies)
Caution....


Even well-defined distinctions like
shared memory and distributed
memory are merging due to new
advances in technolgy.
Good environments for developments
and debugging are yet to emerge.
Caution....
 There
is no strict delimiters for
contributors to the area of parallel
processing : CA,OS, HLLs, databases,
computer networks, all have a role to
play.
 This makes it a Hot Topic of Research
Operating Systems for
High Performance
Computing
Types of Parallel Systems
Shared Memory Parallel
Smallest extension to existing systems
Program conversion is incremental
Distributed Memory Parallel
Completely new systems
Programs must be reconstructed
Clusters
Slow communication form of Distributed
Operating Systems for PP


MPP systems having thousands of
processors requires OS radically
different fromcurrent ones.
Every CPU needs OS :



to manage its resources
to hide its details
Traditional systems are heavy,
complex and not suitable for MPP
Operating System Models


Frame work that unifies features,
services and tasks performed
Three approaches to building OS....



Monolithic OS
Layered OS
Microkernel based OS
Client server OS
Suitable for MPP systems

Simplicity, flexibility and high
performance are crucial for OS.
Monolithic Operating
System
Application
Programs
Application
Programs
User Mode
Kernel Mode
System Services
Hardware


Better application Performance
Ex: MS-DOS
Difficult to extend
Layered OS
Application
Programs
Application
Programs
System Services
User Mode
Kernel Mode
Memory & I/O Device Mgmt
Process Schedule
Hardware
 Easier to enhance
 Each layer of code access lower level interface
Ex : UNIX
 Low-application performance
Traditional OS
Application
Programs
Application
Programs
User Mode
Kernel Mode
OS
Hardware
OS Designer
New trend in OS design
Application
Programs
Application
Programs
Servers
User Mode
Kernel Mode
Microkernel
Hardware
Microkernel/Client Server OS
(for MPP Systems)
Client
Application
Thread
lib.
File
Server
Network
Server
Display
Server
User
Kernel
Microkernel
Send
Reply




Hardware
Tiny OS kernel providing basic primitive (process, memory, IPC)
Traditional services becomes subsystems
Monolithic Application Perf. Competence
OS = Microkernel + User Subsystems
Ex: Mach, PARAS, Chorus, etc.
Few Popular Microkernel Systems
MACH, CMU
PARAS, C-DAC
Chorus
QNX,
(Windows)