Introduction - Computer Science and Engineering
Download
Report
Transcript Introduction - Computer Science and Engineering
Introduction
• What is Parallel Algorithms?
• Why Parallel Algorithms?
• Evolution and Convergence of Parallel
Algorithms
• Fundamental Design Issues
What is Parallel Algorithm?
• An algorithm which uses multiple components of hardware
and software.
• Level of parallelism
–
–
–
–
–
–
Circuits -- VLSI Arithmetic
Functional Units -- Instructional level
Processors
Processes
Tasks (Job)
Data
• Closely related to parallel architectures and computation
model
What is Parallel Architecture?
• A parallel computer is a collection of processing
elements that cooperate to solve problems fast
• Issues:
– Resource
• how large a collection?
• how powerful are the processing elements?
– Communication and Synchronization
• how do the elements cooperate and communicate?
• how are data transmitted between processors?
• what are the abstractions and primitives for cooperation?
– Performance and Scalability
• how does it all translate into performance?
• how does it scale?
Inevitability of Parallel Computing
• Application demands: Demands for computing cycles
– Scientific computing: CFD, Biology, Chemistry, Physics, ...
– General-purpose computing: Video, Graphics, CAD, Databases, TP...
• Technology Trends
– Number of transistors on chip growing rapidly
– Clock rates expected to go up only slowly
• Architecture Trends
– Instruction-level parallelism valuable but limited
– Coarser-level parallelism, as in MPs, the most viable approach
• Economics
• Current trends:
– Today’s microprocessors have multiprocessor support
– Servers and workstations becoming MP: Sun, SGI, DEC, COMPAQ!...
– Tomorrow’s microprocessors are multiprocessors
Driving Force of Parallel Architectures
• Application Trends
Applications provide wish lists
(Grand Challenge Problems)
Weather Simulation, Data Mining, N-body
Some problems require terra (10^12) to Peta flops (10^15)
• VLSI Technology Trends
Smaller minimum feature size (transistor size)
Increasing Die Size => Transistor count double every 3 year
• Architectural Trends
Trends in Computer Performance:
Annual rate of 1.35 before 1985, 1.85 after 1986 (RISC)
• Programming Model Convergence
• Economics
Application Trends
• Transition to parallel computing has occurred for
scientific and engineering computing
• In rapid progress in commercial computing
– Database and transactions as well as financial
– Usually smaller-scale, but large-scale systems also used
• Desktop also uses multithreaded programs, which
are a lot like parallel programs
• Demand for improving throughput on sequential
workloads
– Greatest use of small-scale multiprocessors
• Solid application demand exists and will increase
General Technology Trends
• Microprocessor performance increases 50% - 100% per year
• Transistor count doubles every 3 years
• DRAM size quadruples every 3 years
• Huge investment per generation is carried by huge commodity market
180
160
140
DEC
alpha
120
100
80
60
40
20
MIPS
Sun 4 M/120
260
MIPS
M2000
IBM
RS6000
540
Integer
HP 9000
750
0
1987
1988
1989
1990
1991
1992
• Not that single-processor performance is plateauing, but that
parallelism is a natural way to improve it.
FP
Technology: A Closer Look
• Basic advance is decreasing feature size ( )
– Circuits become either faster or lower in power
• Die size is growing too
– Clock rate improves roughly proportional to improvement in
– Number of transistors improves like (or faster)
• Performance > 100x per decade; clock rate 10x, rest transistor count
• How to use more transistors?
– Parallelism in processing
• multiple operations per cycle reduces CPI
– Locality in data access
Proc
$
• avoids latency and reduces CPI
• also improves processor utilization
– Both need resources, so tradeoff
• Fundamental issue is resource distribution, as in uniprocessors
Interconnect
Economics
• Commodity microprocessors not only fast but CHEAP
• Development cost is tens of millions of dollars (5-100 typical)
• BUT, many more are sold compared to supercomputers
– Crucial to take advantage of the investment, and use the commodity
building block
– Exotic parallel architectures no more than special-purpose
• Multiprocessors being pushed by software vendors (e.g.
database) as well as hardware vendors
• Standardization by Intel makes small, bus-based SMPs
commodity
• Desktop: few smaller processors versus one larger one?
– Multiprocessor on a chip
Challenge of Parallel Processing:
Micro-processors, Memory are cheap.
Connect bunches of them together.
Use either aynchonously or syncronously.
Example: each borad: 100 of 10MFLOP micro-processor
each system: 100 Board
Speed: 100GFLOP
Reason:
Cache Coherancy
Extra Design Time for Bus, interface
Compiler may not find parallelism easily
Communication and I/O complexity
Economics
Amdahl's Law
Writing Parallel Programs not easy
Amdahl's Law
Speedup S is bounded by
1
S < ------------f + (1-f)/p
p: number of PEs, f: fraction of serial part.
Example: if 10% of a program must be sequentially, maximum speed-up is 10
To achieve S=20 for p=100, 99.75% of codes should be parallel
How to overcome the law?
Increase problem size: Scaled speed-up.
Develop parallel algorithm maximizing the parallelism and minimizing the
sequentail operations.
How to Solve Communication Overhead
Increase Communication bandwith
Decrease Communication Latency
Communication Latency Hiding
Increase Granularity (Figure 8.4 of Henessey's book)