ppt - HPC University

Download Report

Transcript ppt - HPC University

Future of High
Performance
Computing
Thom Dunning
National Center for
Supercomputing Applications
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
Outline of Presentation
• Directions in Computing Technology
• From uni-core to multi-core chips
• On to many-core chips
• From Terascale to Petascale Computing
• Science @ Petascale
• Blue Waters Petascale Computing System
• Path to Exascale Computing
• Issues for beyond petascale computing
• Take Home Lessons
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Directions in Computing
Technology
A major shift is underway in computing technology with
multicore and many-core chips
Directions in Computing Technology
Increasing Performance of
Microprocessors
Frequency (MHz)
“In the past, performance
scaling
in
conventional
single-core processors has
been accomplished largely
through increases in clock
frequency (accounting for
roughly 80 percent of the
performance gains to date).”
Platform 2015
S. Y. Borkar et al., 2006
Intel Corporation
Intel Pentium
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Directions in Computing Technology
Problem with Uni-core Microprocessors
Rocket Nozzle
1000
Watts/cm2
Nuclear Reactor
Pentium 4
(Prescott)
100
Pentium 4
(Willamette)
10
Pentium III
Pentium II
Hot Plate
Pentium Pro
Pentium
i486
i386
1
1.5m
1.0m
0.7m
0.5m 0.35m 0.25m 0.18m 0.13m 0.1m 0.07m
Decreasing Feature Size
Increasing Chip Frequency
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Directions in Computing Technology
From Uni-core to Multi-core Processors
Intel’s Nehalem
• Modular
• Up to 8 cores
• 3 levels of cache
• Integrated
memory controller
• Multiple
QuickPath
Interconnects
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Directions in Computing Technology
“For the next several years
the only way to obtain
significant
increases
in
microprocessor performance
will be through increasing
use of parallelism:
dual core
quad core
Frequency (MHz)
Switch to Multicore Chips
Petascale Summer School
•
6-9 July 2010
–
–
–
•
Urbana, Illinois
8× in 2009-10,
16× in 2011-12,
and so on
Directions in Computing Technology
On to Many-core Chips
AMD Llano
(4 x86 cores + 480
stream processors)
NVIDIA Fermi
(512 cores)
Intel Teraflops Chip Intel Many Integrated Cores
(80 cores)
(>80 x86+ cores)
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Directions in Computing Technology
Recent Evolution of NVIDIA GPUs
GPU:
G80
GT200
Fermi
681 million
1,400 million
3,000 million
128
240
512
DP Floating Point
None
30 FMA/cycle
256 FMA/cycle
SP Floating Point
128 MAD/cycle
240 MAD/cycle
512 FMA/cycle
Shared Memory
16 KB/SM
16 KB/SM
16 or 48 KB/SM
L1 Cache
None
None
16 or 48 KB/SM
L2 Cache
None
None
768 KB
ECC Memory
No
No
Yes
Address Width
32-bit
32-bit
64-bit
Transistors
CUDA Cores
Peak DP performance = 256 FMA/cycle x 2 flops/FMA x 1.5 GHz = 768 GF
Peak SP performance = 512 FMA/cycle x 2 flops/FMA x 1.5 GHz = 1,536 GF
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Directions in Computing Technology
Fermi Streaming Multiprocessor
Architecture
• Streaming
Multiprocessor
• 16 SMs per chip
• Each SM has:
• 32 CUDA cores
• Floating point and integer
units for each core
• Fused multiply-add
instruction
• 16 load-store units
• 4 special function units
• Transcendental functions
(sin, cos, reciprocal, square
root)
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Directions in Computing Technology
AMD’s Fusion “Application Processing
Unit”
• Heterogeneous
Architecture
• x86 cores
• Streaming processors
• High Performance
Interconnect
• High Performance
Memory Controller
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters: From Terascale to
Petascale Computing
A computing system for solving the most challenging
compute-, memory- and data-intensive problems
Blue Waters
NSF Track 1 Solicitation
“The petascale HPC environment will enable investigations of
computationally challenging problems that require computing
systems capable of delivering sustained performance
approaching 1015 floating point operations per second
(petaflops) on real applications, that consume large amounts
of memory, and/or that work with very large data sets.”
Leadership-Class System Acquisition - Creating a Petascale
Computing Environment for Science and Engineering
NSF 06-573
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters
Computational Science and Engineering
Petascale computing will enable advances in a broad range of
science and engineering disciplines:
Molecular Science
Astronomy
Weather & Climate Forecasting
Health
Earth Science
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters
Desired Attributes of Petascale System
• Maximum Core Performance
…to minimize number of cores needed for a given performance
level, lessen impact of sections of code with limited scalability
• Low Latency, High Bandwidth Interconnect
…to enable science and engineering applications to scale to tens to
hundreds of thousands of cores
• Large, Fast Memories
…to solve the most memory-intensive problems
• Large, Fast I/O System and Data Archive
…to solve the most data-intensive problems
• Reliable Operation
…to enable the solution of Grand Challenge problems
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters
Building Blue Waters
Blue Waters will be the most powerful
computer in the world for scientific
research when it comes on line
in Summer of 2011.
Blue Waters
Blue Waters Building Block
32 IH server nodes
256 TF (peak)
32 TB memory
128 TB/s memory bw
4 Storage systems (>500 TB)
10 Tape drive connections
IH Server Node
Quad-chip Module
Power7 Chip
8 cores, 32 threads
L1, L2, L3 cache (32 MB)
Up to 256 GF (peak)
128 Gb/s memory bw
4 Power7 chips
128 GB memory
512 GB/s memory bw
1 TF (peak)
8 QCM’s (256 cores)
8 TF (peak)
1 TB memory
4 TB/s memory bw
8 Hub chips
Power supplies
PCIe slots
~10 PF Peak
~1 PF sustained
>300,000 cores
~1.2 PB of memory
>18 PB of disk storage
500 PB of archival storage
≥100 Gbps connectivity
Fully water cooled
Blue Waters is built from components that
can be used to build systems with a wide
range of capabilities—from servers to
beyond Blue Waters.
Hub Chip
1,128 GB/s bw
45 nm technology
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters
Comparison: Jaguar and Blue Waters
System Attribute
Vendor (Model)
Processor
ORNL
Jaguar (#1)
NCSA
Blue Waters
Cray (XT5)
AMD Opteron
IBM Power7
IBM (PERCS)
Peak Performance (PF)
Sustained Performance (PF)
2.3
?
Number of Cores/Chip
Number of Processor Cores
6
224,256
Amount of Memory (TB)
299
Amount of On-line Disk Storage (PB)
Sustained Disk Transfer (TB/sec)
5
0.24
Amount of
Archival
Storage
(PB)
Petascale
Summer School
•
6-9
July 2010
•
20
~4
1⅓
< 1½
Urbana, Illinois
~10
≳1
8
4 >300,000
~1,200
>3
>6
25
>18
>1.5
up to 500
Blue Waters Project
Critical Features of Blue Waters. I
• High Performance Compute Module
• SMP system
• Four Power7 chips
• Hub chip
• Performance: 1 TF
• Memory: 128 GB
• High Performance Interconnect
• High bandwidth, low latency
• Hub Chip/QCM: > 1 TB/sec/QCM
• Latency: ~ 1 msec
• Fully connected, two tier network
• Copper + optical links
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters Project
Critical Features of Blue Waters. II
• High Performance I/O and Data archive Systems
• Large storage subsystems
• On-line disks: > 18 PB (usable)
• Archival tapes: Up to 500 PB
• High sustained disk transfer rate: > 1.5 TB/sec (sustained)
• Fully integrated storage system: GPFS + HPSS
• General
• Hardware support for global shared memory
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters
National Petascale Computing Facility
Partners
EYP MCF/
Gensler
IBM
Yahoo!
• Energy Efficiency
• Modern Data Center
• 90,000+ ft2 total
• 30,000 ft2 raised floor
20,000 ft2 machine gallery
Petascale Summer School
•
• LEED certified Gold (goal: Platinum)
• PUE = 1.1–1.2
6-9 July 2010
•
Urbana, Illinois
Path to Exascale Computing
Although an exascale computer is at least 10 years away,
the issues being confronted will impact all systems beyond
Blue Waters
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Blue Waters
A Glimpse into the Future: Sequoia
System Attribute
NCSA
Blue Waters
LLNL
Sequoia
Vendor (Model)
Processor
IBM (PERCS)
IBM Power7
IBM BG/Q
IBM PowerPC
Peak Performance (PF)
Sustained Performance (PF)
~10
≳1
~20
?
Number of Cores/Chip
Number of Processor Cores
8
>300,000
16
~1,600,000
~1,200
~1,600
>18
>1.5
~50
0.5–1.0
Amount of Memory (TB)
Amount of On-line Disk Storage (PB)
Sustained Disk Transfer (TB/sec)
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Path from Petascale to Exascale
• Levels of concurrency
• Cores: 100s of thousands ➙100s of millions
• Threads: million ➙ billion
• Clock Rate of Core
• No significant increase
• Memory per Core
• 1-4 GB ➙ 10s–100s of MB
• Aggressive Fault Management in HW and SW
• Power Consumption
• 10 MW ➙ 40 MW – 150 MW
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Take Home Lessons
• Examine New Computing Technologies
• Computers of future will be based on many-core chips
• Details TBD, but may be heterogeneous
• Focus on Scalable Algorithms
• Only significant speed gains in future will come through
increased parallelization
• Explore New Programming Models
• Computing systems will be (are!) collections of SMPs
• Need to assess and improve MPI/OpenMP, UPC, CAF
• Enhance Reliability
• Systems level (e.g., virtualization)
• Applications level
Petascale Summer School
•
6-9 July 2010
•
Urbana, Illinois
Questions?
Private Sector Program Annual Meeting
•
12-14 May 2008
•
Urbana, Illinois