BG/P Draft Disclosure Deck - Institute of Network Coding
Download
Report
Transcript BG/P Draft Disclosure Deck - Institute of Network Coding
Overview of the Blue Gene supercomputers
Dr. Dong Chen
IBM T.J. Watson Research Center
Yorktown Heights, NY
Supercomputer trends
Blue Gene/L and Blue Gene/P architecture
Blue Gene applications
Terminology:
FLOPS = Floating Point Operations Per Second
Giga = 10^9, Tera = 10^12, Peta = 10^15, Exa = 10^18
Peak speed v.s. Sustained Speed
Top 500 list (top500.org):
Based on the Linpack Benchmark:
Solve dense linear matrix equation, A x = b
A is N x N dense matrix, total FP operations, ~ 2/3 N^3 + 2 N^2
Green 500 list (green500.org):
Rate Top 500 supercomputers in FLOPS/Watt
Supercomputer speeds over time
1.00E+17
Next BG/Q
Gen Super
Red Storm
Jaguar
Roadrunner
BG/P
SX-9
NASA
BG/L
Pleiades
TACC
BG/LBG/P
ASCSX-8R
Purple
SX-8
Columbia
EarthNASA
Simulator
Red
Storm
Thunder
ASCI
Q
ASCI White
SR8000SX-6
BlueASCI
SX-5
Pacific
Red
RedBlue
Option
T3EMountain
ASCI
SX-4
CP-PACS
Paragon
NWT
T3D
CM-5
SX-3/44
Delta
i860(MPPs)
Test
Y-MP
Cray
2
SX-2
S810/20
X-MP
X-MP
Cyber 205
Peak Speed (flops)
1.00E+14
1.00E+11
Cray 1
CDC Star 100
ILLIAV IV
CDC7600
1.00E+08
CDC6600
IBM Stretch
1.00E+05
IBM7090
IBM704
IBM701
Univac
Eniac (vacuum
tubes)
1.00E+02
1940
1950
1960
1970
1980
Year
1990
2000
2010
2020
CMOS Scaling in Petaflop Era
Three decades of exponential
clock rate (and electrical power!)
growth has ended
Instruction Level Parallelism
(ILP) growth has ended
Single threaded performance
improvement is dead (Bill Dally)
Yet Moore’s Law continues in
transistor count
Industry response: Multi-core
(i.e. double the number of cores
every 18 months instead of the
clock frequency (and power!)
Source: “The Landscape of Computer Architecture,” John Shalf, NERSC/LBNL, presented at ISC07, Dresden, June 25, 2007
4
© 2007 IBM Corporation
TOP500 Performance Trend
Over the long haul IBM has demonstrated continued leadership in various
TOP500 metrics, even as the performance continues it’s relentless growth.
10000
IBM has most aggregate performance for last 22 lists
IBM has #1 system for 10 out of last 12 lists (13 in total)
IBM has most in Top10 for last 14 lists
IBM has most systems 14 out of last 22 lists
32.43 PF
1.759 PF
1000
433.2 TF
100
24.67 TF
10
1
0.1
0.01
0.001
Source:
www.top500.org
0.0001
Ju
N n93
o
Ju v93
N n94
o
Ju v94
N n95
o
Ju v95
N n96
o
Ju v96
N n97
o
Ju v97
N n98
o
Ju v98
N n99
o
Ju v99
N n00
o
Ju v00
N n01
o
Ju v01
N n02
ov
Ju 02
N n03
o
Ju v03
N n04
o
Ju v04
N n05
ov
Ju 05
N n06
o
Ju v06
N n07
o
Ju v07
N n08
ov
Ju 08
N n09
o
Ju v09
n1
0
Rmax Performance (GFlops)
100000
Blue Square Markers Indicate IBM Leadership
President Obama Honors IBM's Blue Gene
Supercomputer With National Medal Of Technology
And Innovation
Ninth time IBM has received nation's most prestigious tech award
Blue Gene has led to breakthroughs in science, energy efficiency and analytics
WASHINGTON, D.C. - 18 Sep 2009: President Obama recognized IBM (NYSE: IBM) and its Blue
Gene family of supercomputers with the National Medal of Technology and Innovation, the
country's most prestigious award given to leading innovators for technological achievement.
President Obama will personally bestow the award at a special White House ceremony on
October 7. IBM, which earned the National Medal of Technology and Innovation on eight other
occasions, is the only company recognized with the award this year.
Blue Gene's speed and expandability have enabled business and science to address a wide
range of complex problems and make more informed decisions -- not just in the life sciences, but
also in astronomy, climate, simulations, modeling and many other areas. Blue Gene systems
have helped map the human genome, investigated medical therapies, safeguarded nuclear
arsenals, simulated radioactive decay, replicated brain power, flown airplanes, pinpointed tumors,
predicted climate trends, and identified fossil fuels – all without the time and money that would
have been required to physically complete these tasks.
The system also reflects breakthroughs in energy efficiency. With the creation of Blue Gene, IBM
dramatically shrank the physical size and energy needs of a computing system whose processing
speed would have required a dedicated power plant capable of generating power to thousands of
homes.
The influence of the Blue Gene supercomputer's energy-efficient design and computing model
can be seen today across the Information Technology industry. Today, 18 of the top 20 most
energy efficient supercomputers in the world are built on IBM high performance computing
technology, according to the latest Supercomputing 'Green500 List' announced by Green500.org
in July, 2009.
Blue Gene Roadmap
• BG/L (5.7 TF/rack) – 130nm ASIC (1999-2004GA)
– 104 racks, 212,992 cores, 596 TF/s, 210 MF/W; dual-core system-on-chip,
– 0.5/1 GB/node
• BG/P (13.9 TF/rack) – 90nm ASIC (2004-2007GA)
– 72 racks, 294,912 cores, 1 PF/s, 357 MF/W; quad core SOC, DMA
– 2/4 GB/node
– SMP support, OpenMP, MPI
• BG/Q (209 TF/rack)
– 20 PF/s
IBM Blue Gene/P Solution: Expanding the Limits of Breakthrough Science
Blue Gene Technology Roadmap
Performance
Blue Gene/Q
Power Multi Core
Scalable to 100 PF
Blue Gene/P
(PPC 450 @ 850MHz)
Scalable to 3.56 PF
Blue Gene/L
(PPC 440 @ 700MHz)
Scalable to 595 TFlops
2004
2007
2010
Note: All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent
goals and objectives only.
IBM® System Blue Gene®/P Solution
© 2007 IBM Corporation
BlueGene/L System Buildup
System
64 Racks, 64x32x32
Rack
32 Node Cards
Node Card
180/360 TF/s
64 TB
(32 chips 4x4x2)
16 compute, 0-2 IO cards
2.8/5.6 TF/s
1 TB
Compute Card
2 chips, 1x2x1
Chip
90/180 GF/s
32 GB
2 processors
5.6/11.2 GF/s
2.0 GB
2.8/5.6 GF/s
4 MB
© 2007 IBM Corporation
BlueGene/L Compute ASIC
PLB (4:1)
32k/32k L1
256
128
L2
440 CPU
4MB
EDRAM
“Double FPU”
snoop
256
32k/32k L1
440 CPU
I/O proc
128
Multiported
Shared
SRAM
Buffer
Shared
L3 directory
for EDRAM
L3 Cache
1024+
or
144 ECC
Memory
L2
256
Includes ECC
256
“Double FPU”
128
Ethernet
Gbit
Gbit
Ethernet
IBM System Blue Gene®/P Solution
JTAG
Access
JTAG
Torus
6 out and
6 in, each at
1.4 Gbit/s link
Collective
3 out and
3 in, each at
2.8 Gbit/s link
Global
Interrupt
4 global
barriers or
interrupts
DDR
Control
with ECC
128 +16 ECC
DDR
512/1024MB
IBM Research | BlueGene Systems
Double Floating-Point Unit
Quadword Load Data
P0
S0
Primary
FPR
Secondary
FPR
P31
S31
– Two replicas of a standard
single-pipe PowerPC FPU
–2 x 32 64-bit registers
– Attached to the PPC440 core
using the APU interface
–Issues instructions across
APU interface
–Instruction decode performed
in Double FPU
–Separate APU interface from
LSU to provide up to 16B data
for load and store
–Datapath width is 16 bytes
–Feeds two FPUs with 8 bytes
each every cycle
– Two FP multiply-add
operations per cycle
–2.8 GF/s peak
Quadword Store Data
© 2006 IBM Corporation
Blue Gene L/Memory Charateristics
Memory:
L1
L2
SRAM
L3
Main store
Node
System (64k nodes)
32kB/32kB
2kB per processor
16kB
4MB (ECC)/node
512MB (ECC)/node
32TB
Bandwidth:
L1 to Registers
L2 to L1
L3 to L2
Main (DDR)
11.2 GB/s Independent R/W and Instruction
5.3 GB/s Independent R/W and Instruction
11.2 GB/s
5.3GB/s
Latency:
L1 miss, L2 hit
L2 miss, L3 hit
L2 miss (main store)
13 processor cycles (pclks)
28 pclks (EDRAM page hit/EDRAM page miss)
75 pclks for DDR closed page access (L3
disabled/enabled)
Blue Gene Interconnection Networks
3 Dimensional Torus
–
–
–
–
–
Interconnects all compute nodes (65,536)
Virtual cut-through hardware routing
1.4Gb/s on all 12 node links (2.1 GB/s per node)
Communications backbone for computations
0.7/1.4 TB/s bisection bandwidth, 67TB/s total bandwidth
Global Collective Network
– One-to-all broadcast functionality
– Reduction operations functionality
– 2.8 Gb/s of bandwidth per link; Latency of tree traversal
2.5 µs
– ~23TB/s total binary tree bandwidth (64k machine)
– Interconnects all compute and I/O nodes (1024)
Low Latency Global Barrier and Interrupt
– Round trip latency 1.3 µs
Control Network
– Boot, monitoring and diagnostics
Ethernet
– Incorporated into every node ASIC
– Active in the I/O nodes (1:64)
– All external comm. (file I/O, control, user interaction, etc.)
System
BlueGene/P
72 Racks, 72x32x32
Cabled 8x8x16
Rack
32 Node Cards
Node Card
1 PF/s
144 (288) TB
(32 chips 4x4x2)
32 compute, 0-1 IO cards
13.9 TF/s
2 (4) TB
Compute Card
435 GF/s
64 (128) GB
1 chip, 20
DRAMs
Chip
4 processors
13.6 GF/s
8 MB EDRAM
13.6 GF/s
2.0 GB DDR2
(4.0GB 6/30/08)
BlueGene/P compute ASIC
32k I1/32k D1
PPC450
Snoop
filter
128
Multiplexing switch
L2
Double FPU
32k I1/32k D1
PPC450
snoop
Snoop
filter
128
256
Shared L3
Directory
for eDRAM
256
512b data
72b ECC
4MB
eDRAM
L3 Cache
or
On-Chip
Memory
w/ECC
L2
Double FPU
32
PPC450
Snoop
filter
128
L2
Double FPU
32k I1/32k D1
PPC450
Double FPU
Shared
SRAM
Multiplexing switch
32k I1/32k D1
Shared L3
Directory
for eDRAM
w/ECC
Snoop
filter
512b data
72b ECC
4MB
eDRAM
L3 Cache
or
On-Chip
Memory
128
L2
Arb
DMA
Hybrid
PMU
w/ SRAM
256x64b
JTAG
Access
JTAG
Torus
6 3.4Gb/s
bidirectional
Collective
3 6.8Gb/s
bidirectional
Global
Barrier
Ethernet
10 Gbit
4 global
barriers or
interrupts
10 Gb/s
DDR-2
Controller
w/ ECC
DDR-2
Controller
w/ ECC
13.6 Gb/s
DDR-2 DRAM bus
Blue Gene/P Memory Characteristics
Memory:
L1
L2
L3
Main store
Node
32kB/32kB
2kB per processor
8MB (ECC)/node
2-4GB (ECC)/node
Bandwidth:
L1 to Registers
L2 to L1
L3 to L2
Main (DDR)
6.8 GB/s instruction Read
6.8 GB/s data Read
6.8 GB/s Write
5.3 GB/s Independent R/W and Instruction
13.6 GB/s
13.6 GB/s
Latency:
L1 hit
L1 miss, L2 hit
L2 miss, L3 hit
L2 miss (main store)
3 processor cycles (pclks)
13 pclks
46 pclks (EDRAM page hit/EDRAM page miss)
104 pclks for DDR closed page access (L3
disabled/enabled)
BlueGene/P Interconnection Networks
3 Dimensional Torus
Interconnects all compute nodes (73,728)
Virtual cut-through hardware routing
3.4 Gb/s on all 12 node links (5.1 GB/s per node)
0.5 µs latency between nearest neighbors, 5 µs to the
farthest
MPI: 3 µs latency for one hop, 10 µs to the farthest
Communications backbone for computations
1.7/3.9 TB/s bisection bandwidth, 188TB/s total bandwidth
Collective Network
One-to-all broadcast functionality
Reduction operations functionality
6.8 Gb/s of bandwidth per link per direction
Latency of one way tree traversal 1.3 µs, MPI 5 µs
~62TB/s total binary tree bandwidth (72k machine)
Interconnects all compute and I/O nodes (1152)
Low Latency Global Barrier and Interrupt
Latency of one way to reach all 72K nodes 0.65 µs,
MPI 1.6 µs
November 2007 Green 500
Linpack GFLOPS/W
0.40
0.37
0.35
0.30
0.25
0.21
0.20
0.15
0.15
0.09
0.10
0.08
0.05
0.05
0.00
0.05
0.02
BG/L
BG/P
SGI
8200
HP
Cray
Cluster Sandia
Cray
Cray
ORNL NERSC
JS21
BSC
Relative power, space and cooling efficiencies
(Published specs per peak performance)
400%
300%
200%
IBM BG/P
100%
0%
Racks/TF
kW/TF
Sun/Constellation
IBM System Blue Gene®/P Solution
Sq Ft/TF
Cray/XT4
Tons/TF
SGI/ICE
System Power Efficiency
1.68
Linpack GF/Watt
1.80
1.60
1.40
1.20
1.00
0.80
0.60
0.40
0.20
0.00
Source: www.top500.org
0.958
0.829
0.635
0.44
0.37
0.23
0.25
0.25
BG/L BG/P SGI Road Cray Tian Fujitsu Titech BG/Q
2005 2007 NASA- Runner XT5 He-1A K 2010 2010 Prot
AMES 2008 2009 2010
2010
2010
HPCC 2009
IBM BG/P 0.557 PF peak (40 racks)
Class 1: Number 1 on G-Random Access (117 GUPS)
Class 2: Number 1
Cray XT5 2.331 PF peak
Class 1: Number 1 on G-HPL (1533 TF/s)
Class 1: Number 1 on EP-Stream (398 TB/s)
Number 1 on G-FFT (11 TF/s)
Source: www.top500.org
Main Memory Capacity per Rack
4500
4000
3500
3000
2500
2000
1500
1000
500
0
LRZ
IA64
Cray ASC
XT4 Purple
RR
BG/P
Sun
TACC
SGI
ICE
Peak Memory Bandwidth per node (byte/flop)
SGI ICE
Sun TACC
Itanium 2
POWER5
Cray XT5 4 core
Cray XT3 2 core
Roadrunner
BG/P 4 core
0
0.5
1
1.5
2
Main Memory Bandwidth per Rack
14000
12000
10000
8000
6000
4000
2000
0
LRZ
Itanium
Cray
XT5
ASC
Purple
RR
BG/P
Sun
TACC
SGI ICE
Interprocessor Peak Bandwidth per node (byte/flop)
Roadrunner
Dell Myrinet
x86 cluster
Sun TACC
Itanium 2
Power5
NEC ES
Cray XT4 2c
Cray XT5 4c
BG/L,P
0
0.2
0.4
0.6
0.8
Failures per Month per TF
From:http://acts.nersc.gov/events/Workshop2006/slides/Simon.pdf
Execution Modes in BG/P per Node
Next Generation HPC
node
core
core
core
core
Hardware Abstractions Black
Software Abstractions Blue
Quad Mode (VNM)
4 Processes
1 Thread/Process
P0
P2
T0
T0
P1
P3
T0
T0
– Many Core
– Expensive Memory
– Two-Tiered Programming Model
SMP Mode
1 Process
1-4 Threads/Process
Dual Mode
2 Processes
1-2 Threads/Process
P0
P1
P0
P0
T0
T0
T0
T2
T0
T1
T1
T1
T3
T1
P0
P1
T0
T0
P0
T0
P0
T2
T0
T1
IBM System Blue Gene®/P Solution
© 2007 IBM Corporation
Blue Gene Software Hierarchical Organization
Compute nodes dedicated to running user application, and almost
nothing else - simple compute node kernel (CNK)
I/O nodes run Linux and provide a more complete range of OS services
– files, sockets, process launch, signaling, debugging, and termination
Service node performs system management services (e.g., partitioning,
heart beating, monitoring errors) - transparent to application software
Front-end nodes, file system
10 Gb Ethernet
1 Gb Ethernet
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
Noise measurements (from Adolphy Hoisie)
Blue Gene/P System Architecture
tree
Service Node
I/O Node
Front-end
Nodes
System
Console
DB2
MMCS
File
Servers
C-Node 0
C-Node n
fs client
app
app
ciod
CNK
CNK
Linux
Functiona
l Ethernet
(10Gb)
torus
I/O Node
I2C
LoadLeveler
Control
Ethernet
(1Gb)
FPGA
BG/P Software Overview | IBM Confidential
C-Node 0
C-Node n
fs client
app
app
ciod
CNK
CNK
Linux
JTAG
© 2007 IBM Corporation
BG/P Software Stack Source Availability
Hardware
Firmware
System
Message
Layer
MPI-IO
CIOD
CNK
Messaging SPIs
Diags
Linux kernel
GPFS (1)
Node SPIs
Common Node
Services
Hardware init, RAS,
Recovery, Mailbox
Bootloader
Compute node
Compute node
Compute node
Compute node
Compute node
Compute node
Compute node
totalviewd
System
GPSHMEM
PerfMon
BG Nav
mpirun
Bridge API
HPC
Toolkit
CSM
ISV
Schedulers,
debuggers
Loadleveler
High Level Control System (MMCS)
Partitioning, Job management and
monitoring,
RAS, Administrator interfaces, CIODB
DB2
Low Level Control System
Power On/Off, Hardware probe,
Hardware init, Parallel monitoring
Parallel boot, Mailbox
Firmware
GA
ESSL
User/Sched
XL Runtime
Open Toolchain Runtime
MPI
Service Node/Front End Nodes
Link card
Link card
I/O node
I/O node
Hardware
Application
I/O and Compute Nodes
Key:
Service card
SN
Node card
Node card
FEN
FEN
Notes:
Closed. No source provided. Not buildable.
Closed. Buildable source. No redistribution of derivative works allowed under license.
1. GPFS does have an open build license available
which customers may utilize.
New open source reference implementation licensed under CPL.
New open source community under CPL license. Active IBM participation.
Existing open source communities under various licenses. BG code will be contributed and/or new sub-community started..
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
Areas Where BG is Used
Weather/Climate Modeling
(GOVERNMENT / INDUSTRY / UNIVERSITIES)
Computational Fluid Dynamics – Airplane and Jet Engine Design, Chemical Flows,
Turbulence (ENGINEERING / AEROSPACE)
Seismic Processing : (PETROLEUM, Nuclear industry)
Particle Physics : (LATTICE Gauge QCD)
Systems Biology – Classical and Quantum Molecular Dynamics
(PHARMA / MED INSURANCE / HOSPITALS / UNIV)
Modeling Complex Systems
(PHARMA / BUSINESS / GOVERNMENT / UNIVERSITIES)
Large Database Search
Nuclear Industry
Astronomy
(UNIVERSITIES)
Portfolio Analysis via Monte Carlo
(BANKING / FINANCE / INSURANCE)
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
LLNL Applications
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
IDC Technical Computing Systems Forecast
Bio Sci
Genomics, proteomics, pharmacogenomics, pharma research, bioinformatics, drug discovery.
Chem Eng
Chemical Engineering: Molecular modeling, computational chemistry, process design
CAD
Mechanical CAD, 3D Wireframe – mostly graphics
CAE
Computer Aided Engineering – Finite Element modeling, CFD, crash, solid modeling (Cars, Aircraft, …)
DCC&D
Digital Content Creation and Distribution
Econ Fin
Economic and Financial Modeling, econometric modeling, portfolio management, stock market modeling.
EDA
Electronic Design and Analysis: schematic capture, logic synthesis, circuit simulation, system modeling
Geo Sci
Geo Sciences and Geo Engineering: seismic analysis, oil services, reservoir modeling.
Govt Lab
Government Labs and Research Centers: government-funded R&D
Defense
Surveillance, Signal Processing, Encryption, Command, Control, Communications, Intelligence, Geospatial Image Management.
Weapon Design
Software Engineering
Development and Testing of Technical Applications
Technical Management
Product Data management, Maintenance Records management, Revision Control, Configuration Management
Academic
University Based R&D
Weather
Atmospheric Modeling, Meteorology, Weather Forecasting
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
What is driving the need for more HPC cycles?
Genome Sequencing
Materials Science
Biological
Modeling
Pandemic Research
Fluid Dynamics
Drug Discovery
Financial Modeling
Climate Modeling
BG/P Software Overview | IBM Confidential
Geophysical Data Processing
© 2007 IBM Corporation
HPC Use Cases
Examples
Capability
–
–
–
–
Calculations not possible on small machines
Usually these calculations involve systems where many disparate
scales are modeled.
One scale defines required work per “computation step”
A different scale determines total time to solution.
–
Protein Folding:
–
• 10-15.secs – 1 sec
Refined grids in Weather forecasting:
–
• 10km today -> 1km in a few years
Full Simulation of Human Brain
Useful as proofs of concept
Complexity
–
–
–
–
Calculations which seek to combine multiple components to
produce an integrated model of a complex system.
Individual components can have significant computational
requirements.
Coupling between components requires that all components be
modeled simultaneously.
As components are modeled, changes in interfaces are
constantly transferred between the components
Examples
–
Water Cycle Modeling in Climate/Environment
–
Geophysical Modeling for Oil Recovery
–
Virtual Fab
–
Multisystem / Coupled Systems Modeling
Critical to manage multiple scales in physical systems
Understanding
–
–
Repetition of a basic calculation many times with different model
parameters, inputs and boundary conditions.
Goal is to develop a clear understanding of behavior /
dependencies / and sensitivities of the solution over a range of
parameters
Examples
–
Multiple independent simulations of Hurricane
paths to develop probability estimates of possible
paths, possible strength,
–
Thermodynamics of Protein / Drug Interactions
–
Sensitivity Analysis in Oil Reservoir Modeling
–
Optimization of Aircraft Wing Design,
Essential to develop parameter understanding, and sensitivity analysis
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
Capability
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
Complexity: Modern Integrated Water Management
Sensors
Partner
Ecosystem
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Climatologists
Environmental Observation Systems Companies
Sensors Companies
Environmental Sciences Consultants
Engineering Services Companies.
Subject Matter Experts
Universities
Adv Water Mgmt
Reference IT
Architecture
Climate
Hydrological
Meteorological
Ecological
Enabling IT
–
–
–
HPC
Visualization
Data management
Model Strategy
Physical Models
–
–
–
–
Physical
Chemical
Biological
Environmental
In-situ
Remotely sensed
Planning and placement
–
–
–
–
Selection
Integration & coupling
Validation
Temporal/spatial scales
Analyses
–
–
–
Stochastic model & stats
Machine learning
Optimization
HistoricalBG/P
– Present
– Near
future – Seasonal – Long term – Far future
Software Overview
| IBM Confidential
© 2007 IBM Corporation
Overall Efficiencies of BG Applications - Major Scientific Advances
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Qbox (DFT) LLNL:
CPMD IBM:
MGDC
ddcMD (Classical MD) LLNL:
New ddcMD LLNL:
MDCASK LLNL, SPaSM LANL:
LAMMPS SNL:
RXFF, GMD:
Rosetta UW:
AMBER
Quantum Chromodynamics CPS:
MILC, Chroma
sPPM (CFD) LLNL:
Miranda, Raptor LLNL:
DNS3D
NEK5 (Thermal Hydraulics) ANL:
HYPO4D, PLB (Lattice Boltzmann)
ParaDis (dislocation dynamics) LLNL:
WRF (Weather) NCAR:
POP (Oceanography):
HOMME (Climate) NCAR:
GTC (Plasma Physics) PPPL:
Nimrod GA:
FLASH (Supernova Ia)
Cactus (General Relativity)
DOCK5, DOCK6
Argonne v18 Nuclear Potential
“Cat” Brain
56.5%;
30%
27.6%
17.4%
2006 Gordon-Bell Award
highest scaling
highest scaling
2005 Gordon-Bell Award
2007 Gordon-Bell Award
highest scaling
highest scaling
highest scaling
highest scaling
highest scaling
highest scaling
highest scaling
highest scaling
highest scaling
64 L racks, 16 P
64 L
32 P
64 L
104 L
64 L
64 L, 32 P
64 L
20 L
4L
64L, 32P
32 P
64 L
64 L
32 P
32 P
32 P
64 L
64 L
8L
32 L, 24Ki P
20 L, 32 P
30%;
2006 GB Special Award
18%;
highest scaling
highest scaling
highest scaling
highest scaling
highest scaling
highest scaling
2010 Bonner Prize
2009 GB Special Award
64 L, 40 P
16 L, 32 P
32 P
32 P
36 P
22%
10%;
12%;
7%;
17%
16%
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation
High Performance Computing Trends
Three distinct phases .
–
–
–
Past: Exponential growth in processor performance mostly through CMOS technology advances
Near Term: Exponential (or faster) growth in level of parallelism.
Long Term: Power cost = System cost ; invention required
1PF: 2008
Curve is not only indicative of peak performance but also performance/$
10PF: 2011
Supercomputer Peak Performance
1E+17
Blue Gene/Q
Peak Speed (flops)
1E+14
1E+11
Doubling time
1E+8
1E+5
1E+2
1940
Blue Gene/P
ASCI Purple
Blue Gene/L
Red Storm
Earth
Blue Pacific
ASCI White
SX-5
ASCI Red Option
ASCI
Red
T3E
SX-4
NWT
CP-PACS
CM-5
Paragon
T3D
Delta
SX-3/44
i860 (MPPs)
VP2600/10
CRAY-2 SX-2
S-810/20
X-MP4 Y-MP8
= 1.5 yr.
Cyber 205
X-MP2 (parallel vectors)
CRAY-1
CDC STAR-100 (vectors)
CDC 7600
ILLIAC IV
CDC 6600 (ICs)
IBM Stretch
Long Term
Near Term
Past
IBM 7090 (transistors)
IBM 704
IBM 701
UNIVAC
ENIAC (vacuum tubes)
1950
1960
1970
1980
1990
2000
2010
2020
Year Introduced
BG/P Software Overview | IBM Confidential
© 2007 IBM Corporation