Some Thoughts on Technology and Strategies for Petaflops
Download
Report
Transcript Some Thoughts on Technology and Strategies for Petaflops
Some Thoughts on
Technology and Strategies for
Petaflops
Possible paths to Petaflops
• Traditional Commodity Clusters
• Leverage Moore’s law on GP microprocessors
• Interconnect and memory bandwidth problems
• Type C machines
• DARPA HPCS paths (e.g. Cascade etc.)
• Embedded systems based Clusters
• QCDOC one example
• BG/L another example
Rick Stevens
Argonne Chicago
Beyond Commodity Clusters
• Improved design capability
• Small groups can design SoCs
• Small groups can gain access to state of the art
fabrication capabilities
• Design cycles are getting shorter thanks to
increasing availability of off-the-shelf IP
• Blue Logic, MIPS, etc.
• QCDOC example
Rick Stevens
Argonne Chicago
Rick Stevens
Argonne Chicago
Hardware/Software Co-design
• Application kernels
• Simple “FORTRAN” like C code - well
behaved basic blocks with performance
requirement annotations
• Compiler builds performance model for each basic
block
• Decision point based on performance estimate
• Compile for GPU or synthesize logic/FGPA code
• Generate glue code/runtime
Rick Stevens
Argonne Chicago
Special purpose SoCs
• Networking Processing Units
• Core of fast IP switches and routers
• Many companies producing 10Gbps components and
moving towards 40 Gbps parts
• DSPs
• Cell phone base stations.. Signal processing and array on
a chip processors
• Example is 2 GHz, 175 Million transistors 64 processor
DSP array, several hundred dollars a chip in quantities of
1,000.
Rick Stevens
Argonne Chicago
Graphics Accelerators
• NVIDIA Geforce4 example
• > 100 M transistors
• High-speed (QDR) RAM interface > 10 GBps
• Moving towards General purpose processors
• Cg programming language (programmable shaders)
• Evolving to become faster than the main CPU on
a commodity based node
• Pentium or Itanium2 process becomes a service
processor?
Rick Stevens
Argonne Chicago
Extendable Cores
• Possible target for HPC Hardware/Software Codesign
• Provides a reconfigurable node platform
• Xilinx virtex-pro
• Multiple PowerPC cores (1-4)
• Millions of gates of FPGA
• Clock rates lag high-performance chips
• Other vendors producing similar things
• MIPS cores, SPARClite cores, etc.
Rick Stevens
Argonne Chicago
Billion Transistor Dies by 2005/6
• Design challenges and opportunities
• Many 32 bit cores available < 500,000 transistors
• Several 64 bit cores available < 2,000,000
transistors
• Complete SoC libraries becoming available (e.g.
Blue Logic, etc.)
• Unprecedented opportunity for semi-custom
node architectures based on SoC technologies
Rick Stevens
Argonne Chicago
Design Tools are Improving
• We can start to think in terms similar to desktop
publishing from 20 years ago
• Mass customization will become possible but:
• What design Macros are needed ?
• How to involve algorithms and applications developers in
the design process ?
• How to connect with systems software (OS, runtime,
libraries)?
Rick Stevens
Argonne Chicago
Evolution of Commodity Clusters
I/O
Commodity Network
GPU/Node
…..
SoCs
GPU/Node
SoCs
…..
Rick Stevens
High-Performance Interconnect
O(1000) nodes
GP services
O(100K) nodes
Semi-custom or
Reconfigurable
Argonne Chicago
Systems Software for SoCs
• Embedded Processor Systems Software
• DSP: real-time OS/Runtime ~40K on chip
FLASH ROM (shadow RAM), off chip
extensions for future
• NPUs: real-time runtime support < 100K
typically, some general purpose co-processors
(Linux typically used in Juniper)
• Graphics processors on chip runtime support
upgradeable via device drivers
Rick Stevens
Argonne Chicago
A Few Recommendations
• Comprehensive applications studies
• To determine feasibility of acceleration via semi-custom
SoC/CLoCs
• To understand what OS functions are actually required
for full HPC applications
• Establish some design challenges
• Pick several core algorithms (besides lattice gauge) and do
some paper designs to validate the possible advantages of
SoC based approaches
• An augmented cluster testbed
• GP Linux cluster with SoC/CLoC based compute
backends
Rick Stevens
Argonne Chicago