talk02 David Scott

Download Report

Transcript talk02 David Scott

HPC@Intel
Platforms and Technology
CCGSC
September 10, 2006
Dr. David Scott
Petascale Product
Line Architect
[email protected]
Legal Disclaimer
•
Information in this document is provided in connection with Intel® products.
•
No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by
this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes
no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of
Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or
infringement of any patent, copyright or other intellectual property right. Intel products are not intended for
use in medical, life saving, or life sustaining applications.
•
Intel may make changes to specifications and product descriptions at any time, without notice
•
Designers must not rely on the absence or characteristics of any features or instructions marked
"reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility
whatsoever for conflicts or incompatibilities arising from future changes to them.
•
This document contains information on products in the design phase of development. The information here
is subject to change without notice. Do not finalize a design with this information.
•
Intel Xeon™, Pentium® 4, Itanium®, Itanium 2, Prescott, Prestonia, Nocona, Jayhawk, Potomac, Tulsa, and
Dempsey processors may contain design defects or errors known as errata which may cause the product
to deviate from published specifications. Current characterized errata are available on request.
•
Contact your local Intel sales office or your distributor to obtain the latest specifications before placing
your product order.
•
Copies of documents which have an order number and are referenced in this document, or other Intel
literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's website at
<http://www.intel.com>.
Intel, Itanium, Xeon and Pentium are trademarks or registered trademarks of
Intel Corporation or its subsidiaries in the United States and other countries.
AGENDA
• New Processors
• New HPC focused platforms
• Technologies for the future
Core-Duo™ Processors
Let’s Take A Look Inside
Historical Driving Forces
Increased Performance
via Increased Frequency
Shrinking Geometry
100000
10000
Frequency
(MHz)
Feature
Size
(um)
1000
10
1
100
0.1
10
1
1970
1980
1990
1946
2000
20 Numbers
in Main Memory
2010
2020
1971
I4004 Processor
2300 Transistors
0.01
1970
1980
2005
1990
65nm
1B+ Transistors
2000
2010
2020
The Challenges
Power Limitations
Diminishing Voltage Scaling
1000
10
0.7um
0.5um
0.35um
CPU
Power 100
(W)
Supply
1
Voltage
(V)
~30%
0.25um
0.18um
0.13um
90nm
65nm
45nm
30nm
10
1990
0.1
1995
2000
2005
2010
2015
1990 1993 1997 2001 2005 2009
Power = Capacitance x Voltage2 x Frequency
also
Power ~ Voltage3
Intel® Core™ Microarchitecture
Low Power
Scalable
High Performance
Woodcrest
Intel® Wide
Dynamic
Execution
Intel®
Intelligent
Power
Capability
Intel®
Advanced
Smart Cache
Intel® Smart
Memory
Access
Intel®
Advanced
Digital Media
Boost
Server
Optimized
Conroe
Desktop
Optimized
65nm
Merom
Mobile
Optimized
*Graphics not representative of actual die photo or relative size
Intel® Wide Dynamic Execution
EACH CORE
EFFICIENT
14 STAGE
PIPELINE
DEEPER
BUFFERS
4 WIDE DECODE TO
EXECUTE
4 WIDE MICRO-OP
EXECUTE
MICRO
and
MACRO
FUSION
ENHANCED
ALUs
CORE 1
CORE 2
INSTRUCTION FETCH
AND PRE-DECODE
INSTRUCTION FETCH
AND PRE-DECODE
INSTRUCTION QUEUE
INSTRUCTION QUEUE
DECODE
DECODE
RENAME / ALLOC
RENAME / ALLOC
RETIREMENT UNIT
(REORDER BUFFER)
RETIREMENT UNIT
(REORDER BUFFER)
SCHEDULERS
SCHEDULERS
EXECUTE
EXECUTE
Intel® Intelligent Power Capability
Process
Coarse
Grained
Ultra
Fine
Grained
Transistor
• 65nm
• Strained Silicon
• Low-K Dielectric
• More Metal Layers
• Aggressive
Clock Gating
• Enhanced
Speed-Step
• Low VCC Arrays
• Blocks Controlled
Via Sleep
Transistors
• Low Leakage
Transistors
• Sleep
Transistors
*Graphics not representative of actual die photo or relative size
Intel Performance Leadership for Life Sciences
Geomeans of relative performance
“Woodcrest” single thread relative performance compared to Opteron*
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Intel outperforms AMD across
all applications tested
Gaussian(3,4) GAMESS(3,4)
Amber(3,4) GROMACS(2,4) NAMD(2,6)
HMMER(1,4)
BLAST(1,5)
ClustalWMPI(2,4)
Higher is better
Computational Chemistry
(1) Woodcrest: Dual-Core Intel® Xeon® processor, 2-socket sys., 3.0GHz, 4MB L2 cache, 4GB Memory
(2) Woodcrest: Dual-Core Intel® Xeon® processor, 2-socket sys., 3.0GHz, 4MB L2 cache, 8GB Memory
(3) Woodcrest: Dual-Core Intel® Xeon® processor, 2-socket sys., 3.0GHz, 4MB L2 cache, 16GB Memory
(4) Dual-Core AMD* Opteron* processor 280, 2-socket sys. 2.4GHz, 1MB L2 cache, 16GB Memory
(5) Dual-Core AMD* Opteron* processor 285, 2-socket sys. 2.6GHz, 1MB L2 cache, 4GB Memory
(6) AMD* Opteron* processor 252, 2-socket sys. 2.6GHz, 1MB L2 cache, 16GB Memory
Bioinformatics
Source: Intel Internal Measurement
* Other brands and names may be claimed as the property
of others.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel®
products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers
should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more
information on performance tests and on the performance of Intel products, reference
http://www.intel.com/performance/resources/benchmark_limitations.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104.
Core™ Microarchitecture Advances With
Quad Core
Energy
Efficient
Performance
4X
Quad Core
Clovertown
3X
Clovertown
H1 ‘07
Woodcrest
H2 ‘06
Server
2X
Dempsey MV H1 ‘06
Kentsfield
1X
Paxville DP H2 ‘05
Irwindale H1 ‘05
Desktop
DP Performance Per Watt
Comparison with SPECint_rate
at the Platform Level
Source: Intel®
*Graphics not representative of actual die photo or relative size
AGENDA
• New Processors
• New HPC focused
platforms
• Technologies for the future
Motivation
• Caretta & Port Townsend:
– Provide a higher memory BW / FLOP option than DP Xeon
– Provide a less expensive option than DP Xeon
• Atoka
– High Density DP solution
• Metrics
– Performance
– Core – we lead
– Bus – close (depends on STREAM binaries etc) + 2x cache size
– Performance / Watt
– We lead
– Performance / SqFt
– We match
– Performance / $
– We lead
Caretta Features
GbE
HPC BOARD FEATURES
 Single Intel® Pentium-D processor (Presler, Smithfield)
 Support for Pentium4 (CedarMill)
 Chipset: Mukilteo + ICH7
 4 DIMM (max 8GB) - DDR2 533/667 with U-ECC
 800 MHz FSB
 Integrated 2 port SATA2 with RAID 0/1
 2xGbE (TekoaE + Tabor)
 2x USB2 external
 Rear video & serial port




Internal headers: serial, 2xUSB2, I2C
Custom 5.95” x13”, 6 layer
Custom power connector
Client Management iAMT via TekoaE
Video
ICH
Memory
MCH
CPU
PortTownsend Features
HPC BOARD FEATURES
GbE
 Single Intel® PentiumD processor (Conroe, Kentsfield)
 Chipset: Mukilteo2 + ICH7
 4 DIMM (max 8GB) - DDR2 533/667 with U-ECC
 1066 FSB









PCIex8 – support for IB MemFree card & SFF GbE card
Integrated 2 port SATA2 with RAID 0/1
2xGbE (Tekoa + TekoaE)
2xUSB2 external (crash cart)
Rear video & serial port
Internal headers: serial (3pin), 2xUSB2, I2C
Custom 5.95” x13” , 6 layer
Custom power connector
Client Management iAMT via TekoaE
ICH
Memory
MCH
CPU
VRD
PCI-E
x8
AtokaV Features
VRD
HPC BOARD FEATURES
 Dual Intel® Xeon processor (WC, CTN)
 Chipset: Greencreek + ESB2
 8 FBD (max 32GB) - DDR2 533/667
 1333 FSB
 PCIex8 – slot
 Mellanox IB 4x DDR single port down
 Integrated 2 port SATA2 with RAID 0/1
 2xGbE (Gilgal)
 2xUSB2 external (crash cart)
 Rear video & serial port
 Internal headers: serial (3pin), 1xUSB2, I2C
 Custom 6.5” x16.5”
 Custom power connector
 Client Management via IPMI module / GbE port
 Support for 32Mbit flash & embedded Linux
CPU
CPU
MCH
Memory
ESB2
PCI-E
x8
GbE
IB
Pics
• PortTownse
nd – 1U
‘side by
side’
reference
chassis
Pics
• PortTownse
nd – 4U
Blade Can
• PortTownse
nd – AC
Blade
AGENDA
• New Processors
• New HPC focused platforms
• Technologies for the
future
Today’s Packaging
Technology
Multi-Chip Package
Wire-Bonded Stacked
Die
Flash
DRAM
CPU
DRAM
CPU
3D Stacking Research
Wafer Stacking
Metal lines on
backside of thin
wafer
Bonding Interface
Top
Thin
Wafer
ThruSilicon
Via
DRAM
CPU
Bottom
Wafer
Source: Intel
Bonding
Structures
3D Stacking Research
Die Stacking
Analog
Via
Die 7
Flash
Die 6
Die 5
Die 4
DRAM
Die 3
DRAM
Die 2
Die 1
CPU
Pkg. Substrate
Metal Pad
Source: Intel
Chip-to-Chip
Signaling Challenge
The Opportunity of Silicon Photonics
• Enormous ($ billions) CMOS infrastructure, process
learning, and capacity
– Draft continued investment in Moore’s law
• Potential to integrate multiple optical devices
• Micromachining could provide smart packaging
• Potential to converge computing & communications
To benefit from this optical wafers
must run alongside existing product.
Intel’s Silicon Photonics Research
First
Continuous
Silicon Laser
1GHz (Nature ‘04)
10 Gb/s (‘05)
(Nature 2/17/05)
First: Innovate to prove
silicon is a viable optical
material
Silicon Photonics
Filter
CMOS
Circuitry
Photodetector
Laser
Modulator
Passive
Alignment
Silicon Photonics Future Vision
Data Center
Fabrics
Chip-to-Chip
Interconnects
Chemical
Analysis
Backplane and Display
Interconnects
Medical
Lasers
Q&A