ITRS-2001 Design Summary - Computer Science and Engineering

Download Report

Transcript ITRS-2001 Design Summary - Computer Science and Engineering

ITRS-2001 Design ITWG
July 18, 2001
International Technology Roadmap for Semiconductors Conference
Preliminary Work in Progress – Not for Publishing
System Drivers Chapter
•
•
•
•
Define IC products that drive mfg, design technologies
Replace the 1999 SOC Chapter
ORTCs + SDs = “consistent framework for tech requirements”
Four drivers
–
–
–
–
(HVC) MPU (USA)
(HVC) DRAM (Korea?)
M/S (Europe)
SOC (Japan/USA, same as “ASIC-LP”)
• Each driver section
• Nature, evolution, formal definition of this driver
• What market forces apply to this driver ?
• What technology elements (process, device, design) does this
drive?
• Key figures of merit, and futures
• Working (in progress) text material (handout): MPU, SOC
• M/S material: presented by European Design TWG
Mixed-Signal Design
Roadmap for ITRS
Ralf Brederlow°, Stephane Donnay+,
Joseph Sauerer#, Maarten Vertregt*,
Piet Wambacq+, and Werner Weber°
°Infineon Technologies, +IMEC, #Fraunhofer-Instutitut for
Integrated Circuits , *Philips Semiconductor
Overview
• Today, the digital part of circuits is most critical for performance
and is dominating chip area
• But in many new IC-products the mixed-signal part becomes
important for performance and cost
• This shift in paradigms leads to the need for a definition of the
analog boundary conditions in the design part of the ITRS
roadmap
• The goal is to define criteria for needs of future analog/RF circuit
performance and compare it to device parameters:
choose critical and important analog/RF circuits
identify circuit performance needs
and related device parameter needs
Mixed-Signal System Driver Roadmap
• Mixed-signal circuits increasingly critical to system performance, cost
– Define “AMS boundary conditions” for Design technology roadmap by choosing
critical and important analog/RF circuits, then identifying circuit performance
needs and related device parameter needs
• Based on figures of merit for four basic analog building blocks, can
estimate future device parameter needs
Roadmap for basic
analog / RF
circuits
Roadmap for
device parameters
(needs)
A/D-Converter
Low-Noise Amplifier
Voltage-Controlled
Oscillator
Power
Amplifier
Lmin
2001
…
2015
…
…
analog transistor gm/gds
…
…
Resolution(bit)
Mixed-Signal System Drivers
22
super
1 kW
1W
20 1mW
audio
18
audio
GSM Basestation
16
GSM
14
Cable
12
DTV
1 mW
UMTS
Storage
10
telephony
Bluetooth
8
Intercon6
video nectivity
4
1kHz 10kHz100kHz1MHz 10MHz100MHz1GHz
Signal Bandwidth
System drivers for mass markets can be identified from
the FoM approach
Summary
Figures of merit for basic analog, mixed-signal and RF-circuits
are defined as a measure for progress in the mixed-signal
performance
They are projected into the future by estimating technology
needs and performance progress form the past
Device parameters important for continuous progress in mixedsignal design in future technology nodes are derived from these
figures of merit
The roadmap includes all necessary information for estimating
future technology and system drivers in mixed-signal design
“System Driver” Models Are Changing
Example – MPU
• Old MPU model – 3 flavors
• Cost-performance at introduction (CP-Intro)
– 340+ mm2 die, small L1 cache (32KB in 180nm)
• Cost-performance at production (CP-Prod)
– 170+ mm2 die, shrink of previous generation’s CP-Intro chip
• High-performance (HP)
– 310+ mm2 die, same as CP-Prod but with large L2 cache (512KB in 180nm)
• SRAM and logic transistor counts double every generation
• New MPU model - 2 flavors
• Cost-performance at production (CP)
– 140 mm2 die, “Pentium 4” / “desktop”
• High-performance at production (HP)
– 310 mm2 die, “Itanium” / “server”
• Both CP, HP have multiple cores (“helper engines”), on-board L3 cache, …
– Multi-cores == more dedicated, less general-purpose logic; driven by power
and reuse considerations; reflect convergence of MPU and SOC
• “Moore’s Law” still applies to tx counts, but NOT to frequency
– doubling is each generation, NOT each 18 months
Example Supporting Analyses
(MPU Diminishing Returns)
• Pollack’s Rule
– In a given process technology, new uArch takes 2-3x area of old (last
generation) uArch, and provides only 40% more performance
– Backup: SPECint, SPECfp per MHz, SPECint per Watt all decreasing
• Power knob running out
–
–
–
–
Speed == Power
10W/cm2 limit for convection cooling, 50W/cm2 limit for forced-air cooling
Large currents, large power surges on wakeup
Cf. 140A supply current, 150W total power at 1.2V Vdd for EV8 (Compaq)
• Speed knob running out (new clock frequency model)
– Historically, 2x clock frequency every process generation (see Backup Slides)
• 1.4x from device scaling (but running into t_ox, other limits – see Device discussion)
• 1.4x from fewer logic stages (from 40-100 down to around 14 FO4 INV delays)
– Clocks cannot be generated with period < 6-8 FO4 INV delays
– Pipelining overhead (1-1.5 FO4 INV delay for pulse-mode latch, 2-3 for FF)
– Around 14-16 FO4 INV delays is limit for clock period (L1 $ access, 64b add)
• Cannot continue 2x freq per generation trend in ITRS
New Layout Density Models
• Semi-custom Logic: Avg size of 4t gate = 32MP2 = 320F2
–
–
–
–
MP = lower-level contacted metal pitch
F = min feature size (technology node)
32 = 8 tracks standard-cell height times 4 tracks width (average NAND2)
Additional whitespace factor = 2x (i.e., 100% overhead)
• Custom Logic: 1.25x ASIC density
• SRAM: (used in MPU)
–
–
–
–
bitcell area (units of F^2) is near flat: 223.19*F (um) + 97.748
peripheral overhead = 60%
memory content is increasing (driver: power) and increasingly fragmented
will see paradigm shifts in architecture/stacking; eDRAM, 1-T SRAM, …
• Significant SRAM density increase, slight Logic density
decrease, compared to 1999 ITRS
– 130nm node: old ASIC logic density = 13M tx/cm2, new = 11.6M tx/cm2
– 130nm node: old SRAM density = 70M tx/cm2, new = 140M tx/cm2
– Chief impact: power densities, logic-memory balance on chip
SOC-LP Model
• Power gap
– Must reduce dynamic and static power to avoid zero logic content limit
– Hits low-power SOC before hits MPU
– SOC degree of freedom: low-power (not high-perf) process
• SOC-LP model reconciled with ASIC-LP (Japan) model
– Physical bottom gate length lags high-performance devices by 2 years
• Adopted in Grenoble meeting
– Technology node used in density must be backed out accordingly
– Many accompanying device parameter changes
•
•
•
•
Vth increased (up to .3 x Vdd limit)
Ig, Ioff constant at 100pA/um (L(Operating)P), 1pA/um (L(STandby)P)
Tox higher
CV/I higher, clock frequencies lower (by 10x from MPU frequencies)
• SOC-LP driver: low-power PDA
– Composition: CPU cores, embedded cores, SRAM/eDRAM
– Roadmap for IO bandwidth, processing power, GOPS/mW efficiency
– Die size grows at 20% per node
Outline of ITRS DESIGN Chapter
• Context
– Scope of Design Technology
– High-level summary of complexities (at level of “issues”)
– Cost, productivity, quality, and other metrics of Design Technology
• Overview of Needs
– Derived from system drivers (e.g., power- or cost-driven design)
• Summary of Difficult Challenges (handout)
• Detailed Statements of Needs, Potential Solutions
– Design Process, System-Level, Functional Verification,
Logic/Physical/Circuit, Test
Design Cost and Quality Requirement
• Design cost of “largest ASIC” rises despite major DT innovations
• Other Dataquest numbers confirm memory content rising
• Currently seeking metric, data, requirements for design quality
(Dataquest, 2001) Cost Metrics Forecast
$100,000,000,000
Design Cost for Largest Possible ASIC
Same Cost RTL Methodology Only
$10,000,000,000
$1,000,000,000
$100,000,000
$10,000,000
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
Design Cost Requirement
• “Largest possible ASIC” design cost model
•
•
•
•
•
•
engineer cost per year increases 5% per year ($181,568 in 1990)
EDA tool cost per year increases 3.9% per year ($99,301 in 1990)
#Gates in largest ASIC design per ORTCs (.25M in 1990, 250M in 2005)
%Logic Gates constant at 70%
#Engineers / Million Logic Gates decreasing from 250 in 1990 to 5 in 2005
Productivity due to 8 Design Technology innovations (3.5 of which are still
unavailable) : RTL methodology; In-house P&R; Tall-thin engineer; Smallblock reuse; Large-block reuse; IC implementation suite; Intelligent
testbench; ES-level methodology
• Small refinements: (1) whether 30% memory content is fixed; (2) modeling
increased amount of large-block reuse (not just the ability to do large-block
reuse). No discussion of other design NRE (mask cost, etc.).
• #Engineers per ASIC design still rising (44 in 1990 to 875 in
2005), despite assumed 50x improvement in designer
productivity
Design Quality Requirement
• “Normalized transistor” quality model under
development
•
•
•
•
•
speed, power, density in a given technology
analog vs. digital
custom vs. semi-custom vs. generated
first-silicon success
other: simple / complex clocking, verification/test effort
and coverage, manufacturing cost, adherence to
schedule, …
• Design process quality model: early in development
Backup
Device Roadmap Changes
• Cf. Process Integration, Devices and Structures (PIDS) Chapter
• CV/I device delay metric: historically decreases by 17%/year
– Since frequency improvement from shorter pipelines no longer available,
perhaps we do need to keep scaling CV/I …
– Bottom line: PIDS is running up against limits of planar CMOS, and is
shifting at least some of the pain to “design/architecture improvements”
• Continuing CV/I trend necessitates huge growth in Ioff
• Subthreshold Ioff at room temperature increases from 0.01 uA/um in 2001 to
10 uA/um at end of ITRS (22nm node)
• Ioff increases by at least order of magnitude at ~100 deg C operating temps
• Static power becomes a huge problem: multi-Vt, multi-Vdd, substrate biasing,
constant-throughput power minimization, etc. must be coherently and
simultaneously applied/optimized by automatic tools
• Also necessitates aggressive reduction in tox
• Physical tox thickness hovers at < 1.4nm (down to 1.0nm) starting in 2001,
even assuming arrival of high-k gate dielectrics starting in 2004
• Implies huge variability mitigation challenges for Design Technology
Assembly/Packaging Roadmap Changes
• MPU pad counts (Tables 3a/3b of 2000 ITRS ORTC Chap.)
flat from 2001-2005, while chip current draw increases 64%
• Effective bump pitch roughly constant at 350mm throughout
roadmap
– Bump/pad counts scale with chip area only, do not increase with technology
demands (IR drop, L*di/dt)
–  metal resources needed to control <10% IR drop skyrocket since Ichip and
wiring resistance increase  challenge for Design Technology
– Later technologies (30-40nm) also have too few bumps to carry maximum
current draw (e.g., 1250 Vdd pads at 30nm with bump pitch of 250mm can
each carry 150mA  187.5A max capability but Ichip/Vdd > 300A
• A&P Rationale: cost control (puts pain onto Design)
• Design Rationalization: must introduce power constraints
– ITRS2001 will have strong power-constrained focus
• Cost of liquid cooling, refrigeration, etc. impractical anyway
• 30-50 W/cm2 limit for forced-air cooling with fins
• MPU power dissipation capped at 150W for entire ITRS; MPU chip area
held constant (more area can’t be used well within 150W power budget)
• Design DOFs for Power Reduction: see Backup Slides
Big Picture
• ITRS takes Moore’s Law as a constraint
• Problem: We signed up for the “wrong” Moore’s Law
– 2x frequency, 2x xtors,bits every node  power, utility contradictions
– Each increment of performance is more and more costly
• Compounding problems
– no architecture awareness (2x memory, 2x logic xtors in lock-step)
– no application awareness (e.g., low-power networked-embedded SOC)
– planar CMOS-centric (no DGFET, FinFET in requirements)
– uneven acknowledgment of cost (mask NRE, design cost, cost of technology
development, manufacturing cost, …)
• New in 2001: Can Design solve it? Can Designers help?
– PIDS : 17%/year improvement in CV/I metric  punt Ioff, Rds, …
– A&P : bump pitch improves < chip area  punt IR drop, power
– Interconnect : what total variability can Designers tolerate?
2001: Design Technology Better Integrated
With Other Supporting Technologies
• Problem: Design has always been “metric-free”
– Metric  “red brick wall”  requirement for R&D investment
• Goal 1: show red bricks in Design Technology
• Goal 2: shift red bricks from other supporting technologies
– e.g., lithography CD variability requirement  solved by new Design
techniques that can better handle variability
– e.g., mask data volume requirement  solved by Design/Mfg
interfaces and flows that pass functional requirements, verification
knowledge to mask writing and inspection
– e.g., Simplex “X initiative”  as much impact as copper ?
• But..
– Need metrics of design cost, design quality
– Need serious validation/participation from EDA, system, ASIC
companies
Need to Beef Up…
• Test roadmap discussion is missing
• Cost of test will soon exceed cost of manufacturing
• At-speed test stresses tester technology
• How solid is treatment of BIST, analog test, SOC test, etc.?
• Cost (big hole in ITRS)
• Manufacturing cost, NRE cost (design, mask, …), technology development
cost (who should solve a given red brick wall?)
• Key challenges for EDA (with respect to ITRS)
•
•
•
•
Circuit/layout optimizations in the face of manufacturing variability
System cost-driven design technology
Holistic analysis, management of power (both dynamic and static)
Circuit- and methodology-level IP: global signaling and synchronization,
off-chip IO; power delivery and management
• Metrics, needs roadmap for quality/cost of design and design process
• Verification and test
• Software
New On-Chip Max Clock Frequency Model
• Flat at 16 FO4 INV delays
– FO4 INV delay = delay of inverter driving load equal to 4x its input cap
= roughly 14x CV/I device delay metric in ITRS PIDS Chapter
• No local interconnect in the model
– negligible, and scales with device performance
• No (buffered) global interconnect in the model, either
– was unrealistically fast in ITRS99 model
– global interconnects are pipelined (clock frequency is set by time
needed to complete local computation loops, not time for global
communication - cf. Pentium-4 and Alpha-21264)
– Note: interconnect delay per se is not a problem (!)
• Clock period decreases from 26 FO4 delays in 2001
(Pentium-4) at historical rates, then flattens at 16 FO4 delays
A-Factor for SRAM Cell Size
(square feature size)
SRAM “A-Factors” for Simple 6T SRAM Cell using
Microprocessor Logic CMOS Process Technology
200
180
160
140
120
100
80
60
40
20
0
19
M
19
M
19
M
)
4
.8
7)
5
p.
7)
0,
)
7)
9
.1
94
19
)
94
,p
96
,p
98
M
ED
,I
D
IE
l,
D
IE
l,
a
ol
or
ot
te
(In
te
(In
D
IE
l,
6
.5
0
20
.p
00
M
ED
,I
20
M
a
ol
or
ot
D
IE
l,
te
(In
(M
te
(In
(M
n
ro
n
ro
n
ro
n
ro
ic
m
ic
m
ic
m
ic
3m
0.
35
0.
25
0.
18
0.
n
ro
n
ro
ic
m
ic
m
13
0.
13
0.
)
71
)
DRAM half-pitch (F)
A-Factor (A*F2)
0.13micron (Intel, IEDM2000.p.567))
143.7
0.13micron (Motorola, IEDM2000,p.571))
146.74
0.18micron (Intel, IEDM1998,p.197)
172.53
0.25micron (Intel, IEDM1996,p.847)
164.16
0.35micron (Intel, IEDM1994)
167.3
0.3micron (Motorola, IEDM1994)
175.6