CMOS VLSI Design CMOS VLSI Design 4th Ed.
Download
Report
Transcript CMOS VLSI Design CMOS VLSI Design 4th Ed.
Lecture 21:
Packaging,
Power, &
Clock
Outline
Packaging
Power Distribution
Clock Distribution
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
2
Packages
Package functions
– Electrical connection of signals and power from
chip to board
– Little delay or distortion
– Mechanical connection of chip to board
– Removes heat produced on chip
– Protects chip from mechanical damage
– Compatible with thermal expansion
– Inexpensive to manufacture and test
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
3
Package Types
Through-hole vs. surface mount
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
4
Chip-to-Package Bonding
Traditionally, chip is surrounded by pad frame
– Metal pads on 100 – 200 mm pitch
– Gold bond wires attach pads to package
– Lead frame distributes signals in package
– Metal heat spreader helps with cooling
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
5
Advanced Packages
Bond wires contribute parasitic inductance
Fancy packages have many signal, power layers
– Like tiny printed circuit boards
Flip-chip places connections across surface of die
rather than around periphery
– Top level metal pads covered with solder balls
– Chip flips upside down
– Carefully aligned to package (done blind!)
– Heated to melt balls
– Also called C4 (Controlled Collapse Chip Connection)
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
6
LGA Package
1
1366 gold-plated pads
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
7
Package Parasitics
Use many VDD, GND in parallel
– Inductance, IDD
Package
Signal Pads
Signal Pins
Chip
VDD
Bond Wire
Board
VDD
Package
Capacitor
Chip
21: Package, Power, and Clock
Lead Frame
Chip
GND
CMOS VLSI Design 4th Ed.
Board
GND
8
Heat Dissipation
60 W light bulb has surface area of 120 cm2
Itanium 2 die dissipates 130 W over 4 cm2
– Chips have enormous power densities
– Cooling is a serious challenge
Package spreads heat to larger surface area
– Heat sinks may increase surface area further
– Fans increase airflow rate over surface area
– Liquid cooling used in extreme cases ($$$)
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
9
Thermal Resistance
DT = qjaP
– DT: temperature rise on chip
– qja: thermal resistance of chip junction to ambient
– P: power dissipation on chip
Thermal resistances combine like resistors
– Series and parallel
qja = qjp + qpa
– Series combination
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
10
Example
Your chip has a heat sink with a thermal resistance
to the package of 4.0° C/W.
The resistance from chip to package is 1° C/W.
The system box ambient temperature may reach
55° C.
The chip temperature must not exceed 100° C.
What is the maximum chip power dissipation?
(100-55 C) / (4 + 1 C/W) = 9 W
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
11
Temperature Sensor
Monitor die temperature and throttle performance if it
gets too hot
Use a pair of pnp bipolar transistors
– Vertical pnp available in CMOS
Ic I se
qVBE
kT
VBE
DVBE VBE1 VBE 2
kT I c
ln
q
Ic
I c 2 kT I c1 kT
kT I c1
ln
ln
ln m
ln
q Is
Is q Ic2 q
Voltage difference is proportional to absolute temp
– Measure with on-chip A/D converter
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
12
Power Distribution
Power Distribution Network functions
– Carry current from pads to transistors on chip
– Maintain stable voltage with low noise
– Provide average and peak power demands
– Provide current return paths for signals
– Avoid electromigration & self-heating wearout
– Consume little chip area and wire
– Easy to lay out
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
13
Power Requirements
VDD = VDDnominal – Vdroop
Want Vdroop < +/- 10% of VDD
Sources of Vdroop
– IR drops
– L di/dt noise
IDD changes on many time scales
Power
Max
clock gating
Average
Min
Time
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
14
IR Drop
A chip draws 24 W from a 1.2 V supply. The power
supply impedance is 5 mW. What is the IR drop?
IDD = 24 W / 1.2 V = 20 A
IR drop = (20 A)(5 mW) = 100 mV
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
15
IR Introduced Noise
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
16
Power Distribution
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
17
Power Distribution
Low level distribution is in metal 1.
Power has to be strapped in higher layers of metal.
The spacing is set by IR drop, electromigration, and
inductive effects.
Always use multiple contacts on straps.
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
18
Power and Ground Distribution
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
19
3 Metal Layers (EV4)
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
20
4 Metal Layers (EV5)
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
21
6 Metal Layers (EV6)
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
22
Power Supply Droop
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
23
L di/dt Noise
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
24
L di/dt Noise
A 1.2 V chip switches from an idle mode consuming
5W to a full-power mode consuming 53 W. The
transition takes 10 clock cycles at 1 GHz. The
supply inductance is 0.1 nH. What is the L di/dt
droop?
DI = (53 W – 5 W)/(1.2 V) = 40 A
Dt = 10 cycles * (1 ns / cycle) = 10 ns
L di/dt droop = (0.1 nH) * (40 A / 10 ns) = 0.4 V
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
25
Dealing with L di/dt
Separate power pins for I/O pads and chip core.
Multiple power and ground pins.
Careful selection of positions of power and ground
pins on package.
Increase rise and fall times as much as possible.
Schedule current consuming transitions.
Use advanced packaging technologies.
Use decoupling capacitances on the board.
Use decoupling capacitances on chip.
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
26
Choosing the Right Pin
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
27
Decoupling Capacitance
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
28
Bypass Capacitors
Need low supply impedance at all frequencies
Ideal capacitors have impedance decreasing with w
Real capacitors have parasitic R and L
– Leads to resonant frequency of capacitor
2
10
1
10
1 mF
0.25 nH
impedance
0.03 W
10
10
10
0
-1
-2
10
4
10
5
10
6
10
7
10
8
10
9
10
10
frequency (Hz)
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
29
De-coupling Capacitor Ratios
EV4
– total effective switching capacitance = 12.5nF
– 128nF of de-coupling capacitance
– de-coupling/switching capacitance ~ 10x
EV5
– 13.9nF of switching capacitance
– 160nF of de-coupling capacitance
EV6
– 34nF of effective switching capacitance
– 320nF of de-coupling capacitance -- not enough!
CMOS VLSI Design 4th Ed.
Source: B. Herrick (Compaq)
EV6 De-coupling Capacitance
Design for DIdd= 25 A @ Vdd = 2.2 V, f = 600
MHz
– 0.32-µF of on-chip de-coupling capacitance was
added
• Under major busses and around major gridded clock
drivers
• Occupies 15-20% of die area
– 1-µF 2-cm2 Wirebond Attached Chip Capacitor
(WACC) significantly increases “Near-Chip” decoupling
• 160 Vdd/Vss bondwire pairs on the WACC minimize
inductance
CMOS VLSI Design 4th Ed.
Source: B. Herrick (Compaq)
EV6 WACC
389 Signal - 198 VDD/VSS Pins
389 Signal Bondwires
395 VDD/VSS Bondwires
320 VDD/VSS Bondwires
WACC
Microprocessor
Heat Slug
587 IPGA
CMOS VLSI Design 4th Ed.
Source: B. Herrick (Compaq)
Power System Model
Power comes from regulator on system board
– Board and package add parasitic R and L
– Bypass capacitors help stabilize supply voltage
– But capacitors also have parasitic R and L
Simulate system for time and frequency responses
Voltage
Regulator
VDD
Bulk
Capacitor
Printed Circuit
Board Planes
Ceramic
Capacitor
Board
21: Package, Power, and Clock
Package
and Pins
Package
Capacitor
Solder
Bumps
On-Chip
Capacitor
Chip
On-Chip
Current Demand
Package
CMOS VLSI Design 4th Ed.
33
Frequency Response
Multiple capacitors in parallel
– Large capacitor near regulator has low impedance
at low frequencies
– But also has a low self-resonant frequency
– Small capacitors near chip and on chip have low
impedance at high frequencies
Choose caps to get low impedance at all frequencies
impedance
frequency (Hz)
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
34
Example: Pentium 4
Power supply impedance for Pentium 4
– Spike near 100 MHz caused by package L
Step response to sudden supply current chain
– 1st droop: on-chip bypass caps
– 2nd droop: package capacitance
– 3rd droop: board capacitance
[Xu08]
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
[Wong06]
35
Distributed Model
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
36
Charge Pumps
Sometimes a different supply voltage is needed but
little current is required
– 20 V for Flash memory programming
– Negative body bias for leakage control during sleep
Generate the voltage on-chip with a charge pump
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
37
Energy Scavenging
Ultra-low power systems can scavenge their energy
from the environment rather than needing batteries
– Solar calculator (solar cells)
– RFID tags (antenna)
– Tire pressure monitors powered by vibrational
energy of tires (piezoelectric generator)
Thin film microbatteries deposited on the chip can
store energy for times of peak demand
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
38
Capacitive Cross Talk
CMOS VLSI Design 4th Ed.
Capacitive Cross Talk Dynamic Node
VDD
CLK
C XY
Y
In 1
In 2
In 3
CY
PDN
X
2.5 V
0V
CLK
3 x 1 mm overlap: 0.19 V disturbance
CMOS VLSI Design 4th Ed.
Capacitive Cross Talk Driven Node
0.5
0.45
0.4
X
VX
RY
CXY
0.3
Y
CY
tr↑
0.35
tXY = RY(CXY+CY)
0.25
0.2
0.15
V (Volt)
0.1
0.05
0
0
0.2
0.4
0.6
t (nsec)
Keep time-constant smaller than rise time
CMOS VLSI Design 4th Ed.
0.8
1
Dealing with Capacitive Cross Talk
Avoid floating nodes
Protect sensitive nodes
Make rise and fall times as large as possible
Differential signaling
Do not run wires together for a long distance
Use shielding wires
Use shielding layers
CMOS VLSI Design 4th Ed.
Shielding
Shielding
wire
GND
VDD
GND
Substrate ( GND )
CMOS VLSI Design 4th Ed.
Shielding
layer
Cross Talk and Performance
Cc
- When neighboring lines switch
in opposite direction of victim
line, delay increases
DELAY DEPENDENT UPON
ACTIVITY IN NEIGHBORING
WIRES
Miller Effect
- Both terminals of capacitor are switched in opposite directions
(0 Vdd, Vdd 0)
- Effective voltage is doubled and additional charge is needed
(from Q=CV)
CMOS VLSI Design 4th Ed.
Impact of Cross Talk on Delay
r is ratio between capacitance to GND and to neighbor
CMOS VLSI Design 4th Ed.
Dealing with Cross-Talk
Evaluate and improve
Constructive layout generation
Predictable structures
Avoid worst case patterns
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
46
Structured Predictable Interconnect
V
S G S V S
S
G
S
V
S
V
Example: Dense Wire Fabric ([Sunil Kathri])
Trade-off:
• Cross-coupling capacitance 40x lower, 2% delay variation
• Increase in area and overall capacitance
Also: FPGAs, VPGAs
CMOS VLSI Design 4th Ed.
Clock Distribution
On a small chip, the clock distribution network is just
a wire
– And possibly an inverter for clkb
On practical chips, the RC delay of the wire
resistance and gate load is very long
– Variations in this delay cause clock to get to
different elements at different times
– This is called clock skew
Most chips use repeaters to buffer the clock and
equalize the delay
– Reduces but doesn’t eliminate skew
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
48
Example
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
49
Example
Skew comes from differences in gate and wire delay
– With right buffer sizing, clk1 and clk2 could ideally
arrive at the same time.
– But power supply noise changes buffer delays
– clk2 and clk3 will always see RC skew
gclk
3 mm
clk1
1.3 pF
21: Package, Power, and Clock
3.1 mm
clk2
0.4 pF
CMOS VLSI Design 4th Ed.
0.5 mm
clk3
0.4 pF
50
Clock Uncertainties
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
51
Clock Nonidealities
Clock skew
– Spatial variation in temporally equivalent clock
edges; deterministic + random, tSK
Clock jitter
– Temporal variations in consecutive edges of the
clock signal; modulation + random noise
– Cycle-to-cycle (short-term) tJS
– Long term tJL
Variation of the pulse width
– Important for level sensitive clocking
52
CMOS VLSI Design 4th Ed.
Review: Skew Impact
F1
Q1
Combinational Logic
D2
Tc
clk
tpcq
Q1
tskew
tpdq
tsetup
D2
clk
t pd Tc t pcq tsetup tskew
Q1
CL
clk
D2
tcd thold tccq tskew
F2
sequencing overhead
clk
F2
clk
F1
Ideally full cycle is
available for work
Skew adds sequencing
overhead
Increases hold time too
tskew
clk
thold
Q1 tccq
D2
21: Package, Power, and Clock
tcd
CMOS VLSI Design 4th Ed.
53
Solutions
Reduce clock skew
– Careful clock distribution network design
– Plenty of metal wiring resources
Analyze clock skew
– Only budget actual, not worst case skews
– Local vs. global skew budgets
Tolerate clock skew
– Choose circuit structures insensitive to skew
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
54
Clock Dist. Networks
Ad hoc
Grids
H-tree
Hybrid
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
55
H-Trees
Fractal structure
– Gets clock arbitrarily close to any point
– Matched delay along all paths
Delay variations cause skew
A
A and B might see big skew
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
B
56
More realistic H-tree
[Restle98]
57
CMOS VLSI Design 4th Ed.
Itanium 2 H-Tree
Four levels of buffering:
– Primary driver
– Repeater
– Second-level
clock buffer
– Gater
Route around
obstructions
Repeaters
Typical SLCB
Locations
Primary Buffer
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
58
Itanium 2 Repeaters
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
59
Spines
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
60
Pentium IV Clock Spines
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
61
Pentium IV Clock Spines
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
62
Clock Grids
Use grid on two or more levels to carry clock
Make wires wide to reduce RC delay
Ensures low skew between nearby points
But possibly large skew across die
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
63
The Grid System
GCL K
Driver
GCLK
Driver
Driver
GCLK
•No rc-matching
•Large power
Driver
GCL K
64
CMOS VLSI Design 4th Ed.
Alpha Clock Grids
Alpha 21064
Alpha 21164
Alpha 21264
PLL
gclk grid
Alpha 21064
21: Package, Power, and Clock
gclk grid
Alpha 21164
CMOS VLSI Design 4th Ed.
Alpha 21264
65
Example: DEC Alpha 21164
Clock Frequency: 300 MHz - 9.3 Million Transistors
Total Clock Load: 3.75 nF
Power in Clock Distribution network : 20 W (out of 50)
Uses Two Level Clock Distribution:
• Single 6-stage driver at center of chip
• Secondary buffers drive left and right side
clock grid in Metal3 and Metal4
Total driver size: 58 cm!
66
CMOS VLSI Design 4th Ed.
21164 Clocking
tcycle= 3.3ns
2 phase single wire clock,
distributed globally
tskew = 150ps
trise = 0.35ns
2 distributed driver channels
Clock waveform
final drivers
pre-driver
Location of clock
driver on die
67
–
–
–
–
Reduced RC delay/skew
Improved thermal distribution
3.75nF clock load
58 cm final driver width
Local inverters for latching
Conditional clocks in caches to
reduce power
More complex race checking
Device variation
CMOS VLSI Design 4th Ed.
Clock Drivers
68
CMOS VLSI Design 4th Ed.
Clock Skew in Alpha Processor
69
CMOS VLSI Design 4th Ed.
EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS
tcycle= 1.67ns
trise = 0.35ns
Global clock waveform
tskew = 50ps
2 Phase, with multiple conditional
buffered clocks
– 2.8 nF clock load
– 40 cm final driver width
PLL
70
Local clocks can be gated “off” to
save power
Reduced load/skew
Reduced thermal issues
Multiple clocks complicate race
checking
CMOS VLSI Design 4th Ed.
21264 Clocking
71
CMOS VLSI Design 4th Ed.
EV6 Clock Results
ps
300
305
310
315
320
325
330
335
340
345
ps
5
10
15
20
25
30
35
40
45
50
72
GCLK Skew
GCLK Rise Times
(at Vdd/2 Crossings)
(20% to 80% Extrapolated to 0% to 100%)
CMOS VLSI Design 4th Ed.
EV7 Clock Hierarchy
Active Skew Management and Multiple Clock Domains
+ widely dispersed
drivers
NCLK
(Mem Ctrl)
DLL
DLL
DLL
+ DLLs compensate static
and low-frequency
variation
L2R_CLK
(L2 Cache)
PLL
L2L_CLK
(L2 Cache)
+ divides design and
verification effort
GCLK
(CPU Core)
SYSCLK
- DLL design and
verification is added
work
+ tailored clocks
CMOS VLSI Design 4th Ed.
73
Hybrid Networks
Use H-tree to distribute clock to many points
Tie these points together with a grid
Ex: IBM Power4, PowerPC
– H-tree drives 16-64 sector buffers
– Buffers drive total of 1024 points
– All points shorted together with grid
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
74
Clock Gaters
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
75
Adaptive Deskewing
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
76
Self-timed and Asynchronous Design
Functions of clock in synchronous design
1) Acts as completion signal
2) Ensures the correct ordering of events
Truly asynchronous design
1) Completion is ensured by careful timing analysis
2) Ordering of events is implicit in logic
Self-timed design
1) Completion ensured by completion signal
2) Ordering imposed by handshaking protocol
77
CMOS VLSI Design 4th Ed.
Self-Timed Pipelined Datapath
Req
Req
HS
Ack
In
Done
F1
tpF1
78
HS
Ack
Start
R1
Req
Start
R2
Req
HS
Ack
Done
F2
ACK
Start
R3
tpF2
CMOS VLSI Design 4th Ed.
Done
F3
tpF3
Out
Completion Signal Generation
LOGIC
In
Out
NETWORK
Start
DELAY MODULE
Using Delay Element (e.g. in memories)
79
CMOS VLSI Design 4th Ed.
Done
Completion Signal Generation
Using Redundant Signal Encoding
80
CMOS VLSI Design 4th Ed.
Completion Signal in DCVSL
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
81
Self-Timed Adder
VDD
VDD
Start
C0
C0
P0
C1
G0
P1
C2
G1
P2
C3
G2
P3
Start
C4
C4
G3
Start
VDD
C0
P0
K0
C1
P1
K1
C2
P2
C3
K2
P3
C4
C4
K3
Start
(a) Differential carry generation
82
C4
C4
C3
C3
C2
C2
C1
C1
Start
Start
C0
Done
CMOS VLSI Design 4th Ed.
(b) Completion signal
Completion Signal Using Current Sensing
21: Package, Power, and Clock
CMOS VLSI Design 4th Ed.
83