Transcript Lecture 7

Lecture 7:
Power
Outline
 Power and Energy
 Dynamic Power
 Static Power
7: Power
CMOS VLSI Design 4th Ed.
2
Power and Energy
 Power is drawn from a voltage source attached to
the VDD pin(s) of a chip.
 Instantaneous Power: P(t )  I (t )V (t )
 Energy:
T
E   P(t )dt
0
 Average Power:
7: Power
T
E 1
Pavg    P(t )dt
T T0
CMOS VLSI Design 4th Ed.
3
Power in Circuit Elements
PVDD t   I DD t VDD
VR2  t 
PR  t  
 I R2  t  R
R


dV
EC   I  t V  t  dt   C V  t  dt
dt
0
0
VC
 C  V  t dV  12 CVC2
0
7: Power
CMOS VLSI Design 4th Ed.
4
Charging a Capacitor
 When the gate output rises
– Energy stored in capacitor is
2
EC  12 CLVDD
– But energy drawn from the supply is


EVDD   I  t VDD dt   CL
0
0
dV
VDD dt
dt
VDD
 dV  C V
– Half the energy from VDD is dissipated in the pMOS
transistor as heat, other half stored in capacitor
 When the gate output falls
– Energy in capacitor is dumped to GND
– Dissipated as heat in the nMOS transistor
 CLVDD
2
L DD
0
7: Power
CMOS VLSI Design 4th Ed.
5
Switching Waveforms
 Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz
7: Power
CMOS VLSI Design 4th Ed.
6
Switching Power
T
Pswitching
1
  iDD (t )VDD dt
T 0
T
VDD

iDD (t )dt

T 0
VDD

Tfsw CVDD 
T
 CVDD 2 fsw
VDD
iDD(t)
fsw
C
7: Power
CMOS VLSI Design 4th Ed.
7
Activity Factor
 Suppose the system clock frequency = f
 Let fsw = af, where a = activity factor
– If the signal is a clock, a = 1
– If the signal switches once per cycle, a = ½
 Dynamic power:
Pswitching  aCVDD2 f
7: Power
CMOS VLSI Design 4th Ed.
8
Short Circuit Current
 When transistors switch, both nMOS and pMOS
networks may be momentarily ON at once
 Leads to a blip of “short circuit” current.
 < 10% of dynamic power if rise/fall times are
comparable for input and output
 We will generally ignore this component
7: Power
CMOS VLSI Design 4th Ed.
9
Power Dissipation Sources
 Ptotal = Pdynamic + Pstatic
 Dynamic power: Pdynamic = Pswitching + Pshortcircuit
– Switching load capacitances
– Short-circuit current
 Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD
– Subthreshold leakage
– Gate leakage
– Junction leakage
– Contention current
7: Power
CMOS VLSI Design 4th Ed.
10
Power Dissipation
 Power dissipation breakdown in the Niagra 2
processor (Sun-8 core – 84W)
7: Power
CMOS VLSI Design 4th Ed.
11
Dynamic Power Example
 1 billion transistor chip
– 50M logic transistors
• Average width: 12 l
• Activity factor = 0.1
– 950M memory transistors
• Average width: 4 l
• Activity factor = 0.02
– 1.0 V 65 nm process
– C = 1 fF/mm (gate) + 0.8 fF/mm (diffusion)
 Estimate dynamic power consumption @ 1 GHz.
Neglect wire capacitance and short-circuit current.
7: Power
CMOS VLSI Design 4th Ed.
12
Solution
Clogic   50 106  12l  0.025m m / l 1.8 fF / m m   27 nF
Cmem   950 106   4l  0.025m m / l 1.8 fF / m m   171 nF
Pdynamic  0.1Clogic  0.02Cmem  1.0  1.0 GHz   6.1 W
2
7: Power
CMOS VLSI Design 4th Ed.
13
Dynamic Power Reduction
2
P

a
CV
 switching
DD f
 Try to minimize:
– Activity factor
– Capacitance
– Supply voltage
– Frequency
7: Power
CMOS VLSI Design 4th Ed.
14
Activity Factor Estimation
 Let Pi = Prob(node i = 1)
– Pi = 1-Pi
 ai = Pi * Pi
 Completely random data has P = 0.5 and a = 0.25
 Data is often not completely random
– e.g. upper bits of 64-bit words representing bank
account balances are usually 0
 Data propagating through ANDs and ORs has lower
activity factor
– Depends on design, but typically a ≈ 0.1
7: Power
CMOS VLSI Design 4th Ed.
15
Switching Probability
7: Power
CMOS VLSI Design 4th Ed.
16
Example
 A 4-input AND is built out of two levels of gates
 Estimate the activity factor at each node if the inputs
have P = 0.5
7: Power
CMOS VLSI Design 4th Ed.
17
Example
 Compare the two cases below:
7: Power
CMOS VLSI Design 4th Ed.
18
Example
7: Power
CMOS VLSI Design 4th Ed.
19
Example
7: Power
CMOS VLSI Design 4th Ed.
20
Clock Gating
 The best way to reduce the activity is to turn off the
clock to registers in unused blocks
– Saves clock activity (a = 1)
– Eliminates all switching activity in the block
– Requires determining if block will be used
7: Power
CMOS VLSI Design 4th Ed.
21
Capacitance
 Gate capacitance
– Fewer stages of logic
– Small gate sizes
 Wire capacitance
– Good floorplanning to keep communicating
blocks close to each other
– Drive long wires with inverters or buffers rather
than complex gates
7: Power
CMOS VLSI Design 4th Ed.
22
Voltage / Frequency
 Run each block at the lowest possible voltage and
frequency that meets performance requirements
 Voltage Domains
– Provide separate supplies to different blocks
– Level converters required when crossing
from low to high VDD domains
 Dynamic Voltage Scaling
– Adjust VDD and f according to
workload
7: Power
CMOS VLSI Design 4th Ed.
23
Voltage Domains
7: Power
CMOS VLSI Design 4th Ed.
24
Voltage Domains
7: Power
CMOS VLSI Design 4th Ed.
25
Voltage Domains
 The easiest approach is to associate each block in a
floorplan with a voltage
 You can also perform clustered voltage scaling
7: Power
CMOS VLSI Design 4th Ed.
26
Voltage Domains
 Dynamic voltage scaling
7: Power
CMOS VLSI Design 4th Ed.
27
Voltage Domains
7: Power
CMOS VLSI Design 4th Ed.
28
Short Circuit Currents
Vd d
Vin
Vout
CL
IVDD (mA)
0.15
0.10
0.05
0.0
1.0
2.0
3.0
Vin (V)
4.0
CMOS VLSI Design 4th Ed.
5.0
How to keep Short-Circuit Currents Low?
Short circuit current goes to zero if tfall >> trise,
but can’t do this for cascade logic, so ...
CMOS VLSI Design 4th Ed.
Minimizing Short-Circuit Power
8
7
6
Vdd =3.3
Pnorm
5
4
Vdd =2.5
3
2
1
0
Vdd =1.5
0
1
2
3
4
t /t
sin sout
CMOS VLSI Design 4th Ed.
5
Resonant Circuits
 Especially useful in clocking. IBM has demonstrated
resonant clocking for a practical processor.
7: Power
CMOS VLSI Design 4th Ed.
32
Static Power
 Static power is consumed even when chip is
quiescent.
– Leakage draws power from nominally OFF
devices
– Ratioed circuits burn power in fight between ON
transistors
7: Power
CMOS VLSI Design 4th Ed.
33
Static Power Example
 Revisit power estimation for 1 billion transistor chip
 Estimate static power consumption
– Subthreshold leakage
• Normal Vt:
100 nA/mm
• High Vt:
10 nA/mm
• High Vt used in all memories and in 95% of
logic gates
– Gate leakage
5 nA/mm
– Junction leakage
negligible
7: Power
CMOS VLSI Design 4th Ed.
34
Solution
Wnormal-Vt   50 106  12l  0.025m m / l  0.05   0.75 106 m m
Whigh-Vt   50 106  12l  0.95    950 106   4l    0.025m m / l   109.25 106 m m
I sub  Wnormal-Vt 100 nA/m m+Whigh-Vt 10 nA/m m  / 2  584 mA


I gate   Wnormal-Vt  Whigh-Vt  5 nA/m m  / 2  275 mA


Pstatic   584 mA  275 mA 1.0 V   859 mW
7: Power
CMOS VLSI Design 4th Ed.
35
Subthreshold Leakage
 For Vds > 50 mV
I sub  I off 10
Vgs  Vds VDD   k Vsb
S
 Ioff = leakage at Vgs = 0, Vds = VDD
7: Power
Typical values in 65 nm
Ioff = 100 nA/mm @ Vt = 0.3 V
Ioff = 10 nA/mm @ Vt = 0.4 V
Ioff = 1 nA/mm @ Vt = 0.5 V
 = 0.1
k = 0.1
S = 100 mV/decade
CMOS VLSI Design 4th Ed.
36
Stack Effect
 Series OFF transistors have less leakage
– Vx > 0, so N2 has negative Vgs
 Vx VDD 
I sub  I off 10
S
 I off 10
Vx   VDD Vx  VDD  k Vx
S
N2
Vx 
N1
VDD
1  2  k
I sub  I off 10
 1  k
VDD 
 1 2  k

S




 I off 10
VDD
S
– Leakage through 2-stack reduces ~10x
– Leakage through 3-stack reduces further
7: Power
CMOS VLSI Design 4th Ed.
37
Threshold Effect
7: Power
CMOS VLSI Design 4th Ed.
38
Leakage Control
 Leakage and delay trade off
– Aim for low leakage in sleep and low delay in
active mode
 To reduce leakage:
– Increase Vt: multiple Vt
• Use low Vt only in critical circuits
– Increase Vs: stack effect
• Input vector control in sleep
– Decrease Vb
• Reverse body bias in sleep
• Or forward body bias in active mode
7: Power
CMOS VLSI Design 4th Ed.
39
Gate Leakage
 Extremely strong function of tox and Vgs
– Negligible for older processes
– Approaches subthreshold leakage at 65 nm and
below in some processes
 An order of magnitude less for pMOS than nMOS
 Control leakage in the process using tox > 10.5 Å
– High-k gate dielectrics help
– Some processes provide multiple tox
• e.g. thicker oxide for 3.3 V I/O transistors
 Control leakage in circuits by limiting VDD
7: Power
CMOS VLSI Design 4th Ed.
40
NAND3 Leakage Example
 100 nm process
Ign = 6.3 nA
Igp = 0
Ioffn = 5.63 nA Ioffp = 9.3 nA
Data from [Lee03]
7: Power
CMOS VLSI Design 4th Ed.
41
Junction Leakage
 From reverse-biased p-n junctions
– Between diffusion and substrate or well
 Ordinary diode leakage is negligible
 Band-to-band tunneling (BTBT) can be significant
– Especially in high-Vt transistors where other
leakage is small
– Worst at Vdb = VDD
 Gate-induced drain leakage (GIDL) exacerbates
– Worst for Vgd = -VDD (or more negative)
7: Power
CMOS VLSI Design 4th Ed.
42
Power Gating
 Turn OFF power to blocks when they are idle to
save leakage
– Use virtual VDD (VDDV)
– Gate outputs to prevent
invalid logic levels to next block
 Voltage drop across sleep transistor degrades
performance during normal operation
– Size the transistor wide enough to minimize
impact
 Switching wide sleep transistor costs dynamic power
– Only justified when circuit sleeps long enough
7: Power
CMOS VLSI Design 4th Ed.
43
Power Gating
 When a block is gated, the state must either be saved or reset
upon power-up.
– Either use registers with a second VDD.
– Or save everything to memory.
 Power gating may be done externally with a disable input to a
voltage regulator or internally with high VT header or footer
switches.
 External power gating eliminates leakage altogether, but it
takes a long time and significant energy.
 The power transistor actually consists of many transistors in
parallel which should be controlled individually to combat Ldi/dt
and IR drops.
 Also, best Ion/Ioff is obtained for specific L and W values.
7: Power
CMOS VLSI Design 4th Ed.
44
Multiple Thresholds
 Selective application of multiple threshold voltages
can maintain performance on critical paths with lowVt transistors while reducing leakage on other paths
with high-Vt transistors.
 Using multiple thresholds adds to the cost of the
process.
 One can alternatively use non-minimum L transistors
for non-critical paths, thus raising the threshold
voltages via the short-channel effect.
 For example, in Intel’s 65nm process, 10% longer
transistors reduces Ion by 10%, but Ioff 3 times.
7: Power
CMOS VLSI Design 4th Ed.
45
Variable Thresholds
 Using body bias, one can dynamically adjust
threshold voltages.
 This is called variable threshold CMOS (VTCMOS).
 Use low-Vt devices and reverse body bias during
sleep.
 Alternatively, use high-Vt devices and forward body
bias during operation.
 Too much reverse body bias (e.g. < -1.2V) leads to
greater junction leakage due to BTBT.
 Too much forward body bias (e.g.>0.4V) leads to
large current through the body to source diodes.
7: Power
CMOS VLSI Design 4th Ed.
46
Variable Thresholds
 Below is an n-well process with body bias.
 Normally, triple well processes should be utilized.
7: Power
CMOS VLSI Design 4th Ed.
47
Input Vector Control
 Applying the pattern that consumes the least power
during sleeping could minimize the power in that
block.
 Be careful that applying this pattern itself causes
power dissipation.
7: Power
CMOS VLSI Design 4th Ed.
48
Energy-Delay Optimization
 What is the best choice for VDD and Vt in a certain
technology and application?
 What does “best” mean?
 Let us start with minimum energy.
 Energy corresponds to PDP.
 It occurs in the subthreshold region where VDD < Vt.
 Von Neumann said that this could be found from
thermodynamics and was kTln2.
 Meindl found the minimum voltage that the inverter
could operate at by equating the slope at the
switching point to -1.
7: Power
CMOS VLSI Design 4th Ed.
49
Energy-Delay Optimization
 He took n = 1 for subthreshold operation.
 The minimum voltage turns out to be
Vmin
2kT 
 2ln
 36m V@300K
 q 
 The energy stored on the gate capacitance of a
MOSFET is
QVDD

E
2
 The minimum charge is q.
 Emin = kTln2 = 2.9 X 10-21 J.

 0.5mm 5V process, 1.5 X 10-13, 65nm 1V 3 X 10-16 J.
7: Power
CMOS VLSI Design 4th Ed.
50
Energy-Delay Optimization
 However, this situation does not really minimize
energy because the circuits run so slowly that the
leakage energy dominates.
 The true minimum energy is at a point where
switching and leakage energies are balanced.
 In subthreshold operation, current drops
exponentially with VDD-Vt, switching energy improves
quadratically with VDD.
 Ignoring DIBL, gate and junction leakage, and short
circuit power one can find the minimum energy point
easily.
7: Power
CMOS VLSI Design 4th Ed.
51
Energy-Delay Optimization
7: Power
CMOS VLSI Design 4th Ed.
52
Minimum Energy
 The delay of N gates operating in subthreshold
region is given by
NkCgVDD
D
Ioff 10VDD
 The energy consumed in one cycle is

2
E switching  Ceff VDD
2
E leak  Isub VDD D  W eff NkCg10VDD VDD
E total  E switching  E leak

7: Power
CMOS VLSI Design 4th Ed.
53
Minimum Energy
 Note that this equation depends on switching
activity.
 Also, only inverters were used in the analysis.
 Other gates can also be considered.
 Temperature effects the behavior strongly.
 Even in this case, taking the derivative and equating
to zero yields messy equations.
 Contour plots are more informative.
7: Power
CMOS VLSI Design 4th Ed.
54
Minimum Energy
a= 1
7: Power
a = 0.1
CMOS VLSI Design 4th Ed.
55
Minimum Energy
 The minimum energy points are not practical
because the energy is decreased about 10 times,
but the frequency is decreased 10000 – 100000
times.
 A better alternative to take into account both energy
and speed is energy delay product (EDP).
7: Power
CMOS VLSI Design 4th Ed.
56
Minimum EDP
 First, ignore leakage.
 Use the alpha-power law to include velocity
saturation.
 EDP is given by
EDP  k
2
3
Ceff
VDD
VDD  Vt 
a
 Differentiating with respect to VDD and setting to zero

7: Power

VDD opt
3

Vt
3a
CMOS VLSI Design 4th Ed.
57
Minimum EDP
 a is typically between 1 and 2.
 Hence, the optimum VDD is around 2Vt.
 Differentiating with respect to Vt gives the optimum
Vt to be zero.
 This is because leakage was neglected.
 Leakage should also be introduced and the equation
should be solved again.
 The results are messy, but can be described in
terms of contour plots.
 The dashed lines represent speed normalized to the
minimum EDP point.
7: Power
CMOS VLSI Design 4th Ed.
58
Minimum EDP
7: Power
CMOS VLSI Design 4th Ed.
59
Minimum Energy under a Delay Constraint
7: Power
CMOS VLSI Design 4th Ed.
60
Low Power Architectures
7: Power
CMOS VLSI Design 4th Ed.
61
Power Management Modes
7: Power
CMOS VLSI Design 4th Ed.
62
Power Management Modes
 Intel Atom Processor
– HFM: 2 GHz, 1 V, 2 W.
– LFM: 600 MHz, 0.75 V.
– Sleep modes: C1-C6
 For a typical workload, the chip spends 80% - 90%
of its time in C6 mode.
 The average power drops to 220mW.
 Chips are usually designed for average power.
 Software designed to spend maximum power and
burn chips is called thermal virus.
7: Power
CMOS VLSI Design 4th Ed.
63
Pitfalls and Fallacies
 Oversizing gates
 Designing for speed regardless of power.
 Reporting power at a a given frequency rather than
energy per operation.
 Reporting PDP where actually EDP should be used.
 Failing to account for leakage.
7: Power
CMOS VLSI Design 4th Ed.
64