Transcript Lecture 7
Lecture 7:
Power
Outline
Power and Energy
Dynamic Power
Static Power
7: Power
CMOS VLSI Design 4th Ed.
2
Power and Energy
Power is drawn from a voltage source attached to
the VDD pin(s) of a chip.
Instantaneous Power: P(t ) I (t )V (t )
Energy:
T
E P(t )dt
0
Average Power:
7: Power
T
E 1
Pavg P(t )dt
T T0
CMOS VLSI Design 4th Ed.
3
Power in Circuit Elements
PVDD t I DD t VDD
VR2 t
PR t
I R2 t R
R
dV
EC I t V t dt C V t dt
dt
0
0
VC
C V t dV 12 CVC2
0
7: Power
CMOS VLSI Design 4th Ed.
4
Charging a Capacitor
When the gate output rises
– Energy stored in capacitor is
2
EC 12 CLVDD
– But energy drawn from the supply is
EVDD I t VDD dt CL
0
0
dV
VDD dt
dt
VDD
dV C V
– Half the energy from VDD is dissipated in the pMOS
transistor as heat, other half stored in capacitor
When the gate output falls
– Energy in capacitor is dumped to GND
– Dissipated as heat in the nMOS transistor
CLVDD
2
L DD
0
7: Power
CMOS VLSI Design 4th Ed.
5
Switching Waveforms
Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz
7: Power
CMOS VLSI Design 4th Ed.
6
Switching Power
T
Pswitching
1
iDD (t )VDD dt
T 0
T
VDD
iDD (t )dt
T 0
VDD
Tfsw CVDD
T
CVDD 2 fsw
VDD
iDD(t)
fsw
C
7: Power
CMOS VLSI Design 4th Ed.
7
Activity Factor
Suppose the system clock frequency = f
Let fsw = af, where a = activity factor
– If the signal is a clock, a = 1
– If the signal switches once per cycle, a = ½
Dynamic power:
Pswitching aCVDD2 f
7: Power
CMOS VLSI Design 4th Ed.
8
Short Circuit Current
When transistors switch, both nMOS and pMOS
networks may be momentarily ON at once
Leads to a blip of “short circuit” current.
< 10% of dynamic power if rise/fall times are
comparable for input and output
We will generally ignore this component
7: Power
CMOS VLSI Design 4th Ed.
9
Power Dissipation Sources
Ptotal = Pdynamic + Pstatic
Dynamic power: Pdynamic = Pswitching + Pshortcircuit
– Switching load capacitances
– Short-circuit current
Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD
– Subthreshold leakage
– Gate leakage
– Junction leakage
– Contention current
7: Power
CMOS VLSI Design 4th Ed.
10
Power Dissipation
Power dissipation breakdown in the Niagra 2
processor (Sun-8 core – 84W)
7: Power
CMOS VLSI Design 4th Ed.
11
Dynamic Power Example
1 billion transistor chip
– 50M logic transistors
• Average width: 12 l
• Activity factor = 0.1
– 950M memory transistors
• Average width: 4 l
• Activity factor = 0.02
– 1.0 V 65 nm process
– C = 1 fF/mm (gate) + 0.8 fF/mm (diffusion)
Estimate dynamic power consumption @ 1 GHz.
Neglect wire capacitance and short-circuit current.
7: Power
CMOS VLSI Design 4th Ed.
12
Solution
Clogic 50 106 12l 0.025m m / l 1.8 fF / m m 27 nF
Cmem 950 106 4l 0.025m m / l 1.8 fF / m m 171 nF
Pdynamic 0.1Clogic 0.02Cmem 1.0 1.0 GHz 6.1 W
2
7: Power
CMOS VLSI Design 4th Ed.
13
Dynamic Power Reduction
2
P
a
CV
switching
DD f
Try to minimize:
– Activity factor
– Capacitance
– Supply voltage
– Frequency
7: Power
CMOS VLSI Design 4th Ed.
14
Activity Factor Estimation
Let Pi = Prob(node i = 1)
– Pi = 1-Pi
ai = Pi * Pi
Completely random data has P = 0.5 and a = 0.25
Data is often not completely random
– e.g. upper bits of 64-bit words representing bank
account balances are usually 0
Data propagating through ANDs and ORs has lower
activity factor
– Depends on design, but typically a ≈ 0.1
7: Power
CMOS VLSI Design 4th Ed.
15
Switching Probability
7: Power
CMOS VLSI Design 4th Ed.
16
Example
A 4-input AND is built out of two levels of gates
Estimate the activity factor at each node if the inputs
have P = 0.5
7: Power
CMOS VLSI Design 4th Ed.
17
Example
Compare the two cases below:
7: Power
CMOS VLSI Design 4th Ed.
18
Example
7: Power
CMOS VLSI Design 4th Ed.
19
Example
7: Power
CMOS VLSI Design 4th Ed.
20
Clock Gating
The best way to reduce the activity is to turn off the
clock to registers in unused blocks
– Saves clock activity (a = 1)
– Eliminates all switching activity in the block
– Requires determining if block will be used
7: Power
CMOS VLSI Design 4th Ed.
21
Capacitance
Gate capacitance
– Fewer stages of logic
– Small gate sizes
Wire capacitance
– Good floorplanning to keep communicating
blocks close to each other
– Drive long wires with inverters or buffers rather
than complex gates
7: Power
CMOS VLSI Design 4th Ed.
22
Voltage / Frequency
Run each block at the lowest possible voltage and
frequency that meets performance requirements
Voltage Domains
– Provide separate supplies to different blocks
– Level converters required when crossing
from low to high VDD domains
Dynamic Voltage Scaling
– Adjust VDD and f according to
workload
7: Power
CMOS VLSI Design 4th Ed.
23
Voltage Domains
7: Power
CMOS VLSI Design 4th Ed.
24
Voltage Domains
7: Power
CMOS VLSI Design 4th Ed.
25
Voltage Domains
The easiest approach is to associate each block in a
floorplan with a voltage
You can also perform clustered voltage scaling
7: Power
CMOS VLSI Design 4th Ed.
26
Voltage Domains
Dynamic voltage scaling
7: Power
CMOS VLSI Design 4th Ed.
27
Voltage Domains
7: Power
CMOS VLSI Design 4th Ed.
28
Short Circuit Currents
Vd d
Vin
Vout
CL
IVDD (mA)
0.15
0.10
0.05
0.0
1.0
2.0
3.0
Vin (V)
4.0
CMOS VLSI Design 4th Ed.
5.0
How to keep Short-Circuit Currents Low?
Short circuit current goes to zero if tfall >> trise,
but can’t do this for cascade logic, so ...
CMOS VLSI Design 4th Ed.
Minimizing Short-Circuit Power
8
7
6
Vdd =3.3
Pnorm
5
4
Vdd =2.5
3
2
1
0
Vdd =1.5
0
1
2
3
4
t /t
sin sout
CMOS VLSI Design 4th Ed.
5
Resonant Circuits
Especially useful in clocking. IBM has demonstrated
resonant clocking for a practical processor.
7: Power
CMOS VLSI Design 4th Ed.
32
Static Power
Static power is consumed even when chip is
quiescent.
– Leakage draws power from nominally OFF
devices
– Ratioed circuits burn power in fight between ON
transistors
7: Power
CMOS VLSI Design 4th Ed.
33
Static Power Example
Revisit power estimation for 1 billion transistor chip
Estimate static power consumption
– Subthreshold leakage
• Normal Vt:
100 nA/mm
• High Vt:
10 nA/mm
• High Vt used in all memories and in 95% of
logic gates
– Gate leakage
5 nA/mm
– Junction leakage
negligible
7: Power
CMOS VLSI Design 4th Ed.
34
Solution
Wnormal-Vt 50 106 12l 0.025m m / l 0.05 0.75 106 m m
Whigh-Vt 50 106 12l 0.95 950 106 4l 0.025m m / l 109.25 106 m m
I sub Wnormal-Vt 100 nA/m m+Whigh-Vt 10 nA/m m / 2 584 mA
I gate Wnormal-Vt Whigh-Vt 5 nA/m m / 2 275 mA
Pstatic 584 mA 275 mA 1.0 V 859 mW
7: Power
CMOS VLSI Design 4th Ed.
35
Subthreshold Leakage
For Vds > 50 mV
I sub I off 10
Vgs Vds VDD k Vsb
S
Ioff = leakage at Vgs = 0, Vds = VDD
7: Power
Typical values in 65 nm
Ioff = 100 nA/mm @ Vt = 0.3 V
Ioff = 10 nA/mm @ Vt = 0.4 V
Ioff = 1 nA/mm @ Vt = 0.5 V
= 0.1
k = 0.1
S = 100 mV/decade
CMOS VLSI Design 4th Ed.
36
Stack Effect
Series OFF transistors have less leakage
– Vx > 0, so N2 has negative Vgs
Vx VDD
I sub I off 10
S
I off 10
Vx VDD Vx VDD k Vx
S
N2
Vx
N1
VDD
1 2 k
I sub I off 10
1 k
VDD
1 2 k
S
I off 10
VDD
S
– Leakage through 2-stack reduces ~10x
– Leakage through 3-stack reduces further
7: Power
CMOS VLSI Design 4th Ed.
37
Threshold Effect
7: Power
CMOS VLSI Design 4th Ed.
38
Leakage Control
Leakage and delay trade off
– Aim for low leakage in sleep and low delay in
active mode
To reduce leakage:
– Increase Vt: multiple Vt
• Use low Vt only in critical circuits
– Increase Vs: stack effect
• Input vector control in sleep
– Decrease Vb
• Reverse body bias in sleep
• Or forward body bias in active mode
7: Power
CMOS VLSI Design 4th Ed.
39
Gate Leakage
Extremely strong function of tox and Vgs
– Negligible for older processes
– Approaches subthreshold leakage at 65 nm and
below in some processes
An order of magnitude less for pMOS than nMOS
Control leakage in the process using tox > 10.5 Å
– High-k gate dielectrics help
– Some processes provide multiple tox
• e.g. thicker oxide for 3.3 V I/O transistors
Control leakage in circuits by limiting VDD
7: Power
CMOS VLSI Design 4th Ed.
40
NAND3 Leakage Example
100 nm process
Ign = 6.3 nA
Igp = 0
Ioffn = 5.63 nA Ioffp = 9.3 nA
Data from [Lee03]
7: Power
CMOS VLSI Design 4th Ed.
41
Junction Leakage
From reverse-biased p-n junctions
– Between diffusion and substrate or well
Ordinary diode leakage is negligible
Band-to-band tunneling (BTBT) can be significant
– Especially in high-Vt transistors where other
leakage is small
– Worst at Vdb = VDD
Gate-induced drain leakage (GIDL) exacerbates
– Worst for Vgd = -VDD (or more negative)
7: Power
CMOS VLSI Design 4th Ed.
42
Power Gating
Turn OFF power to blocks when they are idle to
save leakage
– Use virtual VDD (VDDV)
– Gate outputs to prevent
invalid logic levels to next block
Voltage drop across sleep transistor degrades
performance during normal operation
– Size the transistor wide enough to minimize
impact
Switching wide sleep transistor costs dynamic power
– Only justified when circuit sleeps long enough
7: Power
CMOS VLSI Design 4th Ed.
43
Power Gating
When a block is gated, the state must either be saved or reset
upon power-up.
– Either use registers with a second VDD.
– Or save everything to memory.
Power gating may be done externally with a disable input to a
voltage regulator or internally with high VT header or footer
switches.
External power gating eliminates leakage altogether, but it
takes a long time and significant energy.
The power transistor actually consists of many transistors in
parallel which should be controlled individually to combat Ldi/dt
and IR drops.
Also, best Ion/Ioff is obtained for specific L and W values.
7: Power
CMOS VLSI Design 4th Ed.
44
Multiple Thresholds
Selective application of multiple threshold voltages
can maintain performance on critical paths with lowVt transistors while reducing leakage on other paths
with high-Vt transistors.
Using multiple thresholds adds to the cost of the
process.
One can alternatively use non-minimum L transistors
for non-critical paths, thus raising the threshold
voltages via the short-channel effect.
For example, in Intel’s 65nm process, 10% longer
transistors reduces Ion by 10%, but Ioff 3 times.
7: Power
CMOS VLSI Design 4th Ed.
45
Variable Thresholds
Using body bias, one can dynamically adjust
threshold voltages.
This is called variable threshold CMOS (VTCMOS).
Use low-Vt devices and reverse body bias during
sleep.
Alternatively, use high-Vt devices and forward body
bias during operation.
Too much reverse body bias (e.g. < -1.2V) leads to
greater junction leakage due to BTBT.
Too much forward body bias (e.g.>0.4V) leads to
large current through the body to source diodes.
7: Power
CMOS VLSI Design 4th Ed.
46
Variable Thresholds
Below is an n-well process with body bias.
Normally, triple well processes should be utilized.
7: Power
CMOS VLSI Design 4th Ed.
47
Input Vector Control
Applying the pattern that consumes the least power
during sleeping could minimize the power in that
block.
Be careful that applying this pattern itself causes
power dissipation.
7: Power
CMOS VLSI Design 4th Ed.
48
Energy-Delay Optimization
What is the best choice for VDD and Vt in a certain
technology and application?
What does “best” mean?
Let us start with minimum energy.
Energy corresponds to PDP.
It occurs in the subthreshold region where VDD < Vt.
Von Neumann said that this could be found from
thermodynamics and was kTln2.
Meindl found the minimum voltage that the inverter
could operate at by equating the slope at the
switching point to -1.
7: Power
CMOS VLSI Design 4th Ed.
49
Energy-Delay Optimization
He took n = 1 for subthreshold operation.
The minimum voltage turns out to be
Vmin
2kT
2ln
36m V@300K
q
The energy stored on the gate capacitance of a
MOSFET is
QVDD
E
2
The minimum charge is q.
Emin = kTln2 = 2.9 X 10-21 J.
0.5mm 5V process, 1.5 X 10-13, 65nm 1V 3 X 10-16 J.
7: Power
CMOS VLSI Design 4th Ed.
50
Energy-Delay Optimization
However, this situation does not really minimize
energy because the circuits run so slowly that the
leakage energy dominates.
The true minimum energy is at a point where
switching and leakage energies are balanced.
In subthreshold operation, current drops
exponentially with VDD-Vt, switching energy improves
quadratically with VDD.
Ignoring DIBL, gate and junction leakage, and short
circuit power one can find the minimum energy point
easily.
7: Power
CMOS VLSI Design 4th Ed.
51
Energy-Delay Optimization
7: Power
CMOS VLSI Design 4th Ed.
52
Minimum Energy
The delay of N gates operating in subthreshold
region is given by
NkCgVDD
D
Ioff 10VDD
The energy consumed in one cycle is
2
E switching Ceff VDD
2
E leak Isub VDD D W eff NkCg10VDD VDD
E total E switching E leak
7: Power
CMOS VLSI Design 4th Ed.
53
Minimum Energy
Note that this equation depends on switching
activity.
Also, only inverters were used in the analysis.
Other gates can also be considered.
Temperature effects the behavior strongly.
Even in this case, taking the derivative and equating
to zero yields messy equations.
Contour plots are more informative.
7: Power
CMOS VLSI Design 4th Ed.
54
Minimum Energy
a= 1
7: Power
a = 0.1
CMOS VLSI Design 4th Ed.
55
Minimum Energy
The minimum energy points are not practical
because the energy is decreased about 10 times,
but the frequency is decreased 10000 – 100000
times.
A better alternative to take into account both energy
and speed is energy delay product (EDP).
7: Power
CMOS VLSI Design 4th Ed.
56
Minimum EDP
First, ignore leakage.
Use the alpha-power law to include velocity
saturation.
EDP is given by
EDP k
2
3
Ceff
VDD
VDD Vt
a
Differentiating with respect to VDD and setting to zero
7: Power
VDD opt
3
Vt
3a
CMOS VLSI Design 4th Ed.
57
Minimum EDP
a is typically between 1 and 2.
Hence, the optimum VDD is around 2Vt.
Differentiating with respect to Vt gives the optimum
Vt to be zero.
This is because leakage was neglected.
Leakage should also be introduced and the equation
should be solved again.
The results are messy, but can be described in
terms of contour plots.
The dashed lines represent speed normalized to the
minimum EDP point.
7: Power
CMOS VLSI Design 4th Ed.
58
Minimum EDP
7: Power
CMOS VLSI Design 4th Ed.
59
Minimum Energy under a Delay Constraint
7: Power
CMOS VLSI Design 4th Ed.
60
Low Power Architectures
7: Power
CMOS VLSI Design 4th Ed.
61
Power Management Modes
7: Power
CMOS VLSI Design 4th Ed.
62
Power Management Modes
Intel Atom Processor
– HFM: 2 GHz, 1 V, 2 W.
– LFM: 600 MHz, 0.75 V.
– Sleep modes: C1-C6
For a typical workload, the chip spends 80% - 90%
of its time in C6 mode.
The average power drops to 220mW.
Chips are usually designed for average power.
Software designed to spend maximum power and
burn chips is called thermal virus.
7: Power
CMOS VLSI Design 4th Ed.
63
Pitfalls and Fallacies
Oversizing gates
Designing for speed regardless of power.
Reporting power at a a given frequency rather than
energy per operation.
Reporting PDP where actually EDP should be used.
Failing to account for leakage.
7: Power
CMOS VLSI Design 4th Ed.
64