Chapter 4: Circuit Level Optimiation at Design Time
Download
Report
Transcript Chapter 4: Circuit Level Optimiation at Design Time
Optimizing Power @ Design Time
Circuits
Jan M. Rabaey
Dejan Marković
Borivoje Nikolić
Low Power Design Essentials ©2008
Chapter 4
Chapter Outline
Optimization framework for energy-delay trade-off
Dynamic power optimization
– Multiple supply voltages
– Transistor sizing
– Technology mapping
Static power optimization
– Multiple thresholds
– Transistor stacking
Low Power Design Essentials ©2008
4.2
Energy/Power Optimization Strategy
For given function and activity, an optimal operation
point can be derived in the energy-performance space
Time of optimization depends upon activity profile
Different optimizations apply to active and static power
Fixed
Activity
Variable
Activity
No Activity
- Standby
Design time
Run time
Sleep
Active
Static
Low Power Design Essentials ©2008
4.3
Energy-Delay Optimization and Trade-off
Energy/op
Trade-off space
Unoptimized
design
Emax
Emin
Dmin
Dmax Delay
Maximize throughput for given energy or
Minimize energy for given throughput
Other important metrics: Area, Reliability, Reusability
Low Power Design Essentials ©2008
4.4
The Design Abstraction Stack
A very rich set of design parameters to consider!
It helps to consider options in relation to their
abstraction layer
System/Application
This Chapter
Software
Choice of algorithm
Amount of concurrency
(Micro-)Architecture
Parallel versus pipelined,
general purpose versus
application specific
Logic/RT
logic family, standard cell
versus custom
Circuit
sizing, supply, thresholds
Device
Bulk versus SOI
Low Power Design Essentials ©2008
4.5
Optimization Can/Must Span Multiple Levels
Architecture
Micro-Architecture
Circuit (Logic & FFs)
Design optimization combines top-down and bottom-up:
“meet-in-the-middle”
Low Power Design Essentials ©2008
4.6
topology A
topology B
Delay
Energy/op
Energy/op
Energy-Delay Optimization
topology A
topology B
Delay
Globally optimal energy-delay curve for a
given function
Low Power Design Essentials ©2008
4.7
Some Optimization Observations
Energy
∂E / ∂A
SA=
∂D / ∂A
SA
A=A0
(A0,B0)
SB
f (A,B0)
f (A0,B)
D0
Delay
Energy-Delay Sensitivities
Low Power Design Essentials ©2008
[Ref: V. Stojanovic, ESSCIRC’02]
4.8
Finding the Optimal Energy-Delay Curve
Pareto-optimal:
the best that can be achieved without disadvantaging at least one metric.
f (A1,B)
Energy
∆E = SA∙(∆D) + SB∙∆D
(A0,B0)
f (A,B0)
∆D
D0
f (A0,B)
Delay
On the optimal curve, all sensitivities must be equal
Low Power Design Essentials ©2008
4.9
Reducing Active Energy @ Design Time
Eactive ~ a CL Vswing VDD
Pactive ~ a CL Vswing VDD f
Reducing voltages
– Lowering the supply voltage (VDD) at the expense of clock
speed
– Lowering the logic swing (Vswing)
Reducing transistor sizes (CL)
– Slows down logic
Reducing activity (a)
– Reducing switching activity through transformations
– Reducing glitching by balancing logic
Low Power Design Essentials ©2008
4.10
Observation
Downsizing and/or lowering the supply on the critical path
lowers the operating frequency
Downsizing non-critical paths reduces energy for free, but
target
delay
tp (path)
Low Power Design Essentials ©2008
# of paths
# of paths
– Narrows down the path delay distribution
– Increases impact of variations, impacts robustness
target
delay
tp (path)
4.11
Circuit Optimization Framework
Energy (VDD, VTH, W)
Delay (VDD, VTH, W) ≤ Dcon
Constraints
VDDmin < VDD < VDDmax
VTHmin < VTH < VTHmax
Wmin < W
Reference case
Energy/op
minimize
subject to
topology A
topology B
– Dmin sizing @ VDDmax, VTHref
Low Power Design Essentials ©2008
[Ref: V. Stojanovic, ESSCIRC’02]
Delay
4.12
Optimization Framework: Generic Network
Ci
VDD,i
VDD,i+1
i
i+1
gCi
Cw
Ci+1
Gate in stage i loaded by fanout (stage i+1)
Low Power Design Essentials ©2008
4.13
Alpha-power based Delay Model
K dVDD
gC i Cw Ci 1
1 Ci1
t p
(
) nom (1
)
ad
gCi
g Ci
(VDD Von )
Fit parameters: Von, ad, Kd, g
4
60
simulation
model
simulation
model
50
3
Von = 0.37 V
a d = 1.53
2.5
2
1.5
Delay (ps)
FO4 delay (norm.)
3.5
nom = 6 ps
g = 1.35
40
30
20
1
0.5
0
tp
10
(90nm technology)
0.5
0.6
0.7 0.8 0.9
ref
VDD / VDD
Low Power Design Essentials ©2008
1
0
0
2
4
6
8
10
Fanout (Ci+1/Ci)
VDDref = 1.2V, technology 90 nm
4.14
Combined with Logical Effort Formulation
For Complex Gates
t p nom ( pi
fi gi
g
)
Parasitic delay pi – depends upon gate topology
Electrical effort fi ≈ Si+1/Si
Logical effort gi – depends upon gate topology
Effective fanout hi = figi
Low Power Design Essentials ©2008
[Ref: I. Sutherland, Morgan-Kaufman’99]
4.15
Dynamic Energy
Edyn (gCi Cw Ci 1 ) VDD ,i Ci (g f i) VDD ,i
2
f i (Cw Ci 1 ) / Ci Si1 / S i
Ci K e Si
Ci
2
VDD,i
VDD,i+1
i
i+1
gCi
Cw
Ei Ke Si (V
2
DD ,i 1
Ci+1
gVDD ,i )
2
= energy consumed by logic gate i
Low Power Design Essentials ©2008
4.16
Optimizating Return on Investment (ROI)
Depends on Sensitivity (E/D)
Gate Sizing
E
D
Si
Si
Ei
nom (hi hi 1 )
for equal h
(Dmin)
Supply Voltage
E
D
VDD
VDD
Von
2 (1
)
E
VDD
D a 1 Von
d
VDD
Low Power Design Essentials ©2008
max at VDD(max)
(Dmin)
4.17
Example: Inverter Chain
Properties of inverter chain
– Single path topology
– Energy increases geometrically from input to output
1
S1 = 1
S2
S3
…
SN
CL
Goal
– Find optimal sizing S = [S1, S2, …, SN], supply voltage, and
buffering strategy to achieve the best energy-delay tradeoff
Low Power Design Essentials ©2008
4.18
Inverter Chain: Gate Sizing
effective fanout, h
25
nom
opt
20
d
inc
= 50%
30%
15
10%
10
1%
5
0%
0
1
2
3
4 5
stage
6
7
S i 1 S i 1
Si
1 S i 1
2
2 K e VDD
nom FS
Ei
FS
hi hi 1
2
[Ref: Ma, JSSC’94]
Variable taper achieves minimum energy
Reduce number of stages at large dinc
Low Power Design Essentials ©2008
4.19
Inverter Chain: VDD Optimization
0%
V
DD
/ V DD
nom
1.0
1%
0.8
10%
0.6
30%
0.4
d
= 50%
nom
opt
0.2
0
inc
1
2
3
4
5
stage
6
7
VDD reduces energy of the final load first
Variable taper achieved by voltage scaling
Low Power Design Essentials ©2008
4.20
Inverter Chain: Optimization Results
100
0.8
energy reduction (%)
Sensitivity (norm)
1.0
S
gVDD
2VDD
cVDD
0.6
0.4
0.2
0
0
10
20 30
dinc (%)
40
50
80
60
40
20
0
0
10
20 30
dinc (%)
40
50
Parameter with the largest sensitivity has the largest
potential for energy reduction
Two discrete supplies mimic per-stage VDD
Low Power Design Essentials ©2008
4.21
Example: Kogge-Stone Tree Adder
(A15, B15)
S15
Tree adder
– Long wires
– Re-convergent paths
– Multiple active outputs
(A0, B0)
S0
Cin
Low Power Design Essentials ©2008
[Ref: P. Kogge, Trans. Comp’73]
4.22
Tree Adder: Sizing vs. Dual-VDD Optimization
Reference design: all paths are critical
reference
D=Dmin
sizing: E (-54%)
dinc=10%
2Vdd: E (-27%)
dinc=10%
Internal energy S more effective than VDD
– S: E(-54%), 2Vdd: E(-27%) at dinc = 10%
Low Power Design Essentials ©2008
4.23
Tree Adder: Multi-dimensional Search
1
Reference
VDD, VTH
0.8
S, VDD
S, VTH
0.6
S, VDD, VTH
0.4
0.2
0
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Delay / Dmin
Can get pretty close to optimum with only 2 variables
Getting the minimum speed or delay is very expensive
Low Power Design Essentials ©2008
4.24
Multiple Supply Voltages
Block-level supply assignment
– Higher throughput/lower latency functions are
implemented in higher VDD
– Slower functions are implemented with lower VDD
– This leads to so-called “voltage islands” with separate
supply grids
– Level conversion performed at block boundaries
Multiple supplies inside a block
– Non-critical paths moved to lower supply voltage
– Level conversion within the block
– Physical design challenging
Low Power Design Essentials ©2008
4.25
Using Three VDD’s
© IEEE 2002
1
1.3
1.21.2
0.8
1.1
0.7
11
0.6
0.5
V2 (V)
0.9
V3 (V)
Power Reduction Ratio
1.41.4
0.9
0.80.8
0.7
0.4
1.5
+
0.60.6
0.5
1
0.5
0 0
0.5
1
1.5
0.40.4
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9 1
1
V1 (V)
1.1
1.2
1.2
1.3
1.4
1.4
V2 (V)
V1 = 1.5V, VTH = 0.3V
Low Power Design Essentials ©2008
[Ref: T. Kuroda, ICCAD’02]
4.26
Optimum Number of VDD’s
{ V1, V2, V3 }
{ V1, V2 }
VDD Ratio
1.0
{ V1, V2, V3, V4 }
V2/V1
V2/V1
V2/V1
V3/V1
V3/V1
0.5
V4/V1
1.0
P Ratio
P2/P1
P3/P1
P4/P1
0.4
© IEEE 2001
0.5
1.0
V1
1.5
(V)
0.5
1.5 0.5
1.0
V1
(V)
1.0
V1
1.5
(V)
The more VDD’s the less power, but the effect saturates
Power reduction effect decreases with scaling of VDD
Optimum V2/V1 is around 0.7
Low Power Design Essentials ©2008
[Ref: M. Hamada, CICC’01]
4.27
Lessons: Multiple Supply Voltages
Two supply voltages per block are optimal
Optimal ratio between the supply voltages is 0.7
Level conversion is performed on the voltage boundary,
using a level-converting flip-flop (LCFF)
An option is to use an asynchronous level converter
– More sensitive to coupling and supply noise
Low Power Design Essentials ©2008
4.28
Distributing Multiple Supply Voltages
Conventional
VDDH
i1
Shared N-well
VDDH
VDDL
VDDL
o1
i1
i2
o2
VSS
VDDH circuit
Low Power Design Essentials ©2008
o1
i2
o2
VSS
VDDL circuit
VDDH circuit
VDDL circuit
4.29
Conventional
VDDL Row
N-well isolation
VDDH
VDDL
VDDH Row
VDDL Row
VDDH Row
(a) Dedicated row
VSS
VDDH circuit
VDDL circuit
VDDH
Region
VDDL
Region
(b) Dedicated region
Low Power Design Essentials ©2008
4.30
Shared N-Well
Shared N-well
VDDL circuit
VDDH circuit
VDDH
VDDL
VSS
VDDH circuit
VDDL circuit
[Shimazaki et al, ISSCC’03]
Low Power Design Essentials ©2008
(a) Floor plan image
4.31
Example: Multiple Supplies in a Block
Conventional Design
CVS Structure
FF
Level-Shifting F/F
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
Critical Path
FF
© IEEE 1998
FF
Critical Path
Lower VDD portion is shared
“Clustered voltage scaling”
Low Power Design Essentials ©2008
[Ref: M. Takahashi, ISSCC’98]
4.32
Level Converting Flip-Flops (LCFFs)
level conversion
ck
ckb
level conversion
sf
mo
db
so
q
d
ck sf
so
d
q (inv.)
ck
mf
MN1 MN2
ckb
ck
ck
clk
clk
Master-Slave
Pulsed Half-Latch
© IEEE 2003
Pulsed Half-Latch versus Master-Slave LCFFs
Smaller # of MOSFETs / clock loading
Faster level conversion using half-latch structure
Shorter D-Q path from pulsed circuit
Low Power Design Essentials ©2008
[Ref: F. Ishihara, ISLPED’03]
4.33
Dynamic Realization of Pulsed LCFF
VDDH
xb
Pulsed precharge
LCFF (PPR)
– Fast level conversion by
precharge mechanism
– Suppressed
charge/discharge toggle
by conditional capture
– Short D-Q path
clk
MN1
ckd1
MN2
VDDH
VDDH
IV1
x
MP1
q (inv.)
qb
clk
ckd1
d
level conversion
db
ck
qb
Pulsed Precharge Latch
© IEEE 2003
Low Power Design Essentials ©2008
[Ref: F. Ishihara, ISLPED’03]
4.34
Case Study: ALU for 64-bit Processor
clock gen.
clk
ain0
ain
9:1
MUX
5:1
MUX
9:1
MUX
2:1
MUX
carry
gp
gen.
INV2
bin
: VDDH circuit
: VDDL circuit
carry
gen.
partial
sum
sum
sum
sel.
INV1
s0/s1
0.5pF
logical
unit
sumb (long loop-back bus)
© IEEE 2003
Low Power Design Essentials ©2008
[Ref: Y. Shimazaki, ISSCC’03]
4.35
Low-Swing Bus and Level Converter
VDDH
pc
VDDL
VDDL
sumb
sum
INV1
keeper
sel
(VDDH)
VDDH
ain0
INV2
domino level converter (9:1 MUX)
© IEEE 2003
INV2 is placed near 9:1 MUX to increase noise immunity
Level conversion is done by a domino 9:1 MUX
Low Power Design Essentials ©2008
[Ref: Y. Shimazaki, ISSCC’03]
4.36
Measured Results: Energy and Delay
Energy [pJ]
800
Room temperature
© IEEE 2003
700
600
500
400
300
200
0.6
1.16GHz
VDDL=1.4V
Energy:-25.3%
Delay :+2.8%
VDDL=1.2V
Energy:-33.3%
Delay :+8.3%
0.8
Low Power Design Essentials ©2008
1.0 1.2
TCYCLE [ns]
1.4
Single-supply
Shared well
(VDDH=1.8V)
1.6
[Ref: Y. Shimazaki, ISSCC’03]
4.37
Practical Transistor Sizing
Continuous sizing of transistors only an option in
custom design
In ASIC design flows, options set by available
library
Discrete sizing options made possible in
standard-cell design methodology by providing
multiple options for the same cell
– Leads to larger libraries (> 800 cells)
– Easily integrated into technology mapping
Low Power Design Essentials ©2008
4.38
Technology Mapping
a
b
f
c
d
slack=1
Larger gates reduce capacitance, but are slower
Low Power Design Essentials ©2008
4.39
Technology Mapping
Example: 4-input AND
(a) Implemented using 4 input NAND + INV
(b) Implemented using 2 input NAND + 2-input NOR
Gate
type
Library 1:
High-Speed
Library 2:
Low-Power
Area
(cell unit)
Input
cap. (fF)
Average delay
(ps)
Average delay
(ps)
INV
3
1.8
7.0 + 3.8 CL
12.0 + 6.0 CL
NAND2
4
2.0
10.3 + 5.3 CL
16.3 + 8.8 CL
NAND4
5
2.0
13.6 + 5.8 CL
22.7 + 10.2 CL
NOR2
3
2.2
10.7 + 5.4 CL
16.7 + 8.9 CL
(delay formula: CL in fF)
Low Power Design Essentials ©2008
(numbers calibrated for 90 nm)
4.40
Technology Mapping – Example
4-input AND
(a) NAND4 +
INV
(b) NAND2 +
NOR2
Area
8
11
HS: Delay (ps)
31.0 + 3.8 CL
32.7 + 5.4 CL
LP: Delay (ps)
53.1 + 6.0 CL
52.4 + 8.9 CL
Sw Energy (fF)
0.1 + 0.06 CL
0.83 + 0.06 CL
Area
– 4-input more compact than 2-input (2 gates vs. 3 gates)
Timing
– both implementations are 2-stage realizations
– 2nd stage INV (a) is better driver than NOR2 (b)
– For more complex blocks, simpler gates will show better
performance
Energy
– Internal switching increases energy in the 2-input case
– Low-power library has worse delay, but lower leakage (see later)
Low Power Design Essentials ©2008
4.41
Gate-Level Tradeoffs for Power
Technology mapping
Gate selection
Sizing
Pin assignment
Logical Optimizations
Factoring
Restructuring
Buffer insertion/deletion
Don’t care optimization
Low Power Design Essentials ©2008
4.42
Logic Restructuring
1
1
1
0
0
1
0
1
1
Logic restructuring to minimize spurious transitions
1
1
1
1
2
1
1
1
1
1
3
Buffer insertion for path balancing
Low Power Design Essentials ©2008
4.43
Algebraic Transformations
Idea: Modify network to reduce capacitance
p1=0.05
a
b
a
c
p3=0.075
f
p5=0.075
a
f
b
c
p2=0.05
p4=0.75
pa = 0.1; pb = 0.5; pc = 0.5
Caveat: This may increase activity!
Low Power Design Essentials ©2008
4.44
Lessons from Circuit Optimization
Joint optimization over multiple design parameters
possible using sensitivity-based optimization framework
– Equal marginal costs ⇔ Energy-efficient design
Peak performance is VERY power inefficient
– About 70% energy reduction for 20% delay penalty
– Additional variables for higher energy-efficiency
Two supply voltages in general sufficient; 3 or more
supply voltages only offer small advantage
Choice between sizing and supply voltage parameters
depends upon circuit topology
But … leakage not considered so far
Low Power Design Essentials ©2008
4.45
Considering Leakage @ Design Time
Considering leakage as well as dynamic
power is essential in sub-100 nm
technologies
Leakage is not essentially a bad thing
– Increased leakage leads to improved
performance, allowing for lower supply voltages
– Again a trade-off issue …
Low Power Design Essentials ©2008
4.46
Leakage – Not Necessarily a Bad Thing
1
Version 1
Vref
-180mV
th
0.8
ELk
max
E norm
0.81VDD
ESw opt
0.6
Version 2
0.4
Topology
Ld
ln
a avg
K
Inv Add Dec
(ELk/ESw)opt 0.8
Vref
-140mV
th
0.2
2
0.5
0.2
max
0.52VDD
© IEEE 2004
0 -2
10
-1
0
10
10
Estatic /Edynamic
1
10
Optimal designs have high leakage (ELk/ESw ≈ 0.5)
Must adapt to process and activity variations
Low Power Design Essentials ©2008
[Ref: D. Markovic, JSSC’04]
4.47
Refining the Optimization Model
Switching energy
Edyn a 01Ke S (g f )VDD
2
Leakage energy
Estat SI 0 (Y )e
VTH d VDD
kT / q
VDD Tcycle
with:
I0(Y): normalized leakage current with inputs in state Y
Low Power Design Essentials ©2008
4.48
Reducing Leakage @ Design Time
Using longer transistors
– Limited benefit
– Increase in active current
Using higher thresholds
– Channel doping
– Stacked devices
– Body biasing
Reducing the voltage!!
Low Power Design Essentials ©2008
4.49
Longer Channels
1.0
10
90 nm CMOS
0.8
9
8
Leakage power
0.7
7
0.6
6
0.5
5
0.4
4
Switching energy
0.3
3
0.2
2
0.1
100
110
120
130
140
150
160
170
180
190
Normalized switching energy
Normalized leakage power
0.9
10% longer gates
reduce leakage by
50%
Increases switching
power by 18% with
W/L = const.
1
200
Transistor length (nm)
Doubling L reduces leakage by 5x
Impacts performance
– Attractive when don’t have to increase W (e.g. memory)
Low Power Design Essentials ©2008
4.50
Using Multiple Thresholds
There is no need for level conversion
Dual thresholds can be added to standard design flows
– High-VTh and Low-VTh libraries are a standard in sub-0.18m
processes
– For example: can synthesize using only high-VTh and then only
in-place swap in low-VTh cells to improve timing.
– Second VTh insertion can be combined with resizing
Only two thresholds are needed per block
– Using more than two yields small improvements
Low Power Design Essentials ©2008
4.51
Three VTH’s
1.41.4
1.3
1
1.21.2
1.1
0.6
11
0.4
0.2
Vth1 (V)
0.8
VTH.2 (V)
Leakage Reduction Ratio
© IEEE 2002
0.9
0.80.8
0.7
0
1.5
0.60.6
0.5
1
1
0.5
0 0
1.5
0.5
0.40.4
+
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9 1
1 1.1
Vth2 (V)
1.2
1.2
1.3
1.4
1.4
VTH.3 (V)
VDD = 1.5V, VTH.1 = 0.3V
Impact of third threshold very limited
Low Power Design Essentials ©2008
[Ref: T. Kuroda, ICCAD’02]
4.52
Using Multiple Thresholds
Cell-by-cell VTH assignment (not at block level)
Achieves all-low-VTH performance with substantial
leakage reduction in leakage
FF
FF
FF
FF
FF
High VTH
Low Power Design Essentials ©2008
Low VTH
[Ref: S. Date, SLPE’94]
4.53
Dual-VT Domino
Low-threshold transistors used only in critical paths
Inv3
Inv2
Clkn+1
Clkn
P1
Dn+1
Dn
…
Inv1
Shaded transistors are
low threshold
Low Power Design Essentials ©2008
4.54
Multiple Thresholds and Design Methodology
Easily introduced in standard cell design
methodology by extending cell libraries with cells
with different thresholds
– Selection of cells during technology mapping
– No impact on dynamic power
– No interface issues (as was the case with multiple
VDD’s)
Impact: Can reduce leakage power substantially
Low Power Design Essentials ©2008
4.55
Dual-VTH Design for High-Performance Design
High-VTH
Only
Low-VTH
Only
Dual VTH
Total Slack
-53 psec
0 psec
0 psec
Dynamic
Power
3.2 mW
3.3 mW
3.2 mW
Static
Power
914 nW
3873 nW
1519 nW
All designs synthesized automatically using Synopsys Flows
Low Power Design Essentials ©2008
[Courtesy: Synopsys, Toshiba, 2004]
4.56
Example: High- vs. Low-Threshold Libraries
Leakage Power (nW)
8000
Selected combinational tests
130 nm CMOS
7000
6000
5000
LVth
LVth+HVth
HVth
HVth+LVth
4000
3000
2000
1000
0
i10
Low Power Design Essentials ©2008
des
C7552
seq
pair
AVER
[Courtesy: Synopsys 2004]
4.57
Complex Gates Increase Ion/Ioff Ratio
140
3
(90nm technology)
(90nm technology)
120
2.5
100
Ioff (nA)
Ion (A)
No stack
2
1.5
80
60
No stack
1
40
Stack
0.5
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
VDD (V)
Stack
20
1
0
0
0.1 0.2
0.3 0.4
0.5 0.6 0.7
0.8 0.9
1
VDD (V)
Ion and Ioff of single NMOS versus stack of 10 NMOS
transistors
Transistors in stack are sized up to give similar drive
Low Power Design Essentials ©2008
4.58
Complex Gates Increase Ion/Ioff Ratio
3.5
x 105
(90nm technology)
3
Ion/Ioff ratio
2.5
Stack
2
Factor 10!
1.5
1
No stack
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VDD (V)
Stacking transistors suppresses submicron effects
Reduced velocity saturation
Reduced DIBL effect
Allows for operation at lower thresholds
Low Power Design Essentials ©2008
4.59
Complex Gates Increase Ion/Ioff Ratio
Example: 4-input NAND
versus
Fan-in (4)
Fan-in (2)
With transistors sized for
similar performance:
Leakage of Fan-in(2) =
Leakage of Fan-in(4) x 3
(Averaged over all possible
input patterns)
Leakage Current (nA)
14
12
10
8
Fan-in (2)
6
4
2
0
Fan-in (4)
2
4
6
8
10
12
14
16
Input pattern
Low Power Design Essentials ©2008
4.60
Example: 32 bit Kogge-Stone Adder
factor 18
% of input vectors
© Springer 2001
Standby leakage current (A)
Reducing the threshold by 150 mV increases leakage of
single NMOS transistor by factor 60
Low Power Design Essentials ©2008
[Ref: S.Narendra, ISLPED’01]
4.61
Summary
Circuit optimization can lead to substantial
energy reduction at limited performance loss
Energy-delay plots the perfect mechanisms
for analyzing energy-delay trade-off’s.
Well-defined optimization problem over W,
VDD and VTH parameters
Increasingly better support by today’s CAD
flows
Observe: leakage is not necessarily bad – if
appropriately managed.
Low Power Design Essentials ©2008
4.62
References
Books:
A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer
Academic Publishers, 1st Ed, 1995.
D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002.
D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007.
J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed,
Prentice Hall 2003.
I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, MorganKaufmann, 1st Ed, 1999.
Articles:
R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power
Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002.
S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology
with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power
Electronics, San Diego, CA, pp. 90-91, Oct. 1994.
M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE
Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.
F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low
Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.
P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of
Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp. 786-793, Aug 1973.
T. Kuroda, “Optimization and control of VDD and VTH for low-power, high-speed CMOS design,”
Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.
Low Power Design Essentials ©2008
4.63
References
Articles (cont.):
H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J.
Solid-State Circuits, vol. SC-10, no. 2, pp. 106-109, Apr. 1975.
S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS
Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994.
D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True EnergyPerformance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39,
no. 8, pp. 1282-1293, Aug. 2004.
MathWorks, http://www.mathworks.com
S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its
applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp.
195-200, Aug. 2001.
T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS
Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2,
pp. 584-594, Apr. 1990.
Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf.
Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.
V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs
in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European SolidState Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.
M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable
supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37,
Feb. 1998.
Low Power Design Essentials ©2008
4.64