Low Power Design of Integrated Systems

Download Report

Transcript Low Power Design of Integrated Systems

Low Power Design
of Integrated Systems
Assoc. Prof. Dimitrios Soudris
[email protected]
Technology Directions:
SIA Roadmap
Year 1999 2002
Feature size (nm)
Logic trans/cm2
Cost/trans (mc)
#pads/chip
Clock (MHz)
Chip size (mm2)
Wiring levels
Power supply (V)
High-perf pow (W)
Battery pow (W)
180
6.2M
1.735
1867
1250
340
6-7
1.8
90
1.4
130
18M
.580
2553
2100
430
7
1.5
130
2
2005
2008 2011
2014
100
39M
.255
3492
3500
520
7-8
1.2
160
2.4
70
84M
.110
4776
6000
620
8-9
0.9
170
2.8
35
390M
.022
8935
16900
900
10
0.5
183
3.7
50
180M
.049
6532
10000
750
9
0.6
175
3.2
Technology Directions:
Technology
Process2002
Evolution
SIA Roadmap
Transistors
#Transistors
Frequency
Performance
Performance
Power
Power Consumption
consumption
Power Terminology
• Power is the rate at which energy is delivered
or exchanged
» electrical energy is converted to heat energy
during operation
• Power Dissipation - rate at which energy is
taken from the source (Vdd ) and converted
into heat
Why Smaller Power?
• Large Market of Portable devices
– e.g. laptops, mobile phones
• Achieve larger transistor integration
– Pentium IV contains 42 million transistors
– Teraflops chip contains 1.9 billion
transistors
• Need for “green” computers
– 10% of total electrical energy consumed by
PCs
Battery Technology Improvements
The Industry’s Reaction
• Reduce chip capacitance through process scaling
==> Expensive
• Reduce Voltage levels from 5V  3.3V 2V
==> Industry is hard to move (microprocessors,
memory,...)
• Better Circuit Techniques
==> Gated clocks, Power-Down of non-operational
units…
• Example: IBM 80 MHz PowerPC RISC (3 W @ 3.3V)
–Power Management Logic determines activity on per cycle basis
–Clocks of idle blocks are turned off  12-30% savings
–Doze - Nap and Sleep mode (5 mW)
Example: Intel Pentium-II processor
• Pentium-1: 15 Watt (5V - 66MHz)
• Pentium-2: 8 Watt (3.3V- 133 MHz)
Where Does Power Go in CMOS?
• The power consumption in digital CMOS circuits
Pavg = Pdynamic + Pshort-circuit + Pleakage
• Dynamic Power Consumption
Charging and Discharging Capacitors
• Short Circuit Currents
Short Circuit Path between Supply Rails during Switching
• Leakage (Static)
Leaking diodes and transistors
Present & Future in Power
Consumption
Dynamic Power Consumption(1)
2
Pdynamic  CL Vdd
N f
•
where VDDV supply voltage, CL Vcapacitance, N is the
average
Vdd
dd
dd
number of transitions per clock cycle, and f frequency operation
Charging
current
IN
OUT
OUT
CL
(a)
CL
(b)
OUT
CL
Discharging
current
(c)
Dynamic Power Consumption (2)
• For technologies up to 0.35 m, the dynamic
consumption is about 80% of the total consumption
• Goal ===> reduce dynamic power consumption
–
–
–
–
–
reduction capacitance
reduction of supply voltage
reduction of frequency
reduction of switching activity
or combination of above factors
Leakage current consumption
• the reverse-bias diode leakage at the transistor
drains and
• the sub-threshold current through an turned-off
transistor channel
Log ID
gate
Subthreshold
region
10-3
p+
p+
n-type substrate
leakage
current
reversed-biased diode
(drain-substrate)
Saturated
region
10-5
10-7
Decreasing VD S, Vdd
10-9
10-11
+
Vdd
The leakage of a reverse-biased pMOS transistor.
10-13
10-15
0
0.5
1
1.5
2
VGS , volts
Subthreshold leakage with respect to gate-source
voltage
The Design Flow
System
Specif ications
System
Specif ica tions
System -Level Design
A rchitecture-Level
Design
System-Level Design
System-Level
Analysis/Estim ation
Pow e r mode ls
for Syste m-le ve l
c ompone nt s
Architecture-Level
Design
Architecture-Level
Analysis/Estim ation
Pow e r mode ls
for mac r oc e lls,
c ontr ol logic
Log ic-Level Design
Logic-Level Design
C ircuit-Level Design /
Layout synthesis
(a)
Logic-Level
Analysis/Estim ation
Circuit-Level Design /
Layout synthesis
Circuit-Level
Analysis/Estim ation
(b)
Pow e r mode ls
for gate s, c e lls
Power savings in terms of the design level
System level
10-20 x
RT level
2-5 x
Logic level
Transistor level
20-50%
Layout level
Increasing power savings
B ehavior level
Lower Vdd Increases Delay
7.50
7.00
multiplier
2.0m technology
clock generator
NORMALIZED DELAY
6.50
Td =
6.00
5.50
5.00
CL * Vdd
I
I ~ (Vdd - Vt)2
4.50
4.00
3.50
ring oscillator
3.00
Td(Vdd=2)
2.50
2.00
1.50
1.00
microcoded DSP chip
Td(Vdd=5)
adder
adder (SPICE)
2.00
4.00
V dd (volts)
(2) * (5 - 0.7)2
=
(5) * (2 - 0.7)2
 4
6.00
Relatively independent of logic function and style.
NORMALIZED POWER-DELAY PRODUCT
Reducing Vdd
1.5
P x td = E t = CL * Vdd 2
1.00
0.70
0.50
0.30
0.20
quadratic dependence
0.15
0.1
E(Vdd=2)
E(Vdd=5)
(CL) * (2)2
=
(CL) * (5)2
51 stage ring oscillator
0.07
E(Vdd=2)  0.16 E(Vdd =5)
0.05
8-bit adder
0.03
1
2
5
Vdd (volts)
Strong function of voltage (V 2 dependence).
Relatively independent of logic function and style.
Power Delay Product Improves with lowering VDD.
Lowering the Threshold
Delay
I
2V t
Vdd
D
Vt = 0
Vt = 0.2
V GS
Reduces the Speed Loss, But Increases Leakage
Interesting Design Approach:
DESIGN FOR PLeakage == PDynamic
Transistor Sizing for Power
Minimization
Lower Capacitance
Higher Voltage
Small W/L’s
Large W/L’s
Higher Capacitance
Lower Voltage
Larger sized devices are useful only when interconnect dominated.
Minimum sized devices are usually optimal for low-power.
Techniques to reduce supply voltage
Algorithm
Transformation to exploit
concurrency
Architecture
Parallelism and Pipelining
Circuit/Logic
Transistor Sizing, Fast Logic
Structures
Technology
Threshold Voltage Reduction,
Feature Size scaling
Techniques to minimizing the
switched capacitance
System
Algorithm
U
Partitioning, Power-down, power states
Complexity, Concurrency, Regularity,
Locality, Data representation
Architecture
Concurrency, Instruction set selection,
Signal correlations,
Data representation, Data Encoding
Circuit/Logic
Transistor sizing, Logic optimization,
Power down, Layout Optimization
Technology
Advanced packaging, SOI
Power consumption of transfer and storage
over datapath operations both in hardware
[Men95] and software [Tiw94, Gon96] .
9
10
4.4
relative energy
relative energy/operation
33
0.4
0.2
3.6
1
0.0
t
d)
te) cess
c
i
r
a
e
r
e
ess
e
l
i
c
c
l
r
e
w
c
(
(
p
A
-s
lti AM
O
M
ry
yA
/
u
r
I
r
A
a
o
l
it M 6 SR 6 SR rna
em
it c
b
b
e
1
1
M
t
6
x
x
1
16
Ex
28
bit
28
1
1
6
1
8x
8x
S
t
SC
ec s
I
n
R
ts
on ock
r
n
c
e
e
h
n
cl
ter
Ot mpo
In
co
ag
r
o
t
e
Architecture Power Optimization
Techniques
• Architecture-driven voltage reduction: The key idea is to
speed up the circuit in order to be able reduces voltage while
meeting throughput rate constraints. Voltage reduction can
be achieved by introducing parallelism in hardware or
inserting flip-flops
• Switching activity minimization: Try to prevent the
generation and propagation of spurious transitions or to
reduce the number of transitions, e.g. retiming, path
balancing, data representation
• Switched capacitance minimization: Aim at the minimization
of switched capacitance
• Dynamic power management: Under certain conditions, a
circuit part becomes inactive, avoiding unnecessary
calculations, e.g. gated clocks, operand isolation, precomputation, and guarded evaluation
Architecture Trade-offs:
Reference Data Path
•
•
•
•
Critical path delay  Tadder + Tcomparator (= 25ns),  fref = 40MHz
Total capacitance being switched = Cref
Vdd = Vref = 5V
Power for reference datapath = Pref = Cref Vref2 fref
Voltage Reduction Technique:
Parallelism
• The clock rate can be reduced by half with the same throughput
 fpar = fref / 2
• Vpar = Vref / 1.7 Cpar = 2.15 Cref
• Ppar = (2.15 Cref ) (Vref /1.7)2 (fref /2)  0.36 P ref
Voltage Reduction Technique:
Pipeline
• fpipe = fref, Cpipe = 1.1 Cref, Vpipe = Vref /1.7
• Voltage can be dropped while maintaining the original
throughput
• Ppipe = Cpipe Vpipe2 fpipe = (1.1 Cref ) (Vref /1.7)2 fref = 0.37 Pref
Comparisons
Logic Style and Power Consumption
• Power-delay product improves as voltage decreases
• The “best” logic style minimizes power-delay for a given delay
constraint
The concept of gating clock signals
<
X
comparator
output
Y
gated
clock
B
clock
<
A
scheme 1
0
0
clock
1
0
gated clock
(scheme 1)
<
REG
clock
0
gated
clock
clock
scheme 2
gated clock
(scheme 2)
0
1 clock period
(a)
(b)
(c)
Resource Sharing Can Increase
Activity
Reducing Effective Capacitance
Global bus architecture
Local bus architecture
Shared Resources incur Switching Overhead
Data representation
• Sign-extension activity significantly reduced using
sign-magnitude representation
Switching Activity in Adders
Switching Activity in Multipliers
Signals and Operations Reordering
• Example: complex multiplication
Trading a multiplication for an addition
Xr
Xr
Ar
Xi
x
x
Xr
Ai
Ai
Xi
x
x
Xr
+
x
x
Xi
Ar
Ai-Ar
-
Xi
Ar
x
+
Yr
+
-
Yi
Yr
Yi
(a)
(b)
Ai+Ar
Module Selection
*ii
*i
*iii
*ii
*i
+i
*iii
+i
+ii
+ii
(a)
ripple
adder
array
multiplier
Area=16185
Latency=60 ns
Power=18540μW
Area=2744
Latency=30 ns
Power=1199μW
carry
l ooh ah e ad
adde r
Area=3959
Latency=20 ns
Power=1467μW
(c)
RTL
Library
*ii
*i
+
wallace
multiplier
(b)
Area=18443
Latency=40 ns
Power=23545μW
+ii
(d)
*iii
Glitching activity reduction (3)
x
y
x
a
c
b
d
0
1
0
1
y
a
b
c
0
Power Consumption:
Without glitches: 823.9 μW
With glitches: 1650 μW
Function
if (x < y) then
z=c+d
else
z=a+b
z
ARCHITECTURE 1
Power Consumption:
Without glitches: 951.7 μW
With glitches: 1357.7 μW
d
1
z
ARCHITECTURE 2
Two-Level Logic Circuits
Switching Activity Minimization (1)
• Taking into account the static and transition
probabilities (i.e. temporal correlation) of the primary
inputs, we can insert in certain gates of the first logic
level (i.e. AND gates), additional input signals
resulting into reduced switching activity
• Appropriately-selected input signals force the
outputs of the AND gates to logic level zero for a
number of combinations of the binary input signals
Two-Level Logic Circuits Switching
Activity Minimization (2)
• Example: F  x0 x1  x0 x2  x0 x3
• Signal x3 exhibits low-transition probability and
high static-1 probability, while the signals x0 , x1,
and x2 are characterized by high-transition
probabilities
x0
x1
x0
x2
x0
x3
g1
g2
g3
y1
y2
y3
Intial Logic Circuit
x3
gg44
F
x0
x1
x0
x2
x0
x3
g1
g2
g3
y1'
y2'
y3'
Modified Logic circuit
g4
F'
Additional Info
•
•
•
•
•
•
•
A. Chandrakasan and R. Brodersen, “Low Power CMOS Design”,
Kluwer Academic Publishers, 1995
Christian Piguet, Editor, « Low-Power Electronics Design”, CRC
Press, November 2004
D. Soudris, C. Piguet, C. Goutis, “Designing CMOS Circuits for LowPower”, Kluwer Academic Press, October 2002
F. Catthoor, K. Danckaert, et. al.: 2002, Data Access and Storage
Management for Embedded Programmable Processors. Kluwer
Academic Publishers
Stamatis Vassiliadis and Dimitrios Soudris, “Fine- and CoarseGrain Reconfigurable Computing” Springer,
Dordrecht/London/Boston, August 2007
http://vlsi.ee.duth.gr/~dsoudris
AMDREL website  http://vlsi.ee.duh.gr/amdrel