Transcript lecture-11

11 Low Power Techniques
Contents
1. Introduction
2. 왜 low power 인가?
3. Future Opportunities for Low-Power
4. How to reduce power
11.1
1. Introduction
1) Drivers for IC progress
Delay

FPGA
P

Power 
 

Size

Reliability-1
 


Full custom






Flexibility-1
(Programmability)
Design TAT
Cost
 Silicon is the winner, and among many, CMOS is the winner.
 So will it be at least for next 25 years.
11.2
There’s no show stopper! (in technology)
ex. 양자/열역학(min. switching energy, power dissipation)
전자기학(빛의 속도)
material, etc.
Except for Multi-Billion $ investment cost!
Moore’s law will keep being honored.
Why?
1. No insurmountable obstacle exists.
2. People believes & behaves accordingly.
 Huge opportunity exists only if we do good in exploiting
1) cross-breeding, co-utilization and co-development among
interactable technologies
2) Technology sharing using network
11.3
2) Big Picture : If power reduction is THE goal, you need to visit all areas to
achieve it.
Speed Power Designtime Feb. Cost Pgm’unability
algorithm
architecture
logic
circuit
device
process
material
S/W
11.4
Analogy : Vertical engineer vs. horizontal engineer
IF you want to sell graphic chip, you need to do anything to help achieve it,
from design, application to marketing, etc.
P-core wireless graphics giga-bit switch MPEG RAMBUS
marketing
application
Legal affairs(IP)
Horizontal engineer
Main facturing
verification
design
testing
simulation
Process tuning
Vertical engineer
11.5
2. 왜 low power인가?
1) Battery 기술 발전 slow ! : 5-8배 향상/200yrs
200년전 : 납축전지 25 watt.hour/kg
 now : lithium polymer 전지 : 200 watt.hour/kg
이에 비하면 반도체기술은 30년동안 106배(CPU속도)
매 3년마다 4배(Memory density)  Still wild wild frontier stretching
before us!
2) 열방출 문제 :
You don’t want big cooling tower for each IC’s !
3) Energy 절약 :
minimize the amount of energy consumption, and recirculation period,
otherwise our earth will be EXHAUSTED.
4) Convenience :
too many wires around : mess
11.6
3. Future Opportunities for Low-Power
1) PDA(Personal Digital Assistant)
telephone, pager, pen-based input, schedule keeper, audio/video
entertainment fax, video camera, data security with fingerprint and/or
voice recognition, speech recognition, appl. S/W, teleconferencing…
Appl.
Server
PDA
Base
Station(RF)
2) Tablet(descendent of current Notebook)
Function sharing
for “low-power”ing PDA
11.7
3) Virtual Reality(VR) headset for Games
: allows you to move around, only if there’s no wire.
: delegate complex processing to fixed server, while
performing only video decompression.
4) Military :
No chance for wires, No heavy batteries was your too busy.

Information warfare :
1) Soldier locates enemy tank using laser rangefinder with GPS
2) request(for airstrike) to control officers
3) aircraft nearby gets command
11.8
5) Pico-cell based home network for Games
FTTH
Satellite
xDSL
Cable
Phone &
TV
I/F
Home
automation
Home
cellular
A/V digital
network
PDA
cellular
video-phone
HDTV, VCR
Game
Camera, Printer
Temp control
security
Get all available service,
Allow all possible communications among home devices,
But with no messy wires.
11.9
6) Medical Uses
pace maker(implanted)
health monitor
hearing aids
7) GPS(for traveller/explorer, driver(car, ship, boat, soldiers …)
8) RF ID(for identifying people, animal, cars…)
passive type : resonant LC circuits
active type(no battery, draws RF power from RF field)
9) Smart Cards :
주민증, Cash drawing
encryption, COS(card OS)
11.10
4. How to reduce Power
 By all means possible, algorithm, S/W, architectures, data representation,
logic & circuits place & route, clock, process, library, material
1) algorithm :
adjusting # of taps(N) in FIR filters by measuring
noise power.
N=10
transfer
function
N=6(low power)
11.11
2) Software : similar to the case when reducing code
speed of eyecution

size & improving
instruction selection and ordering  compiler’s job
to minimize Bus switching

minimize memory space & access
(reduce cache miss)

codesign for low power

slow down clock

halt clock

lower VDD

Shut down
11.12
3) Architectures
 Parallel architecture
VDD/n, f/n
VDD
f

MUX
MUX
Switching Power
2
P1  CVDD f
2
VDD f P1
P2  (1  ) n  C 2   2
n
n n
For the same speed
CVDD
CVDD
1
t

2 ~
i
( VDD  VT )
VDD
 f  VDD
Sacrifice area for low power
11.13
 Pipelining
Latches
VDD
VDD
,f
n
f
P1  CVDD 2 f
VDD 2
P1
P2  (1   )C  (
) f  2
n
n
1
1
i) VDD가 로 되면 speed도 로 됨.
n
n
1
ii) pipeline stage 수를 n으로 하면 각 stage의 logic complexity는 로 되
n
고, 따라서 speed(throughput)가 n 배로 됨.
iii) speed는 그대로 유지 됨.(는 pipelining overhead,
ex: 각 stage delay의mismatch …. )
11.14
 BUS에서의 switching power 소모를 최소화:
CV 2 f
 Effective capacitance
 activity-driven bus placement
Decreasing (activity)
SRAM data
Phys.
Cap.
address bus mostly READ operation
mostly sequential access
 : small
Display
data
 : large
Distance from core to pads
priority for placing bus(route, layer)
11.15
 V(voltage swing) reduction
lowV
hi-V
High V
I/F
I/F
Large C
Small C
- low-swing bus
V
ex. GTL(Xerox)
V  01
. VDD
CTT(Mosaid)
JTL(Jedec)
LVTTL, LVCTT ….
- Charge-recycling bus
11.16
 BUS invert encoding :
- send inverted signals when majority of bits are switching, and deinvert.
Source
date
EX-OR
DATA bus
Received
data
Polarity signal
Polarity
decision logic
11.17
 F(frequency) lowering :
PLL
Multiply f by N
using PLL
before distribution
f/N master
clock
PLL
PLL
11.18
4) Data representation
 Gray code vs. binary 2’s(or 1’s) compl.
n
# of toggles ratio : Bn 2(2  1)

2
Gn
2n
 signed magintude
vs. 2’s compl.
Zero-crossing 시 sign-bit
Zero crossing 시 full switching
만 변함.
11.19
5) Logic
 Signal gating : masking unwanted switching activities from
propagating forward, causing unnecessary power dissipation.
 Additional power due to control signal generation should be small.
Frequency of control signal needs to be slower than the signal
frequency.
11.20
 Logic encoding ; binary vs. Gray code for counters
11.21
11.22
 State encoding
0.1
0.1
11
01
0.1
0.3
VS.
0.4
00
01
0.3
0.4
00
0.1
(M1)
0.1
11
0.1
(M1)
E(M1)  expectation of # of switchings per transition
=
= 2(0.3+0.4)+1(0.1+0.1)=1.6
E(M2)
1(0.3+0.4+0.1)+2(0.1)=1.0
- assigning don’t cares to either 1 or - for low switching
11.23
 Precomputation logic ;

saves power by masking uninfluential input signals into the combinational logic
with g(x), precomputation logic.

I.e., for the out put f(x), there may be some conditions under which f(x) is
independent of some set of input signals latched in R2, which can be disabled
according to g(x).
11.24
ex.) Binary comparator : f(A,B) = 1 if A>B
g(x) = AnBn
11.25
 Systematic method to derive a pre-computation function, g(x), given
f(x), R1 and R2
 Let f(p1, … pm, x1, …, xn) be Boolean function where p1,…, pm are
pre-computed inputs corresponding to R1, and x1,…,xn are gated inputs
corresponding to R2.
 Let fxi(fxi)be the Boolean function obtained by setting xi=1(xi=0) in f.

 Define Uxi f (= universal quantification of f w.r.t. xi )= fxi * fxi
 Then Uxi f = 1 implies f=1 regardless of the value of xi, because Uxif=1
means fxi=fxi=1 in the Shannon’s decomposition of f w.r.t. xi
f=xi*fxi +xi*fxi
11.26
 Let g1 = Ux1 Ux2 … Uxn f
Then g1 =1 implies that f=1 regardless of the values of x1 … xn.
I.e., g1=1 is one of the conditions where f is indep. of the input values
of x1 … xn.
 Similarly, g1 = Ux1 Ux2 … Uxn f
g0=1 implies that f=0 regardless of x1,…xn.
 Then g=g1+g0 is the pre-computation function.
I.e. if g = 1, we can disable the loading of x1,…xn into R2 because
output f is independent of gated inputs.
 G, computed this way, may not be the unique pre-computation function,
but it contains the most number of 1’s in its truth table among all precomputation functions.
11.27
 Examples 1)
Precomputation architecture based on Shannon’s decomposition;
f(x1,…,xn) = xi *fxi + xi*fxi
11.28
 Ex 2)
Latch-based pre-computation architecture:
11.29
6) Low Power Circuits
 Use static rather than dynamic
to avoid unnecessary precharge
 low static power

self reverse bias for reducing subthreshold current
VDD
S
I1
I2
Pc(Wc)
X

Word line drivers
lnID
act
stdby
S=0(active)
S=1(stdby)
Pdi
VGs
11.30
 Compromise between dynamic and leakage power dissipation
11.31
 Multi-VT(threshold) :
speed-critical part : low VT
power-critical part : high VT
- by back-gate bias : routing difficult
- by additional implant
 Adiabatic Computing :
Power dissipation is due to voltage
drop on R  reduce it!
C
by gradual rise & fall of inputs
 multi-step clock 파형
11.32
 Delay vs. power supply voltage(Td vs. VDD)
Td  VDD-1
11.33
 Power delay product(Energy) vs. delay for various circuits
11.34
7) Power reduction in clock network
 Why bother with clock network?

In synchronous circuit, clock is generally the highest frequency signal.

And, clock typically drives a large load as it has to reach many sequential
elements.

In alpha chip, power consumption in the clock network is 40% of total.
 Clock gating:

Most popular method for power reduction of clock signals

effective when some functional module(ALU, memory or FPU, etc) is not
required for some extended period.

Gated clock suffers additional gate
delay due to gating function.
11.35
 Reduced clock swing:

Conventional vs. half-swing clocking
11.36

Charge sharing circuit for half-swing clock
When CLK is low, VH 
C1  C A
 Vdd
C1  C4  C A  C B
When CLK is high, VH 
C2  C A
 Vdd
C2  C3  C A  C B
 VH  0.5 Vdd if CA=CB >> C1, C2, C3, C4
11.37

Simple charge sharing circuit
11.38
 Tri-state keeper circuit:

Floating node with its potential somewhere between GND and VDD is
noise-sensitive and can cause DC power dissipation in the fanin gate

Floating bus suppressor circuit
11.39
 Blocking gate

Fanin gates connected to a node floating( as it is powered down) can
experience large short-circuit current.
– Use a blocking NAND gate as below:
11.40
 Reduction of switching activity:

guarded evaluation:
– adding latches or blocking gates before C/L if its outputs are not used.
– Ex).
11.41

Careful bus multiplexing for +vely correlated data stream

Aggressive bus multiplexing for -vely correlated data stream
11.42
8) process :
conflict
 VDD reduction
 reduce VT
 Standby current를 줄인다.
 VT not too small
 leakage 전류 축소  junction profile, high subthreshold swing
 switching power 축소  parasitic C 축소
(high-speed와 같은 goal유지)
retrograded channel
trench
sidewall pacer for S/D implant
11.43
9) Library :
 Small size, various sizes for tr. sizing for delay balancing long intercon.
on low C-layer
to reduce glitch
large C
to reduce buffer size
small C
10) Material
low e inter-layer dielectric
low  material for intercon  copper
11.44