13-leakage-b - Texas A&M University

Download Report

Transcript 13-leakage-b - Texas A&M University

Computing with Leakage
Currents
Nikhil Jayakumar, Kanupriya Gulati, Rajesh Garg
and Sunil P. Khatri
ECE Department
Texas A&M University
1
Outline
 Sub-threshold circuits – the opportunity
 Challenges
 Process/temperature/voltage variations
 Energy minimization in sub-threshold circuits
 Re-claiming the speed penalty
 What’s next?
2
Introduction
 Power consumption has become a significant
hurdle for recent ICs
 Higher power consumption leads to
 Shorter battery life
 Higher on-chip temperatures – reduced operating
life of the chip
 There is a large and growing class of applications
where power reduction is paramount – not speed.
 Such applications are ideal candidates for subthreshold circuit design.
 OK, so what is sub-threshold design??
3
Sub-threshold Leakage
I dssub  I o 
W
e
L
 Vgs VT Voff

nvt




V 

 ds  
v
 1  e  t   when Vgs  VT




 As supply voltage scales down, the VT of the devices is
scaled down as well.
 A larger VT would reduce leakage but increase delay.
 Leakage increases exponentially with decreasing VT
 Until a few process generations ago, leakage power was
negligible compared to dynamic power
 But leakage power is now becoming comparable with
dynamic power. Ouch (three times).
 Can we turn this dilemma into an opportunity ?
4
The Opportunity
Traditional Ckt
Sub-threshold Ckt (Vb = 0V) Sub-threshold Ckt (Vb = VDD)
Process Delay(ps) Power(W) P-D-P(J) Delay  Power  P-D-P  Delay  Power  P-D-P 
bsim70
14.157 4.08E-05 5.82E-07 17.01X 308.82X 18.50X
9.93X 141.10X 14.43X
bsim100 17.118 6.39E-05 1.08E-06 24.60X 497.54X 20.08X
12.00X 100.96X
8.20X
 Compared traditional circuit with sub-threshold (obtained by
simply setting VDD < VT)
 Performed simulations for 2 different processes on a 21 stage
ring oscillator.
 Impressive power reduction (100X – 500X)
 Power-Delay-Product (P-D-P) improves by as much as 20X
 P-D-P is an important metric to compare circuit design styles
 Delay penalty of 10X – 25X can be reduced:
 By applying forward body bias (dynamic)
 By reducing VT values (static)
5
The Opportunity
 We also performed experiments with lower VT values.
 VT can be modified with no extra cost
VT
0.18
0.17
0.16
0.15
0.14
0.13
bsim70
Delay  Power  P-D-P 
16.15X 167.52X 10.41X
14.88X 151.99X 10.09X
13.78X 137.73X
9.95X
13.15X 124.59X
8.86X
12.43X 112.73X
9.40X
12.32X 101.85X
8.02X
VT
0.27
0.25
0.23
0.21
0.19
0.17
bsim100
Delay  Power  P-D-P 
23.32X 479.85X 20.60X
22.43X 464.33X 20.16X
21.02X 444.23X 20.05X
18.69X 400.89X 20.27X
18.42X 366.28X 18.98X
17.51X 323.26X 17.98X
 Delays improved, while the PDP improvement remained high.
6
Sub-threshold Logic
 Advantages
I dssub
 Vgs VT Voff
nvt
W 
 I o   e
L



V 

 ds  
v
 1  e  t  




 Circuits get faster at higher temperature. Hence no
need for expensive cooling techniques.
 Device transconductance is an exponential function
of Vgs which results in a high ratio of on versus off
current. Hence noise margins are near-ideal.
 Note that device is never “on”. It is just “off” or
“exponentially more off”, so to say
 Disadvantages
 Ids has an exponential dependence on temperature.
 Ids is highly dependent on process variations (such
as VT variations).
 Ids is small. This explains the delay penalty
7
Solving the Problem of
Delay Sensitivity to
Process, Voltage and
Temperature Variations
8
Our Solution
 We propose a technique that uses self-adjusting
body-bias to phase-lock the circuit delay to a beat
clock.
 Use a network of PLAs to implement circuits.
 Several PLAs in a cluster share a common Nbulk
node.
 A representative PLA in each cluster is chosen to
phase lock the delay of the PLAs to the beat clock
 If the delay is too high, a forward body bias is
applied to speed up the PLA.
 If the delay is low, the body bias is brought back
down to zero to slow down the PLA.
9
PLA structure
 We use precharged
NOR-NOR PLAs
as the structure of
choice.
 Wordlines run
horizontally.
 Inputs (and their
complements) and
the outputs run
vertically.
 Several PLAs in a
cluster share a
common Nbulk
node.
10
The Charge Pump
11
Effectiveness of the
Approach
 We simulated a single
PLA from 0ºC to
100ºC. Also applied
VT variations (10%)
and VDD variations
(10%).
 The light region shows
the variations on delay
over all the corners.
 The red region shows
the delays with the
self-adjusting bodybias circuit.
12
An Example Showing Phase
Locking
VDD change
0.2V to 0.22V
VDD change
0.22V to 0.18V
 This figure shows how
the body bias (and
hence the delay of the
PLA) changes with
changes in VDD.
 The adjustment is very
quick (within a few
clock cycles).
13
What about Energy
Minimization
Minimum Power does not mean
Minimum Energy…
We are interested in mimimum
energy operation given the
application scenario envisioned
14
What about Energy ??
 Minimizing VDD reduces power.
 But minimum VDD does not mean minimum Energy!
 There exists an optimum VDD for minimum Energy.
15
Finding the Optimum VDD
 While one level of
PLAs is Evaluating, the
others are Precharged.
 The Precharged PLAs
are consuming leakage
power.
 Hence optimum VDD
depends on logical
depth.
Energy  PchgingEnergydyn  EvaluatingEnergydyn  D 
 D  D  1 

  PchgedPowerstatic  EvaluatedPowerstatic 
2


16
The Optimum VDD
25ºC
100ºC
 The optimum VDD value increases with increased logical depth.
 The optimum VDD can vary with temperature (since the circuits
get faster with temperature).
 The optimum VDD can be estimated given the logical depth and
delay for each PLA.
17
Reclaiming Part of the
Speed Penalty
18
Micropipelining
Handshaking Logic
 For high-speed operation, a network of
PLAs can be implemented as an
Asynchronous Micropipeline.
 P1 triggers a precharge event
 P2 triggers an evaluate event
 Latency increases, but throughput
improves dramatically.
19
Micropipelining Results
Ckt
C432
C499
alu4
count
rot
apex6
C1908
c2670
c1355
c3540
c880
pair
Avg
Area (μ2)
Delay (ns)
Non-μ pipelined
2665
2665
3340
1315
3565
2890
4465
4015
3790
8290
2665
5140
μ
-
p
i
p
e
l
i
475
475
475
475
475
475
475
475
475
475
475
475
n
e
d
I
m
p
r
o
v
e
m
0.18
0.18
0.14
0.36
0.13
0.16
0.11
0.12
0.13
0.06
0.18
0.09
0.1533
e
n
t
N
o
n
-
μ
p
i
p
e
7392
9408
9408
3360
12768
16128
16128
22848
14112
45024
10752
43680
l
i
n
e
d
μ
-
p
i
p
e
l
i
n
10080
12096
12768
4032
21504
24192
24864
31584
20832
75936
14112
67200
e
d
I
m
p
r
o
v
e
m
e
n
1.36
1.29
1.36
1.20
1.68
1.50
1.54
1.38
1.48
1.69
1.31
1.54
1.4444
 We get an average speedup of 7X over a nonmicropipelined design.
 After this, sub-threshold circuits are slower by a factor of
1.5X -3.5X over their traditional (non micropipelined)
counterparts
20
t
Layout of the PLA
 Each PLA has 16 inputs, 14 outputs and 24 rows (cubes).
21
Ambient Light Powered ICs
 The approach lends itself to being powered by
energy scavenged from ambient light
 Early studies show that this is feasible
 New Cadmium Sulfide/Cadmium Telluride solar
panels achieve 0.09W/cm2. (Silicon panels produce
0.015 W/cm2)
 Estimated power consumption for a subthreshold
processor of this size is about 10mW.
 So the CdS/CdTe panel could power our processor
with a 9X safety margin
 Challenges include how to store energy (battery?
Supercapacitors? MIM capacitors?).
22
What next?
 Explore extensions to structured ASIC approaches
 Fabrication of a subthreshold design (in 2006)
 Mixed-signal – with small processor and
transceiver on a single die.
 Set up a small hardware lab for debug/diagnosis
 Validate the experiments we discussed
 Hope to use this test-chip to validate other ideas as
well.
 Develop a design methodology for subthreshold electronics, tuned for widespread use.
23
Summary
 Sub-threshold circuit design is promising due to
extreme low power.
 The delay phase locking approach helps sub-threshold
logic design overcome the hurdle of sensitivity to PVT
variations.
 This can help achieve a significant yield improvement.
 The study on optimum VDD for minimum Energy helps
to fix an optimum VDD for a given logical depth.
 Micro-pipelining helps bridge the delay gap.
 Sub-threshold design approaches are appealing for a
widening class of low power or energy applications.
 Goal : Help bring sub-threshold logic design into the
24
mainstream of VLSI technology.
Thank you!!
25