power modeling and leakage reduction
Download
Report
Transcript power modeling and leakage reduction
Leakage Modeling and
Reduction
Amit Agarwal, Lei He et. al
Presenters: Qun Gu
Ho-Yan Wong
Courtesy of Lei He
Outline
Introduction
Circuit level leakage reduction
System level leakage reduction
Coupled leakage and thermal simulation
and management
Power Trends
Circuit Power
Dynamic Power:
determined by circuit
performance requirement
etc. The percentage is
getting smaller.
Short_Circuit Power: Both
PU and PD circuit partially
conduct. Small percentage.
(<10%)
Leakage Power:
Increasingly important, and
many issues dependent,
such as device geometry,
temperature, doping,
processing and data pattern
dependent, etc. It is very
complicated and worthy to
study more to improve it.
Leakage Power Sources
Subthreshold leakage
Subthreshold
Leakage
Gate Leakage
Gate
Source
Drain
n+
n+
Reverse Biased
Junction BTBT
Bulk
Reverse Biased Junction
BTBT Leakage
Gate Leakage
Leakage Dependences
Circuit Techniques to Reduce
Leakage
Design Time Techniques
Dual
threshold CMOS
Run Time Techniques
Standby
Natural Transistor Stacks
Sleep Transistor (MTCMOS)
Forward/Reverse Body Biasing (VTCMOS)
Active
Leakage Reduction Techniques
Leakage Reduction Techniques
Dynamic Vth Scaling (DVTS)
Dual Threshold CMOS
Adjust Vth approaches in fabrication:
• Adjustment of tox (the higher tox, the higher Vth)
How?
• Low Vth for critical path
• High Vth for non-critical path
Concerns:
• It is not so straigtht forward to do this. Sometime tradeoff exist
between high Vth and low Vth applications.
• Vth variation cannot be always success at low voltage supplies.
• Increasing the number of critical paths will sometimes hurt
circuit performance.
Natural Transistor Stacks
How?
• Reduce the leakage by stacking the devices.
Concerns:
• Trade off between speed and
power
• Data pattern determined
• Trade off with other leakage
power ( gate leakage)
Sleep Transistor (MTCMOS)
How?
Inserts an extra series connected transistor
(sleep transistor with high Vth) in the PU/PD
path of a gate and turns it ‘off’ in the standby
mode of operation.
Disadvantages:
• Increase area and delay
• Data retention problem
• Hard to turn on completely at very low
supply voltages
Improvements for MTCMOS -VRC
Virtual power/ground Rails Clamp
(VRC)
Solves data retention problem
with diodes
Virtual level changes are clamped
Allow data to be retained in SRAM
arrays
Alternatives: Super cutoff CMOS (with
low Vth) (SCCMOS)
In standby mode, PMOS gate is
Vcc+0.4v, NMOS is Vss-0.4v to fully
cut off leakage.
Forward/Reverse Body Biasing (VTCMOS)
RBB (Reverse Body Bias): zero
body bias in active mode, a deep
reverse bias in standby mode.
Disadvantages:
• Increase PN junction reverse
leakage
• Scaling down technology worsen
short channel effects and weaken
the Vth modulation capability
FBB (Forward Body Bias): high Vth in
standby mode, forward body biasing to
achieve better current drive in active mode.
Disadvantages:
• Larger junction capacitance
• High body effect for stack devices
Technology improvement for high Vth:
• Different doping profile
• Higher work function materials
Dynamic Vth Scaling (DVTS)
How?
• When critical path replica frequency is less then reference CLK,
adjust bias to decrease Vth.
• Otherwise adjust bias to increase Vth.
Results:
• The lowest Vth is delivered (NBB-no body bias) if the highest
performance is required.
• When the performance demand is low, clock frequency is lowered
and Vth is raised via RBB to reduce the run time leakage power
dissipation.
Process Variation and Leakage
Variation Sources:
• Channel length
• Transistor width
• Oxide thickness
• Flat-band voltage
• Random dopant effect
The effects of larger
spread of leakage:
• Robustness of logic
circuits.
• Circuit design margin.
IDSAT and IOFF variation measured (150nm process).
Circuit Techniques for Compensation Process Variation:
• Adaptive body biasing for process compensation
• Process variation compensation in dynamic circuits
Adaptive Body Biasing for Process
Compensation
Due to the worsening parameter fluctuation:
• Some dies may not meet the target frequency.
• Others exceed the leakage power constraints.
How?
• The slow dies which fail to meet the desired frequency can be forward
body biased to improve performance which paying more leakage power.
• On the other hand, excess leakage dies can be reverse body biased to
meet the leakage power specifications.
Effects:
So adaptive body bias reduces the spread of the die frequency distribution
by 7X, compared to a conventional zero body bias.
Process Variation Compensation in Dynamic Circuits (I)
Dynamic Circuits need keepers to compensate leakage current to keep
data.
The consideration for keepers size:
• Unnecessary large keeper size will hurt circuit performance
• Excess leakage dies can not meet the robustness requirements
without enough keeper size.
Programmable
keeper size scheme:
A desired effective keeper
width can be chosen
among {0, W, 2W, …7W}
according to the control
bit.
Process Variation Compensation in Dynamic Circuits (II)
Simulation Results:
• 5X reduction in the number of robustness failing dies and 10%
improvement in average performance.
• Variation spread of the robustness and delay distribution is reduced
by 55% and 35%
System Level Leakage Reduction
Motivation
Leakage characteristics and reduction
Coupled leakage and thermal simulation
and management
Power
and thermal simulation
Dynamic power and thermal management
Vdd scaling with cooling selection
Motivation
Leakage current has increased due to
scaling in Vt, L, and tox
Leakage power becomes more important
due to high leakage devices and low
activity rates
Leakage power depends greatly on
temperature
Power States at System Level
3 Power states defined at system level:
1. Active Mode – circuit in operation;
P= Pd + Ps
2. Standby Mode – circuit is idle but ready
to execute; P= Ps
3. Inactive Mode – circuit is deactivated by
leakage reduction techniques; P < Ps
System Level Leakage Power
Modeling
Early model:
Ps = Vdd * N FET * k design * Ileakage
Later model, with application of 2 leakage
power reduction techniques (later):
Ps = Vdd * Ngate * Iavg
Leakage Power Characteristics
Minimum Idle Time (M.I.T)
M.I.T. = {Es-i + Ei-s – Pi * (ts-i + ti-s)} / (Ps – Pi)
Idle Period
Leakage power reduction is useful only
when Idle Period > M.I.T.
Runtime Leakage Reduction for
Caches
Caches dissipate large amount of leakage
power due to large SRAM array structures
Different techniques are developed to
reduce L1 cache Ps, e.g. DRI, SWAY
Basic principle is to dynamically turn off
partial cache array structure
Ps Reduction for L2 Caches
L2 cache has much larger miss penalty, so
approach for L1 can not be directly applied
Use VRC to reduce Ps , and use time-out
based control mechanisms to shutdown
L2-cache data portion
Time out threshold could be fixed (FTO),
dynamic, or by feedback control (FCTO)
Ps Reduction for L2 Caches cont’d
FTO
Time out threshold is set as M.I.T.
FCTO
Adjust the time-out threshold with the proportionalintegral (PI) feedback controller
Update time-out threshold according to
N: L2 cache miss rate in previous time window
Told: Time-out threshold in previous time window
New timeout threshold T = Told + (N – Setpoint) *
Gain
Circuits for FCTO
Request
address:
Data word
Tag Index Block offset
Timeout controller
hit/miss
Hit?
Yes
Counter
=
Data
potion
Tag
potion
Wakeup
signal
Threshold
controller
Wakeup/
shutdown
signals
Check for tag match
Shutdown
signal
Threshold controller
Timeout
controller
hit/miss
Nmiss
Mux
-
X
setpoint
gain
Threshold
output
+
Threshold
register
Comparison of L2 Leakage Reduction
Time-out (FTO and FCTO) achieve much smaller
performance penalty
Targeting at 1% performance loss, FCTO obtains more
power reduction than FTO does.
Power reduction (%)
Benchmark
FTO
FCTO SWAY
go
52.21
63.80
li
12.92
equake
art
Performance penalty (%)
DRI
FTO
FCTO
SWAY
DRI
57.55
56.79
1.06
1.10
9.95
7.39
27.87
26.64
26.56
0.93
1.07
7.28
7.71
35.75
48.61
46.40
45.71
0.84
1.01
9.73
10.58
0.07
2.20
2.17
2.18
0.37
0.92
3.18
3.14
System Level Leakage Reduction
Motivation
Leakage characteristics and reduction
Coupled leakage and thermal simulation
and management
Power
and thermal simulation
Dynamic power and thermal management
Vdd scaling with cooling selection
Temperature Aware Computing
Initial
conditions
(T, delay)
Performance simulator
(e.g. SimpleScalar, IMPACT)
uArch
Floorplan
packaging
Dynamic power estimation
(e.g. Wattch)
Leakage estimation
Coupled power and thermal simulator
(e.g. PTscalar, PowerImpact)
Workload
(e.g. Spec 2k)
Adjusted
conditions
(T, delay)
Temperature-aware
architecture techniques
(DVS, DTM,
reconfigurability
power model, GALS, etc)
Leakage Model with Temperature
Scaling
Exponential scaling based on BSIM3v3
Logic circuits in ITRS 100nm technology:
1986.13Vdd 4396.09
Ps Ngate Vdd Iavg (T0 ,Vdd 0 ) T exp
T
2
Memory units in ITRS 100nm technology:
1986.13Vdd 4396.09
Pl (T ,Vdd ) (5.30 1010 words 1.72 109 wordsize ) T 2 exp
Vdd
T
711.92Vdd 3725.53
Pc (T ,Vdd ) 5.29 1010 words wordsize T 2 exp
Vdd
T
Delay with Vdd and Temperature Scaling
Based on SPICE level 1 model, transistor
saturation current Isat is proportional to
(Vdd Vt )
T
100%
We obtain
delay(Vdd ,T )
1
Isat
Vdd T 1.19
(Vdd Vt )1.2
ITRS 100nm technology
Normalized gate delay
95%
T=100oC
90%
T=80oC
T=60oC
85%
80%
75%
1
1.1
1.2
Vdd (V)
1.3
Thermal Modeling
For the lumped RC thermal circuit
Thermal resistance Rth: the ability to remove heat to the ambient in
steady-state condition
Thermal capacitance Cth: capture the delay between a change in
power and the corresponding change in the temperature
Thermal time constant τ= Rth * Cth
Distributed model is needed for accurate solution
Coupled Power and Thermal
Simulation
Simulate time step ts < 0.5% of time
constant (~106 cycles) will give negligible
temperature and power calculation errors
Clock gating reduces dynamic power and
also leakage energy
Leakage energy changes with operation
temperature
Leakage Power at Different Temperature
Normalized total power
100%
80%
60%
100nm, 3.33GHz, 1.2V
40%
20%
0%
35 85 110 Dep
Benchmark art
35 85 110 Dep
Benchmark gcc
Temperature (oC)
Dynamic power
uP similar to DEC Alpha 21264 and with clock gating
Leakage differs by up to 2X between 80oC and 110oC
Leakage power
Differs for different applications too.
Coupled thermal and power simulation is a must
Thermal Runaway
Thermal runaway is caused by the positive
feedback loop between on-resistor,
temperature, and power
Also a result of the interaction between
leakage power and temperature
Component
temperature ↑leakage power ↑
exponentially temperature ↑
If cooling not adequate, both keep increasing
Thermal Runaway cont’d
Assume no throttling
and constant power
consumption,
conditions for thermal
runaway is equivalent
to d2T/dt2 > 0
Lowest temperature
to meet TR criteria is
runaway temperature
Dynamic Power and Thermal
Management (DPTM)
Goal: Maximize throughput subject to maximum
on-chip temperature constraint
For each time window = X cycles, stop or
throttle instruction fetch in X cycles
0<=δ
<=1
Feedback controller (Proportional Integral) to
adjust δ:
For
each time window, updateδ according to
Current maximum on-chip temperature
δ in previous time window
Dynamic Power and Thermal
Management (DPTM)
Fetch toggling toggles I-cache, I-TLB,
branch prediction and decode units
Dynamic frequency scaling (DFS) and
Dynamic Voltage Scaling (DVS) adjust the
clock freq and Vdd stall
Activity migration move activities to
another component copy of lower
temperature
Need for Temperature Dependent
Leakage Model
Dynamic thermal
management using
fetch toggling with PI
feedback controller
Implemented 2
models: simple (fixed
Ps) and accurate (Ps
is temp. dependent)
Validation of PI-based DPTM
Compared with two practices:
No dynamic management
Lower Vdd to avoid thermal violations
Cooling
down
If reaching the thermal threshold, stop the
whole processor until the maximum
temperature is X oC lower than the threshold
X = 5 in our experiments
Throughput (BIPS)
System Performance
5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0
Max throughput
1
1.1
1.2
1.3
Vdd (V)
Feedback control, Max T=80C
No management, Max T=110C
Simple cooling down, Max T=80C
DPTM by feedback control may improve throughput
by up to 11% compared to no DPTM case
DPTM allows designing for common workload but not
the worst case => thermal speculation
Active Cooling
Direct water-spray cooling
Thermal
resistance 0.067 compare to 0.8 for
conventional heatsink
Microchannel with liquid coolant, …
Impacts of Water Cooling
0.4
Throughput (BIPS)
6
0.3
5
water cooling, Max T=60oC
4
0.2
3
2
1
0.1
Power efficiency (BIPS/W)
7
Air cooling, Max T=80oC
0
0
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Vdd (V)
Increases the maximum throughput by 30%
Improves power efficiency by 9% and slows
down the decay of power efficiency
References
Amit Agarwal et. al, “Leakage Mechanisms
and Leakage Control for Nano-Scale
CMOS Circuits”, Purdue University.
Lei He et. al, “System Level Leakage
Reduction Considering the
Interdependence of Temperature and
Leakage”, UCLA.