power modeling and leakage reduction

Download Report

Transcript power modeling and leakage reduction

Leakage Modeling and
Reduction
Amit Agarwal, Lei He et. al
Presenters: Qun Gu
Ho-Yan Wong
Courtesy of Lei He
Outline
Introduction
 Circuit level leakage reduction
 System level leakage reduction
 Coupled leakage and thermal simulation
and management

Power Trends
Circuit Power



Dynamic Power:
determined by circuit
performance requirement
etc. The percentage is
getting smaller.
Short_Circuit Power: Both
PU and PD circuit partially
conduct. Small percentage.
(<10%)
Leakage Power:
Increasingly important, and
many issues dependent,
such as device geometry,
temperature, doping,
processing and data pattern
dependent, etc. It is very
complicated and worthy to
study more to improve it.
Leakage Power Sources

Subthreshold leakage
Subthreshold
Leakage
Gate Leakage
Gate
Source
Drain
n+
n+
Reverse Biased
Junction BTBT
Bulk

Reverse Biased Junction
BTBT Leakage

Gate Leakage
Leakage Dependences
Circuit Techniques to Reduce
Leakage

Design Time Techniques
 Dual

threshold CMOS
Run Time Techniques
 Standby



Natural Transistor Stacks
Sleep Transistor (MTCMOS)
Forward/Reverse Body Biasing (VTCMOS)
 Active

Leakage Reduction Techniques
Leakage Reduction Techniques
Dynamic Vth Scaling (DVTS)
Dual Threshold CMOS
Adjust Vth approaches in fabrication:
• Adjustment of tox (the higher tox, the higher Vth)
How?
• Low Vth for critical path
• High Vth for non-critical path
Concerns:
• It is not so straigtht forward to do this. Sometime tradeoff exist
between high Vth and low Vth applications.
• Vth variation cannot be always success at low voltage supplies.
• Increasing the number of critical paths will sometimes hurt
circuit performance.
Natural Transistor Stacks
How?
• Reduce the leakage by stacking the devices.
Concerns:
• Trade off between speed and
power
• Data pattern determined
• Trade off with other leakage
power ( gate leakage)
Sleep Transistor (MTCMOS)
How?
Inserts an extra series connected transistor
(sleep transistor with high Vth) in the PU/PD
path of a gate and turns it ‘off’ in the standby
mode of operation.
Disadvantages:
• Increase area and delay
• Data retention problem
• Hard to turn on completely at very low
supply voltages
Improvements for MTCMOS -VRC



Virtual power/ground Rails Clamp
(VRC)
 Solves data retention problem
with diodes
 Virtual level changes are clamped
 Allow data to be retained in SRAM
arrays
Alternatives: Super cutoff CMOS (with
low Vth) (SCCMOS)
In standby mode, PMOS gate is
Vcc+0.4v, NMOS is Vss-0.4v to fully
cut off leakage.
Forward/Reverse Body Biasing (VTCMOS)
RBB (Reverse Body Bias): zero
body bias in active mode, a deep
reverse bias in standby mode.
Disadvantages:
• Increase PN junction reverse
leakage
• Scaling down technology worsen
short channel effects and weaken
the Vth modulation capability
FBB (Forward Body Bias): high Vth in
standby mode, forward body biasing to
achieve better current drive in active mode.
Disadvantages:
• Larger junction capacitance
• High body effect for stack devices
Technology improvement for high Vth:
• Different doping profile
• Higher work function materials
Dynamic Vth Scaling (DVTS)
How?
• When critical path replica frequency is less then reference CLK,
adjust bias to decrease Vth.
• Otherwise adjust bias to increase Vth.
Results:
• The lowest Vth is delivered (NBB-no body bias) if the highest
performance is required.
• When the performance demand is low, clock frequency is lowered
and Vth is raised via RBB to reduce the run time leakage power
dissipation.
Process Variation and Leakage
Variation Sources:
• Channel length
• Transistor width
• Oxide thickness
• Flat-band voltage
• Random dopant effect
The effects of larger
spread of leakage:
• Robustness of logic
circuits.
• Circuit design margin.
IDSAT and IOFF variation measured (150nm process).
Circuit Techniques for Compensation Process Variation:
• Adaptive body biasing for process compensation
• Process variation compensation in dynamic circuits
Adaptive Body Biasing for Process
Compensation
Due to the worsening parameter fluctuation:
• Some dies may not meet the target frequency.
• Others exceed the leakage power constraints.
How?
• The slow dies which fail to meet the desired frequency can be forward
body biased to improve performance which paying more leakage power.
• On the other hand, excess leakage dies can be reverse body biased to
meet the leakage power specifications.
Effects:
So adaptive body bias reduces the spread of the die frequency distribution
by 7X, compared to a conventional zero body bias.
Process Variation Compensation in Dynamic Circuits (I)
Dynamic Circuits need keepers to compensate leakage current to keep
data.
The consideration for keepers size:
• Unnecessary large keeper size will hurt circuit performance
• Excess leakage dies can not meet the robustness requirements
without enough keeper size.
Programmable
keeper size scheme:
A desired effective keeper
width can be chosen
among {0, W, 2W, …7W}
according to the control
bit.
Process Variation Compensation in Dynamic Circuits (II)
Simulation Results:
• 5X reduction in the number of robustness failing dies and 10%
improvement in average performance.
• Variation spread of the robustness and delay distribution is reduced
by 55% and 35%
System Level Leakage Reduction
Motivation
 Leakage characteristics and reduction
 Coupled leakage and thermal simulation
and management

 Power
and thermal simulation
 Dynamic power and thermal management
 Vdd scaling with cooling selection
Motivation
Leakage current has increased due to
scaling in Vt, L, and tox
 Leakage power becomes more important
due to high leakage devices and low
activity rates
 Leakage power depends greatly on
temperature

Power States at System Level
3 Power states defined at system level:
1. Active Mode – circuit in operation;
P= Pd + Ps
2. Standby Mode – circuit is idle but ready
to execute; P= Ps
3. Inactive Mode – circuit is deactivated by
leakage reduction techniques; P < Ps
System Level Leakage Power
Modeling
Early model:
Ps = Vdd * N FET * k design * Ileakage
 Later model, with application of 2 leakage
power reduction techniques (later):
Ps = Vdd * Ngate * Iavg

Leakage Power Characteristics

Minimum Idle Time (M.I.T)
M.I.T. = {Es-i + Ei-s – Pi * (ts-i + ti-s)} / (Ps – Pi)
Idle Period
 Leakage power reduction is useful only
when Idle Period > M.I.T.

Runtime Leakage Reduction for
Caches
Caches dissipate large amount of leakage
power due to large SRAM array structures
 Different techniques are developed to
reduce L1 cache Ps, e.g. DRI, SWAY
 Basic principle is to dynamically turn off
partial cache array structure

Ps Reduction for L2 Caches
L2 cache has much larger miss penalty, so
approach for L1 can not be directly applied
 Use VRC to reduce Ps , and use time-out
based control mechanisms to shutdown
L2-cache data portion
 Time out threshold could be fixed (FTO),
dynamic, or by feedback control (FCTO)

Ps Reduction for L2 Caches cont’d

FTO


Time out threshold is set as M.I.T.
FCTO


Adjust the time-out threshold with the proportionalintegral (PI) feedback controller
Update time-out threshold according to



N: L2 cache miss rate in previous time window
Told: Time-out threshold in previous time window
New timeout threshold T = Told + (N – Setpoint) *
Gain
Circuits for FCTO
Request
address:

Data word
Tag Index Block offset
Timeout controller
hit/miss
Hit?
Yes
Counter
=
Data
potion
Tag
potion
Wakeup
signal
Threshold
controller
Wakeup/
shutdown
signals
Check for tag match
Shutdown
signal
Threshold controller
Timeout
controller
hit/miss
Nmiss

Mux
-
X
setpoint
gain
Threshold
output
+
Threshold
register
Comparison of L2 Leakage Reduction

Time-out (FTO and FCTO) achieve much smaller
performance penalty

Targeting at 1% performance loss, FCTO obtains more
power reduction than FTO does.
Power reduction (%)
Benchmark
FTO
FCTO SWAY
go
52.21
63.80
li
12.92
equake
art
Performance penalty (%)
DRI
FTO
FCTO
SWAY
DRI
57.55
56.79
1.06
1.10
9.95
7.39
27.87
26.64
26.56
0.93
1.07
7.28
7.71
35.75
48.61
46.40
45.71
0.84
1.01
9.73
10.58
0.07
2.20
2.17
2.18
0.37
0.92
3.18
3.14
System Level Leakage Reduction
Motivation
 Leakage characteristics and reduction
 Coupled leakage and thermal simulation
and management

 Power
and thermal simulation
 Dynamic power and thermal management
 Vdd scaling with cooling selection
Temperature Aware Computing
Initial
conditions
(T, delay)
Performance simulator
(e.g. SimpleScalar, IMPACT)
uArch
Floorplan
packaging
Dynamic power estimation
(e.g. Wattch)
Leakage estimation
Coupled power and thermal simulator
(e.g. PTscalar, PowerImpact)
Workload
(e.g. Spec 2k)
Adjusted
conditions
(T, delay)
Temperature-aware
architecture techniques
(DVS, DTM,
reconfigurability
power model, GALS, etc)
Leakage Model with Temperature
Scaling

Exponential scaling based on BSIM3v3

Logic circuits in ITRS 100nm technology:
  1986.13Vdd  4396.09 
Ps  Ngate  Vdd  Iavg (T0 ,Vdd 0 )  T exp  

T


2

Memory units in ITRS 100nm technology:
  1986.13Vdd  4396.09 
Pl (T ,Vdd )  (5.30  1010 words  1.72  109 wordsize )  T 2  exp  
  Vdd
T


  711.92Vdd  3725.53 
Pc (T ,Vdd )  5.29  1010 words  wordsize  T 2  exp  
  Vdd
T


Delay with Vdd and Temperature Scaling
Based on SPICE level 1 model, transistor
saturation current Isat is proportional to



(Vdd  Vt )
T
100%

We obtain
delay(Vdd ,T ) 

1
Isat
Vdd  T 1.19

(Vdd  Vt )1.2
ITRS 100nm technology
Normalized gate delay

95%
T=100oC
90%
T=80oC
T=60oC
85%
80%
75%
1
1.1
1.2
Vdd (V)
1.3
Thermal Modeling

For the lumped RC thermal circuit




Thermal resistance Rth: the ability to remove heat to the ambient in
steady-state condition
Thermal capacitance Cth: capture the delay between a change in
power and the corresponding change in the temperature
Thermal time constant τ= Rth * Cth
Distributed model is needed for accurate solution
Coupled Power and Thermal
Simulation
Simulate time step ts < 0.5% of time
constant (~106 cycles) will give negligible
temperature and power calculation errors
 Clock gating reduces dynamic power and
also leakage energy
 Leakage energy changes with operation
temperature

Leakage Power at Different Temperature
Normalized total power
100%
80%
60%
100nm, 3.33GHz, 1.2V
40%
20%
0%
35 85 110 Dep
Benchmark art
35 85 110 Dep
Benchmark gcc
Temperature (oC)
Dynamic power


uP similar to DEC Alpha 21264 and with clock gating
Leakage differs by up to 2X between 80oC and 110oC


Leakage power
Differs for different applications too.
Coupled thermal and power simulation is a must
Thermal Runaway
Thermal runaway is caused by the positive
feedback loop between on-resistor,
temperature, and power
 Also a result of the interaction between
leakage power and temperature

 Component
temperature ↑leakage power ↑
exponentially  temperature ↑
 If cooling not adequate, both keep increasing
Thermal Runaway cont’d


Assume no throttling
and constant power
consumption,
conditions for thermal
runaway is equivalent
to d2T/dt2 > 0
Lowest temperature
to meet TR criteria is
runaway temperature
Dynamic Power and Thermal
Management (DPTM)


Goal: Maximize throughput subject to maximum
on-chip temperature constraint
For each time window = X cycles, stop or
throttle instruction fetch in X cycles
 0<=δ

<=1
Feedback controller (Proportional Integral) to
adjust δ:
 For


each time window, updateδ according to
Current maximum on-chip temperature
δ in previous time window
Dynamic Power and Thermal
Management (DPTM)
Fetch toggling toggles I-cache, I-TLB,
branch prediction and decode units
 Dynamic frequency scaling (DFS) and
Dynamic Voltage Scaling (DVS) adjust the
clock freq and Vdd  stall
 Activity migration move activities to
another component copy of lower
temperature

Need for Temperature Dependent
Leakage Model


Dynamic thermal
management using
fetch toggling with PI
feedback controller
Implemented 2
models: simple (fixed
Ps) and accurate (Ps
is temp. dependent)
Validation of PI-based DPTM

Compared with two practices:
 No dynamic management

Lower Vdd to avoid thermal violations
 Cooling
down
If reaching the thermal threshold, stop the
whole processor until the maximum
temperature is X oC lower than the threshold
 X = 5 in our experiments

Throughput (BIPS)
System Performance
5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0
Max throughput
1
1.1
1.2
1.3
Vdd (V)
Feedback control, Max T=80C
No management, Max T=110C
Simple cooling down, Max T=80C

DPTM by feedback control may improve throughput
by up to 11% compared to no DPTM case

DPTM allows designing for common workload but not
the worst case => thermal speculation
Active Cooling

Direct water-spray cooling
 Thermal
resistance 0.067 compare to 0.8 for
conventional heatsink

Microchannel with liquid coolant, …
Impacts of Water Cooling
0.4
Throughput (BIPS)
6
0.3
5
water cooling, Max T=60oC
4
0.2
3
2
1
0.1
Power efficiency (BIPS/W)
7
Air cooling, Max T=80oC
0
0
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Vdd (V)


Increases the maximum throughput by 30%
Improves power efficiency by 9% and slows
down the decay of power efficiency
References

Amit Agarwal et. al, “Leakage Mechanisms
and Leakage Control for Nano-Scale
CMOS Circuits”, Purdue University.

Lei He et. al, “System Level Leakage
Reduction Considering the
Interdependence of Temperature and
Leakage”, UCLA.