On the Limits of Leakage Power Reduction in Caches

download report

Transcript On the Limits of Leakage Power Reduction in Caches

On the Limits of Leakage
Power Reduction in Caches
Yan Meng, Tim Sherwood and Ryan Kastner
UC, Santa Barbara
HPCA-2005
Overview


Caches are good targets for tackling the
leakage problem
Much work has been done in this field
• Gated-Vdd
• [Powell 01], [Agarwal 02], [Roy 02], [Hu 02],
[Kaxiras 01], [Zhou 03], [Velusamy 02]
• Multiple supply voltages
• [Flaunter 02], [Kim 02,04], [Mudge 04]
• Others
• [Hu 03] , [Li 04], [Heo 02], [Hanson 01], [Li 03], [Bai
05], [Skadron 04], [Zhang 02], [Azizi et al. 03]
Research Question and Finding

What is the best leakage power saving
we could hope to achieve with existing
techniques?

Far more potential left for further
reducing leakage power in caches
Outline






Motivation
Definitions
Optimal approach
The generalized model
Experimental results
Conclusions
Motivation
Why to study leakage problem?
•
Leakage power: dominant source for power consumption as
technology scales down below 100nm
100%
Leakage Power/Total Power

80%
60%
40%
20%
0%
1999
2001
2003
2005
2007
2009
Year
Fig: Projected leakage power consumption as a fraction of the total power
consumption according to International Technology Roadmap for Semiconductor
Motivation

Why to tackle the
leakage problem
through caches?
• Caches : huge chip
•
area (50% 2005
[ITRS])
Major source for
leakage power
consumption
Alpha 21364 microprocessor die photo
[http://www.oracle.com/technology/products/rdb/pdf/2
002_tech_forums/rdbtf_2002_opt_on_alpha_mdr.pdf]
Motivation

How to tackle the problem with existing
techniques?
• Keep frequently accessed cache lines active to ensure
•
•

high performance
Turn off cache lines that are not used for a long time
Use low supply voltage to save power for the rest
What’s the best that the existing circuit and
architecture techniques could achieve?
How much room is left for further research?
Definitions – Cache Interval

Time between two successive accesses to the
same cache line
access(i)
access(i+1)
|Ii|
Time
Definitions --- Operating Modes



Active mode
•
•
Power on the whole cache line
No power saving
Sleep mode [Roy01, Hu01]
• Sleep/“turn off” transistors
• Lose data
• Refetch data with high
overhead
Voltage
Vdd
0
|Ii|
Active
Voltage
Vdd
0
Drowsy mode [Flautner02,Mudge04] Voltage
• Use low supply voltage to save Vdd
power when it is not needed
• Preserve data for fast reaccess Vddlow0
• Wake up to the high voltage
d1
and return data
*
s1
s2
s3 s4
|Ii|
Sleep
d2
Drowsy
d3
|Ii|
Choosing Operating Modes
|Ii|



Active mode
Sleep mode
Drowsy mode
?
Optimal Approach

Differences
• Studying optimality
• Combining all three modes to achieve the
maximal leakage power saving

Optimal policy
• Oracle knowledge of future address trace
• Applying the appropriate operating mode on
•
•
each cache interval
Obtaining optimal leakage power saving
Formal proof of the optimality
Inflection Points


Which mode to apply on each interval?
Active-drowsy inflection point a
• The least amount of time drowsy mode needs
to save energy

Sleep-drowsy inflection point b
• The time where sleep and drowsy modes
consume the same amount of energy
Selecting Operating Modes
with Inflection Points
I
Optimality
|I|?
a<|I|≤b
Active
Interval
Active
Mode
Drowsy
Interval
Drowsy
Mode
Sleep
Interval
Sleep
Mode
Calculating Inflection Points

Active-drowsy inflection point a
a  arg min{EDrowsy saving (t )  0}  d1  d 3
t

Sleep-drowsy inflection point b
Voltage
b  {t : EDrowsy (Vdd
t )  ESleep (t )}
EDrowsy 
ESleep 
P (d ) * d
Vdd
i 1, 2 , 3
L
i
i
low
 P ( s0) * sd1 C
i 1, 2 , 3, 4
L
i
Voltage
i
D
|Ii|
d3
Drowsy
CD
Voltage
Vdd
Vdd
*
0
|Ii|
Vddlow
0
d1
d2
Drowsy
d3
|Ii|
s1
s2
Sleep
s3 s4
Saving Leakage Power without Performance Degradation




Deriving the interval lengths with perfect
knowledge of the future address trace
Fetching any needed data just before it
is needed
Avoiding any performance impact
Taking into account the power cost of
just-in-time refetch CD
Saving Leakage Power without Performance Degradation
access(i)
access(i+1)
Active energy
Transition energy
Drowsy energy
Fetch energy
Energy consumption
due to system stall
Saved energy
(a) The active mode
(b) The sleep mode w/o perfect prefetching
Just before needed
(c) The sleep mode w/ perfect prefetching
(d) The drowsy mode w/o perfect prefetching
Just before needed
(e) The drowsy mode w/ perfect prefetching
The Generalized Model

Parameterized model
• Inputs
•
•

•
•
•
•
Wake-up latencies
Interval distribution
Leakage power of each state
Transition energy between states
Outputs
•
P(Active)
Can be extended to accommodate
future technologies and power
saving modes
http://express.ece.ucsb.edu/software/leakage.html
EAS
EAD
Optimal savings of OPT-Drowsy,
OPT-Sleep, and OPT-Hybrid
Publicly available
•
Active
EDA
Drowsy
P(Drowsy)
ESA
Sleep
P(Sleep)
Methodology


Core: Compaq Alpha 21264 [Kessler 99]
•
Memory
• 2-way L1 instruction and data caches, 64KB
• Unified direct mapped L2 cache, 2MB
• LRU replacement policy
Tools
• SimAlpha simulator
• HotLeakage
• Leakage power and dynamic cost
• Parameters: taken from HotLeakage

Averaged results over all benchmark applications
Calculating Inflection Points
Inflection points(Cycles)
Active-drowsy point
Sleep-drowsy point
180nm
6
103084
130nm
6
10328
90nm
6
5088
70nm
6
1057
• The sleep-drowsy point decreases from 180nm to 70nm
• Because the leakage power consumption increases while the
dynamic power consumption caused by an induced miss decreases
• Our approach can be parameterized and applied to many
•
other memory technologies
70nm, the most advanced technology, is used in the rest of
our study
Exploring the Upper-bound
Leakage power savings
100%
OPT-Sleep(10K)
OPT-Drowsy
Sleep(10K)
OPT-Hybrid
No
Turning
performance
Optimally
off cache
combining
penalty
lines after
with
forthree
waking
lengths
10Kmodes
cycles
up
greater
data
w/o
[Hu01]
than
performance
10K cycles
penalty
90%
80%
70%
60%
50%
average ammp
OPT-Drowsy
applu
Sleep(10K)
gcc
gzip
mesa
OPT-Sleep(10K)
L1 data cache
vortex
OPT-Hybrid
Research Finding
OPT-Drowsy
Sleep(10K)
OPT-Sleep(10K)
OPT-Hybrid
Leakage power savings
100%
90%
80%
70%
60%
50%
Instruction cache



Data cache
Larger leakage saving can be achieved for data cache
Drowsy and sleep modes each achieve fairly high savings
Savings are complementary: potential in combining drowsy and
sleep technologies
Conclusions

Why leakage?
•
•


Leakage: dominant source of power consumption as
technology scales down below 100nm
Caches: primary targets to tackle the problem
Optimal approach and software
•
•
•
Calculating the maximal leakage savings
Quantifying how much room left for improvement
Used to guide future power management policy research
Great potential in combining techniques
•
•
Optimally combining Active, Drowsy, and Sleep
The optimal approach reduces power dissipation
• Instruction cache: by a factor of 5.3
• Data cache: by a factor of 2