pacs03 - University of Utah

Download Report

Transcript pacs03 - University of Utah

Hot-and-Cold:
Using Criticality in the Design
of Energy-Efficient Caches
Rajeev Balasubramonian, University of Utah
Viji Srinivasan, IBM T.J. Watson
Sandhya Dwarkadas, University of Rochester
Alper Buyuktosunoglu, IBM T.J. Watson
All Instructions are not Created Equal
Critical instructions – lie on the program critical path
Non-critical instructions – can be slowed without
increasing execution time
• Potential to improve cache performance (?)
[Srinivasan ’01] [Fisk ’99]
• Prioritization policies [Fields ’01] [Tune ’01]
• Energy-efficient ALUs [Seng ’01]
Energy-Delay Trade-Offs
• Example energy-delay trade-off techniques:
 Voltage scaling, transistor sizing, way
prediction, serial-access
 Gated-ground cells, high Vt
Normalized delay
Transistor sizing
Variable threshold voltage
1.7
1.6
1.5
1.4
Vt
1.3
1.2
1.1
1
1
1.2
1.4
1.6
1.8
Normalized dynamic energy
2
Normalized Normalized
Leakage
Delay
Low
8.5
0.88
Nominal
1
1
High
0.23
1.34
Exploiting Criticality
• Design two static banks –
 hot bank: fast and high power
 cold bank: slow and low power
• Instructions have to be classified as critical or not
and
• Data has to be placed in one of two banks
Energy-efficient ALUs are easier to handle as there is no
associated storage
Criticality Metric
Oldest-N: The N oldest instructions in the queue
are critical
 Younger instructions are likely to be on
mispredicted paths or can tolerate latencies
 N can be varied based on program needs
 Minimal hardware overhead
 Behavior comparable to more complex metrics
AM
vpr
vortex
twolf
parser
gzip
gcc
gap
eon
crafty
bzip
Percentage of loads with the same
behavior as the last invocation
Instruction Classification
100
90
80
70
60
50
40
30
20
10
0
Data Classification
Percentage of cache blocks
50
45
Exclusively critical
40
35
30
25
20
Exclusively non-critical
15
10
5
0
0-10%
10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90%
Percentage of critical accesses to a cache block
90100%
Hot-and-Cold Microarchitecture
Dispatch
Bank
Predictor
Issue Queue
Cold bank
Hot bank
Criticality
Counters
Placement
Predictor
L2
cy
c
co
ld
cy
c
co
ld
6
cy
c
ba
se
;
c;
6
h-
cy
c
co
ld
4
cy
c
cy
c
ba
se
;
c;
4
h-
2
pe
na
lti
es
c;
2
h-
c;
no
h-
ba
se
;
HM of IPCs
Performance Results
1.3
1.25
1.2
1.15
1.1
1.05
1
Energy Results
700
cold-bank
hot-bank
L1 energy (pJ/instr)
600
500
400
300
200
100
0
base case
h-c; cold=0.6
h-c; cold=0.2
Results Summary
• Bank mispredict rate of 9.5%
• Criticality mismatch rate of 26%
• Performance loss = 2.7% (data reorganization)
+ (0.8 x slowdown)
• L1 cache energy savings of 37%
Related Work
• Recent split-cache organization by Abella and
Gonzalez [ICCD’03]
Base
Fast
Slow
• Data allocation based on criticality of accessing
instruction
Conclusions
• Data and instruction classification is reasonably
accurate
• Overhead from contention is non-trivial
• Results are worthwhile in limited settings
The use of criticality for data cache reorganization
yields little benefit