Transcript 7810-13

CS 7810
Lecture 13
Pipeline Gating: Speculation Control For
Energy Reduction
S. Manne, A. Klauser, D. Grunwald
Proceedings of ISCA-25
June 1998
Cost of Speculation
Mispredict rates 
9.9 12.2 23.9 10.4 6.9 4.6 11.3 1.7
Pipeline Gating
• Low confidence branches throttle instr fetch until they are resolved
• Pipeline gating usually lasts for fewer than five cycles
Metrics
• SPEC (specificity): fraction of all mispredicted
branches detected as low-confidence by the
confidence estimator (coverage)
• PVN (predictive value of a negative test): probability
of a low-confidence branch being incorrectly
branch-predicted (accuracy)
Confidence Estimators
• Perfect: to gauge potential benefits
• Static: branches that have low prediction rates
• JRS: if a branch has yielded N successive correct
predictions, it has high confidence
• Saturating counters: unbiased counter value or
disagreement in two predictors  low confidence
• Distance: mpreds are clustered, hence the first 4
branches after a mispredict have low confidence
SPEC and PVN
SPEC (coverage): mispred branches detected by low-confidence estimator
PVN (accuracy): % of low-confidence branches that are branch mpreds
• It is easier to achieve a high SPEC value than PVN
• A high PVN value can be achieved by using N low-confidence branches
to invoke gating – if PVN is 30%, re-defining low-confidence as two
low-confidence branches increases PVN to 51%
Perfect
Gating Results
Results
• Can gating improve performance? – only if cache
pollution is significant
• Less than 1% performance loss and up to 38%
reduction in extra work
• Energy consumption could go up – some work is
independent of number of executed instrs (clock
distribution) – incr. execution time can incr. Energy
• Pipeline gating should reduce power consumption
Results
CS 7810
Lecture 13
Cache Decay: Exploiting Generational Behavior
to Reduce Cache Leakage Power
S. Kaxiras, Z. Hu, M. Martonosi
Proceedings of ISCA-28
July 2001
Leakage Power Trends
• Circuit delay a 1/(V – Vth)
• Leakage a num transistors (incr)
supply voltage (decr)
(exp) low thresh. voltage (incr)
• L1 and L2 caches are the biggest
contributors (high transistor budgets)
Vdd-Gating
• Leakage can be reduced by gating off the
supply voltage to the circuit
• When applied to a cache, the contents of the
SRAM cell are lost
• Cache decay: apply Vdd-gating when you do not
care about cache contents
Lifetime of a Cache Line
Overheads
• Hardware to determine when to decay
• Introduces additional cache misses
• Normalized cache leakage power =
Activeratio (fraction of cache that is powered on) +
(Counter overhead : Leak) x activity +
(L2 access energy : Leak) x num-misses
• Increased execution time (< 0.7%)
• L2 access/leakage ratio is ~9
Skier’s Dilemma
New skis: $400
Ski rentals: $20
Heuristic: Buy skis after rental cost = purchase price
Ski trips:
Optimal:
Heuristic:
5
10
15
20
25 50
$100 $200 $300 $400 $400 $400
$100 $200 $300 $800 $800 $800
Likewise, decay a cache line when the cost of an
additional miss equals leakage dissipated so far
Tracking Dead Time
• Each line has a 2-bit counter that gets reset on
every access and gets incremented every 2500
cycles through a global signal (negligible overhead)
• After 10,000 clock cycles, the counter reaches
the max value and triggers a decay
• Adaptive decay: Start with a short decay period;
if you have a quick miss, double the period; if there
is no miss, halve the period
Results
Overheads
Other Results
• L2 cache is equally suitable to decay techniques
-- lifetimes are scaled by a factor of 10, an extra
miss also costs a lot more
• For their experiments, there is little interference
from multiprogramming
• Some instructions can easily be identified as
last touches to a cache block – potential for early
cache decay
• Can this apply to bpred, register file?
Title
• Bullet