reading_group_feb10_..
Download
Report
Transcript reading_group_feb10_..
Single-ISA Heterogeneous
Multi-Core Architectures:
The Potential for Processor
Power Reduction
Rakesh Kumar, Keith I. Farkas,
Norman P. Jouppi, Parthasarathy
Ranganathan, Dean M. Tullsen
Presenter: Borys Bradel
1
Introduction
Different programs have different
requirements (e.g. ILP)
Extends to phases of a single program
Heterogeneous cores
Use core that matches the requirements
Reuse existing cores
Use multiple generations of the same
family of processors
2
Outline
Methodology
Experiments
Hardware
Assumptions
Power
Optimal – energy/energy delay product
Heuristic based – static/dynamic
Related Work
Conclusion
3
Single ISA Multi-Core Benefits
Small area overhead because of the
growth in core sizes between
generations
Clock frequencies of older cores would
scale with technology
P3 1 GHz = P4 1.4 GHz
Increased pipeline depth precisely because
could not scale
4
Hardware – Alpha Family
2 in order cores
EV4=21064
EV5=21164
2 out of order cores
EV6=21264
EV8-=21464 (multi thread support
removed)
5
Hardware Size
15% more area than
just using 21464
6
Assumptions
Can switch cores dynamically
Private L1 cache and common L2 cache
All cores use 0.10 micron technology
Single process executing on a single core at any one
time
2.1 GHz clock (=21264 0.35 micron 600 MHz)
Input voltage 1.2V
Cores shut down when idle
1000 cycle restart cost (staged, phase lock loop left
alone)
150 ms memory access
Stall cycles through CACTI
7
Core Configurations
8
Power Model
Use Wattch to account for activity based
dissipation
Use scaling and offset factors to account for
other factors
This hybrid model is closer to manufacturer’s
data points
Peak power: data sheets less L2 cache and
output pins
Typical power: scaled based on Intel chips
9
Power and Area Statistics
10
Performance Modeling
Use SMTSIM, a cycle accurate simulator
simpoint is used to identify
representative instructions of programs
and how many instructions need to be
fast forwarded
11
Varying Performance Ratio
12
Varying Energy Efficiency Ratio
13
Oracle Switching for Energy
Performance always within 10% of EV8-
14
Oracle Switching for Energy
15
Oracle Switching for Energy
Delay Product
Performance always within 50% of EV8-
16
Oracle Switching for Energy
Delay Product
17
Others
Voltage/frequency scaling – not as good
Static core selection
only EV6 and EV8- are used
Dynamic heuristic
Running average performance within 10%
Every 100 time intervals (100 million
instructions) cores are sampled for 5
intervals
Select best core based on sampling
18
Results for Heuristics
19
Results for Heuristics/Static Core
20
Related Work
Gating based power optimization
Cannot gate at a fine enough granularity
May still have leakage
This could be thought of as gating to
reduce capabilities of different units
Voltage and frequency scaling
Chip wide – one size does not fit all
Fine grained – granularity problems
21
Conclusions
Heterogeneous multi core architectures
reduce the energy-delay product
Using several cores from the same
family is good
More fine grained than other approaches
Reduces development/testing costs
Is it scalable?
Just use EV6??
22