Transcript ppt
University of Connecticut
School of Engineering
Department of Electrical & Computer Engineering
Wattch: A Framework for ArchitectureLevel Power Analysis and Optimizations
Author: D. Brooks, V.Tiwari and M. Martonosi
Reviewer: Junxia Ma
May, 1st 2008
1
Contents
Overview of this Work
Power Modeling Methodology
Model Validation
Case Studies
Conclusions
2
Overview: motivation
Power is increasingly important in
modern processors
The need that power/performance
tradeoffs be made more visible
Circuit level power analysis tools are
slow and late in the design process
3
Overview: contribution
Wattch: a framework for analyzing and
optimizing CPU power consumption at
the architecture level;
Achieve 1000X or more faster than
layout-level power tools;
Maintain accuracy within 10% of
estimates from layout-level power tools
Based on parameterized power models
+ per-cycle resource usage counts
4
Overview: contribution
Scenario B: Compiler Optimizations
App
App
Binary
Binary 1
Config 1
Binary 2
Config 2
Common Cofig
SimPower
SimPower
SimPower
Watts-1
SimPower
Watts-2
Scenario A: Microachitectural tradeoffs
Watts-1
Watts-2
5
Overview: contribution
App
Binary
Config 1
SimPower
Watts-1
Additional Hardware?
Array
Structure?
Use Current
Models
Custom
Structure?
Estimate Power
of Structure
SimPower
Scenario C: Hardware Optimizations
Watts-2
6
Power Modeling Methodology-1
Foundation of
this work
Hardware
Config
Binary
Cycle-Level
Performance
Simulator
Parameterizable
Power Models
Cycle-by-Cycle
Hardware Access
Counts
Power
Estimate
Performance
Estimate
Overall Structure of the Power Simulator
7
Power Modeling Methodology-2
Main processor units:
Array Structure
Fully Associative Content-Addressable
Memories
Combinational Logic and Wires
Clock
Clocking
Load Cap
Pd CVdd af
frequency
2
Supply
Voltage
Switch
activity
8
Power Modeling Methodology-3
Equations for Capacitance of Critical Nodes
9
Array Structure
PreCharge
Wordline Driver
From
Decoder
Cell Access
Transistors
Num. of
Bitlines
Num. of
Wordlines
To Sense
amps
To Sense
amps
10
CAM Structure
Key sizing Parameters in this CAM:
The issue/commit width of the machine (W)
The instruction window size (impacts CAM’s height)
Physical register tag size (impacts CAM’s width)
11
Complex Logic Blocks
Two larger complex logic blocks
considered:
i): instruction selection logic;
ii): dependency check logic
For result buses: model the power
consumption of result buses by
estimating the length
For ALU: scale based on previous
research results
12
Clocking
Clocking network can be the most
significant source of power consumption
Sources of clock power consumption:
i) Global clock metal lines
ii) Global clock buffers
ii) clock loading
13
Common CPU hardware
structures and model type used
Use SimpleScalar’s hardware configuration Parameters as inputs
14
SimpleScalar Interface
The power models are interfaced with
SimpleScalar
keeps track of which units are accessed per cycle
records the total energy consumed for an application
SimpleScalar provides simulation environment with
out-of-order processors with 5-stage pipelines.
Speculative execution is supported
This work extended SimpleScalar to provide variable #
of additional pipestages between fetch and issue.
Assume 7 cycles of mispredict penalty
15
Conditional Clocking Styles
Power consumption of benchmarks with conditional clocking
on multi ported hardwares
16
Simulation Speed
For lower-level tools, running Power Mill
on a 64-bit adder for 100 test vectors
takes ~1 hr
In the same amount of time, Wattch can
simulate a full CPU running roughly 280M
SimpleScalar instrucitons and generate
both power and performance estimates!
17
Model Validation-1
Three methods of validation
Validation 1: Model Capacitance vs. Physical Schematics
Total capacitance values are within 6~11%
18
Model Validation-2
Validation 2: Relative power consumption by structure
The clock power model used by
Wattch is based on H-tree style
which was used in Alpha 21264,
not in Intel processors
Comparison for Pentium Pro
Comparison for Alpha 21264
19
Model Validation-3
Validation 3: Max power consumption for three CPUs
Maximum power, modeled vs reported
Average 30% lower than
reported; reflect systematic
underestimation
Configuration of Processors
20
Validation Summary
For capacitance estimates: ~10%
(validation 1)
Relative accuracy: 10~13% (validation 2)
Limitations
Don’t model all of the miscellaneous logic in real
microprocessors
Different circuit design styles can lead to different results
Most up-to-date industrial fabrication data is unavailable
The model will be most accurate when
comparing CPUs of similar fabrication
technology
21
Case Studies
Baseline Configuration
Of Simulated Processor
Use SPECint95 and SPECfp95
benchmark suites;
Benchmarks are compiled
using Compaq Alpha cc
compiler
For each program simulate
200M instructions
Metrics used:
i) Power
ii) Performance
iii) Energy
iv) Energy-Delay Product
22
Case Study –
A Microarchitectural Exploration
IPC for gcc
IPC for turb3d
Power for gcc
Energy-delay product for gcc
Power for turb3d Energy-delay product for turb3d
23
Case Study –
Power Analysis of Loop Unrolling
Effect of loop unrolling on performance and power
24
Case Study –
Power Analysis of Loop Unrolling
Detailed breakdown of power dissipation
25
Conclusions
This paper presents a simulator frame work for a wide
range of architectural and compiler evaluation;
Wattch has the benefit of low-level validation which
can help researchers do power modeling at
abstraction levels;
Wattch can provide feedback to compilers on power
consumption — power aware compiler
The design choices are slightly different when power
metrics are taken into account; Wattch is intended to
help explore these tradeoffs.
Wattch has limitations and need improvements
26