Power-Aware Compilation - University of Virginia

Download Report

Transcript Power-Aware Compilation - University of Virginia

Power-Aware Compilation
CS 671
April 22, 2008
Why Worry about Power Dissipation?
Battery
life
Thermal issues: affect
cooling, packaging,
reliability, timing
Environment
1
CS 671 – Spring 2008
Power Dissipation Trends
Power Density (W/cm2)
1000
Nuclear Reactor
100
Hot Plate
Pentium 2
10
Pentium Pro
Pentium
1
1980
2
Pentium 4 (Prescott)
Pentium 4
Pentium 3
386
486
1990
CS 671 – Spring 2008
2000
2010
Cooking-Aware Computing
3
CS 671 – Spring 2008
Intel vs. Duracell
16x
Processor (MIPS)
14x
Improvement
(compared to
year 0)
12x
Hard Disk (capacity)
10x
8x
Memory (capacity)
6x
4x
Battery (energy stored)
2x
1x
0
1
2
3
4
5
6
Time (years)
No Moore’s Law in batteries: 2-3%/year growth
4
CS 671 – Spring 2008
Environment
•Environment Protection Agency (EPA):
computers consume 10% of commercial
electricity consumption
• Includes peripherals, possibly also manufacturing
•Data center growth was cited as a contribution
to the 2000/2001 California Energy Crisis
•Equivalent power (with only 30% efficiency) for
AC
•CFCs used for refrigeration
•Lap burn
•Fan noise
5
CS 671 – Spring 2008
Where Does the Juice Go in Laptops?
6
CS 671 – Spring 2008
Now We Know Why Power is Important
What can we do about it?
Two components to the problem:
• #1: Understand where and why power is dissipated
• #2: Think about ways to reduce it at all levels of
computing hierarchy
• In the past, #1 is difficult to accomplish except at
the circuit level
• Consequently most low-power efforts were all circuit
related
7
CS 671 – Spring 2008
Power: The Basics
Dynamic “switching” power vs. Static “leakage”
power
• Dynamic power dominates, but static power
increasing in importance
• Trends in each
Static power: steady, per-cycle energy cost
Dynamic power: capacitive and short-circuit
• Capacitive power: charging/discharging at
transitions from 01 and 10
• Short-circuit power: power due to brief short-circuit
current during transitions.
• Most research focuses on capacitive, but recent
work on others
8
CS 671 – Spring 2008
Power Issues in Microprocessors
Capacitive (Dynamic) Power
Static (Leakage) Power
Vdd
VIN
Vin
VOUT
Vout
ISub
IGate
CL
CL
Di/Dt (Vdd/Gnd Bounce)
Voltage (V)
Current (A)
Temperature
9
CS 671 – Spring 2008
20 cycles
Minimum Voltage
Capacitive Power Dissipation
Capacitance:
Function of wire
length, transistor size
Supply Voltage:
Has been dropping
with successive fab
generations
Power ~ ½ CV2Af
Activity factor:
How often, on average,
do wires switch?
10
CS 671 – Spring 2008
Clock frequency:
Increasing…
Lowering Dynamic Power
Reducing Vdd has a quadratic effect
• Has a negative (~linear) effect on performance
however
Lowering CL
• May improve performance as well
• Keep transistors small (keeps intrinsic capacitance
(gate and diffusion) small)
Reduce switching activity
• A function of signal transition stats and clock rate
• Clock gating idle units
• Impacted by logic and architecture decisions
11
CS 671 – Spring 2008
Power vs. Energy
12
CS 671 – Spring 2008
Power vs. Energy
Power consumption in watts
• Determines battery life in hours
• Sets packaging limits
Energy efficiency in joules
• Rate at which energy is consumed over time
• Energy = power * delay (joules = watts * seconds)
• Lower energy number means less power to perform
a computation at same frequency
13
CS 671 – Spring 2008
Power vs. Energy Metrics
Power-delay Product (PDP) = Pavg * t
• PDP is the average energy consumed per switching
event
Energy-delay Product (EDP) = PDP * t
• Takes into account that one can trade increased
delay for lower energy/operation
14
CS 671 – Spring 2008
Low-Power Software Strategies
Code running on CPU
• Code optimizations for low power
CPU
Code accessing memory objects
• SW optimizations for memory
Cache
Data flowing on the buses
• I/O coding for low power
Compiler controlled power management
15
CS 671 – Spring 2008
Memory
Code Optimizations for Low Power
High-level operations (e.g. C statement) can be
compiled into different instruction sequences
– different instructions & ordering have different
power
Instruction Selection
• Select a minimum-power instruction mix for
executing a piece of high level code
Instruction Packing & Dual Memory Loads
• Two on-chip memory banks
– Dual load vs. two single loads
– Almost 50% energy savings
16
CS 671 – Spring 2008
Code Optimizations for Low Power
Reorder instructions to reduce switching effect
at functional units and I/O buses
• Cold scheduling minimizes instruction bus
transitions
Operand swapping
• Swap the operands at the input of multiplier
• Result is unaltered, but power changes significantly!
Other standard compiler optimizations
• Intermediate level: Software pipelining, dead code
elimination, redundancy elimination
• Low level: Register allocation and other machine
specific optimizations
17
CS 671 – Spring 2008
Code Optimizations for Low Power
Use processor-specific instruction styles
• on ARM the default int type is ~ 20% more efficient
than char or short as the latter result in sign or
zero extension
• on ARM conditional instructions can be used instead
of branches
18
CS 671 – Spring 2008
ARM vs. THUMB
ARM – 32-bit, requires fewer instructions
THUMB – 16-bit, more instructions
Switching between ARM/THUMB takes time
19
CS 671 – Spring 2008
Minimizing Memory Access Costs
Reduce memory access, better use of registers
• Register access consumes less power than memory
access
Easy way: minimize number of r/w operations
Cache optimizations
• Reorder memory accesses to improve cache hit rates
Can use existing techniques for high-performance
code generation
20
CS 671 – Spring 2008
Minimizing Memory Access Costs
Loop optimizations such as loop unrolling, loop
fusion also reduce memory power consumption
More effective: explicitly target minimization of
switching activity on I/O busses and exploiting
memory hierarchy
• Data allocation to minimize I/O bus transitions
– map large arrays with known access patterns to
main memory to minimize address bus transitions
– works in conjunction with coding of address
busses
• Exploiting memory hierarchy
– organizing video and DSP data to maximize the
higher levels (lower power) of memory hierarchy
21
CS 671 – Spring 2008
Observation: Execution-time Variation
Significant variation in execution time of
real-time tasks
But, variation is not random due to
correlation in underlying signal (speech,
sensor etc.)
Decode (usecs)
Speech CODEC Frame Decode Time
500
400
300
200
100
0
0
60
120 180 240 300 360 420 480 540 600 660
Real Time (msecs)
22
CS 671 – Spring 2008
Observation: Applications Tolerant to
Deadline Misses
E.g. sensor networks
Computation deadline misses lead to data loss
Packet loss common in wireless links
Significant probability of error in sensor signals
• noisy sensor channels
Applications designed to tolerate noisy/bad
data by exploiting spatio-temporal redundancy
• high transient losses acceptable if localized in time
or space
If the communication is noisy, and applications
are loss tolerant, is it worthwhile to strive
for perfect noise-free computing?
23
CS 671 – Spring 2008
Exploiting Execution-time Variation and
Tolerance to Deadlines
Idea: predict execution time of task instance
and dynamically scale voltage so as to minimize
shutdown
Execution time prediction
• learn distribution of execution times (pdf)
• provide hints
– MPEG decode can tell whether frame is P, I, or F
But, some deadlines are missed!
Adaptive control loop to keep missed deadlines
< limit
Provides adaptive power-fidelity trade-off
24
CS 671 – Spring 2008
Compiler-Controlled DVFS
MICRO’05 – Princeton
Use compiler to find (predict) large regions
where low frequency won’t hurt performance
25
CS 671 – Spring 2008
Sensor Network Compilation
PLDI 2007 – University of Pittsburgh
1 bit over the wire == 1000 executed
instructions
Rework binary “patches” to minimize difference
from original binary
26
CS 671 – Spring 2008
Power-Aware Compilation
Not all optimizations target performance
Power-aware optimizations are
• Most important on embedded systems
• Most effective on VLIW architectures
• Still present primarily in the research community
It’s important to rethink many of our notions of
“optimization”
27
CS 671 – Spring 2008