Transcript Talk Slides
Energy Conservation and Adaptation
in
Mobile and Embedded Systems
Faculty: Kang G. Shin
Grad students: Babu Pillai
Hai Huang
Sabbatical leave from
Real-Time Computing Laboratory
University of Michigan
http://www.eecs.umich.edu/~kgshin
My Current Thrust Areas
Wired and Wireless QoS Networking
Internet QoS with DiffServ, MPLS, overlay networks
Dependable real-time protocols
Wireless QoS and security protocols
Embedded Systems
Model-based integration of application SW
Modeling and integration of OS and network services
Energy-aware real-time OS and applications
Embedded systems networks
Internet Servers and Adaptware
Workload differentiation and protection
Overload detection and avoidance
DDoS attacks
Outline
Motivation
Energy-efficient real-time OSs
Real-time dynamic voltage scaling
Energy-aware QoS
Conclusions
Motivation
Handheld and mobile comp and comm devices
are everywhere!
Increasingly complex SW and faster HW demand
more energy
Rapid increases in HW complexity, speed, and
power consumption, but battery technology is not
keeping up
Need to conserve energy, improve computational
efficiency via OS on power-constrained systems
Real-Time & Energy Constraints
Many power-constrained embedded or mobile
systems have real-time tasks
Time-critical computations, typically periodic
Need to provide guarantees for meeting deadlines
Available stored energy fundamentally limits
system runtime
Need to use energy efficiently, allocate to the
most critical or desirable computations, while
meeting real-time requirements
Real-Time Task Characteristics
Typically well-defined task set
Canonical real-time task, Ti:
Is periodic, with period Pi
Has worst-case execution time (WCET), Ci
Has relative deadline, di typically equal to Pi
Periodic model can accommodate aperiodic and
sporadic tasks
Schedulability of RT systems well-studied
Energy-Efficient RTOS
Reduce overhead services =>
lower computational overhead
=>lower CPU power consumption
Optimized IPC for periodic RT tasks
Combined Static Dynamic (CSD) scheduling
Protocol stack layer-bypassing
Eliminate naming services
Reported at ACM SOSP’99, NOSSDAV’00, USENIX’02
Exploit HW mechanisms, e.g., voltage scaling of
CPU, power management of memory subsystem
Memory Power Management
Goal: reduce power dissipation for memory access
Main memory consists of multiple devices, each
with independently-controlled power states
Switch devices not needed for current task to lowpower states
Modify page allocation to reduce number of
devices in use by each task
59-94% memory power reduction with RDRAM
Will present at USENIX’03
RT-DVS
Goal: reduce per-cycle CPU energy costs
Reducing frequency permits lower voltage
Lower voltage on CPU to obtain V2 savings per
cycle
Frequency change affects execution time,
disrupts RT scheduling
Have developed energy-conserving algs for DVS
that preserve RT guarantees (ACM SOSP’01)
CPU Operating Voltage
Predominant device technology is CMOS
E V 2
Maximum gate delays inversely related to
voltage
Can reduce unit computation energy by
reducing frequency and voltage
Dynamic Voltage Scaling (DVS)
[Weiser+94]
busy system increase frequency
idle system reduce frequency
Many algorithms for non-RT applications
Software adjustable PLL, voltage regulator
often
available, but intended for other things
XScale, SpeedStep, PowerNow!, Crusoe
Our Approach
Development of real-time DVS (RT-DVS)
DVS
algorithms that maintain RT guarantees
Simple enough for online scheduling
Work closely with existing RT sched algs.
In-depth simulation
Implementation in a real, working system
Static Voltage Scaling
Earliest-Deadline-First (EDF)
Worst-case utilization: ui = Ci / Pi
Frequency selection: ui f / fmax
utilization
1.00
0.75
0.50
0.25
0.00
u3= 0.4
u2= 0.2
u1= 0.3
f = 0.9*fmax
Simulation
Parameterized simulation
Synthetic random real-time task sets
uniform distribution of short (1-10), medium (10100), long (100-1000ms) periods
Vary computation time distributions
Vary hardware specifications
Compare different scheduling algorithms
and theoretical bounds.
Simulation Setup
Input: task set, system parameters, sched alg
Output: energy consumption of each alg
System parameters:
list of freqs and voltages
actual fraction of WCET for each task
idle level (idle vs. normal op energy
consumed)
Theoretical lower bounds:
task execution thruput only w/o timing issues
Simulation Results
8 tasks in a task set
100 random task sets
Workload = total worstcase utilization
3 freq./volt. settings:
5V, 1.0*fmax
4V, 0.75*fmax
3V, 0.5*fmax
1
Normalized Energy
Static
0.8
0.6
0.4
0.2
0
0.1
0.3
0.5
Workload
0.7
0.9
Dynamic Scaling
utilization
If each task uses less than WCET, we
can use lower frequency during its
invocation interval
1.00
0.75
0.50
0.25
0.00
u3= 0.2
f = 0.7*fmax
u3= 0.4
(k-1)P3
kP3
u2= 0.2 f = 0.9*fmax
u1= 0.3
time
Cycle-Conserving EDF
= fmax* ui
at Ti release set ui = Ci / Pi
at Ti finish set ui = actual execution time / Pi
f / fmax
fdesired
1.00
0.75
0.50
0.25
0.00
U=0.9
U=0.7
U=0.8 U=0.7
U=0.7
U=0.6
U=0.6
U=0.5
C1/3 C /2
2
C3/2
D1
u1 = 0.3
u2 = 0.2
u3 = 0.4
C1/3
D2
D3
time
Task 1
Task 2
Task 3
Simulation Results
8 tasks in a set
70% WCET each
invocation
3 freq./volt.
settings:
5V, 1.0*fmax
4V, 0.75*fmax
3V, 0.5*fmax
1
Normalized Energy
Static
ccEDF
0.8
0.6
0.4
0.2
0
0.1
0.3
0.5
Workload
0.7
0.9
Proactive Techniques
Tasks typically use much less than WCETs
Proactively reduce frequencies
Look ahead to meet future deadlines
Consider all tasks together
Look-Ahead EDF
Minimize current
frequency
Trade current savings for
potential future loss
Plan to defer work
beyond next immediate
deadline
Ensure future deadlines
with “reservation”
Task 1
Task 2
Task 3
D1
D2
D3
time
D1
D2
D3
time
D1
D2
D3 D1 time
Simulation Results
8 tasks in a set
70% WCET each
invocation
3 freq./volt.
settings:
5V, 1.0*fmax
4V, 0.75*fmax
3V, 0.5*fmax
1
Normalized Energy
Static
ccEDF
laEDF
0.8
0.6
0.4
0.2
0
0.1
0.3
0.5
Workload
0.7
0.9
More Simulation Results
Number of tasks is not important
Voltage and frequency settings greatly
affect performance
Look-ahead does not always perform best
Algorithms can perform close to
theoretical lower bound
Implementation
PC notebook computer
AMD K6-2+ processor, 550 MHz
PowerNow! Technology:
frequency can be changed 200-550MHz in 50MHz
increments
voltage selection 1.4V or 2.0V
empirical mapping between voltage and frequency
switching overheads: 0.4ms (voltage), 41 micros
(freq)
Processing 20W, screen backlight 7W
Software Architecture
Real-time extension to Linux 2.2
Modular design, plug-in schedulers
RT Task Set
PowerNow
module
Real-Time
module
Linux 2.2 Kernel
Hardware
Scheduler
module
Measurements
30
25
20
EDF
ccEDF
laEDF
Simulated Watts
Use oscilloscope with current probe
Synthetic RT workload w/ backlighting off
Similar to simulation results (20-40% savings)
Measured Watts
15
10
5
0
0.2
0.4
0.6
Workload
0.8
0.95
4
3.5
3
2.5
2
1.5
1
0.5
0
EDF
ccEDF
laEDF
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Workload
Power Measurement
Interesting Observations
The very first invocation of a task may
overrun its WCET due to ‘cold’ processor
OS states
Dynamic addition of a task may cause
transient deadline misses, especially with
more aggressive schemes
Insert the task immediately, but release it after
completion of current invocations of all existing tasks
Related work
Most are loosely-coupled with OS and based on
avg processor utilization
Many non-RT DVS papers (esp., UCB)
Offline WCET analysis + online heuristic
Computation time probability heuristics
[Krishna+00], [Swaminathan+00]
[Gruian01]
Compiler-based, application-level DVS
[Mosse+00]
RT-DVS Discussion
Designed and evaluated 5 DVS algorithms for realtime systems (ACM SOSP’01)
Provide deadline guarantees while scaling
frequency and voltage
Simple enough to be used as online schedulers
Excellent energy savings, comparable to non-RT DVS
Implemented in real system on top of Linux
Code available at: http://kabru.eecs.umich.edu/rtos/
But, we need a larger energy framework
Limits of Low-level Techniques
DVS & processor halt conserve energy only
when extra capacity available
No general guidelines on how to make best use
of limited energy
Cannot provide more energy and runtime to
more critical or valuable tasks
Need to adapt app workload to maximize
system gains or utility of computation
Example
A remote surveillance device transmits
compressed video and audio
Solar-powered, but must run overnight
3 real-time tasks:
Radio transmitter (critical): constant bit rate
Video codec (degradable):
-high quality (30 fps, 640x480) MPEG4
-low quality (10 fps, 160x120) MPEG1
Audio codec (noncritical): mp3, either ON or OFF
Example, cont’d
Adapt task set based on power consumption of
tasks, available energy, hours until daylight, and
relative value of the tasks, e.g.,
During daytime or high battery levels:
radio, video at high quality, audio ON
Low battery at night: radio, video low quality, audio ON
Energy is critically low: radio, video low quality, audio OFF
Dynamic adaptation needed in general, as battery
levels and time until daylight are variable
EQoS
Need to maximize benefits gained from energy
spent, but HOW?
=> Energy-aware Quality of Service (EQoS):
Vary per-task QoS, which directly affects the task’s
utility and energy consumption
Select a set of task QoS levels to maximize total
system utility over given runtime
Cast selection into tractable, optimization problem
EQoS Design
EQoS design goals:
Leverage low-level halt, DVS
mechanisms
Meet system runtime goal
Maximize benefits of task
execution
Need methods of changing
task QoS levels and specifying
benefits and energy requirements
How to change per-task QoS?
Adopt RT fault-tolerance techniques:
Period extension
Imprecise computation
Apply different algorithms or CODECs
Omission
Degraded service requires less energy
For EQoS, need to specify set of QoS levels and
their average required energy for each task
Utility
Abstract notion of value from executing tasks
Need to specify utility for each QoS level of a
task:
Increasing Rewards for Increasing Service (IRIS)
Performance Index (PI) for control applications
Perceived quality metrics for multimedia
Actual specification flexible to types of
applications and systems designed
EQoS Algorithms
Given:
Select a QoS level for each task to:
tasks with QoS levels defined, energy required, and utility gained
for each level
remaining system energy
desired runtime, or known time until recharge
achieve desired runtime
maximize total utility
This can be formulated as a MCKP
Each task as a category, and the set of its QoS levels as items in
the category
Knapsack size = power budget
Item values and weights = utility rates and power consumption
MCKP vs. EQoS Problem
EQoS MC Knapsack Problem
...
Task
Category Item 1
Value
Energy Weight
Task
Category Item 2
...
Value
Energy Weight
Task
Category Item n
Value
Energy Weight
Value
Energy Weight
...
Value
Energy Weight
...
Value
Energy Weight
Energy Weight Limit
Runtime Constraint
Optimal Algorithms
NP-hard: all KP can be expressed as MCKP
Exponential Search - O(mn)
...
Branch-and-Bound (BB)
Need
fast bound computation
Can use LMCKP as upper bound
May still require exponential time
Optimal Algorithms, cont’d
Dynamic Programming (DP)
time, O(mnk)
Partial solutions for 1, 2, …, n tasks for all
possible power budgets (energy/runtime)
Pseudo-polynomial
Heuristics
Linear:
Use
LMCKP solution, as with BB bound
Drop fractional part
Greedy:
Start
with same approach as LMCKP
Continue selecting smaller upgrades
O(nm) overhead, without upgrades
sorting
Simulation
Permits exploring a large space multidimensional task set space
Simulate various hardware configurations, RT
scheduling, DVS mechanisms
Static RM, Static EDF, ccRM, ccEDF, laEDF
Generated 1000 random task sets, each with 10
tasks, and each of which has up to 5 QoS levels
QoS degradation models period extension, imprecise
computation, algorithmic change
Simulation Results
EQoS algorithms achieve desired runtime
DVS conserves extra energy, throws off
estimated runtime
Greedy
Linear
Max
Min
Opt
1.2
2
Normalized Utility
Normalized Runtime
1
1.5
1
0.5
0.8
0.6
0.4
0.2
0
0.00E+00
2.00E+09
4.00E+09
6.00E+09
Initial Energy
8.00E+09
0
min
linear
greedy
opti
max
Simulation Results - DVS
DVS increases energy efficiency
Throws off adaptation -- extends runtime
3 volt/freq:
5V,
1.0*fmax
4V, .75*fmax
3V, .35*fmax
Linear
2.00E+09
4.00E+09
Max
Min
Opt
3.5
Normalized Runtime
Greedy
3
2.5
2
1.5
1
0.5
0
0.00E+00
6.00E+09
Initial Energy
8.00E+09
Simulation Results, cont’d
DVS compensation achieves desired runtime
with higher utility
Greedy
Linear
Max
Min
Opt
3
DVS
2
DVS-comp
Normalized Utility
Normalized Runtime
2.5
1.5
1
0.5
2
1.5
1
0.5
0
0.00E+00
2.00E+09
4.00E+09
6.00E+09
8.00E+09
0
min
linear
greedy
opti
max
Initial Energy
Adaptation w/ DVS Compensation
Utility comparison between DVS
compensation and w/o compensation
Implementation
Implemented on Linux 2.2
Periodic real-time support
PowerNow! driver
Real-time scheduler modules
EQoS adaptation module
Battery monitoring module
Currently supports Athlon, Duron, K6-2 processors that
implement AMD’s PowerNow! Technology
Experiments
Measurements on a Compaq Presario 1200Z
Implement RT version of Lame MP3 encoder
use quality parameter to vary QoS
multiple concurrent instances
Results follow trend observed in simulations
Conclusions
RT-DVS provides low-level CPU voltage control
Maintains timing guarantees for RT tasks
Significant energy savings, comparable to non-RT DVS
EQoS provides task adaptation in energyconstrained real-time systems
Provides guidelines to best utilize available energy
among tasks
Frame energy adaptation as a tractable problem
Heuristics work nearly as well as optimal algorithms in
practice