Transcript Talk Slides

Energy Conservation and Adaptation
in
Mobile and Embedded Systems
Faculty: Kang G. Shin
Grad students: Babu Pillai
Hai Huang
Sabbatical leave from
Real-Time Computing Laboratory
University of Michigan
http://www.eecs.umich.edu/~kgshin
My Current Thrust Areas



Wired and Wireless QoS Networking
 Internet QoS with DiffServ, MPLS, overlay networks
 Dependable real-time protocols
 Wireless QoS and security protocols
Embedded Systems
 Model-based integration of application SW
 Modeling and integration of OS and network services
 Energy-aware real-time OS and applications
 Embedded systems networks
Internet Servers and Adaptware
 Workload differentiation and protection
 Overload detection and avoidance
 DDoS attacks
Outline
Motivation
 Energy-efficient real-time OSs
 Real-time dynamic voltage scaling
 Energy-aware QoS
 Conclusions

Motivation




Handheld and mobile comp and comm devices
are everywhere!
Increasingly complex SW and faster HW demand
more energy
Rapid increases in HW complexity, speed, and
power consumption, but battery technology is not
keeping up
Need to conserve energy, improve computational
efficiency via OS on power-constrained systems
Real-Time & Energy Constraints

Many power-constrained embedded or mobile
systems have real-time tasks
Time-critical computations, typically periodic
 Need to provide guarantees for meeting deadlines



Available stored energy fundamentally limits
system runtime
Need to use energy efficiently, allocate to the
most critical or desirable computations, while
meeting real-time requirements
Real-Time Task Characteristics


Typically well-defined task set
Canonical real-time task, Ti:
Is periodic, with period Pi
 Has worst-case execution time (WCET), Ci
 Has relative deadline, di typically equal to Pi



Periodic model can accommodate aperiodic and
sporadic tasks
Schedulability of RT systems well-studied
Energy-Efficient RTOS

Reduce overhead services =>
lower computational overhead
=>lower CPU power consumption
Optimized IPC for periodic RT tasks
 Combined Static Dynamic (CSD) scheduling
 Protocol stack layer-bypassing
 Eliminate naming services
 Reported at ACM SOSP’99, NOSSDAV’00, USENIX’02


Exploit HW mechanisms, e.g., voltage scaling of
CPU, power management of memory subsystem
Memory Power Management





Goal: reduce power dissipation for memory access
Main memory consists of multiple devices, each
with independently-controlled power states
Switch devices not needed for current task to lowpower states
Modify page allocation to reduce number of
devices in use by each task
59-94% memory power reduction with RDRAM

Will present at USENIX’03
RT-DVS





Goal: reduce per-cycle CPU energy costs
Reducing frequency permits lower voltage
Lower voltage on CPU to obtain V2 savings per
cycle
Frequency change affects execution time,
disrupts RT scheduling
Have developed energy-conserving algs for DVS
that preserve RT guarantees (ACM SOSP’01)
CPU Operating Voltage
Predominant device technology is CMOS
 E V 2
 Maximum gate delays inversely related to
voltage
 Can reduce unit computation energy by
reducing frequency and voltage

Dynamic Voltage Scaling (DVS)
[Weiser+94]
busy system  increase frequency
idle system  reduce frequency
 Many algorithms for non-RT applications
 Software adjustable PLL, voltage regulator

 often
available, but intended for other things
 XScale, SpeedStep, PowerNow!, Crusoe
Our Approach

Development of real-time DVS (RT-DVS)
 DVS
algorithms that maintain RT guarantees
 Simple enough for online scheduling
 Work closely with existing RT sched algs.
In-depth simulation
 Implementation in a real, working system

Static Voltage Scaling
Earliest-Deadline-First (EDF)
 Worst-case utilization: ui = Ci / Pi
 Frequency selection: ui  f / fmax

utilization
1.00
0.75
0.50
0.25
0.00
u3= 0.4
u2= 0.2
u1= 0.3
f = 0.9*fmax
Simulation
Parameterized simulation
 Synthetic random real-time task sets


uniform distribution of short (1-10), medium (10100), long (100-1000ms) periods
Vary computation time distributions
 Vary hardware specifications
 Compare different scheduling algorithms
and theoretical bounds.

Simulation Setup




Input: task set, system parameters, sched alg
Output: energy consumption of each alg
System parameters:
 list of freqs and voltages
 actual fraction of WCET for each task
 idle level (idle vs. normal op energy
consumed)
Theoretical lower bounds:
 task execution thruput only w/o timing issues
Simulation Results



8 tasks in a task set
100 random task sets
Workload = total worstcase utilization
3 freq./volt. settings:
 5V, 1.0*fmax
 4V, 0.75*fmax
 3V, 0.5*fmax
1
Normalized Energy

Static
0.8
0.6
0.4
0.2
0
0.1
0.3
0.5
Workload
0.7
0.9
Dynamic Scaling
utilization
If each task uses less than WCET, we
can use lower frequency during its
invocation interval
1.00
0.75
0.50
0.25
0.00
u3= 0.2
f = 0.7*fmax
u3= 0.4
(k-1)P3
kP3
u2= 0.2 f = 0.9*fmax
u1= 0.3
time
Cycle-Conserving EDF
= fmax* ui
at Ti release set ui = Ci / Pi
at Ti finish set ui = actual execution time / Pi


f / fmax
 fdesired
1.00
0.75
0.50
0.25
0.00
U=0.9
U=0.7
U=0.8 U=0.7
U=0.7
U=0.6
U=0.6
U=0.5
C1/3 C /2
2
C3/2
D1
u1 = 0.3
u2 = 0.2
u3 = 0.4
C1/3
D2
D3
time
Task 1
Task 2
Task 3
Simulation Results


8 tasks in a set
70% WCET each
invocation
3 freq./volt.
settings:
5V, 1.0*fmax
 4V, 0.75*fmax
 3V, 0.5*fmax

1
Normalized Energy

Static
ccEDF
0.8
0.6
0.4
0.2
0
0.1
0.3
0.5
Workload
0.7
0.9
Proactive Techniques
Tasks typically use much less than WCETs
 Proactively reduce frequencies
 Look ahead to meet future deadlines
 Consider all tasks together

Look-Ahead EDF




Minimize current
frequency
Trade current savings for
potential future loss
Plan to defer work
beyond next immediate
deadline
Ensure future deadlines
with “reservation”
Task 1
Task 2
Task 3
D1
D2
D3
time
D1
D2
D3
time
D1
D2
D3 D1 time
Simulation Results


8 tasks in a set
70% WCET each
invocation
3 freq./volt.
settings:
5V, 1.0*fmax
 4V, 0.75*fmax
 3V, 0.5*fmax

1
Normalized Energy

Static
ccEDF
laEDF
0.8
0.6
0.4
0.2
0
0.1
0.3
0.5
Workload
0.7
0.9
More Simulation Results
Number of tasks is not important
 Voltage and frequency settings greatly
affect performance
 Look-ahead does not always perform best
 Algorithms can perform close to
theoretical lower bound

Implementation



PC notebook computer
AMD K6-2+ processor, 550 MHz
PowerNow! Technology:
frequency can be changed 200-550MHz in 50MHz
increments
 voltage selection 1.4V or 2.0V
 empirical mapping between voltage and frequency
 switching overheads: 0.4ms (voltage), 41 micros
(freq)


Processing 20W, screen backlight 7W
Software Architecture
Real-time extension to Linux 2.2
 Modular design, plug-in schedulers

RT Task Set
PowerNow
module
Real-Time
module
Linux 2.2 Kernel
Hardware
Scheduler
module
Measurements

30
25
20
EDF
ccEDF
laEDF
Simulated Watts

Use oscilloscope with current probe
Synthetic RT workload w/ backlighting off
Similar to simulation results (20-40% savings)
Measured Watts

15
10
5
0
0.2
0.4
0.6
Workload
0.8
0.95
4
3.5
3
2.5
2
1.5
1
0.5
0
EDF
ccEDF
laEDF
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Workload
Power Measurement
Interesting Observations
The very first invocation of a task may
overrun its WCET due to ‘cold’ processor
OS states
 Dynamic addition of a task may cause
transient deadline misses, especially with
more aggressive schemes


Insert the task immediately, but release it after
completion of current invocations of all existing tasks
Related work
Most are loosely-coupled with OS and based on
avg processor utilization
 Many non-RT DVS papers (esp., UCB)
 Offline WCET analysis + online heuristic


Computation time probability heuristics


[Krishna+00], [Swaminathan+00]
[Gruian01]
Compiler-based, application-level DVS

[Mosse+00]
RT-DVS Discussion

Designed and evaluated 5 DVS algorithms for realtime systems (ACM SOSP’01)
 Provide deadline guarantees while scaling
frequency and voltage
 Simple enough to be used as online schedulers
Excellent energy savings, comparable to non-RT DVS
Implemented in real system on top of Linux
Code available at: http://kabru.eecs.umich.edu/rtos/

But, we need a larger energy framework



Limits of Low-level Techniques




DVS & processor halt conserve energy only
when extra capacity available
No general guidelines on how to make best use
of limited energy
Cannot provide more energy and runtime to
more critical or valuable tasks
Need to adapt app workload to maximize
system gains or utility of computation
Example



A remote surveillance device transmits
compressed video and audio
Solar-powered, but must run overnight
3 real-time tasks:
Radio transmitter (critical): constant bit rate
 Video codec (degradable):
-high quality (30 fps, 640x480) MPEG4
-low quality (10 fps, 160x120) MPEG1
 Audio codec (noncritical): mp3, either ON or OFF

Example, cont’d

Adapt task set based on power consumption of
tasks, available energy, hours until daylight, and
relative value of the tasks, e.g.,




During daytime or high battery levels:
radio, video at high quality, audio ON
Low battery at night: radio, video low quality, audio ON
Energy is critically low: radio, video low quality, audio OFF
Dynamic adaptation needed in general, as battery
levels and time until daylight are variable
EQoS

Need to maximize benefits gained from energy
spent, but HOW?
=> Energy-aware Quality of Service (EQoS):
Vary per-task QoS, which directly affects the task’s
utility and energy consumption
 Select a set of task QoS levels to maximize total
system utility over given runtime
 Cast selection into tractable, optimization problem

EQoS Design

EQoS design goals:
Leverage low-level halt, DVS
mechanisms
 Meet system runtime goal
 Maximize benefits of task
execution


Need methods of changing
task QoS levels and specifying
benefits and energy requirements
How to change per-task QoS?

Adopt RT fault-tolerance techniques:
Period extension
 Imprecise computation
 Apply different algorithms or CODECs
 Omission



Degraded service requires less energy
For EQoS, need to specify set of QoS levels and
their average required energy for each task
Utility


Abstract notion of value from executing tasks
Need to specify utility for each QoS level of a
task:
Increasing Rewards for Increasing Service (IRIS)
 Performance Index (PI) for control applications
 Perceived quality metrics for multimedia


Actual specification flexible to types of
applications and systems designed
EQoS Algorithms

Given:




Select a QoS level for each task to:



tasks with QoS levels defined, energy required, and utility gained
for each level
remaining system energy
desired runtime, or known time until recharge
achieve desired runtime
maximize total utility
This can be formulated as a MCKP



Each task as a category, and the set of its QoS levels as items in
the category
Knapsack size = power budget
Item values and weights = utility rates and power consumption
MCKP vs. EQoS Problem
EQoS MC Knapsack Problem
...
Task
Category Item 1
Value
Energy Weight
Task
Category Item 2
...
Value
Energy Weight
Task
Category Item n
Value
Energy Weight
Value
Energy Weight
...
Value
Energy Weight
...
Value
Energy Weight
Energy Weight Limit
Runtime Constraint
Optimal Algorithms
NP-hard: all KP can be expressed as MCKP
 Exponential Search - O(mn)
...
 Branch-and-Bound (BB)

 Need
fast bound computation
 Can use LMCKP as upper bound
 May still require exponential time
Optimal Algorithms, cont’d

Dynamic Programming (DP)
time, O(mnk)
 Partial solutions for 1, 2, …, n tasks for all
possible power budgets (energy/runtime)
 Pseudo-polynomial
Heuristics

Linear:
 Use
LMCKP solution, as with BB bound
 Drop fractional part

Greedy:
 Start
with same approach as LMCKP
 Continue selecting smaller upgrades

O(nm) overhead, without upgrades
sorting
Simulation


Permits exploring a large space multidimensional task set space
Simulate various hardware configurations, RT
scheduling, DVS mechanisms


Static RM, Static EDF, ccRM, ccEDF, laEDF
Generated 1000 random task sets, each with 10
tasks, and each of which has up to 5 QoS levels

QoS degradation models period extension, imprecise
computation, algorithmic change
Simulation Results


EQoS algorithms achieve desired runtime
DVS conserves extra energy, throws off
estimated runtime
Greedy
Linear
Max
Min
Opt
1.2
2
Normalized Utility
Normalized Runtime
1
1.5
1
0.5
0.8
0.6
0.4
0.2
0
0.00E+00
2.00E+09
4.00E+09
6.00E+09
Initial Energy
8.00E+09
0
min
linear
greedy
opti
max
Simulation Results - DVS
DVS increases energy efficiency
 Throws off adaptation -- extends runtime

3 volt/freq:
 5V,
1.0*fmax
 4V, .75*fmax
 3V, .35*fmax
Linear
2.00E+09
4.00E+09
Max
Min
Opt
3.5
Normalized Runtime

Greedy
3
2.5
2
1.5
1
0.5
0
0.00E+00
6.00E+09
Initial Energy
8.00E+09
Simulation Results, cont’d

DVS compensation achieves desired runtime
with higher utility
Greedy
Linear
Max
Min
Opt
3
DVS
2
DVS-comp
Normalized Utility
Normalized Runtime
2.5
1.5
1
0.5
2
1.5
1
0.5
0
0.00E+00
2.00E+09
4.00E+09
6.00E+09
8.00E+09
0
min
linear
greedy
opti
max
Initial Energy
Adaptation w/ DVS Compensation
Utility comparison between DVS
compensation and w/o compensation
Implementation

Implemented on Linux 2.2






Periodic real-time support
PowerNow! driver
Real-time scheduler modules
EQoS adaptation module
Battery monitoring module
Currently supports Athlon, Duron, K6-2 processors that
implement AMD’s PowerNow! Technology
Experiments



Measurements on a Compaq Presario 1200Z
Implement RT version of Lame MP3 encoder
 use quality parameter to vary QoS
 multiple concurrent instances
Results follow trend observed in simulations
Conclusions

RT-DVS provides low-level CPU voltage control
Maintains timing guarantees for RT tasks
 Significant energy savings, comparable to non-RT DVS


EQoS provides task adaptation in energyconstrained real-time systems
Provides guidelines to best utilize available energy
among tasks
 Frame energy adaptation as a tractable problem
 Heuristics work nearly as well as optimal algorithms in
practice
