Transcript Slide 1

System-Wide Energy Minimization for
Real-Time Tasks: Lower Bound and
Approximation
Xiliang Zhong and Cheng-Zhong Xu
Dept. of Electrical & Computer Engg.
Wayne State University
Detroit, Michigan
http://www.cic.eng.wayne.edu
Outline

Introduction



Related Work
System-Wide Energy Optimization for periodic
tasks




The optimal algorithm
A fully polynomial time approximation scheme
Performance Evaluation
System-Wide Energy Optimization for sporadic
Tasks


Processor and system energy model
Solution and evaluation
Conclusions
2
Introduction


Mobile/Embedded devices are power critical,
with limited battery capacity
Software assisted power management

Dynamic power management (DPM)


Resource shutdown after a timeout
Dynamic voltage/frequency scaling (DVS)


Processing speed designed for peak performance
Slowdown the processor voltage / speed when not
fully utilized
3



The dynamic CPU
power is , P ∝ v2f
Reducing v also
reduce the maximum
processors frequency
Approximately,
energy per cycle∝ f2
Energy per cycle
Dynamic voltage scaling (DVS)
1
0.5
DVS
No-DVS
0
0.2
0.4
0.6
0.8
1
Normalized CPU speed
Energy per cycle of PXA processor
Processor slowdown leads to super-linear energy
savings, while linear execution time increase

4
System-Wide Energy


Processor also has leakage power
Applications may use other components such as
memory and peripheral devices


System-wide energy consumed in running a task


Can be in active, standby, sleep, and shutdown states
CPU, resource standby and active energy
Lowering CPU frequency can increase overall
energy expenditure due to prolonged resource
standby time of other components
5
System-Wide Energy (cont.)

critical speed,
the speed with
minimum energy
per cycle


Not energy
efficient using
lower speed
Energy per cycle
x10-9
Processor only
Standby power 0.2 W
Standby power 0.6 W
Standby power 1.2 W
8
6
4
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized speed
Energy per cycle of PXA processor with
different standby power
Execute a task at speed no lower than its critical
speed, then put the devices into low power state

A combined use of slowdown and shutdown
6
Related Work

CPU energy minimization for periodic tasks:


Few studies on system-wide energy minimization

Applications w/o deadlines



Subject to a performance loss [Choi et al.’04]
Real-time periodic tasks on CPU w/ continuous speed levels

Heuristics [Zhuo and Chakrabarti’05]
Real-time periodic tasks on CPU w/ discrete speed levels


Heuristics [Mejia-Alvarez’04], approximations [Chen and Kuo’05]
Heuristics [Jejurikar and Gupta’04]
This work


Pseudo-polynomial algorithm for optimal solutions and
polynomial approximated schemes
Applicable to both offline periodic tasks and online sporadic
tasks in processors with practical discrete levels
7
System-wide energy optimization

Periodic Tasks (Offline)




Sporadic Tasks (Online)



: worst case execution time under max speed
: task period and deadline
: normalized speed of task
Task releases have irregular intervals
Online scheduling based on uncompleted tasks, no
assumption about future task releases
The objective is to minimize

overall energy consumption including CPU and all
other system components while meeting deadline
constraints of all the tasks
8
Energy Minimization for Periodic Tasks

Minimization of energy consumption for n
periodic tasks in a hyper-period,
Feasible constraint under EDF
Boundary constraint

Practical processors with discrete speed levels


The minimization is an NP-hard Multiple Choice
KnapSack (MCKP) problem
There exist pseudo-polynomial solutions to MCKP with
integer coefficients, not applicable in this problem
9
An Example




Basic idea: first solve subprobs with fewer #tasks
A system with an PXA processor with 5 normalized
speed [0.15 0.4 0.6 0.8 1]
System with memory, flash, and WNIC
An example real-time workload w/ 4 periodic tasks
Task Execution Period Utilization
time
Required
resources
Critical
speed
1
2
3
4
cpu
cpu,memory
cpu,mem,flash
cpu,mem, WNIC
0.4
0.4
0.6
0.6
6.4
1.6
1.2
1.08
16
20
12
9
0.4
0.08
0.1
0.12
10
Solution to task 1
Task 1, execution time 6.4; deadline 16; utilization 0.4
 Branch on four normalized speeds [0.4 0.6 0.8 1]
0, 0 f: pruned by feasibility condtion
e: pruned by energy condition
(utilization, energy)

f
e
(1, 2.72) (0.667, 4.267) (0.5, 7.2)

e
(0.4, 10.24)
task 1
State pruning

Feasibility condition:


The 1st node at speed 0.4 removed with utilization already 1
Energy condition


Task 1 at the smallest speed (2nd , 0.6); tasks 2-4 at the max. Total
Energy=7.6 (upper bound)
Task 1 at 3rd or 4th speed (0.8 or 1); tasks 2-4 at the min. The
11
required energy exceeds 7.6. The two states can be removed
Solution to the first three tasks
0, 0
pairs of (utilization, energy)
f: pruned by feasibility condtion
e: pruned by energy condition
d: pruned by dominance f
e
(1, 2.72) (0.667, 4.267) (0.5, 7.2)
f
f
(0.867, 5.75)
e
(0.4, 10.24)
task 1
task 2
(0.767, 6.467)
(0.747, 7.147)
task 3
f
f
d
f
(0.93, 8.47) (0.867, 9.107) (0.87, 9.40)

(0.847, 9.786)
Dominance condition

The states (0.867, 9.107) and (0.87, 9.4) of task 3



First one leads to smaller utilization
Any feasible schedule by the second can also be satisfied
by the first
12
First one uses less energy; the second can be removed
(utilization, energy)
f: pruned by feasibility condtion
e: pruned by energy condition
d: pruned by dominance
0, 0
f
1, 2.72 0.667, 4.267
f
f
0.867, 5.75
e
0.5, 7.2
e
0.4, 10.24
task 1
task 2
0.767, 6.467
0.747, 7.147
task 3
f
0.93, 8.47
f
0.867, 9.107
f
d
0.87, 9.40 0.847, 9.786
f
f
f
1.07, 10.37 0.987, 11.159
optimal state
e
task 4
e
0.967, 11.84
Maximum state number reduced to 6/4*4*3*3 = 0.4 %
A fully polynomial
approximation scheme (FPTAS)

State # is pseudo-polynomial in task number.


can be reduced by providing approximated solutions
Approximated with worst case perf. guarantee


An algorithm is said to be an approximation scheme
if for a given in (0,1), we have
A more desirable approximation scheme (FPTAS) has
a polynomial running time in both the number of
tasks and the performance ratio
14
A fully polynomial
approximation scheme (cont.)

Divide the energy values into a number of
groups each of size r,



Each value
scaled and rounded to
Energy values in the same group are treated equally
Find the group size r, subject to a given
performance bound



Solving
Energy value of each task introduces an error no
larger than group size r
Accumulated errors of n tasks no larger than n*r
A lower bound of E* is when all tasks run at their
critical speeds (Emin), i.e., E*≥ Emin
derives group size r
15
Performance Evaluation

Simulation Settings





A system with an PXA processor
memory: standby power 0.2W, standby time 20%~60% of task
execution
flash drive: 0.4W and 10%~25%
wireless interface: 1W and 5%~20%
Periodic Tasks



Randomly generated deadlines w/ utilization from 0.1~1
Each task randomly chooses a subset of resources
Algorithms implemented
 CPU-DVS, speed control for CPU energy consumption
 CS-DVS, a heuristic algorithm for system-wide energy savings
[Jejurikar and Gupta ISLPED2004],
 OPT-P, the proposed optimal solution
 Approximated scheme with perf. bounds 0.01, 0.1, 0.5
16
Energy Consumption
Performance Evaluation (Periodic tasks)
1.4
CS-DVS
0.5-APPROX
0.1-APPROX
OPT-P
CPU-DVS
1.3
1.2 23%
1.1
16%
8%
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Processor Utilizations
• Proposed algorithms 23% less energy than CPU-only solutions
• Energy consumption up to 16% more efficient than CS-DVS
• Approximation algorithms effectively bound the performance errors
17
Energy Minimization for Sporadic Tasks

Online energy minimization for all
uncompleted tasks
n feasible constraints under EDF
boundary constraint

On a processor with discrete speed levels

Prove the problem is an instance of Multidimensional MCKP (NP-hard in the strong sense,
any optimal solution has exponential running time)
18
Sporadic Tasks (cont.)




Consider three tasks released at time
0 with deadlines 3, 5, 7
Feasibility of a task (e.g. J2) is not
affected by tasks finished later (tasks
in a non-decreasing order of deadlines)
Satisfy one constraint (e.g. J3) at each
J1
iteration
Can be solved by a pseudo-polynomial
algorithm for the optimal solution and 1
an approximation scheme (FPTAS)
J2
3
J3
5
7 Time
19
Performance Evaluation (Sporadic tasks)

Experimental Settings



Varied number of tasks
Task inter-release times generated by an exponential dist.
Algorithms implemented
 TV-DVS, adaptive speed scaling for CPU energy
consumption on processors w/ continuous levels [Zhong
and Xu RTSS2005]
 DVSST, CPU energy consumption with only frequency
scaling available (continuous levels) [Qadi et al. RTSS2003]
 OPT-S, the proposed optimal solution
 0.1, 0.5-approximation, approximated solutions with
different performance settings
20
Energy Consumption
Energy consumption (Sporadic tasks)
2.9
2.5
2.1
1.7 56%
1.3
0.9
10
TV-DVS
0.5-APPROX
OPT-S
DVSST
23%
20
30
40
50
60
70
80
90 100
Number of Tasks
• Small task number: Energy consumption up to 56%
more efficient than TVDVS and DVSST
•Large task number: 23% more efficient
21
Conclusion

System-wide energy minimization for periodic tasks



Minimization for online sporadic tasks


pseudo-polynomial algorithm for the optimal solution
approximated solution in moderate running time with
bounded performance degradation (FPTAS)
Pseudo-polynomial algorithm and an FPTAS by exploiting
inherent properties of online task scheduling
On-going work


Implementation of the policies in an embedded system with
PXA270 processor
Energy/Time overhead voltage and speed switches;
overhead in putting a resource into low power state
22
System-Wide Energy Minimization for Real-Time Tasks:
Lower Bound and Approximation
Thank you!
23
Algorithm running time
(s)
Algorithm running time
• Algorithm running time for
schedules in a 10-minutes run
• OPT-S has higher running time,
but <1% task execution time
• Comparable time for
approximation algorithms with
TV-DVS
Complexity in CPU time (s)
• Running time measured in a Pentium 4 10000
machine with 2 GHz processor
100
• OPT-P has a higher complexity than CS1
DVS
• Below 90 ms for systems with up to 50 0.01
tasks
0
• All approximation algorithms require no
more than 0.4 s to finish
OPT-P
0.01-APPROX
0.1-APPROX
0.5-APPROX
CS-DVS
20
40
60
80
100
Number of tasks
OPT-S
TV-DVS
0.1-APPROX
0.5-APPROX
6
4
2
0
10
20
30
40
50
60
70
80
90
100
Number of Tasks
24