Transcript a 3
Power Aware
Real-time Systems
Rami Melhem
A joint project with
Daniel Mosse,
Bruce Childers,
Mootaz Elnozahy
Outline
Introduction to real-time systems
Voltage and frequency scaling (no slide)
Speed adjustment in frame-based systems
Dynamic speed adjustment in multiprocessor environment
Speed adjustment in general periodic systems
Static speed adjustment for tasks with different power consumptions
Maximizing computational reward for given energy and deadline
Tradeoff between energy consumption and reliability
The Pecan board
Real-time systems
Hard RT systems
Periodic
preemptive
non
preemptive
uni-processor
Soft RT systems
Aperiodic
(frame-based)
preemptive
non
preemptive
parallel processors
Periodic, Rate Monotonic scheduling
• n tasks with maximum computation times Ci and
periods Ti, for i=1,…,n.
C=2, T=4
C=3, T=7
• Static priority scheduling (high priority to task with shorter
period).
• If C2 = 3.1, then C2 will miss its deadline although the
utilization is 24 + 3.1
, which is less than 1.
7
Liu and Layland RMS utilization bound is n (21/n - 1)
Periodic, EDF scheduling
• n tasks with maximum computation times Ci and
periods Ti, for i=1,…,n.
C=2, T=4
C=3, T=7
• Dynamic priority scheduling (high priority to the task with
earlier deadline)
• All tasks will meet their deadlines if utilization is not more
than 1.
speed adjustment in frame-based systems
Static speed adjustment
Assumption: all tasks have the same deadline.
CPU speed
WCET
deadline
Smax
Smin
time
Smax
Smin
time
Select the speed based on worst-case execution time,WCET,
and deadline
Dynamic Speed adjustment techniques
for linear code
WCET
time
ACET
time
Speed adjustment based on remaining WCET
Note: a task very rarely consumes its estimated worst case execution time.
Dynamic Speed adjustment techniques
for linear code
Remaining WCET
time
Remaining time
time
Speed adjustment based on remaining WCET
Dynamic Speed adjustment techniques
for linear code
Remaining WCET
time
Remaining time
time
Speed adjustment based on remaining WCET
Dynamic Speed adjustment techniques
for linear code
time
time
Speed adjustment based on remaining WCET
Dynamic Speed adjustment techniques
for linear code
time
time
Speed adjustment based on remaining WCET
time
Greedy speed adjustment:
Give reclaimed slack to next task rather than all future tasks
Dynamic Speed adjustment techniques
for linear code
WCET
WCE
ACET
Smax
WCE
WCE
time
Remaining time
Remaining av. ex. time
AV
AV
time
Smax
WCE
WCE
Speed adjustment based on
remaining average execution time
time
An alternate point of view
WCET
WCE
WCE
WCE
time
ACET
WCE
Reclaimed
slack
WCE
AV
time
stolen
slack
WCE
time
Dynamic Speed adjustment techniques
for non-linear code
p1
p2
p3
min
average
At a
• Remaining WCET is based on the longest path
• Remaining average case execution time is based on
the branching probabilities (from trace information).
max
2. Periodic, non-frame-based systems
Each task has a WCET, Ci and a period Ti
Earliest Deadline First (EDF) scheduling
Ci
T
i
Static speed adjustment: If utilization U =
< 1,
then we can reduce the speed by a factor of U, and still
guarantee that deadlines are met.
Smax
Note: Average utilization, Uav can be much less than U
correct
Greedy dynamic speed adjustment
Giving reclaimed slack to the next ready task is not always a
correct scheme
Theorem: A reclaimed slack has to be associated with a deadline
and can be given safely to a task with an earlier or equal deadline.
aggressive dynamic speed adjustment
Theorem: if tasks 1, …, k are ready and will complete before the
next task arrival, then we can swap the time allocation of the k
tasks. That is we can add stolen slack to the reclaimed slack
Experimental rule: Do not be very aggressive and reduce the
speed of a task below a certain speed (the optimal speed
determined by Uav )
Speed adjustment in Multi-processors
1. the case of independent tasks on two processors
Canonical execution ==> all tasks consume WCET
Global queue
6
P1
6
No speed
management
P2
4
4
2
4
4
Deadline
12
Static speed
management
P1
P2
2
9
6
3
6
time
Dynamic speed adjustment
Non canonical execution ==> tasks consume ACET
If we select the initial speed based on WCET, can we do
dynamic speed adjustment and still meet the deadline?
9,3
Greedy slack
Reclamation
(GSR)
6,5
6,6
3,3
P1
deadline
miss
P2
12
Slack sharing
(SS)
P1
P2
time
Dynamic speed adjustment
Non canonical execution ==> tasks consume ACET
If we select the initial speed based on WCET, can we do
dynamic speed adjustment and still meet the deadline?
9,3
Greedy slack
Reclamation
(GSR)
6,5
6,6
3,3
P1
deadline
miss
P2
12
Slack sharing
(SS)
P1
P2
time
Dynamic speed adjustment
Non canonical execution ==> tasks consume ACET
If we select the initial speed based on WCET, can we do
dynamic speed adjustment and still meet the deadline?
9,3
Greedy slack
Reclamation
(GSR)
6,5
6,6
3,3
P1
deadline
miss
P2
12
Slack sharing
(SS)
P1
P2
time
meets
deadline
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
P1
P2
2
4
3
Ready_Q
Canonical
execution
3
4
6
6
3
4
3
12
time
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
P1
P2
2
3
4
3
Ready_Q
Canonical
execution
3
4
6
6
3
4
3
12
time
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
P1
P2
2
3
4
3
Ready_Q
Canonical
execution
3
4
4
3
3
6
6
6
6
12
time
6
6
• Assuming that we adjust the speed statically such that
canonical execution meets the deadline.
• Can we reclaim unused slack dynamically and still meet the
deadline?
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
P1
2
P2
3
4
3,1
Ready_Q
Canonical
execution
3
4
4
3
3
6
6
6
6
12
time
6
6
Ready_Q
Non-canonical P1
Execution with
Slack sharing P2
2
3
time
12
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
2
P1
P2
3
Ready_Q
Non-canonical P1
Execution with
Slack sharing P2
4
3,1
Ready_Q
Canonical
execution
3
4
3
4
6
6
6
6
12
time
6
3
6
6
2
1
time
12
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
2
P1
P2
3
4
3
4
6
6
6
6
12
time
6
3
Ready_Q
Non-canonical P1
Execution with
Slack sharing P2
4
3,1
Ready_Q
Canonical
execution
3
6
6
2
1
6
time
12
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
2
P1
P2
3
4
2
1
6
6
3
4
6
6
12
time
6
3
Ready_Q
Non-canonical P1
Execution with
Slack sharing P2
4
3,1
Ready_Q
Canonical
execution
3
6
6
4
3
4
6
time
12
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
2
P1
P2
3
4
2
1
6
6
6
12
time
6
3
6
6
4
6
3
4
Ready_Q
Non-canonical P1
Execution with
Slack sharing P2
4
3,1
Ready_Q
Canonical
execution
3
4
3
6
3
6
time
12
Dynamic speed adjustment (2 processors)
2. dependent tasks
2
Use list scheduling
2
P1
P2
3
4
2
1
6
6
12
time
6
6
6
6
6
3
4
6
3
4
Ready_Q
Non-canonical P1
Execution with
Slack sharing P2
4
3,1
Ready_Q
Canonical
execution
3
4
3
3
6
deadline
miss
6
time
12
Dynamic speed adjustment (2 processors)
Solution: Use a wait_Q to
enforce canonical order in
Ready_Q
Wait_Q
4
2
3
4
3,1
3
6
6
6
Ready_Q
P1
P2
2
3
12
time
• A task is put in Wait_Q when its last predecessor starts execution
• Tasks are ordered in Wait_Q by their expected start time under WCET
• Only the head of the Wait_Q can move to the Ready_Q
Dynamic speed adjustment (2 processors)
Solution: Use a wait_Q to
enforce canonical order in
Ready_Q
Wait_Q
4
2
3
4
3,1
3
6
6
6
Ready_Q
P1
2
P2 1
12
time
• A task is put in Wait_Q when its last predecessor starts execution
• Tasks are ordered in Wait_Q by their expected start time under WCET
• Only the head of the Wait_Q can move to the Ready_Q
Dynamic speed adjustment (2 processors)
Solution: Use a wait_Q to
enforce canonical order in
Ready_Q
2
3
4
3,1
6
Wait_Q
4
3
6
Ready_Q
4
3
6
P1
2
P2 1
6
6
4
3
12
time
• A task is put in Wait_Q when its last predecessor starts execution
• Tasks are ordered in Wait_Q by their expected start time under WCET
• Only the head of the Wait_Q can move to the Ready_Q
Dynamic speed adjustment (2 processors)
Solution: Use a wait_Q to
enforce canonical order in
Ready_Q
2
3
4
3,1
6
Wait_Q
4
3
6
Ready_Q
4
3
6
P1
2
P2 1
6
6
4
3
6
12
time
• A task is put in Wait_Q when its last predecessor starts execution
• Tasks are ordered in Wait_Q by their expected start time under WCET
• Only the head of the Wait_Q can move to the Ready_Q
Dynamic speed adjustment (2 processors)
Solution: Use a wait_Q to
enforce canonical order in
Ready_Q
3
2
4
6
3,1
6
Wait_Q
4
3
6
6
Ready_Q
4
3
6
6
P1
2
P2 1
6
4
3
6
meets
deadline
12
time
• A task is put in Wait_Q when its last predecessor starts execution
• Tasks are ordered in Wait_Q by their expected start time under WCET
• Only the head of the Wait_Q can move to the Ready_Q
Theoretical results
For independent tasks, if canonical execution finishes at time T, then
non-canonical execution with slack sharing finishes at or before time
T.
For dependent tasks, if canonical execution finishes at time T, then,
non-canonical execution with slack sharing and a wait queue
finishes at or before time T.
Implication:
• Can optimize energy based on WCET (static speed adjustment)
• At run time, can use reclaimed slack to further reduce energy
(dynamic speed adjustment), while still guaranteeing deadlines.
Simulation results
• We simulated task graphs from real-applications (matrix
operations and solution of linear equations)
• We assumed that Power consumption is proportional to S3
• Typical results are as follows:
Static speed/voltage adjustment
Dynamic speed/voltage adjustment
100%
100%
80%
80%
60%
60%
40%
40%
20%
20%
2
4
6
8
10
Number of processors
(50% dynamic slack)
12
.8
.7
.6
.5
.4
.3
Average dynamic slack
(2 processors)
4. Static optimization when different
tasks consume different power
Assuming that the power consumption
functions are identical for all tasks.
Then to minimize the total energy, all
tasks have to execute at the same speed.
Start time
Task 1
Task 2
If, however, the power functions, Pi(S), are
different for tasks i = 1, … , n,
Then using the same speed for all tasks
does not minimize energy consumption.
Task 3
deadline
Let Ci = number of cycles needed to complete task i
Minimizing energy consumption
Example:
Three tasks with C1 = C2 = C3
If Pi(S) = ai S2 , for task i ,
then, energy consumed by task i is Ei = ai / ti .
Start time
If a1 = a2 = a3 ,
then t1 = t2 = t3 minimizes total energy
t1
Task
1
t2
Task
2
t3
Task
3
E
D/3
time
deadline
D
Minimizing energy consumption
Example:
Three tasks with C1 = C2 = C3
If Pi(S) = ai S2 , for task i ,
then, energy consumed by task i is Ei = ai / ti .
Start time
If a1 , a2 and a3 , are different
then t1 = t2 = t3 does not minimize total energy
t1
E
t2
D
t3
D/3
time
deadline
Minimizing energy consumption
The problem is to find Si , i=1, … , n, such that to
Start time
n
minimize
ti
i 1
n
subject to
t
i 1
and
Note that ti
S min
i
Pi ( Si )
D
Si
S max
D
Ci
Si
• We solved this optimization problem, consequently developing
a solution for arbitrary convex power functions.
• Algorithm complexity: O(n2 log n)
deadline
Maximizing the system’s reward
General problem assumptions:
• tasks have different power/speed functions
• tasks have different rewards as functions of number of
executed cycles
Power
S1
Speed
(S)
S2
C2
C1
S3
C3
time
Reward
C1
C2
C3
Maximizing the system’s reward
p
Theorem:
If power
functions
areare
all all
quadratic
or cubic
Theorem:
If power
functions
of the form
ai Sin, S,
then reward is maximum when power is the same for all tasks
Power
Energy
Deadline
S1
S2
S3
Given the speeds, we know how to maximize
the total reward while meeting the deadline.
Speed
C1
C2
time
C3
Maximizing the system’s reward
If power functions of tasks are not all of the form ai S
p
1) Ignore energy and maximize reward, R, within the deadline
2) If exceed available energy;
– remove D t from a task such that decrease in R is minimal
– use D t to decrease the speed of a task, such as to maximize
the decrease in energy consumption
Dt
Speed
time
Maximizing the system’s reward
If power functions of tasks are not all of the form ai S
p
1) Ignore energy and maximize reward, R, within the deadline
2) If exceed available energy;
- remove D t from a task such that decrease in R is minimal
- use D t to decrease the speed of a task, such as to maximize
the decrease in energy consumption
3) Repeat step 2 until the energy constraints are satisfied
time
Dt
Tradeoff between energy & dependability
Basic hypothesis:
Dependable systems must include redundant capacity
in either time or space (or both)
Redundancy can also be exploited to reduce power
consumption
Time redundancy
(checkpointing and rollbacks)
Space redundancy
Exploring time redundancy
The slack can be used to
1) add checkpoints
2) reserve recovery time
3) reduce processing speed
Smax
For a given number of checkpoints,
we can find the speed that minimizes
energy consumption, while
guaranteeing recovery and timeliness.
deadline
Optimal number of checkpoints
More checkpoints = more overhead + less recovery slack
r
C
D
For a given
slack (C/D) and
checkpoint overhead (r/C),
we can find the number of checkpoints
that
minimizes energy consumption, and
guarantee recovery and timeliness.
Energy
# of checkpoints
Non-uniform check-pointing
Observation: May continue executing at Smax
after recovery.
Disadvantage: increases energy consumption when a
fault occurs (a rare event)
Advantage: recovery in an early section can use slack
created by execution of later sections at Smax
Requires non-uniform checkpoints.
Non-uniform check-pointing
Can find
# of checkpoints,
their distribution, and
the CPU speed
such that
energy is minimized,
recovery is guaranteed
and deadlines are met
The Pecan board
• An experimental platform for power management research
• Profiling power usage of applications.
• Development of active software power control.
• An initial prototype for thin/dense servers based on
Embedded PowerPC processors.
• A software development platform for PPC405 processors.
Serial
RAM
PCMCIA
PCI
Bridge
PPC405
Flash
Ethernet
RTC
Ethernet
Network(s)
PCI Bridge
Flash
Flash
Flash
Flash
FPGA
Serial
RAM
Serial
RAM
Ethernet
Serial
RAM
Ethernet
Serial
PPC405 RAM
EthernetPPC405
Ethernet
PPC405
PPC405
PCI-Based Network
Block Diagram
• Single-board computer as a
PCI adapter; non-transparent
bridge.
FPGA
PCMCIA
Ethernet
Flash, RTC, FPGA, PCMCIA.
• Components grouped on
RAM
PCI
Bridge
PPC405
Flash
• Up to128MB SDRAM.
• Serial, 10/100 Ethernet, Boot
Serial
RTC
voltage islands for power
monitoring & control.
Software: Linux
• Linux for PPC 405GP available in-house & commercially (MontaVista).
• PCI provides easy expandability.
• Leverage open-source drivers for networking over PCI -- Fast access to
network-based file systems.