Name Center for Applied Scientific Computing month day, 2001

Download Report

Transcript Name Center for Applied Scientific Computing month day, 2001

DVSleak: Combining Leakage
Reduction and Voltage Scaling in
Feedback EDF Scheduling
Yifan Zhu, Frank Mueller
North Carolina State University
Center for Efficient, Secure and Reliable Computing
Background

Dyn. Voltage scaling (DVS): lowers dyn. power
Dynamic power was dominating
— Power ~
pt CLVdd2 fclk  IleakVdd  Pshort
Leakage becoming dominant

Sleep: lowers leakage (static) power
2
Real-Time Systems


Hard real-time systems
— periodic, preemptive, independent tasks [Liu, Layland]
– w/ known worst-case execution time (WCET)
— jobs: periodically released instances of a task
— WCET: measured at the max. freq., w/o DVS
— most practical system: U << 1
Earliest-deadline-first (EDF) scheduling
Ci
—   1 , Ci=WCET, Pi=period
i Pi
n
Ci
f
— 
  ,  = act (0<  1)  DVS scaling factor
f max
i 1 Pi
3
Motivation




Embedded systems with limited power supply
DVS for real-time system
— trade-off: energy saving vs. timing requirements
— lower CPU voltage/frequency  longer to complete
Task workloads change dynamically
— WCET overestimates actual execution time
— wide variation of execution times
–Longest vs. shortest times
Sleeping: 1-2 orders of magnitude less power
— DVS below threshold  more energy than sleeping
— Long idle  more energy than sleeping
— But wakeup overhead (cold misses in cache)
4
Motivation

Real-world examples:
— graphics: 78% of WCET [Wegener/Mueller]
— defense: 87%; automotive: 74%
— benchmarks: 30-89%; image recognition: 85% [Wolf]
1.2
Prior DVS algorithms: lack
adaptability to dynamic
workloads
Energy (normalized)

c==50%WCET
1
c in [20%WCET, 80%WCET]
0.8
0.6
0.4
0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
WCET Utilization
Look-ahead DVS [Pillai/ Shin]
5
Contribution


A feedback-based framework for dynamic workloads
[LCTES’02, RTAS’04, LCTES’05]
New: A hybrid sleep+DVS scheme, 2 observations:
1. Limit to DVS  use sleep below certain threshold
2. Trade-off idle vs. sleep  depends on length of inactivity
3. Feedback helps in these decisions

Simulation experiments

Comparison with prior work
6
Related Work

Dynamic Voltage Scaling
— General purpose DVS: Weiser, Govil, Pering, Grunwald
— Real-time DVS: Lee, Pillai, Aydin
— Optimality of DVS: Ishihara, Qu, Lorch, Xie, Saewong

Feedback Real-time Scheduling
— Stankovic, Lu, Varma, Poellabauer, Minerick

Leakage-aware DVS scheduling
— Lee, Quan, Jejurikar ’04/’05, Zhang
— We compare with Jejurikar’05 (closest related, best scheme)
7
Feedback-DVS Framework

V/f selector:
error  ci  Ci
A


(V,f) = func(error)
Fig. Feedback-DVS Framework
Maximum EDF schedule
 determine slack in EDF schedule
 assumes: c = WCET
8
Voltage-Frequency Selector
f



:
 1
fmax
Ck
Ci
 
1
Pk i{1,...,n}\{k } Pi
t
Task splitting with WCET: Ci  CiA + CiB
— CiA at freq.  ( 0   100%); CiB at max. freq.
— More aggressive:
–  < uniform frequency w/o splitting
— Objective:
f
–T finishes within the 1st portion
100%
lower energy consumption
A
A
  =Ci /(Ci +slack)

Still guaranteed to meet deadline
 proof in prior paper
Tb
Ta
t
CiA/  CiB
9
Extension to Leakage-aware DVS

Power ~
pt CLVdd2 fclk  IleakVdd  Pshort
Dynamic power does NOT
dominate anymore!


Static power exceeds dynamic power when the voltage is reduced
below a threshold value, the critical speed
— Voltage below threshold  not energy efficient anymore
— Sleeping may be better
But need to consider wakeup overhead
— Mostly due to cache refill
— Calculated statically based on time to refill reused lines
10
Speed Reduction vs. Task Delaying
T
t
Speed reduction
T
t
Delaying the start time

Why delay the start time of a task?
— To maximize the CPU sleeping time
11
Delay Dispatching a Task
T1
idle1
(i) Consider Schedule
T1
T2
idle2
T3
t
WCET
idle1 T2
(ii) No Delay
WCET
T1
sleep
(iii) Delay
sleep
T3
t
idle2
T2
T3
WCET
CB
t
1.
If idle1+idle2 > tth before DVS but < tth afterwards  no DVS
2.
idle1+idle2 < tth  no delay
3.
If idle1 < CB  no delay
4.
Otherwise delay

Still guaranteed to meet deadline  proof in paper
threshold for sleep
12
Scaling below the Critical Speed


Pure DVS: should never scale frequency below critical speed
DVS combined with sleeping:
— sleep if threshold tth > idle slot
— If idle slot is too short (< tth), scale below critical speed
–No other work to do (in contrast to non real-time)
–Lower frequency/voltage  power savings
13
Experimental Framework




Scheduling simulator
— Accurately reflects energy trends [Zhu’05]  PPC405LP
Use the same power model as [Jejurikar’04]
— Critical speed, wakeup cost
Assume four discrete frequency levels:
— 25%, 50%, 75%, 100% of fmax
Compare energy in hyperperiod (const. amount of work) for
— Pure Feedback-DVS
— DVS+sleep: Feedback-DVS w/ sleep policy (no delay policy)
— DSR-DP: dyn. procrastination+slack reclamation [Jejurikar’05]
— DVSleak: feedback-DVS w/ sleep & delay now/later policies
— Lower-bound schedule: best frequency + sleep for max. idle
14
3 Tasks, Const. Execution, 25% WCET
2000000
Pure DVS
DVS+sleep
DSR-DP
DVSleak
Lower Bound
1800000
Energy Consumption (mJ)
1600000
1400000
1200000
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
WCET Utilization


Significant savings w/ sleep, more for low utilizations
DVSleak: Delay  most impact for medium to high utilizations
— Close to lower bound
15
3 Tasks, Const. Execution, 75% WCET
2000000
Pure DVS
DVS+sleep
DSR-DP
DVSleak
Lower Bound
1800000
Energy Consumption (mJ)
1600000
1400000
1200000
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
WCET Utilization

All schemes: resilient to actual/WCET ratio

DVSleak never worse than other schemes, savings:
— 50% over pure, 20% over DVS+sleep, 8.5% over DSR-DP
16
3 Tasks, Var. Execution (pat1), 75% WCET
2000000
Pure DVS
DVS+sleep
DSR-DP
DVSleak
Lower Bound
1800000
Energy Consumption (mJ)
1600000
1400000
1200000
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
WCET Utilization


DVSleak: more resilient to fluctuating exec. times (unchanged)
 feedback helps!
All others: 5-10% more energy consumption than for const. exec.
17
10 Tasks, Const. Execution, 25% WCET
2200000
PureDVS
DVS+sleep
DSR-DP
DVSleak
LowerBound
2000000
Energy Consumption (mJ)
1800000
1600000
1400000
1200000
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
WCET Utilization

More tasks  5-10% higher energy cost (switching)

DVSleak still best of all (~ same margin)
18
Length of Task Periods




U=60%, E normalized to hyperperiod task set 2, c=50% WCET
Harmonic (1) vs. non-harmonic (2):
— 10-27% more energy for non-harmonic  cannot fold jobs
released at same time  more uncertainty
Longer (2) vs. shorter (3) periods for non-harmonic:
— 2-28% more energy for shorter periods  more job releases,
less sleep time
— DVSleak ~ 15% lower energy than DSR-DP
Feedback more important for shorter periods
19
Conclusion


DVSleak: Novel Feedback DVS + leakage (sleep), benefits for
— fluctuating execution times
— shorter task periods
 can scale below critical speed
— medium utilizations (most common)
 sleep policy by itself enough for high/low utilizations
(always sleep/never sleep)
DVSleak energy over other schemes:
—avg. 50% over DVS-only
—avg. 20% more over DVS+sleep
—Avg. 8.5% more over [Jejurikar’05]
—Sleep now/later important when actual exec. << WCET

Prior: Evaluation on a real embedded platform
—V2f model works for OS scheduling
20
Future Work



Implementation on IBM PPC 405LP test board
Has been used for DVS experiments
— Oscilloscope, data acquisition card for voltage / current
Assessing sleep modes
1.
Clock suspend 
same power, all still up
2.
Suspend 
1/10 power, SDRAM up
3.
Hibernate 
N/A (SDRAM  NVRAM)
4.
Standby 
N/A (APM over I2C)

Need faster resume (reactivating devices slowlow-power modes)
21