Exploiting Symmetry in SAT-Based Boolean Matching for
Download
Report
Transcript Exploiting Symmetry in SAT-Based Boolean Matching for
Optimality and Improvement of
Dynamic Voltage Scaling Algorithms for
Multimedia Applications
Authors:
Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar
Presented by :
Amarnath Kasibhatla & ViswaKiran Popuri
Electrical Engineering Dept., UCLA
Outline
Background
Problem formulation
Optimal Offline Solution
Effective Online Algorithm
Simulations and Results
Conclusions
Overview of the Background Review
Why DVS is required and how it helps?
Brief Review of DVS algorithms
Changes in Communication & Computation
From Wall-plugged Towards Portable systems
Battery did not scale as much! Has LIMITED ENERGY
And that limited energy is expected to last long.
From General-purpose Towards Intelligent systems
Computationally more expensive (freq. of operation is ever
increasing)! Requires MORE ENERGY
From Continuous-Mode Towards Burst-Mode
Multimedia (PEAK requirement >> AVERAGE throughput)
For e.g. Video transmission over Mobile phones
Need an ENERGY-efficient processor to
support high peak operating freq while
conserving energy in other modes.
CMOS & DV(F)S
Energy = Csw*Vdd2*fclk*Texec (Quadratic wrt Vdd)
Delay α Vdd/(Vdd–Vth)α (Freq scales Linearly wrt Vdd )
Dynamic Voltage Scaling: Reduce Vdd to an extent
that the Delay requirements are just met.
Example: fclk = (n/ ΔT)
Case1:
Texec= ΔT/2
ΔT
E1 = (1/2)*CVdd2n
Case2:
Texec= ΔT, (Vdd/2), fclk/2
E2 = (1/8)*CVdd2n
= (1/4)*E1
75 % Reduction in Energy!
Taken from *[1]
Need and Challenge of Power Management
Multimedia applications are energy sensitive
Computationally demanding
Stringent deadlines for multiple tasks
Processed on energy-limited devices
DVS (dynamic voltage scaling) algorithms to reduce
energy while meeting deadlines
DVS algorithms need to deal with uncertainty in
Job complexity (time-varying workloads)
Communication delay (e.g. wireless channel)
DVS Hardware/Algorithm requirements
Hardware: Programmable DC-DC switching voltage
regulator, Programmable clock gen & Processor with
multiple Operating Points.
Algorithm: Accurate Prediction of workload requirement
for a give computation task.
Error in Prediction causes Deadline miss or Low energyefficiency
Taken from *[1]
Overview of the background review
Why DVS is required and how it helps?
Types of DVS algorithms
DVS algorithms
Single-task deadline based
Multiple-task deadline based
Feedback control based
Stochastic Model based
Single-task deadline based
Uses Worst-Case Exec Time (WCET) or Average Case
Exec Time (ACET)
Frame-based DVS [1]
Each frame is handled individually for accurate prediction of
decoding time.
Cross-layer adaptation [2]
Adapts Hardware layer (Vdd Scaling) for small changes in
processing overload (fine granularity).
Taken from *[2]
(+) Low computational Complexity
(-) Tasks with imminent deadlines take huge energy to finish
task in time.
Multiple-task deadline based
Real Time-DVS (RT-DVS) [3]
Normal-DVS
Based
on average throughput (For e.g ACET)
Simple feedback mechanism: Detect the idle time
and adjust the freq. This might cause deadline miss
DVS must be tightly coupled with the real-time scheduler of the
OS.
Rate-Monotonic
Scheduler (RM): Static, Quickest
Task first
Earliest-Deadline-First Scheduler (EDF): Dynamic,
Task with earliest deadline first
Multiple-task deadline based
Look-Ahead RT-DVS: Defer as much work as possible.
Set oper freq to meet the minimum work that must be
done to meet the deadlines.
Taken from *[3]
Feedback Control Based
Earlier approaches of hard real-time
scheduling rely on a priori
knowledge of WCET.
The actual execution time varies a
lot from the estimated WCET.
If actual exec time is lesser, proc
consumes more Energy ( & hence
computed earlier) than required.
~60% degradation
For e.g. 60% degradation in RTDVS for fluctuating workload.
Use feedback control techniques in
real-time scheduling for hard realtime systems such that the DVS
scheme should adjust to the everchanging workload as fast as
possible.
Taken from *[4]
Feedback Control Based
Feedback-DVS [4]
Based on the difference
between CA and the actual
exec time (error) & the
WCETs (Maximal Schedule
Profile) the Vol/Freq
selector chooses a V,F.
Using the V,F input, the
scheduler schedules the
next ready task (from the
Task Queue) using EDF
policy.
The estimated exec time for
the next job is fed-back for
later decision making.
Taken from *[4]
CA : Execution time of
the first portion of tasks.
Maximal Schedule
Profile: Has the offline
generated exec times.
Complexity of Multimedia Applications
Huge work-load variations between different classes of jobs
Work load distribution within each class of decoding jobs is
observed to estimate mean and variance through off-line
training.
Model for Stochastic Complexity of Jobs
Class-based stochastic model [5]
A job class is a particular GOP frame type
Near Gaussian distributed complexity
Parameters derived offline and transmitted online with low cost
B. Foo and M. van der Schaar, "A Queuing Theoretic Approach to Processor Power
Adaptation for Video Decoding Systems," IEEE Trans. Signal Process., vol. 56, no. 1,
pp. 378-392, Jan. 2008
References
[1] “Frame-Based Dynamic Voltage and Frequency Scaling for a
MPEG Decoder” Kihwan Choi et al, ICCAD 2002
[2] “GRACE: Cross-layer adaptation for multimedia quality and
battery energy” W. Yuan et al, IEEE trans. On Mobile Computing,
2006
[3] “Real time dynamic voltage scaling for low power embedded
operating systems” P.Pillai et al, Proc of ACM symposium on
Operating Systems, 2001
[4] “Feedback EDF scheduling exploiting dynamic voltage scaling”
Y. Zhu et al, Proc. Of International Conf on Comp. Arch, 2004
[5] "A Queuing Theoretic Approach to Processor Power Adaptation
for Video Decoding Systems," B. Foo and M. van der Schaar, IEEE
Trans. Signal Process., vol. 56, no. 1, pp. 378-392, Jan. 2008
Contributions
Efficient LP-based (instead of ILP-based) optimal offline
DVS algorithm.
It is not clear how far are the existing online algorithms are
from the optimal solution.
Existing ILP-based offline solutions are not scalable.
Current offline algorithm enables optimality study of online
DVS algorithms.
Effective online approach by sequential robust linear
programming, namely SLP/r.
Consumes 0.3% more energy versus 4% more energy for
the best existing work, when both compared with optimal
solution.
Applicable to other delay-sensitive applications with timevarying workloads.
e.g. real-time stream queries for financial or medical data
and manufacturing process control.
Power Model
The power model used for Dynamic power:
Sub-threshold leakage power
Lg is the number of devices in the circuit, Ij is the reverse-bias junction
current, Vbs is the body bias voltage. K3, K4 and K5 are constant
fitting parameters.
Sleep mode of operation is also considered (which makes the power
model non-convex):
Power = 0 ; Frequency = 0
Outline
Background
Problem formulation
Optimal Offline Solution
Effective Online Algorithm
Simulations and Results
Conclusions
Formulation of DVS Problem
Given
A sequence of decoding jobs (stochastic complexity,
stochastic arrival time, deterministic deadline).
A set of voltages including power gating, each with
associated clock frequency and power.
Find
The time and voltage level for each voltage switch.
Minimize
Energy.
Subject to
Start a job after it arrives.
Finish a job before its deadline.
Formulation of DVS Problem - Contd.
Let C = {C1, C2 ... CM}, T = {T1,T2 ... TM} and D = {D1, D2 ... DM} be the
complexity, arrival time and display deadlines of M incoming jobs.
Let F = {F0, F1, ... FK} and P = {P0, P1 ... PK} be the available frequency
and power switch levels.
The scheduling solution S = {Ts, Vs, N}, where N is the number of
voltage switches; Ts = {t0, t1, ... tN, tN+1} and Vs = {v0, v1 .. vN} are the
time and voltage levels for each switch, then the DVS problem is:
Outline
Background
Problem formulation
Optimal Offline Solution
Effective Online Algorithm
Simulations and Results
Conclusions
DVS Problem in Time – Complexity Space
DVS Problem in Time – Complexity Space
Property of DVS Solution
Compared to multimedia jobs (around 109 clock cycles), voltage switching
overhead (around 10 clock cycles) is negligible.
We call time interval with constant U(t) and L(t) as the adaptation interval.
Primary Theorem: for an adaptation interval, an arbitrary ordering of any
accumulative computation curve consumes the same energy.
What really matters is the percentage of time for each voltage level.
U(t)
U(t)
L(t)
L(t)
Seq : 2 ,0 ,1 ,3 ,4
Seq : 0 ,1 ,2 ,3 ,4
LP Formulation for DVS
Length of adaptation
interval i
s. t.
Frequency
Power j
allocation of voltage level j in
adaptation interval i
Outline
Background
Problem formulation
Optimal Offline Solution
Effective Online Algorithm
Simulations and Results
Conclusions
rLP Formulation
s. t.
Difference from offline formulation
L(t) and U(t) become stochastic
L(t) and U(t) depend on stochastic complexity of jobs
A sequence of rLP for a sequence of time windows, each window is
bigger than an adaptation interval
Illustration of SLP/r
Prediction
……
U(t) for robust
linear programming
SD
mean
T1 '
T2'
T3 '
D1
L(t) for robust
linear programming
D2
D3Media
Time
Illustration of SLP/r
Prediction
rLP
……
U(t) for robust
linear programming
L(t) for robust
linear programming
T1 '
T2'
T3 '
D1
D2
D3Media
Time
Illustration of SLP/r
Prediction
Commitment
rLP
U(t)
real complexity
and arrive time of jobs
……
L(t) for robust
linear programming
T1 '
T2'
T3 '
D1
D2
D3Media
Time
Illustration of SLP/r
Prediction
Commitment
rLP
U(t)
real complexity
and arrive time of jobs
……
L(t) for robust
linear programming
T1 '
T2'
T3 '
D1
D2
D3Media
Time
Illustration of SLP/r
Prediction
Commitment
rLP
Process a new time window
U(t)
real complexity
and arrive time of jobs
L(t) for robust
linear programming
D1
D2
D3Media
Time
Outline
Background
Problem formulation
Optimal Offline Solution
Effective Online Algorithm
Simulations and Results
Conclusions
Experimental Setup
Vdd between 0.6V and 1.0V with step sizes of 0.1V, plus power
gating
Our algorithms are applicable to any power model.
Video sequence consisting of 10 different scenes.
Compare SLP/r with:
Queuing-Based Stochastic Algorithm [Foo, 2008]
Deterministic laEDF [Pillai, 2001]
Monte Carlo simulation of stochastic complexity and arrival
time to verify all results
Recap of SLP/r
Confidence level
Linear online prediction function for workload of each job
class:
mean + k * standard deviation
Deciding trade-off between miss rate and energy
Granularity of SLP/r
Number of jobs to commit before shifting the window
Deciding tradeoff between runtime and quality of solution
Comparison of Energy/Miss Rate
Granularity = 1 job
Optimality study:
laEDF: 15% more
energy.
Queuing-based: 4%
more.
Online algorithm SLP/r
1% more energy
Granularity VS Quality
Changing granularity from 1 to 4jobs
We reduce runtime, energy and miss rate
simultaneously
Granularity = 4 jobs:
0.03% miss rate with 0.3% more
energy than optimal
Energy VS Granularity/Confidence Level
Minimum energy setting:
Granularity = 4 jobs
Confidence level =1.5
Miss Rate VS Granularity/Confidence Level
Granularity = 4 jobs
Confidence level =1.5,
Miss rate close to zero
Outline
Background
Optimal Offline Solution
Effective Online Algorithm
Simulations and Results
Conclusions
Conclusions
An efficient optimal offline DVS algorithm based a tractable LP
formulation.
Enables optimality study for DVS algorithms.
An effective online approach SLP/r by sequential robust linear
programming.
Consumes 0.3% more energy versus 4% for the best
existing work, when both compared with optimal solution.
Thanks!
Q&A
Back-up slides
Existing Online Algorithms
Uses worst or average case execution time [Pillai ACM
Symposium on OS’01] [Choi, ICCAD’02] [Zhu LCTES’07] [Nahrstedt et
al., ITMC’06]
Ensures hard deadlines not missed
Soft deadlines (and slack reclamation) to reduce energy
consumption
Online workload prediction
Feedback-based [Zhu LCTES’07]
Adaptive linear prediction [Akyol 2007]
Buffer-constrained DVS [Maxiaguine et al, ICHSC’05]
Stochastic Queuing-based DVS [Foo 2008]
It’s not clear how far online algorithms
are away from the optimal solution
Existing Offline Algorithms
Offline algorithms can be used for optimality study
Optimal solution assuming that complexity and arrival time
are known based on trace
Lower bound of energy for online algorithms
Existing ILP-based offline algorithms are not scalable
[Akyol, 2007] [Zhang, ICCAD’07]