Exploiting Symmetry in SAT-Based Boolean Matching for

Download Report

Transcript Exploiting Symmetry in SAT-Based Boolean Matching for

Optimality and Improvement of
Dynamic Voltage Scaling Algorithms for
Multimedia Applications
Authors:
Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar
Presented by :
Amarnath Kasibhatla & ViswaKiran Popuri
Electrical Engineering Dept., UCLA
Outline
 Background
 Problem formulation
 Optimal Offline Solution
 Effective Online Algorithm
 Simulations and Results
 Conclusions
Overview of the Background Review
 Why DVS is required and how it helps?
 Brief Review of DVS algorithms
Changes in Communication & Computation
 From Wall-plugged Towards Portable systems


Battery did not scale as much! Has LIMITED ENERGY
And that limited energy is expected to last long.
 From General-purpose Towards Intelligent systems

Computationally more expensive (freq. of operation is ever
increasing)! Requires MORE ENERGY
 From Continuous-Mode Towards Burst-Mode


Multimedia (PEAK requirement >> AVERAGE throughput)
For e.g. Video transmission over Mobile phones
Need an ENERGY-efficient processor to
support high peak operating freq while
conserving energy in other modes.
CMOS & DV(F)S
 Energy = Csw*Vdd2*fclk*Texec (Quadratic wrt Vdd)
 Delay α Vdd/(Vdd–Vth)α (Freq scales Linearly wrt Vdd )
 Dynamic Voltage Scaling: Reduce Vdd to an extent
that the Delay requirements are just met.
 Example: fclk = (n/ ΔT)
 Case1:
 Texec= ΔT/2
ΔT

E1 = (1/2)*CVdd2n
 Case2:
 Texec= ΔT, (Vdd/2), fclk/2

E2 = (1/8)*CVdd2n
= (1/4)*E1
75 % Reduction in Energy!
Taken from *[1]
Need and Challenge of Power Management
 Multimedia applications are energy sensitive

Computationally demanding

Stringent deadlines for multiple tasks

Processed on energy-limited devices
 DVS (dynamic voltage scaling) algorithms to reduce
energy while meeting deadlines
 DVS algorithms need to deal with uncertainty in

Job complexity (time-varying workloads)

Communication delay (e.g. wireless channel)
DVS Hardware/Algorithm requirements
 Hardware: Programmable DC-DC switching voltage
regulator, Programmable clock gen & Processor with
multiple Operating Points.
 Algorithm: Accurate Prediction of workload requirement
for a give computation task.

Error in Prediction causes Deadline miss or Low energyefficiency
Taken from *[1]
Overview of the background review
 Why DVS is required and how it helps?
 Types of DVS algorithms
DVS algorithms




Single-task deadline based
Multiple-task deadline based
Feedback control based
Stochastic Model based
Single-task deadline based
 Uses Worst-Case Exec Time (WCET) or Average Case
Exec Time (ACET)
 Frame-based DVS [1]

Each frame is handled individually for accurate prediction of
decoding time.
 Cross-layer adaptation [2]

Adapts Hardware layer (Vdd Scaling) for small changes in
processing overload (fine granularity).
Taken from *[2]
 (+) Low computational Complexity
 (-) Tasks with imminent deadlines take huge energy to finish
task in time.
Multiple-task deadline based
 Real Time-DVS (RT-DVS) [3]

Normal-DVS
 Based
on average throughput (For e.g ACET)
 Simple feedback mechanism: Detect the idle time
and adjust the freq. This might cause deadline miss

DVS must be tightly coupled with the real-time scheduler of the
OS.
 Rate-Monotonic
Scheduler (RM): Static, Quickest
Task first
 Earliest-Deadline-First Scheduler (EDF): Dynamic,
Task with earliest deadline first
Multiple-task deadline based
 Look-Ahead RT-DVS: Defer as much work as possible.
Set oper freq to meet the minimum work that must be
done to meet the deadlines.
Taken from *[3]
Feedback Control Based
 Earlier approaches of hard real-time
scheduling rely on a priori
knowledge of WCET.
 The actual execution time varies a
lot from the estimated WCET.
 If actual exec time is lesser, proc
consumes more Energy ( & hence
computed earlier) than required.

~60% degradation
For e.g. 60% degradation in RTDVS for fluctuating workload.
 Use feedback control techniques in
real-time scheduling for hard realtime systems such that the DVS
scheme should adjust to the everchanging workload as fast as
possible.
Taken from *[4]
Feedback Control Based
 Feedback-DVS [4]
 Based on the difference
between CA and the actual
exec time (error) & the
WCETs (Maximal Schedule
Profile) the Vol/Freq
selector chooses a V,F.
 Using the V,F input, the
scheduler schedules the
next ready task (from the
Task Queue) using EDF
policy.
 The estimated exec time for
the next job is fed-back for
later decision making.
Taken from *[4]
 CA : Execution time of
the first portion of tasks.
 Maximal Schedule
Profile: Has the offline
generated exec times.
Complexity of Multimedia Applications
 Huge work-load variations between different classes of jobs
 Work load distribution within each class of decoding jobs is
observed to estimate mean and variance through off-line
training.
Model for Stochastic Complexity of Jobs
 Class-based stochastic model [5]



A job class is a particular GOP frame type
Near Gaussian distributed complexity
Parameters derived offline and transmitted online with low cost
B. Foo and M. van der Schaar, "A Queuing Theoretic Approach to Processor Power
Adaptation for Video Decoding Systems," IEEE Trans. Signal Process., vol. 56, no. 1,
pp. 378-392, Jan. 2008
References
 [1] “Frame-Based Dynamic Voltage and Frequency Scaling for a
MPEG Decoder” Kihwan Choi et al, ICCAD 2002
 [2] “GRACE: Cross-layer adaptation for multimedia quality and
battery energy” W. Yuan et al, IEEE trans. On Mobile Computing,
2006
 [3] “Real time dynamic voltage scaling for low power embedded
operating systems” P.Pillai et al, Proc of ACM symposium on
Operating Systems, 2001
 [4] “Feedback EDF scheduling exploiting dynamic voltage scaling”
Y. Zhu et al, Proc. Of International Conf on Comp. Arch, 2004
 [5] "A Queuing Theoretic Approach to Processor Power Adaptation
for Video Decoding Systems," B. Foo and M. van der Schaar, IEEE
Trans. Signal Process., vol. 56, no. 1, pp. 378-392, Jan. 2008
Contributions
 Efficient LP-based (instead of ILP-based) optimal offline
DVS algorithm.



It is not clear how far are the existing online algorithms are
from the optimal solution.
Existing ILP-based offline solutions are not scalable.
Current offline algorithm enables optimality study of online
DVS algorithms.
 Effective online approach by sequential robust linear
programming, namely SLP/r.

Consumes 0.3% more energy versus 4% more energy for
the best existing work, when both compared with optimal
solution.
 Applicable to other delay-sensitive applications with timevarying workloads.

e.g. real-time stream queries for financial or medical data
and manufacturing process control.
Power Model
 The power model used for Dynamic power:
 Sub-threshold leakage power
 Lg is the number of devices in the circuit, Ij is the reverse-bias junction
current, Vbs is the body bias voltage. K3, K4 and K5 are constant
fitting parameters.
 Sleep mode of operation is also considered (which makes the power
model non-convex):
Power = 0 ; Frequency = 0
Outline
 Background
 Problem formulation
 Optimal Offline Solution
 Effective Online Algorithm
 Simulations and Results
 Conclusions
Formulation of DVS Problem
 Given


A sequence of decoding jobs (stochastic complexity,
stochastic arrival time, deterministic deadline).
A set of voltages including power gating, each with
associated clock frequency and power.
 Find

The time and voltage level for each voltage switch.
 Minimize

Energy.
 Subject to


Start a job after it arrives.
Finish a job before its deadline.
Formulation of DVS Problem - Contd.
 Let C = {C1, C2 ... CM}, T = {T1,T2 ... TM} and D = {D1, D2 ... DM} be the
complexity, arrival time and display deadlines of M incoming jobs.
 Let F = {F0, F1, ... FK} and P = {P0, P1 ... PK} be the available frequency
and power switch levels.
 The scheduling solution S = {Ts, Vs, N}, where N is the number of
voltage switches; Ts = {t0, t1, ... tN, tN+1} and Vs = {v0, v1 .. vN} are the
time and voltage levels for each switch, then the DVS problem is:
Outline
 Background
 Problem formulation
 Optimal Offline Solution
 Effective Online Algorithm
 Simulations and Results
 Conclusions
DVS Problem in Time – Complexity Space
DVS Problem in Time – Complexity Space
Property of DVS Solution
 Compared to multimedia jobs (around 109 clock cycles), voltage switching
overhead (around 10 clock cycles) is negligible.
 We call time interval with constant U(t) and L(t) as the adaptation interval.
 Primary Theorem: for an adaptation interval, an arbitrary ordering of any
accumulative computation curve consumes the same energy.

What really matters is the percentage of time for each voltage level.
U(t)
U(t)
L(t)
L(t)
Seq : 2 ,0 ,1 ,3 ,4
Seq : 0 ,1 ,2 ,3 ,4
LP Formulation for DVS
Length of adaptation
interval i
s. t.
Frequency
Power j
allocation of voltage level j in
adaptation interval i
Outline
 Background
 Problem formulation
 Optimal Offline Solution
 Effective Online Algorithm
 Simulations and Results
 Conclusions
rLP Formulation
s. t.
Difference from offline formulation
L(t) and U(t) become stochastic
 L(t) and U(t) depend on stochastic complexity of jobs
 A sequence of rLP for a sequence of time windows, each window is
bigger than an adaptation interval
Illustration of SLP/r
Prediction
……
U(t) for robust
linear programming
SD
mean
T1 '
T2'
T3 '
D1
L(t) for robust
linear programming
D2
D3Media
Time
Illustration of SLP/r
Prediction
rLP
……
U(t) for robust
linear programming
L(t) for robust
linear programming
T1 '
T2'
T3 '
D1
D2
D3Media
Time
Illustration of SLP/r
Prediction
Commitment
rLP
U(t)
real complexity
and arrive time of jobs
……
L(t) for robust
linear programming
T1 '
T2'
T3 '
D1
D2
D3Media
Time
Illustration of SLP/r
Prediction
Commitment
rLP
U(t)
real complexity
and arrive time of jobs
……
L(t) for robust
linear programming
T1 '
T2'
T3 '
D1
D2
D3Media
Time
Illustration of SLP/r
Prediction
Commitment
rLP
Process a new time window
U(t)
real complexity
and arrive time of jobs
L(t) for robust
linear programming
D1
D2
D3Media
Time
Outline
 Background
 Problem formulation
 Optimal Offline Solution
 Effective Online Algorithm
 Simulations and Results
 Conclusions
Experimental Setup
 Vdd between 0.6V and 1.0V with step sizes of 0.1V, plus power
gating

Our algorithms are applicable to any power model.
 Video sequence consisting of 10 different scenes.
 Compare SLP/r with:


Queuing-Based Stochastic Algorithm [Foo, 2008]
Deterministic laEDF [Pillai, 2001]
 Monte Carlo simulation of stochastic complexity and arrival
time to verify all results
Recap of SLP/r
 Confidence level

Linear online prediction function for workload of each job
class:
mean + k * standard deviation

Deciding trade-off between miss rate and energy
 Granularity of SLP/r


Number of jobs to commit before shifting the window
Deciding tradeoff between runtime and quality of solution
Comparison of Energy/Miss Rate
 Granularity = 1 job
Optimality study:
laEDF: 15% more
energy.
Queuing-based: 4%
more.
Online algorithm SLP/r
1% more energy
Granularity VS Quality
Changing granularity from 1 to 4jobs
We reduce runtime, energy and miss rate
simultaneously
Granularity = 4 jobs:
0.03% miss rate with 0.3% more
energy than optimal
Energy VS Granularity/Confidence Level
Minimum energy setting:
Granularity = 4 jobs
Confidence level =1.5
Miss Rate VS Granularity/Confidence Level
Granularity = 4 jobs
Confidence level =1.5,
Miss rate close to zero
Outline
 Background
 Optimal Offline Solution
 Effective Online Algorithm
 Simulations and Results
 Conclusions
Conclusions
 An efficient optimal offline DVS algorithm based a tractable LP
formulation.

Enables optimality study for DVS algorithms.
 An effective online approach SLP/r by sequential robust linear
programming.

Consumes 0.3% more energy versus 4% for the best
existing work, when both compared with optimal solution.
Thanks!
Q&A
Back-up slides
Existing Online Algorithms
 Uses worst or average case execution time [Pillai ACM
Symposium on OS’01] [Choi, ICCAD’02] [Zhu LCTES’07] [Nahrstedt et
al., ITMC’06]


Ensures hard deadlines not missed
Soft deadlines (and slack reclamation) to reduce energy
consumption
 Online workload prediction




Feedback-based [Zhu LCTES’07]
Adaptive linear prediction [Akyol 2007]
Buffer-constrained DVS [Maxiaguine et al, ICHSC’05]
Stochastic Queuing-based DVS [Foo 2008]
It’s not clear how far online algorithms
are away from the optimal solution
Existing Offline Algorithms
 Offline algorithms can be used for optimality study


Optimal solution assuming that complexity and arrival time
are known based on trace
Lower bound of energy for online algorithms
 Existing ILP-based offline algorithms are not scalable

[Akyol, 2007] [Zhang, ICCAD’07]