Improved EDF schedulability analysis of EDF on

Download Report

Transcript Improved EDF schedulability analysis of EDF on

Real-Time Scheduling
for
Multiprocessor Platforms
Marko Bertogna
Scuola Superiore S.Anna,
Pisa, Italy
Outline







Why do we need multiprocessor RTS?
Why are multiprocessors difficult?
Global vs partitioned scheduling
Existing scheduling algorithms
Schedulability results
Evaluation of existing techniques
Future research
As Moore’s law goes on…

Number of transistor/chip doubles every 18 to 24 mm
…heating becomes a problem
Pentium Tejas
cancelled!
Power density (W/cm2)
1000
100
Pentium P2
P1
P3
P4
STOP !!!
Hot-plate
10
286 486
8086 386
8085
8080
1 8008
4004
Power
0,1
71

74
78
85
92
Nuclear
Reactor
00
04
08
Year
P  V  f: Clock speed limited to less than 4 GHz
Why did it happen?
Integration technology constantly improves…
CMOS manufactoring process
Half-Pitch:
half the
distance
between
identical
features in
an array of
transistors
Technology trends



Reduced gate sizes
Higher frequencies allowed
More space available
BUT
 Physical limits of semiconductor-based
microelectronics
 Larger dynamic power consumed
 Leakage current becomes important
 Problematic data synchronization (GALS)
Frequency
Lead Microprocessors frequency doubled every 2 years
Frequency (Mhz)
10000
1000
100
486
10
8085
1
0.1
1970
Courtesy, Intel
8086 286
P6
Pentium ® proc
386
8080
8008
4004
1980
1990
Year
2000
2010
Intel’s timeline
Year
1971
1972
1974
1978
1979
1982
1985
1989
1993
1995
1997
1999
2000
2002
2005
2006
2007
2008
2010
Processor
4004
8008
8080
8086
8088
286
386
486
Pentium
Pentium Pro
Pentium II
Pentium III
Pentium 4
Pentium M
Pentium D
Core 2 Duo
Core 2 Quad
Core 2 Quad X
Core i3, i5, i7
Manufacturing
Frequency
Technology
10 m
108 kHz
10 m
800 kHz
6 m
2 MHz
3 m
5 MHz
3 m
5 MHz
1,5 m
6 MHz
1,5 m
16 MHz
1 m
25 MHz
0,8 m
66 MHz
0,6 m
200 MHz
0,25 m
300 MHz
0,18 m
500 MHz
0,18 m
1,5 GHz
90 nm
1,7 GHz
65 nm
3,2 GHz
65 nm
2,93 GHz
65 nm
2,66 GHz
45 nm
3 GHz
32 nm
3,33 GHz
Number of
transistors
2300
3500
4500
29000
29000
134000
275000
1200000
3100000
5500000
7500000
9500000
42000000
55000000
291000000
291000000
582000000
820000000
…
Frequency and power


f = operating frequency
V = supply voltage (V~=0.3+0.7 f)


Reducing the voltage causes a higher frequency
reduction
P = Pdynamic + Pstatic = power consumed
Pdynamic  A C V2 f
(main contributor until hundreds nm)
 Pstatic  V Ileak
(due to leakage  important when going below 100nm)

Design considerations (I)
P = A C V2 f + V Ileak





Number of transistors grows  A grows
Die size grows  C grows
Reducing V would allow a quadratic reduction
on dynamic power
But clock frequency would decrease more
than linearly since V~=0.3+0.7 f  unless Vth
as well is reduced
But, again, there are problems:
Ileak  Isub + Igox  increases when Vth is low!
Design considerations (II)
P = A C V2 f + V Ileak



Reducing Vth and gate dimensions  leakage
current becomes dominant in recent process
technologies  static dissipation
Static power dissipation is always present
unless losing device state
Summarizing:
There is no way out for classic frequency
scaling on single cores systems!
Power delivery and dissipation
100000
18KW
5KW
1.5KW
500W
Power (Watts)
10000
1000
Pentium® proc
100
286 486
8086
10
386
8085
8080
8008
1 4004
0.1
1971 1974 1978 1985 1992 2000 2004 2008
Year
Courtesy, Intel
UltraSPARC Power consumption

The largest amount
of power is
consumed by


Cores
Leakage
Keeping Moore’s law alive




Exploit the immense number of transistors in
other ways
Reduce gate sizes maintaining the frequency
sufficiently low
Use a higher number of slower logic gates
In other words:
Switch to Multicore Systems!
Solution
Denser chips with transistors operating at lower
frequencies
MULTICORE SYSTEMS
What is this?

1979 - David May’s B0042 board - 42 Transputers
The multicore invasion (high-end)








Intel’s Core 2, Itanium, Xeon: 2, 4 cores
AMD’s Opteron, Athlon 64 X2, Phenom: 2, 4
cores
IBM’s POWER7: 8 cores
IBM-Toshiba-Sony Cell processor: 8 cores
(PSX3)
Sun’s Niagara UltraSPARC: 8 cores
Microsoft’s Xenon: 3 cores (Xbox 360)
Tilera’s TILE64: 64-core
Others (network processors, DSP, GPU,…)
The multicore invasion (embedded)




ARM’s MPCore: 4 cores
ALTERA’s Nios II: x Cores
Network Processors are being replaced by
multicore chips (Broadcom’s 8-core
processors)
DSP: TI, Freescale, Atmel, Picochip (up to
300 cores, communication domain)
How many cores in the future?



Application dependent
Typically few for high-end computing
Many trade-offs



transistor density
technology limits
Amdahl’s law
Beyond 2 billion transistors/chip

Intel’s Tukwila







Itanium based
2.046 B FET
Quad-core
65 nm technology
2 GHz on 170W
30 MB cache
2 SMT 
8 threads/ck
How many cores in the future?

Intel’s 80 core prototype already available

Able to transfers a TB of data/s (while Core 2 Duo
reaches 1.66GB data/s)
How many cores in the future?

Berkeley: weather simulation for 1.5km
resolution, 1000 x realtime, 3M customtailored Tensilica cores





Petaflop computer
Power around 2.5 MW
estimated for 2011
cost around 75 M$
main obstacle will be on software
Supercomputers


Petaflop supercomputers (current
supercomputer have Teraflop computing
power)
IBM’s Bluegene/P



3 Pflops
884736 quad-core Power PC
ready before 2010
Prediction

Patterson & Hennessy: “number of cores will
double every 18 months, while power, clock
frequency and costs will remain constant”
Amdahl’s law





Originally made for speed-ups in a portion of
a program
Later adapted to measure the speedup
obtained increasing the number of processors
P = Parallel portion of a given application
N = Number of processors/cores
The total speedup obtained increasing N is
Considerations on Amdahl’s law



For N arbitrarily large  maximum speedup
tends to 1 / (1-P)
In practice, performance/price falls rapidly as
N is increased once there is even a small
component of (1 − P)
Example: P = 90%  (1 − P) = 10% 
speedup < 10
Amdahl’s law
Consequences of Amdahl’s law



“Law of diminishing returns”: picking optimal
improvements, the income is each time lower
 so is with adding processors
Considering as well the memory, bus and I/O
bottlenecks, the situation gets worse
Parallel computing is only useful for


limited numbers of processors, or
problems with very high values of P 
“embarrassingly parallel problems”
Embarassingly parallel problems

Problems for which




no particular effort is needed to segment the
problem into a very large number of parallel tasks
there is no essential dependency (or
communication) between those parallel tasks
1 core @ 4 GHz = 2 cores @ 2 GHz
Examples: GPU handled problems, 3D
projection (independent rendering of each
pixel), brute-force searching in cryptography
Performance boost with multicore




Interrupts can be handled on an idle processor
instead of preempting the running process (also for
programs written for single core)
Not faster execution, but smoother appearance
For inherently parallel applications (graphic
operations, servers, compilers, distributed
computing) speedup proportional to the number of
processors
Limitations due to serialized RAM access and cache
coherency
Less likely to benefit from
multicores




I/O bound tasks
Tasks composed by a series of pipeline
dependent calculations
Tasks that frequently communicate with each
other
Tasks that contend for shared resources
Exploiting multicores

Multicore-capable OS’s





Windows NT 4.0/2000/XP/2003 Server/Vista
Linux and Unix-like systems
Mac OS
VxWorks, QNX, etc.
Multi-threaded applications

Muticore optimizations for game engines


Half-Life 2: Episode Two, Crysis, etc.
Software development tools
Parallel programming

Existing parallel programming models









OpenMP
MPI
IBM’s X10
Intel’s TBB (abstraction for C++)
Sun’s Fortress
Cray’s Chapel
Cilk (Cilk++)
Codeplay’s Sieve C++
Rapidmind Development Platform
Identical vs heterogenous cores
ARM’s MPCore
• 4 identical ARMv6 cores
STI’s Cell Processor
• One Power Processor Element (PPE)
• 8 Synergistic Processing Element (SPE)
Allocating tasks to processors

Possible partitioning choices




Partition by CPU load
Partition by information-sharing requirements
Partition by functionality
Use the least possible number of processors
or run at the lowest possible frequency

Depends on considerations like fault tolerance,
power consumed, temperature, etc.
Real-time scheduling theory for
multiprocessors
Different models

Deadline model


Task model




Periodic (synchronous, asynchronous)
Sporadic
Generalized multiframe, Recurring (DAG), …
Priority model


Implicit, constrained, arbitrary
Static (fixed) task priority, static job priority, arbitrary
Migration model

Global, partitioned, job-level partitioned, clustered, etc.
System model


Platform with m identical processors
Task set t with n periodic or sporadic tasks ti




Period or minimum inter-arrival time Ti
Worst-case execution time Ci
Deadline Di
Utilization Ui=Ci/Ti, density li=Ci/min(Di,Ti)
Problems addressed



Feasibility problem
Run-time scheduling problem
Schedulability problem
CPU1
t1
?
t2
t3
t5
CPU2
t4
CPU3
w.r.t. a given task model
Assumptions


Independent tasks
Job-level parallelism prohibited


Preemption and Migration support


the same job cannot be simultaneously executed
on more than one processor
For global schedulers, a preempted task can
resume its execution on a different processor
Cost of preemption/migration integrated into
task WCET
Uniprocessor RT Systems







Solid theory (starting from the 70s)
Optimal schedulers
Tight schedulability tests for different task
models
Shared resource protocols
Bandwidth reservation schemes
Hierarchical schedulers
RTOS support
EDF for uniprocessor systems




Optimality: if a collection of jobs can be
scheduled with a given scheduling algorithm,
than it is schedulable as well with EDF
Bounded number of preemptions
Efficient implementations
Exact feasibility conditions


linear test for implicit deadlines: Utot ≤ 1
Pseudo-polynomial test for constrained and
arbitrary deadlines [Baruah et al. 90]
Uniprocessor feasibility

EDF optimal for arbitrary job collections
Deadline model
Task model
Implicit
Constrained or Arbitrary
Sporadic or
Synchronous Periodic
Linear test:
Utot ≤ 1
Unknown complexity;
Pseudo-polynomial test if Utot< 1:
EDF until Utot/(1- Utot) · max(Ti-Di)
Asynchronous Periodic
Linear test:
Utot ≤ 1
Strong NP-hard;
Exponential test: EDF until 2H+Dmax+rmax
RT scheduling for uniprocessors

Optimal priority assignments for sporadic and
synchronous periodic task systems



RM for implicit deadlines
DM for constrained deadlines
Exact FP schedulability conditions

Response Time Analysis for Fixed Priority systems:
the response time of task k is given by the fixed
point of Rk in the iteration
 Rk 
Rk  Ck      Ci
t i hp Ti 
Uniprocessor static priorities
Deadline model
Task model
Implicit
Constrained
Arbitrary
Sporadic or
Synchronous Periodic
RM
optimality
DM
optimality
Unknown complexity;
Audsley’s bottom-up algorithm
(exponential complexity)
Asynchronous
Periodic
Unknown complexity;
Audsley’s bottom-up algorithm (exponential complexity)
Uniprocessor static priority
schedulability
Deadline model
Task model
Sporadic or
Synchronous Periodic
Asynchronous
Periodic
Implicit
Constrained
Arbitrary
Pseudo-polynomial
simulation until Tmax
or RTA
Pseudo-polynomial
simulation until Dmax
or RTA
Unknown
complexity;
Lehoczky’s test
(exponential)
Strong NP-hard;
Simulation until 2H+rmax or other exponential tests
Muliprocessors are difficult

“The simple fact that a task can use only one
processor even when several processors are free at
the same time adds a surprising amount of difficulty
to the scheduling of multiple processors” [Liu’69]
CPU1
CPU2
CPU3
Multiprocessor RT Systems







First theoretical results starting from 2000
Many NP-hard problems
Few optimal results
Heuristic approaches
Simplified task models
Only sufficient schedulability tests
Limited RTOS support
Global vs partitioned scheduling

Single system-wide queue instead of multiple
per-processor queues:
Global scheduling
t5
t4
t3
t2
t1
Partitioned approach
CPU1
t1
t5 t4 t1
CPU1
t1
CPU2
t2
t3 t2
CPU2
t2
CPU3
t3
CPU3
Partitioned Scheduling

The scheduling problem reduces to:
Bin-packing
problem
NP-hard in the
strong sense
+
Uniprocessor
scheduling
problem
t2
t3
t4
t5
Well known
Various heuristics used:
FF, NF, BF, FFDU, BFDD, etc.

t1
EDF
Utot ≤ 1
RM
(RTA)
Global (work-conserving) and partitioned
approaches are incomparable
...
Global scheduling on SMP
Global queue
(ordered according to a given policy)
t5
t4
t3
t2
t1
CPU1
t1
CPU2
t2
CPU3
t3
The first m tasks are scheduled upon the m CPUs
Global scheduling on SMP
Global queue
(ordered according to a given policy)
t5
t54
t43
t2
t1
CPU1
t1
CPU2
t2
CPU3
t43
When a task finishes its execution, the next one in
the queue is scheduled on the available CPU
Global scheduling on SMP
Global queue
(ordered according to a given policy)
t5
t45
t34
t2
t1
CPU1
t1
CPU2
t2
CPU3
t34
t3
When a higher priority task arrives, it preempts the
task with lowest priority among the executing ones
Global scheduling on SMP
Global queue
(ordered according to a given policy)
t5
t54
t43
t32
t21
Task t4 “migrated” from
CPU3 to CPU1
CPU1
t41
CPU2
t2
CPU3
t34
When another task ends its execution, the
preempted task can resume its execution
Global scheduling


The m highest priority ready jobs are always
the one executing
Work-conserving scheduler

No processor is ever idled when a task is ready to
execute.
t5
t4
t3
t2
t1
CPU1
t1
CPU2
t2
CPU3
t3
Global scheduling: advantages
Load automatically balanced
Easier re-scheduling (dynamic loads, selective
shutdown, etc.)
Lower average response time (see queueing theory)
More efficient reclaiming and overload management
Number of preemptions
Migration cost: can be mitigated by proper HW (e.g.,
MPCore’s Direct Data Intervention)
Few schedulability tests  Further research needed
Global Scheduling problem

Pfair optimal only for implicit deadlines:
Utot ≤ m




preemption and synchronization issues
No optimal scheduler known for more general
task models
Classic schedulers are not optimal:
Dhall’s effect
Hybrid schedulers: EDF-US, RM-US, DM-DS,
AdaptiveTkC, fpEDF, EDF(k), EDZL, …
Dhall’s effect
Example: m processors, n=m+1 tasks, Di = Ti
t1 ,…, tm = (1,T-1)
tm+1 = (T,T)
m light tasks
1 heavy task
Utot1
DEADLINE
MISS
T
EDF can fail at very low utilizations
Hybrid schedulers

EDF-US, RM-US, DM-DS, fpEDF


EDF(k), RM(k), DM(k)


assign priorities according to a function (T- k C)
EDZL


give highest priority to the heaviest k tasks and schedule the
remaining ones with EDF/RM/DM
AdaptiveTkC


give highest static priority to the heaviest tasks and schedule
the remaining ones with EDF/RM/DM
…
Schedule tasks with EDF, raising the priority to the jobs that
reach zero laxity
Global vs partitioned

There are task sets that are schedulable only
with a global scheduler
Example: t1=(1,2); t2=(2,3); t3=(2,3)

Valid also for global FP assigning p2 > p1 > p3

Global vs partitioned


There are task sets that are schedulable only
with a partitioned scheduler
Example: t1=(2,3); t2=(3,4); t3=(5,15); t4=(5,20)
Processor 1
Processor 2
Global vs partitioned
t1=(2,3); t2=(3,4); t3=(5,15); t4=(5,20)



In interval [0,12) there are 9 jobs  9! possible job
priority assignments
For all of them there is either a deadline miss or an idle
slot in [0,12)
Since total utilization equals m  deadline miss
Global vs partitioned (FP)


There are task sets that are schedulable only
with a partitioned scheduler
Example: t1=(4,6); t2=(7,12); t3=(4,12); t4=(10,24)
Processor 1

Processor 2
All 4!=24 global priority assignments lead to deadline
miss
Global vs partitioned (FP)

Example: p1>p2>p3>p4:
t1=(4,6); t2=(7,12); t3=(4,12); t4=(10,24)
Partitioned scheduling heuristics







First Fit (FF)
Best Fit (BF)
Worst Fit (WF)
Next Fit (NF)
FFD
BFD
…
Partitioned schedulability



Lopez et al.: EDF-FF gives the best utilization
bound among all possible partitioning
methods
The bound is:
A refined bound, when Umax is the maximum
utilization among all tasks, is:
, where
= 1/Umax
Partitioned schedulability
Tightness of the bound




The bound
is tight
Take
tasks with utilization
By definition of ,
tasks of utilization
Umax do not fit into one processor 
Umax >
 the set is well defined
At least one processor should allocate
or more tasks  but they do not fit since
 deadline miss
Partitioned schedulability



BF, BFD, FFD also give the same utilization
bound, but with a higher computational
complexity
Among fixed priority systems, the best
utilization bound is given by FFD and BFD
The bound for fixed priority is somewhat
more complicated (see Lopez et al.’04)
Global schedulers (implicit)


Pfair algorithms are optimal for (periodic and
sporadic) systems with implicit deadlines
Based on GPS, with a lag bounded by one




In any interval t, a task with utilization U will
execute for an amount W, with Ut-1 < W < Ut+1
Different Pfair algorithms (PF, PD, PD2) [see
Anderson et al.]
Other optimal algorithms: LLREF, EKG, BF
All these algorithms suffer from a large
number of preemptions/migrations
Global scheduling (constrained)



No optimal algorithm is known for
constrained or arbitrary deadline systems
No optimal on-line algorithm is possible for
arbitrary collection of jobs [Leung and
Whitehead]
Even for sporadic task system optimality
requires clairvoyance [Fisher et al’09]
Global scheduling: main results
Many sufficient schedulability tests:
 GFB (RTSJ’01)
 BAK (RTSS’03  TPDS’05)
 BAR (RTSS’07)
 LOAD (ECRTS’07,ECRTS’08,RTSJ’08  RTSJ’09)
 BCL (ECRTS’05  TPDS’09)
 RTA (RTSS’07)
 FF-DBF (ECRTS’09)
Global scheduling: main results

Utilization-based tests (implicit deadlines)




Polynomial tests




EDF  Goossens et al.: Utot ≤ m(1-Umax)+Umax
fpEDF  Baruah: Utot ≤ (m+1)/2
RM-US  Bertogna et al.: Utot ≤ (m+1)/3
EDF, FP  Baker: O(n2) and O(n3) tests
EDZL  Cirinei,Baker: O(n2) test
EDF, FP, WC  Bertogna et al.: O(n2) test
Pseudo-polynomial tests



EDF, FP  Fisher,Baruah: load-based tests
EDF, FP, WC  Bertogna et al.: RTA
EDF  Baruah et al.: BAR and FF-DBF
Global schedulability tests



Few dominance results
Most tests are incomparable
Different possible metrics for evaluation
Possible metrics for evaluation

Percentage of schedulable task set detected



Processor speedup factor s



Over a randomly generated load
Depends on the task generation method
All feasible task sets pass the test on a platform in
which all processors are s times as fast
Run-time complexity
Sustainability and predictability properties

Tests still succeeds if Ci , Ti
, Di
Processor speedup factor



All feasible task sets pass the schedulability
test on a platform in which all processors are
s times as fast
Phillips et al.’97: Each collection of jobs that
is feasible on m processors can be scheduled
with EDF when processors are
times
as fast
A test is better if its speedup bound 
Sustainability

A scheduling algorithm is sustainable iff
schedulability of a task set is preserved
when
1.
2.
3.


decreasing execution requirements
increasing periods of inter-arrival times
increasing relative deadlines
Baker and Baruah [ECRTS’09]: global EDF
for sporadic task sets is sustainable w.r.t.
points 1. and 2.
Sustainable schedulability test
The GFB test



For implicit deadline systems (Di = Ti)
Linear complexity
Utilization-based test (tight)
A task set is schedulable with EDF on a platform
with m identical processors if:
Total utilization
Umax = maxi{Ci/Ti}
(Utilization of the heaviest task)
The GFB test
Total Utilization
Utot  m (1-Umax) + Umax
m
1
0
1
Umax = max{Ci/Ti}
GFB


Density-based test  linear complexity
Sustainable w.r.t. all parameters
Density-based tests


EDF:
ltot ≤ m(1-lmax)+lmax
EDF-DS[1/2]: ltot ≤ (m+1)/2
[ECRTS’05]
Gives highest priority to (at most m-1) tasks
having lt ≥ 1/2, and schedules the
remaining ones with EDF


DM:
ltot ≤ m(1–lmax)/2+lmax
DM-DS[1/3]: ltot ≤ (m+1)/3
Gives highest priority to (at most m-1) tasks
having lt ≥ 1/3, and schedules the remaining
ones with DM (only constrained deadlines)
[OPODIS’05]
Critical instant



A particular configuration of releases that leads to
the largest possible response time of a task.
Possible to derive exact schedulability tests
analyzing just the critical instant situation.
Uniprocessor FP and EDF: a critical instant is when



all tasks arrive synchronously
all jobs are released as soon as permitted
Response Time Analysis for uniprocessors

FP  the response time of task k is given by the fixed
point of Rk in the iteration
 Rk 
Rk  Ck      Ci
t i hp Ti 
Multiprocessor anomaly

Synchronous periodic arrival of jobs is not a
critical instant for multiprocessors:
t1 = (1,1,2)
t2 = (1,1,3)
t3 = (5,6,6)
Second job periodic
of t2
Synchronous
delayed
situation
by one unit
from [Bar07]
Need to find pessimistic situations to
derive sufficient schedulability tests
Problem window
First missed deadline
L
tk
Dk
Ck
t
Ti
εi
ti
Ci
Carry-in
Di
Adopted techniques




Consider the interference on the problem job
Bound the interference with the workload
Use an upper bound on the workload
Existing schedulability tests differ in


Problem window selection: L
Carry-in bound εi in the considered window



Amount of each contribution (BAK, LOAD, BCL, RTA)
Number of carry-in contributions (BAR, LOAD)
Total amount of all contributions (FF-DBF, GFB)
Introducing the interference
Ik = Total interference suffered by task tk
Iki = Interference of task ti on task tk
CPU3
CPU2
CPU1
rk
tk
Ik3
Ik2
Ik4
Ik1
Ik5
Ik3
tk
Ik3
Ik6
Ik5
Ik2
Ik7 Ik8
  I ki ( Rk ) 
I k ( Rk )  

m


tk
rk+Rk
1

i
Rk  Ck  I k ( Rk )  Ck    I k ( Rk )
 m i k

Limiting the interference
It is sufficient to consider at most the portion (Rk-Ck+1)
of each term Iik in the sum
CPU3
CPU2
CPU1
tk
Ik3
Ik2
Ik4
Ik1
Ik5
Ik3
tk
Ik3
Ik6
Ik5
Ik2
Ik7 Ik8
tk
rk+Rk
rk
I (Rk )  I k (Rk )  Rk  Ck 1
i
k
It can be proved that WCRTk is given by the fixed point of:
1

i
Rk  Ck    min(I k ( Rk ), Rk  Ck  1)
 m i k

Bounding the interference
Exactly computing the interference is complex
1.
Pessimistic assumptions:
Bound the interference of a task with the
workload:
I ( Rk )  Wi ( Rk )
i
k
2.
Use an upper bound on the workload.
Bounding the workload
Consider a situation in which:


ti
The first job executes as close as possible to its deadline
Successive jobs execute as soon as possible
Di
Ci
Ti
Ci
Ci
εi
Ci
L
Wi ( L)  wi ( L)  Ni ( L)Ci   i ( L)
where:
 L  Di  Ci  (# jobs excluded the last one)
N i ( L)  

T
i


 i ( L)  min(Ci , Li  Di  Ci  Ni ( L)Ti ) (last job)
RTA for generic global schedulers

An upper bound on the WCRT of task k is given
by the fixed point of Rk in the iteration:
1

Rk  Ck    min(wi ( Rk ), Rk  Ck  1)
 m i k

Rk

Sk
The slack of task k is at least:
Sk  Dk  Rk
Improvement using slack values
Consider a situation in which:


ti
The first job executes as close as possible to its deadline
Successive jobs execute as soon as possible
Di
Ci
Ti
Ci
Ci
εi
Ci
L
Wi ( L)  wi ( L)  Ni ( L)Ci   i ( L)
where:
 L  Di  Ci  (# jobs excluded the last one)
N i ( L)  

T
i


(last job)
 i ( L)  min(Ci , Li  Di  Ci  Ni ( L)Ti )
Improvement using slack values
Consider a situation in which:


ti
The first job executes as close as possible to its deadline
Successive jobs execute as soon as possible
Si
Ci
Di
Ti
Ci
Ci
Ci
L
Wi ( L)  wi ( L, Si )  Ni ( L, Si )Ci   i ( L, Si )
where:
 L  Di  Ci  Si 
N i ( L, Si )  

T
i


 i (L, Si )  min(Ci , Li  Di  Ci  Si  Ni ( L, Si )Ti )
Improvement using slack values
Consider a situation in which:


ti
The first job executes as close as possible to its deadline
Successive jobs execute as soon as possible
Ri
Ci
Di
Ti
Ci
Ci
Ci
L
Wi ( L)  wi ( L, Ri )  Ni ( L, Ri )Ci   i ( L, Ri )
where:
 L  Ri  Ci 
N i ( L, Ri )  

T
i


 i ( L, Ri )  min(Ci , Li  Ri  Ci  Ni ( L, Ri )Ti )
RTA for generic global schedulers

An upper bound on the WCRT of task k is given
by the fixed point of Rk in the iteration:
1

Rk  Ck    min(wi ( Rk , Ri ), Rk  Ck  1)
 m i k

If a fixed point Rk ≤ Dk is reached for every task k in the
system, the task set is schedulable with any workconserving global scheduler.
Iterative schedulability test
1.
2.
All response times Ri initialized to Di
Compute response time bound for tasks
1,…,n


3.
4.
5.
if smaller than old value  update Ri
If Ri > Di, mark as temporarily not schedulable
If all tasks have Ri ≤ Di  return success
If no response time has been updated for
tasks 1,…,n  return fail
Otherwise, return to point 2
RTA refinement for Fixed Priority

The interference on higher priority tasks is
always null:
I ( Rk )  0, i  k
i
k

An upper bound on the WCRT of task k can
be given by the fixed point of Rk in the
iteration:
1

Rk  Ck    min(wi ( Rk , Ri ), Rk  Ck  1)
 m i k

RTA refinement for EDF

A different bound can be derived analyzing the
worst-case workload in a situation in which:


The interfering and interfered tasks have a common deadline
All jobs execute as late as possible
I ( Rk )  wi( Dk , Ri )
i
k
ti
Ri
Di
Ci
tk
Ci
Ti
Ci
Dk
Ci
RTA refinement for EDF
ti
Ri
Di
Ci
tk
Ci
Ti
Ci
Ci
Dk
wi( Dk , Ri ) 
An upper bound on the WCRT of task k is given
by the fixed point of Rk in the iteration:
1

1. Rk  Ck    min(wi ( Rk , Ri ), wi( Dk , Ri ), Rk  Ck  1)
 m i k


Complexity



Pseudo-polynomial complexity
Fast average behavior
Lower complexity for Fixed Priority systems


at most one slack update per task, if slacks are
updated in decreasing priority order.
Possible to reduce complexity limiting the
number of rounds
Polynomial complexity test





A simpler test can be derived avoiding the
iterations on the response times
A lower bound on the slack of tk is given by:
The iteration on the slack values is the same
Performances comparable to RTA-based test
Complexity down to O(n2)
BAK
(BAK)



Polynomial complexity: O(n3)
Pessimistic version O(n2)
Not so good performances
BAR


Go back until the first instant at which some
of the processors is idled
At most (m-1) carry-in contributions
When no
carry-in 
BAR
When
carry-in 
BAR
(BAR)

When Utot < m  pseudo-polynomial
complexity (limit the Ak to check)
LOAD


Computation is exponential in the worst-case
Polynomial and pseudo-polynomial
approximations
LOAD
(LOAD)


Sustainable
Proc. Speedup bound of
RTA




An upper bound on the WCRT of task k is given
by the fixed point of Rk in the iteration:
Iteratively refine the response time bounds
using already computed values
Pseudo-polynomial complexity
Sustainable w.r.t. task periods
FF-DBF
ti
Di
Ci
Executing at speed s:
dmax ≤ s ≤ 1
Ci
Ti
Ci
t
Problem window
Ci
FF-DBF
FF-DBF
(FF-DBF)


Pseudo-polynomial complexity
Best processor speedup bound:
Multiprocessor feasibility
Deadline model
Task model
Implicit
Asynchronous Periodic
Arbitrary
Unknown complexity;
Synchronous periodic not a critical instant
Sporadic
Synchronous Periodic
Constrained
Linear test:
Utot ≤ m
Unknown complexity;
Horn’s algorithm in (0,H]
Strong NP-hard
Unknown
complexity
Multiprocessor run-time
scheduling
Deadline model
Task model
Implicit
Sporadic
P-fair, GPS
Synchronous Periodic
Asynchronous Periodic
P-fair, GPS,
LLREF, EKG, BF
Constrained
Arbitrary
Requires clairvoyance
Unknown complexity;
Clairvoyance not needed;
Horn’s algorithm in (0,H]
Unknown
complexity;
Clairvoyance
not needed
Unknown complexity;
Clairvoyance not needed
Utot > m
load > m
load* > m
Not feasible
Feasibility conditions
Sufficient feasibility and
schedulability tests
Σi Ci /min(Di,Ti) ≤ m
Feasible
???
Multiprocessor static job priority
feasibility
Deadline model
Task model
Implicit
Sporadic
Unknown complexity
Synchronous
Periodic
Asynchronous
Periodic
Constrained
Unknown complexity;
Synchronous periodic not a critical instant
Unknown complexity;
Simulation until hyperperiod for all N! job
priority assignments
Unknown complexity
Arbitrary
Unknown complexity
Strong NP-hard
Multiprocessor static job priority
schedulability
Deadline model
Task model
Implicit
Sporadic
Unknown complexity
Synchronous
Periodic
Asynchronous
Periodic
Constrained
Arbitrary
Unknown complexity;
Synchronous periodic not a critical instant
Unknown complexity;
Simulation until hyperperiod
Strong NP-hard
Unknown complexity
Multiprocessor static priority runtime scheduling
Deadline model
Task model
Implicit
Constrained
Arbitrary
Periodic
(synchronous or
asynchronous)
Unknown complexity;
Cucu’s optimal priority assignment
Sporadic
Unknown complexity;
Multiprocessor static priority feasibility
Deadline model
Task model
Implicit
Constrained
Arbitrary
Sporadic
Unknown complexity;
Synchronous periodic not a critical instant
Synchronous
Periodic
Strong NP-hard;
Simulation until hyperperiod for all n! priority assignments
Asynchronous
Periodic
Strong NP-hard;
Simulation on exponential feasibility interval for all n! priority
assignments
Multiprocessor static priority
schedulability
Deadline model
Task model
Implicit
Constrained
Arbitrary
Sporadic
Unknown complexity;
Synchronous periodic not a critical instant
Synchronous
Periodic
Unknown complexity;
Simulation until hyperperiod
Asynchronous
Periodic
Strong NP-hard;
Simulation on exponential feasibility interval
Conclusions



Multiprocessor Real-Time systems are a
promising field to explore.
Still few existing results far from tight conditions.
Future work:



Find tighter schedulability tests
Take into account shared resources
Integrate into Resource Reservation framework.
Bibliography







Sustainability: Sanjoy Baruah and Alan Burns. Sustainable scheduling analysis.
In Proceedings of the IEEE Real-time Systems Symposium, Rio de Janeiro,
December 2006.
Speedup: Cynthia A. Phillips, Cliff Stein, Eric Torng, and Joel Wein. Optimal timecritical scheduling via resource augmentation. In Proceedings of the TwentyNinth Annual ACM Symposium on Theory of Computing, El Paso, Texas, 4{6 May
1997.
BAR test: Sanjoy Baruah. Techniques for mutliprocessor global schedulability
analysis. In Proceedings of the IEEE Real-time Systems Symposium, Tucson,
December 2007.
FF-DBF test: Sanjoy Baruah, Vincenzo Bonifaci, Alberto Marchetti-Spaccamela,
and Sebastian Stiller. Implementation of a speedup-optimal global EDF
schedulability test. In Proceedings of the EuroMicro Conference on Real-Time
Systems, Dublin, Ireland, July 2009.
RTA test: Marko Bertogna and Michele Cirinei. Response-time analysis for
globally scheduled symmetric multiprocessor platforms. In 28th IEEE Real-Time
Systems Symposium (RTSS), Tucson, Arizona (USA), 2007.
BCL: Marko Bertogna, Michele Cirinei, and Giuseppe Lipari. Schedulability
analysis of global scheduling algorithms on multiprocessor platforms. IEEE
Transactions on Parallel and Distributed Systems, 20(4):553{566, April 2009.
GFB test: Joel Goossens, Shelby Funk, and Sanjoy Baruah. Priority-driven
scheduling of periodic task systems on multiprocessors. Real Time Systems,
25(2{3):187{205, 2001.
The end