α - Department of Mathematics and Statistics

Download Report

Transcript α - Department of Mathematics and Statistics

Chapter 5, CPU Scheduling
1
5.1 Basic Concepts
• The goal of multi-programming is to maximize
the utilization of the CPU as a system resource
by having a process running on it at all times
• Supporting multi-programming means
encoding the ability in the O/S to switch
between currently running jobs
• Switching between jobs can be nonpreemptive or preemptive
2
• Simple, non-preemptive scheduling means that a
new process can be scheduled on the CPU only
when the current job has begun waiting, for I/O,
for example
• Non-preemptive means that the O/S will not
preempt the currently running job in favor of
another one
• I/O is the classic case of waiting, and it is the
scenario that is customarily used to explain
scheduling concepts
3
The CPU-I/O Burst Cycle
• A CPU burst refers to the period of time when
a given process occupies the CPU before
making an I/O request or taking some other
action which causes it to wait
• CPU bursts are of varying length and can be
plotted in a distribution by length
4
• Overall system activity can also be plotted as a
distribution of CPU and other activity bursts
by processes
• The distribution of CPU burst lengths tends to
be exponential or hyperexponential
5
6
The CPU scheduler = the short term
scheduler
• Under non-preemptive scheduling, when the
processor becomes idle, a new process has to be
picked from the ready queue and have the CPU
allocated to it
• Note that the ready queue doesn’t have to be
FIFO, although that is a simple, initial assumption
• It does tend to be some sort of linked data
structure with a queuing discipline which
implements the scheduling algorithm
7
Preemptive scheduling
• Preemptive scheduling is more advanced than
non-preemptive scheduling.
• Preemptive scheduling can take into account
factors besides I/O waiting when deciding
which job should be given the CPU.
• A list of scheduling points will be given next.
• It is worthwhile to understand what it means.
8
• Scheduling decisions can be made at these
points:
1. A process goes from the run state to the wait
state (e.g., I/O wait, wait for a child process to
terminate)
2. A process goes from the run state to the ready
state (e.g., as the result of an interrupt)
3. A process goes from the wait state to the ready
state (e.g., I/O completes)
4. A process terminates
9
• Scheduling has to occur at points 1 and 4.
• If it only occurs then, this is non-preemptive
or cooperative scheduling
• If scheduling is also done at points 2 and 3,
this is preemptive scheduling
10
• Points 1 and 4 are given in terms of the job
that will give up the CPU.
• Points 2 and 3 seem to relate to which process
might become available to run that could
preempt the currently running process.
11
• Historically, simple systems existed without
timers, just like they existed without mode
bits, for example
• It is possible to write a simple, nonpreemptive operating system for multiprogramming without multi-tasking
• Without a timer or other signaling, jobs could
only be switched when one was waiting for
I/O
12
• However, recall that much of the discussion in
the previous chapters assumed the use of
interrupts, timers, etc., to trigger a context
switch
• This implies preemptive scheduling
• Preemptive schedulers are more difficult to
write than non-preemptive schedulers, and
they raise complex technical questions
13
• The problem with preemption comes from
data sharing between processes
• If two concurrent processes share data,
preemption of one or the other can lead to
inconsistent data, lost updates in the shared
data, etc.
14
• Note that kernel data structures hold state for
user processes.
• The user processes do not directly dictate
what the kernel data structures contain, but
by definition, the kernel loads the state of >1
user process
15
• This means that the kernel data structures
themselves have the characteristic of data
shared between processes
• As a consequence, in order to be correctly
implemented, preemptive scheduling has to
prevent inconsistent state in the kernel data
structures
16
• Concurrency is rearing its ugly head again,
even though it still hasn’t been thoroughly
explained.
• The point is that it will become apparent that
concurrency is a condition that is inherent to a
preemptive scheduler.
• Therefore, a complete explanation of
operating systems eventually requires a
complete explanation of concurrency issues.
17
• The idea that the O/S is based on shared data
about processes can be explained concretely by
considering the movement of PCB’s from one
queue to another
• If an interrupt occurs while one system process is
moving a PCB, and the PCB has been removed
from one queue, but not yet added to another,
this is an error state
• In other words, the data maintained internally by
the O/S is now wrong/broken/incorrect…
18
Possible solutions to the problem
• So the question becomes, can the scheduler
be coded so that inconsistent queue state
couldn’t occur?
• One solution would be to only allow switching
on I/O blocks.
• The idea is that interrupts will be queued
rather than instantaneous (a queuing
mechanism will be needed)
19
• This means that processes will run to a point
where they can be moved to an I/O queue and
the next process will not be scheduled until
that happens
• This solves the problem of concurrency in
preemptive scheduling in a mindless way
• This solution basically means backing off to
non-preemptive scheduling
20
Other solutions to the problem
• 1. Only allow switching after a system call
runs to completion.
• In other words, make kernel processes
uninterruptible.
• If the code that moves PCB’s around can’t be
interrupted, inconsistent state can’t result.
• This solution also assumes a queuing system
for interrupts.
21
• 2. Make certain code segments in the O/S
uninterruptible.
• This is the same idea as the previous one, but
with finer granularity.
• It increases concurrency because interrupts
can at least occur in parts of kernel code, not
just at the ends of kernel code calls.
22
• Note that interruptibility of the kernel is
related to the problem of real time operating
systems
• If certain code blocks are not interruptible,
you are not guaranteed a fixed, maximum
response time to any particular system
request or interrupt that you generate
23
• You may have to wait an indeterminate
amount of time while the uninterruptible
code finishes processing
• This violates the requirement for a hard realtime system
24
Scheduling and the dispatcher
• The dispatcher = the module called by the
short term scheduler which
– Switches context
– Switches to user mode
– Jumps to the location in user code to run
• Speed is desirable.
• Dispatch latency refers to time lost in the
switching process
25
Scheduling criteria
• There are various algorithms for scheduling
• There are also various criteria for evaluating
them
• Performance is always a trade-off
• You can never maximize all of the criteria with
one scheduling algorithm
26
Criteria
• CPU utilization. The higher, the better. 40%90% is realistic
• Throughput = processes completed / unit time
• Turnaround time = total time for any single
process to complete
27
• Waiting time = total time spent waiting in O/S
queues
• Response time = time between submission
and first visible sign of response to the request
– This is important in interactive systems
28
Depending on the criterion, you may
want to
• Strive to attain an absolute maximum or
minimum (utilization, throughput)
• Minimize or maximize the average
(turnaround, waiting)
• Minimize or maximize the variance (for timesharing, minimize the variance, for example)
29
5.3 Scheduling Algorithms
•
•
•
•
•
•
5.3.1
5.3.2
5.3.3
5.3.4
5.3.5
5.3.6
First-Come, First-Served (FCFS)
Shortest-Job-First (SJF)
Priority
Round Robin (RR)
Multilevel Queue
Multilevel Feedback Queue
30
• Reality involves a steady stream of many, many
CPU bursts
• Reality involves balancing a number of different
performance criteria or measures
• It is worth keeping these facts in the back of your
mind
• Examples of the different scheduling algorithms
will be given below but these examples will be
based on a very few processes and a limited
number of bursts
31
• The examples will be given as a short list of
processes and their burst times.
• This information will be turned into simple Gantt
charts which make it possible to visualize the
situation.
• The scheduling algorithms will be evaluated and
compared based on a simple measure of average
waiting time.
• It is relatively simple to see the values needed for
the calculation using a Gantt chart.
32
FCFS Scheduling
• The name, first-come, first-served, should be
self-explanatory
• This is an older, simpler scheduling algorithm
• It is non-preemptive
• It is not suitable for interactive time sharing
• It can be implemented with a simple FIFO
(ready, or scheduling) queue of PCB’s
33
•
•
•
•
•
Consider the following scenario
Process
Burst length
P1
24 ms.
P2
3 ms.
P3
3 ms.
34
Avg. wait time = (0 + 24 + 27) / 3 =
17 ms.
35
• Compare with a different arrival order:
• P2, P3, P1
36
Avg. wait time = (0 + 3 + 6) / 3 =
3 ms.
37
Additional comments on performance
analysis
• It is clear that average wait time varies greatly
depending on the arrival order of processes
and their varying burst lengths
• As a consequence, it is also possible to
conclude that for any given set of processes
and burst lengths, arbitrary FCFS scheduling
does not result in a minimal or optimal
average wait time
38
• FCFS scheduling is subject to the convoy effect
• There is the initial arrival order of process
bursts
• After that, the processes enter the ready
queue after I/O waits, etc.
• Let there be one CPU bound job (long CPU
burst)
• Let there be many I/O bound jobs (short CPU
bursts)
39
• Scenario:
• The CPU bound job holds the CPU
• The other jobs finish their I/O waits and enter
the ready queue
• Each of the other jobs is scheduled, FCFS, and
is quickly finished with the CPU due to an I/O
request
• The CPU bound job then takes the CPU again
40
• CPU utilization may be high (good) under this
scheme
• The CPU bound job is a hog
• The I/O bound jobs spend a lot of their time
waiting
• Therefore, the average wait time will tend to be
high
• Recall that FCFS is not preemptive, so once the
jobs have entered, scheduling only occurs when a
job voluntarily enters a wait state due to an I/O
request or some other condition
41
SJF Scheduling
• The name, shortest-job-first, is not quite selfexplanatory
• Various ideas involved deserve explanation
• Recall that these thumbnail examples of
scheduling are based on bursts, not the overall
job time
• For scheduling purposes, it is the length of the
next burst that is important
42
• In a sense, SJF is really shortest-next-CPU
burst-first scheduling
• There is no perfect way of predicting the
length of the next burst
• Implementing SJF in practice involves devising
formulas for predicting the next burst length
based on past performance
43
• SJF can be a non-preemptive algorithm.
• When one process relinquishes the CPU for an
I/O wait, for example, all other processes are
available for scheduling at this time t = 0
• The job with the shortest predicted next CPU
burst would be chosen for scheduling
44
• SJF can also be implemented as a preemptive
algorithm.
• Jobs enter the ready queue at different times.
• These may be new jobs that have just entered
the system or jobs that have finished waiting
because the system has handled their I/O
request
45
• Let a job enter the ready queue while another job
is running.
• Let the newly ready job have a shorter predicted
CPU burst time than the predicted remaining CPU
burst time of the currently running job.
• Then the newly ready job preempts the current
job.
• Under the preemptive scenario a more
descriptive name for the algorithm would be
“shortest remaining time first” scheduling.
46
• A non-preemptive example will be given first
• Its performance characteristics will be
compared with FCFS scheduling
• Then there will be a discussion of how to
predict burst times
• This will be followed by a preemptive example
47
Non-preemptive Example
•
•
•
•
•
•
•
Consider the following scenario:
Process
burst length
P1
6 ms.
P2
8 ms.
P3
7 ms.
P4
3 ms.
Recall that for a miniature scenario like this, the
assumption is that all jobs (bursts) are available
for scheduling at time t = 0.
48
SJF order: P4, P1, P3, P2
average wait time = (0 + 3 + 9 + 16) / 4 =
7 ms.
49
• In general, the average wait time for SJF scheduling is
lower than the average wait time for FCFS scheduling
of the same processes
• This is illustrated by scheduling the jobs of this example
in FCFS order and comparing the average wait time
• Although the assumption is that all bursts are available
at time t = 0, for the comparison with FCFS, the arrival
order in the ready queue makes a difference
• Let the arrival order be represented by the subscripts
• The example using FCFS scheduling is given next
50
FCFS average wait time = (0 + 6 + 14 + 21) / 4 =
10.25 ms.
51
• In theory, SJF is actually optimal for average wait
time performance
• Always doing the shortest burst first minimizes
the aggregate wait time for all processes
• This is only theoretical because burst length can’t
be known
• In a batch system user estimates might be used
• In an interactive system user estimates make no
sense
52
Devising a formula for predicting burst
time
• The only basis for such a formula is past
performance
• What follows is the definition of an exponential
average function for this purpose
• Let tn = the actual observed length of the nth CPU
burst for a given process
• Let Tn+1 = the predicted value of the next burst
• Let α be given such that 0 <= α < 1
• Then define Tn+1 as follows:
• Tn+1 = α tn + (1 – α)Tn
53
Explanation
• Consider the coefficient α
• It is a weighting factor.
• It weights the most recent actual performance
vs. the performance before that
• To get an idea of the purpose α serves,
consider the results of the three choices given
below
54
• If α = 0, no weight is given to the most recent
burst time in predicting the next
• If α = ½, one half weight is given to the most
recent burst time and one half weight to all
burst times before that in predicting the next
• If α = 1, the prediction of the next burst time is
based completely on the most recent burst
time
55
•
•
•
•
•
tn appears in the formula.
This was the most recent actual burst time.
Tn also appears in the formula.
It is the previous prediction.
Although it is a prediction, it includes real past
performance information because
• Tn = α tn-1 + (1 – α)Tn-1
• In other words, Tn includes tn-1
56
• Ultimately this formula expands to include all
previous actual performance figures
• It regresses back to the initial predicted value, T0
• T0 has to be some sort of guess because it is not
based on any immediately preceding value for
that process
• Some arbitrary constant can be used, a system
average can be used, perhaps a value from a
previous run of the same process was kept, etc.
57
Expanding the formula
• Expanding the formula illustrates how come it
is known as an exponential average
• The expansion gives a better feel for the role
of the components in the formula
58
•
•
•
•
•
•
•
Tn = α tn-1 + (1 – α)Tn-1
Tn-1 = α tn-2 + (1 – α)Tn-2
Tn-2 = α tn-3 + (1 – α)Tn-3
…
T2 = α t1 + (1 – α)T1
T1 = α t0 + (1 – α)T0
T0 is an estimate
59
• The expansion looks like this:
• Tn+1 = α tn + (1- α)(α tn-1 + (1- α)(… α t0 + (1- α)T0)…)
•
= α tn + (1- α) α tn-1 + (1- α)2 α tn-2 + … + (1- α)n α t0 + (1- α)n+1T0
60
•
•
•
•
•
•
The first term is:
α tn
The general term is:
(1 – α)jαtn-j
The last term is:
(1 – α)n+1T0
61
In words
• The most recent actual performance, tn, gets
weight α
• All previous performances, ti, are multiplied by
α and by a factor of (1 – α)j, where the value
of j is determined by how far back in time t
occurred
• Since (1 – α) < 1, as you go back in time, the
weight of a given term on the current
prediction is exponentially reduced
62
• Incidentally, if α = 1 – α = ½, in the formula
you can just combine the coefficients for α
and 1 – α, giving a single coefficient of α (or 1
– α with 1 added to the power on it.
63
• The following graph illustrates the results of
applying the formula with T0 = 10 and α = ½
• With a = ½, the exponential coefficients on the
terms of the prediction are ½, (½)2, (½)3, …
• Note that the formula tends to produce a lagging,
not a leading indicator
• In other words, as the actual values shift up or
down, the prediction gradually approaches the
new reality, whatever it might be, but the
prediction lags reality
64
14
12
10
8
6
4
actual burst length
2
predicted burst
0
65
Preemptive SJF
• This was mentioned previously.
• If a waiting job enters the ready queue with an
estimated burst length shorter than the time
remaining of the burst length of the currently
running job, then the shorter job preempts the
one on the CPU.
• This can be called “shortest remaining time first”
scheduling.
• Unlike in the previous examples, the arrival time
of a process now makes a difference
66
• An example of preemptive SJF scheduling will
now be given.
• For the purposes of comparison the
performance of the same set of jobs will then
be given with non-preemptive scheduling
67
•
•
•
•
•
•
Consider the following scenario:
Process
arrival time
burst length
P1
0
8 ms.
P2
1
4 ms.
P3
2
9 ms.
P4
3
5 ms.
68
Preemptive SJF average wait time =
(0 + 9 + 0 + 15 + 2) / 4 =
6.5 ms.
69
Walking through the example
• P1 arrives at t = 0 and starts
• P2 arrives at t = 1
– P2’s burst length = 4
– P1’s remaining burst length = 8 – 1 = 7
– P2 preempts
70
• P3 arrives at t = 2
– P3’s burst length burst length = 9
– P2’s remaining burst length = 4 – 1 = 3
– P1’s remaining burst length = 7
– No preemption
71
• P4 arrives at t = 3
– P4’s burst length = 5
– P3’s remaining burst length = 9
– P2’s remaining burst length = 3 – 1 = 2
– P1’s remaining burst length = 7
– No preemption
72
• P2 runs to completion at t = 5
• P4 is scheduled.
– It runs to completion at t = 10
• P1 is rescheduled.
– It runs to completion at 17
• P3 is scheduled.
– It runs to completion at 26
73
Calculating the wait times for the
example
• P1 has 2 episodes
– 1st
• enters at t = 0
• starts at t = 0
• wait time = 0
– 2nd
• waits from t = 1 to t = 10
• wait time = 10 – 1 = 9
– Total P1 wait time = 0 + 9
74
• P2 has 1 episode
– Enters at t = 1
– starts at t = 1
– wait time = 1 – 1 = 0
• P3 has 1 episode
– Enters at t = 2
– starts at t = 17
– wait time = 17 – 2 = 15
75
• P4 has 1 episode
– Enters at t = 3
– starts at t = 5
– wait time = 5 – 3 = 2
• Total wait time = sum of waits for P1 through
P4
• = 0 + 9 + 0 + 15 + 2 = 26
• Average wait time = 26 / 4 = 6.5
76
The same processes under nonpreemptive SJF
77
• Notice how it doesn’t really look like SJF until
after P1 is finished
• At that point the other processes have arrived
and are available to be scheduled
• Time t = 8 is like time t = 0 and from there it is
clearly SJF
• Of course, it would be more complicated if some
jobs arrived even later
• The average wait time is longer than for the
preemptive example due to the behavior of P1 at
the beginning
78
Average wait time for this nonpreemptive SJF example
•
•
•
•
•
•
P1 wait = 0 – 0 = 0
P2 wait = 8 – 1 = 7
P3 wait = 17 – 2 = 15
P4 wait = 12 – 3 = 9
Total wait time = 0 + 7 + 15 + 9 = 31
Average wait time = 31 / 4 = 7.75
79
• Notice that the wait time without preemption
is higher than the wait time with preemption
• This is not surprising
• P1 burns up 8 time units at the beginning,
adding to the overall wait time of all of the
remaining processes
80
Priority Scheduling
• A priority is assigned to each process
• High priority processes are scheduled before low
priority ones
• Processes of equal priority are handled in FCFS
order
• In the textbook a high priority process is given a
low number and a low priority process is given a
high number, e.g., 0-7, 0-4095
• Note that SJF is a type of priority scheduling
where the priority is inversely proportional to the
predicted length of the next burst
81
Priority Example
•
•
•
•
•
•
•
Consider the following scenario:
Process
burst length
priority
P1
10 ms.
3
P2
1 ms.
1
P3
2 ms.
4
P4
1 ms.
5
P5
5 ms.
2
82
Average wait time = (0 + 1 + 6 + 16 + 18) / 5 = 41 / 5 =
8.2 ms.
83
How process priorities are set
• Internal priority setting
• SJF is an example
• Other criteria that have been used:
– Time limits
– Memory requirements
– (I/O burst) / (CPU burst)
84
• External priority setting:
– Importance of process
– Type or amount of funding
– Sponsoring department
– politics
85
• Priority scheduling can be either preemptive or
non-preemptive
• Priority scheduling can lead to indefinite blocking
= process starvation
• Low priority jobs may be delayed until low load
times
• Low priority jobs might be lost (in system crashes,
e.g.) before they’re finished
• Solution to starvation: aging. Raise a process’s
priority by n units for every m time units it’s been
in the system
86
Round Robin (RR) Scheduling
• This is the time-sharing scheduling algorithm
• It is FCFS with fixed time-slice preemption
• The time slice, or time quantum, is in the
range of 10ms.-100ms.
• The ready queue is a circularly linked list
• The scheduler goes around the list allocating 1
quantum per process
87
• A process may use a whole time slice
• If it is unfinished when the timer goes off, it is
taken off the CPU and added to the “tail” of the
circularly linked list
• A process may also block (I/O, e.g.) before the
quantum is over
• If so, it goes into a waiting queue
• When it is ready again it is added to the tail of the
circularly linked ready queue
88
• The so-called tail of the queue moves
• It is the point behind the currently scheduled
process
89
90
• RR scheduling depends on a hardware timer
• RR scheduling provides for fairness in dividing
up the CPU as a shared resource
• It has the disadvantage of long average
waiting times for all processes contending for
it
91
• If this is interactive time-sharing, the time a job
spends waiting for human I/O (blocked) will far
outweigh the time spent waiting for access to the
CPU while in the circularly linked ready queue
• Also, human expectations for finishing a job will
be relatively long compared to the sum of all
waiting times for the job.
• However, human expectations of a short response
time may be demanding.
92
• Like essentially everything in CS, RR scheduling involves
a trade-off, in particular, fairness vs. waiting time
• Everything has a price
• The question of whether the trade-off in performance
is worth it in this case depends in large part on human
factors
• Empirically, it’s been shown that the advantages of
time-slicing in this way outweigh the disadvantages
• Essentially every modern operating system does
something that is an elaboration on the RR idea
93
RR Example
•
•
•
•
•
•
Consider the following scenario:
Let the time slice be 4 ms.
Process
burst length
P1
24 ms.
P2
3 ms.
P3
3 ms.
94
Average wait time = (0 + 6 + 4 + 7) / 3 =
17 / 3 = 5 2/3
95
• Wait time for P1 = 0 initially
• Wait time for P1 = 10 – 4 = 6 when scheduled
again
• Wait time for P2 = 4
• Wait time for P3 = 7
96
• The performance of round robin depends on
the length of the time slice
• If the length of the slice is > any single process
burst, then RR = FCFS
• If the slice is short, then in theory a machine
with n users behaves like n machines, each
1/nth as fast as the actual machine
• This is the ideal, which ignores the overhead
from switching between jobs
97
• A simple measure to gauge overhead cost is:
• (context switch time) / (time slice length)
• In order for time sharing to be practical, this
ratio has to be relatively small
• The context switch time is dependent on
hardware speed and O/S code efficiency
(speed)
98
• It is interesting to consider the parameters
involved and their effects on what is possible
• Historically, there was a time when time-sharing
was simply not possible
• Device speeds were so slow that to make the
ratio acceptable, the time slice had to be
impractically large.
• What is the point of time-sharing if the time slice
is one second, for example?
• Moore’s Law solved this problem
99
• In modern micro-devices, like handhelds, multitasking may still be impractical
• The devices are much faster than older devices so
the cost may be more evident when considering
the amount of code run for various purposes
rather than time
• Consider this ratio: (number of lines of machine
code run for context switch) / (number of lines of
code run for some lightweight application)
100
• The context switch is still pure overhead which
eats into the number of cycles available for
useful work, eats into battery life directly with
computational costs, eats into battery life if
the display remains on while O/S code runs,
etc.
101
• In any case, for larger-scale systems, multi-tasking
is supportable, but the question then becomes
one of performance tuning
• System speed ultimately determines the answers
to these questions:
• How small can a time slice be, and still consist of
a useful number of cycles?
• How small can a time slice be, and still consist of
a useful number of cycles compared to the
overhead cost of switching?
102
• How many slices of a given size should be
given per unit time to a single process in order
to satisfy its needs?
• The total number of slices per unit time
divided by the number needed per process
gives n, the total number of users or processes
that can be supported at any given time
103
• Once all of the parameters are settled, you can
then determine what percent of your processing
cost is pure overhead in order to support that
number of users
• In general, system administrators have some
control over the multi-tasking level
• An administrator may have the ability to limit the
number of jobs or change the time slice size in
order to improve performance for those jobs that
enter or in order to reduce cycles wasted on
context switching
104
• Round robin scheduling illustrates other
performance issues besides just average waiting
time and overall efficiency of CPU resource
utilization
• As pointed out earlier, if time slices are long,
scheduling can degenerate into FCFS
• FCFS doesn’t fairly allocate the CPU in a time
sharing environment
• On the other hand, if the system speed supports
very small time slices, another factor can come
into play
105
• Consider overall average process turnaround time
as a function of time slice size
• It’s not just a question of how many time slices
should be allocated among jobs at the macro
level.
• It’s what the relationship should be between time
slices and the CPU bursts of jobs at the micro
level
• Delays will increase if individual process bursts
have to wait for multiple slices
106
• The typical size of process bursts is work-load
dependent and is something that an operating
system can monitor and report to the
administrator.
• The rule of thumb for system design and
tuning is that 80% of all process CPU bursts
should finish within 1 time slice
• Empirically, this shares the CPU while still
achieving reasonable performance
107
• Digression on the 80-20 rule…
108
• Just as there are trade-offs in CS, lots of things
can be viewed on a spectrum.
• The whole topic of time slices nicely illustrates
a spectrum.
109
• At one extreme, in a very old system,
overhead costs might be so high that time
slices would be so long that you were just
accomplishing multi-programming, not multitasking
• In other words, whole jobs completed within
their time slice, more or less.
110
• At the next step down, slices might still be so
large that under multi-tasking users were not
satisfied with response times
• From the point of view of dividing up the
processor, you might simply be getting FCFS of
CPU bursts among various jobs.
111
• The sweet spot is at the 80-20 mark
• There is an approximate match between time
slice size and burst length for 80% of the CPU
bursts for all processes
• 20% of CPU bursts are exceptionally large and
it’s a performance benefit to break them into
separate slices, allowing other process bursts
to be scheduled in between
112
• At the other end of the spectrum, if you make
the time slice too small, you break too many
bursts, and again the overhead cost of
switching can begin to eat away at the benefit
of sharing
113
• Next, the effect of time slice size variation is
examined with an example
• The example calculates both turnaround time
and average wait time
114
RR time slice size variations
• Consider the following scenario with a time
slice size of 4:
• Process
burst length
• P1
6 ms.
• P2
3 ms.
• P3
1 ms.
• P4
7 ms.
115
Average turnaround time = (14 + 7 + 8
+ 17) / 4 = 46 / 4 = 11 ½
116
Average wait time = ((0 + 8) + 4 + 7 + (8
+ 2)) / 4 = 29 / 4 = 7 ¼
117
• Average waiting time and average turnaround
time ultimately measure the same thing
• Average turnaround time varies as the time
slice size varies
• However, it doesn’t vary in a regular fashion
118
• Depending on the relative length of process
bursts and time slice size, a larger slice may
lead to slower turnaround
• In the following chart, the middle two bars
represent the case just presented, of a time
slice of 4
• The surrounding bars show what happens
with smaller and larger time slice sizes
• Some are higher and some are lower
119
14
12
10
8
6
average turnaround time
4
average waiting time
2
0
time time time time time time time
slice slice slice slice slice slice slice
size 1 size 2 size 3 size 4 size 5 size 6 size 7
120
• Keep in mind that all of these examples are
thumbnails
• They are designed to give some idea of what’s
going on, but they are not realistic in size
• In real life design and tuning would be based
on an analysis of a statistically significant mass
(relatively large) of historical or ongoing data
121
Multi-level Queue Scheduling
• You could argue whether this statement is
literally correct, but it may have some
explanatory value:
• Round robin scheduling is the classic way of
dividing processor time among processes
• In a sense, queue based scheduling systems are
just complex elaborations of round robin
scheduling
• They are ways of dividing processor time with
various bells and whistles to achieve certain
desirable effects
122
A simple example
• Let interactive jobs be foreground jobs
• Let batch jobs be background jobs
• Let foreground and background be distinguished
by keeping the jobs in separate queues where the
queues have separate queuing
disciplines/scheduling algorithms
• For example, use RR scheduling for foreground
jobs
• Use FCFS for batch jobs
123
• The follow-up question becomes, how do you
coordinate scheduling between the two queues?
• One possibility: Fixed priority preemptive
scheduling.
• Batch jobs only run if the interactive queue is
empty
• Another possibility: Time slicing.
• For example, the interactive queue is given 80%
of the time slices and the batch queue is given
20%
124
• Consider the following set of scheduling design
decisions, for example:
• Different classes of jobs are permanently
assigned to different queues
• The queues have priorities relative to each other
• Each queue implements its own scheduling
algorithm for the processes in it, which are of
equal priority
• Consider the set of queues shown on the next
overhead in a system with such a design:
125
An Example
126
• The coordination between queues would be
similar to the interactive/batch example
• Fixed priority preemptive scheduling would mean
that any time a job entered a queue of a higher
priority, any currently running job would have to
step aside
• Lower priority jobs could only run if all higher
priority queues were empty
• It would also be possible to time slice between
the queues, giving a certain percent of CPU time
to each one
127
Multi-level Feedback Queue
Scheduling
• The next elaboration is to allow processes to
change priority—and change between queues
accordingly
• The change may be based on characteristics such
as CPU or I/O usage or time spent in the system
• In general, CPU greedy processes can be moved
to a lower queue
• This gives interactive jobs and I/O bound jobs
with shorter CPU bursts higher priority
128
• Such a system can also handle ageing.
• If a job is in a lower priority queue too long, it
can be moved to a higher one, preventing
starvation
• A diagram of a simple system is given on the
next overhead.
• This is followed by explanations
129
An Example
130
Queuing Discipline
• 1. The relative priority of the queues is fixed.
– Jobs in queue 1 execute only if queue 0 is empty.
– Jobs in queue 2 execute only if queue 1 is empty.
• 2. Every new job enters queue 0.
– If its burst is <= 8, it stays there.
– Otherwise, it’s moved to queue 1.
• 3. When a job in queue 1 is scheduled
– If it has a burst length > 16, it’s preempted and
moved to queue 2.
131
• 4. Jobs can move back up to a different queue
if their burst lengths are within the quantum
of the higher priority queue.
• 5. Note that in a sense, this queuing scheme
predicts future performance on the basis of
the most recent burst length.
132
Defining Characteristics of a General MultiLevel Feedback Queue Scheduling System
• 1. The number of queues.
• 2. The scheduling algorithm for each queue.
• 3. The method used to determine when to
upgrade a process.
• 4. The method used to determine when to
downgrade a process.
• 5. The method used to determine which
queue a job will enter when it needs service
(initially).
133
• Multi-level feedback queue systems are the most
general and the most complex.
• The example given was simply that, an example.
• In theory, such a system can be configured to
perform well for a particular hardware
environment and job mix.
• In reality, there are no ways of setting the
scheduling parameters except for experience,
analysis, and trial and error.
134
Multiple Processor Scheduling
• Load sharing = the possibility of spreading work
among >1 processor, assuming you can come up
with a scheduling algorithm.
• Homogeneous systems = each processor is the
same.
• Any process can be assigned to any processor in
the system.
• Even in homogeneous systems, a process may be
limited to a certain processor if a needed
peripheral is attached to that processor
135
Approaches to Multiple Processor
Scheduling
• Asymmetric multi-processing = master-slave
architecture.
• The scheduling code runs on one processor
only.
• Symmetric Multi-Processing (SMP) = each
processor is self-scheduling.
• Under SMP there is still the question of
whether the ready queue is local or global.
136
• To maximize concurrency, you need a global
ready queue.
• Maintaining a global ready queue requires
cooperation (concurrency control).
• This is a difficult problem, so most systems
maintain local ready queues for each
processor.
• Most modern O/S’s support SMP: Windows,
Solaris, Linux, Max OS X.
137
Processor Affinity
• This term refers to trying to keep the same job
on the same processor.
• Moving jobs between processors is expensive.
• Everything that might have been cached
would be lost unless explicitly recovered.
• Soft affinity = not guaranteed to stay on the
same processor.
• Hard affinity = guaranteed to stay on the same
processor.
138
Load Balancing
• This term refers to trying to keep all
processors busy at all times.
• This is an issue if there are at least as many
jobs as there are processors.
• If a global ready queue is implemented, load
balancing would naturally be part of the
algorithm.
139
• If a system only maintains local ready queues and
there is not hard affinity, there are two
approaches to moving jobs among processors:
• Push migration = a single system process
regularly checks processor utilization and pushes
processes from busy processors to idle ones.
• Pull migration = an idle processor reaches into
the ready queue of a busy processor and extracts
a process for itself.
140
• Both kinds of migration can be built into a
system (Linux for example).
• By definition, migration and affinity are in
opposition.
• There is a performance trade-off.
• Some systems try to gauge imbalance in load
and only do migration if the imbalance rises
above a certain threshold.
141
Thread Scheduling
• This is essentially an expansion on ideas raised in
the last chapter.
• The term “contention scope” refers to the level at
which scheduling is occurring.
• Process Contention Scope (PCS) = the scheduling
of threads on lightweight processes.
• In many-to-one or many-to-many schemes,
threads of one or more user processes contend
with each other to be scheduled.
• This is usually priority based, but not necessarily
preemptive.
142
• System contention scope (SCS) = the
scheduling of kernel level threads on the
actual machine.
• In a one-to-one mapping scheme, these kernel
threads happen to represent user threads
belonging to one or more processes.
143
(Hyperthreading)
• Sometimes the question arises, what does or
did the term hyperthreading mean?
• In previous editions of the book, this was
explained in some detail.
• It has been replaced by a discussion of
scheduling in multicore processors
• For the sake of completeness, a little bit of
information on hyperthreading is given before
taking up multicore processors
144
• Symmetric multi-threading = SMT
• Definition: Provide multiple logical processors
rather than multiple physical processors.
• This is known as hyperthreading on Intel chips.
• At a hardware level:
– Each logical processor has its own architecture state
(register values).
– Each logical processor receives and handles its own
interrupts.
– All hardware resources are shared.
145
Multicore Processors
• Multicore processors basically consist of >1
CPU paired with a register set on a single chip
• Each core appears as a separate CPU to the
operating system
• One of the key things to note about this
architecture is the separate cores still share
access to a single main memory
146
• In classical systems (one CPU, many
processes), the need to access shared
secondary storage (do I/O) is a source of
significant delay
• This affects scheduling decisions/algorithms
• In multicore systems, it turns out that at the
micro-level, the need to access shared
memory is a significant source of delay
147
• This delay is known as memory stall, and it
affects multicore scheduling
decisions/algorithms
• Memory stall at the core/thread level is
analogous to I/O wait at the process level
• One solution to the delay is implement multithreaded multicore systems
• The idea is that 2 (or more) hardware threads
are assigned to each core
148
• If the application process running on one of the
hardware threads assigned to the core stalls, then
that thread is switched out and the other one
takes the core
• The thing to note about the term “hardware
thread” is that to the O/S, such a thread appears
as a separate processor
• In other words, hardware threads truly are a
hardware construct, which is implemented below
the O/S level
149
• The engineers who design the chip accomplish
switching between hardware threads using
microcode and silicon solutions
• From the software person’s point of view, it’s
necessary to write a multi-processing O/S, but it
doesn’t matter whether the multiple physical
processors are virtual or real
• All that matters is the multi-processor interface
presented by the hardware to the O/S
150
• In summary, consider a system with dual cores
and dual threads
• From the operating system point of view, such
a system has four processors
• The book points out that the switching of
hardware threads on an off of a core may be
coarse-grained or fine-grained
• This describes, in essence, how large or small
the time slices are for each hardware thread
151
• Each approach has its advantages and
disadvantages, but the details are not
important
• These are engineering questions, not O/S
questions
• The book also points out that scheduling is
now happening both in the O/S and on the
chip
152
• The general scheduling ideas discussed earlier
still apply to the implementation of scheduling
in the O/S
• The came concepts also apply at the hardware
level
• Engineers may choose to alternate hardware
threads using round robin scheduling, for
example, some other recognizable technique
153
Virtualization and Scheduling
• The main point of this section is that if you
add a layer to system software, there will be
effects on or interactions with other layers
• In a virtual machine environment, each virtual
machine runs an O/S with a scheduling
algorithm implemented in it
154
• Each virtual machine appears to be running on a
separate processor, when in fact, it is effectively
just getting time slices of an underlying processor
• The actual performance of processes scheduled
by the user O/S on the virtual machine is at the
mercy of how many real machine cycles the
virtual machine is actually given
• The effort put into optimizing scheduling
performance at the user level may be undone by
the way the virtual machine layer works
155
Operating System Examples
• In most previous chapters, the O/S example
sections have been skipped because they
involve needless detail.
• Concrete examples will be covered here for
two reasons:
– To give an idea of how complex real system are.
– To show that if you know the basic principles, you
can tease apart the different pieces of an actual
implementation.
156
Solaris Scheduling
• Solaris scheduling is based on four priority
classes:
– Real time
– System
– Time sharing
– Interactive
157
• Practical points of Solaris scheduling:
– High numbers = high priority, range of values: 0-59
– The four different priority classes are implemented in
three queues (3 and 4 are together).
– The distinction between 3 and 4 is that if a process
requires the generation of windows, it is given a
higher priority.
– It is apparent that with 60 priorities and 3 queues,
more than one priority value applies to each queue
158
• There is an inverse relationship between
priority and time slice size.
– A small time slice = quick response for high
priority (interactive type) jobs.
– A large time slice = good throughput for low
priority (CPU bound) jobs.
159
Solaris Scheduling Queue—Notice that
Jobs Don’t Move Between Queues
160
Solaris Dispatch Table for Interactive and Time-sharing Threads
Starting Priority
Allocated Time Quantum
200
New Priority after
Quantum Expiration
0
New Priority after Return
from Sleep
50
0
5
200
0
50
10
160
0
51
15
160
5
51
20
120
10
52
25
120
15
52
30
80
20
53
35
80
25
54
40
40
30
55
45
40
35
56
50
40
40
58
55
40
45
58
59
20
49
59
161
Later Versions of Solaris Add these
Details
•
•
•
•
Fixed priority threads
Fair share priority threads
System processes don’t change priorities
Real-time processes have the absolutely
highest priorities
162
• Each scheduling class has a set of priorities.
• These are translated into global priorities and
the scheduler uses the global priorities to
schedule
• Among threads of equal priority, the
scheduler does RR scheduling
163
Windows XP Scheduling
• XP (kernel) thread scheduling is priority based
preemptive
• This supports soft real-time applications
• There are 32 priorities, 0-31
• A high number = a high priority
• There is a separate queue for each priority
• Priority 0 is used for memory management
and will not come up further
164
• There is a relationship between priorities in
the dispatcher and classes of jobs defined in
the Win32 API
• There are 6 API classes divided into 2 groups
according to the priorities they have
165
• Class: Real time. Priorities: 16-31
• Variable (priority) classes:
– High priority
– Above normal priority
– Normal priority
– Below normal priority
– Idle priority
• The priorities of these classes can vary from 115
166
• Within each class there are 7 additional
subdivisions:
– Time critical
– Highest
– Above normal
– Below normal
– Lowest
– idle
167
• Each thread has a base priority
• This corresponds to the relative priority it’s
given within its class
• The default base value would be the “normal”
relative priority for the class
• The distribution of values among classes and
relative priorities is shown in the following
table
168
Columns = Priority Classes, Rows = Relative Priorities within Classes
The ‘Normal’ row contains the base priorities for the classes.
Real-time
High
Above
normal
Normal
Below
normal
Idle
priority
Timecritical
31
15
15
15
15
15
Highest
26
15
13
10
8
6
Above
normal
25
14
11
9
7
5
Normal
24
13
10
8
6
4
Below
normal
23
13
9
7
5
3
Lowest
22
11
8
6
4
2
Idle
16
1
1
1
1
1
169
• The scheduling algorithm dynamically changes a
thread’s priority if it’s in the variable group
• If a thread’s time quantum expires, it’s priority is
lowered, but not below its base priority
• When a thread is released from waiting, it’s
priority is raised.
• How much it’s raised depends on what it was
waiting for. For example:
– Waiting for keyboard I/O—large raise
– Waiting for disk I/O—smaller raise
170
• For an interactive process, if the user thread is
given a raise, the windowing process it’s
running in is also given a raise
• These policies favor interactive and I/O bound
jobs and attempt to control threads that are
CPU hogs
171
• XP has another feature that aids windowing
performance
• If several process windows are on the screen
and one is brought to the foreground, it’s time
quantum is increased by a factor such as 3 so
that it can get something done before being
preempted.
172
Linux Scheduling
• Skip this
• Two concrete examples are enough
173
Java Scheduling
• The JVM scheduling specification isn’t detailed
• Thread scheduling is supposed to be priority
based
• It does not have to be preemptive
• Round-robin scheduling is not required, but a
given implementation may have it
174
• If a JVM implementation doesn’t have timeslicing or preemption, the programmer can try
to devise cooperative multi-tasking in
application code
• The relevant Java API method call is
Thread.yield();
• This can be called in the run() method of a
thread at the point where it is willing to give
up the CPU to another thread
175
• Java Thread class name priorities:
– Thread.MIN_PRIORITY (value = 1)
– Thread.MAX_PRIORITY (value = 10)
– Thread.NORM_PRIORITY (value = 5)
• A new thread is given the priority of the
thread that created it
• The default priority is NORM
• The system doesn’t change a thread’s priority
176
• The programmer can assign a thread a priority
value in the range 1-10
• The relevant Java API method call is:
Thread.currentThread().setPriority(value);
• This is done in the thread’s run() method
177
• This specification isn’t foolproof though
• Java thread priorities have to be mapped to
O/S kernel thread priorities
• If the difference between Java priorities isn’t
great enough, they may be mapped to the
same priority in the implementing system
• The author gives Windows NT as an example
where this can happen
178
Scheduling Algorithm Evaluation
• In general, algorithm selection is based on
multiple criteria.
• For example:
• Maximize CPU utilization under the constraint
that maximum response time is <1 second
• Maximize throughput such that turnaround
time, on average, is linearly proportional to
total execution time
179
• Various approaches can be used to try and
predict performance for a given scheduling
algorithm:
• Deterministic models
• Queuing models
• Simulation modeling
• Implementation
• Brief remarks will be made on each of these
approaches
180
Deterministic Modeling
• This is a form of analytic evaluation
• For a given set of jobs, with all parameters
known, you can determine performance under
various scheduling scenarios
• This is OK for developing examples and exploring
possibilities
• It’s not generally a practical way to pick a
scheduling algorithm for a real system with an
unknown mix of jobs
• The thumbnail analyses with Gantt charts are a
simplified example of deterministic modeling
181
Queuing Models
• These are basically statistical models where
the statistical assumptions are based on past
observation of real systems
• The first distribution of interest is the arrival of
jobs into the system
• This is typically a Poisson distribution
182
• All of the rest of the system distributions tend
to be exponential (or hyper-exponential—
remember the graph given earlier of burst
lengths)
– CPU burst occurrence distribution
– CPU burst length distribution
– I/O burst occurrence distribution
– I/O wait length distribution
183
• Given these distributions it is possible to
calculate:
• Throughput
• CPU utilization
• Waiting time
• A simple example of calculating a statistic
follows
184
A Simple Example of an Analysis
Formula
•
•
•
•
•
•
•
Let the following parameters be given:
N = number of processes in a queue
L = arrival rate of processes
W = average waiting time in queue
Then
N=L*W
This is known as Little’s formula
185
• Given any two of the parameters, the third can be
calculated
• Note that the formula applies when the system is
in a steady state—the number of processes
entering the queue = the number of processes
leaving the queue
• Increase or decrease in the queue length occurs
when the system is not in steady state (when the
arrival and departure processes are not equal)
186
• Queuing models are not perfect
• They are limited by the match or mismatch
between the chosen distributions and actual
behavior
• They are simplifications because they
aggregate behavior and may overlook some
factors
187
• They rely on mathematical assumptions (they
treat arrival, service, and waiting as
mathematically independent distributions,
when for each process/burst these successive
events are related)
• They are useful for getting ideas, but they do
not perfectly match reality
188
Simulation Modeling
• Basic elements of a simulation:
– A clock (discrete event simulation)
– Data structures modeling state
– Modules which model activity which changes
state
189
• Simulation input:
• Option 1: Random number generation based on
statistical distributions for processes (again,
mathematical simplification)
• Option 2: Trace tapes.
– These are records of events in actual runs.
– They provide an excellent basis for comparing two
algorithms
– Industrial strength operating system have hooks which
allow the collection of performance information down
to the level of individual processes and bursts
190
• Simulation models can generate statistics on all
aspects of performance under the simulated
workload
• Obtaining suitable input data and coding the
simulation are not trivial tasks.
• Coding an O/S and living with the
implementation choices it embodies are also not
trivial
• Making the model may be worth the cost if it aids
in developing the O/S
191
Implementation
• Implementation is the gold standard of
algorithm evaluation and system testing
• You code an algorithm, install it in the O/S,
and test it under real conditions
• Problems:
– Coding cost
– Installation cost
– User reactions to modifications
192
User Reactions
• Changes in O/S code result from perceived
shortcomings in performance
• Performance depends on the algorithm, the
mix of jobs, and the behavior of jobs
• If the algorithm is changed, users will change
their code and behavior to adapt to the
altered O/S runtime environment
• This can cause the same performance problem
to recur, or a new problem to occur
193
• Examples:
• If job priority is gauged by size (smaller jobs
given higher priority), programmers may break
their applications into separate processes
• If job priority is gauged by frequency of I/O
(I/O bound processes are given higher
priority), programmers may introduce
(needless) I/O into their applications
194
• Without resorting to subterfuge, a Java
application programmer has some control
over behavior in a single application by
threading and using calls like yield() and
setPriority()
195
• A large scale O/S will be tunable.
• This goes back to the idea of the separation of
mechanism and policy
• The system administrator will be able to set
scheduling algorithms and their parameters to
meet the job mix at any given time
• It will not be necessary to re-code the
operating system except in cases of major
version changes
196
The End
197