Process Scheduling
Download
Report
Transcript Process Scheduling
Process Scheduling
國立中正大學
資訊工程研究所
羅習五 老師
1
Outline
•
•
•
•
•
OS schedulers
Unix scheduling
Linux scheduling
Linux 2.4 scheduler
Linux 2.6 scheduler
– O(1) scheduler
– O(2) scheduler
2
Introduction
preemptive & cooperative multitasking
• A multitasking operating system is one that
can simultaneously interleave execution of
more than one process.
• Multitasking operating systems come in two
flavors: cooperative multitasking and
preemptive multitasking.
– Linux provides preemptive multitasking
– MAC OS 9 and earlier being the most notable
cooperative multitasking .
3
Linux scheduler –
Scheduling Policy
• Scheduling policy determines what runs when
– fast process response time (low latency)
– maximal system utilization (high throughput)
• Processes classification:
– I/O-bound processes: spends much of its time
submitting and waiting on I/O requests
– Processor-bound processes: spend much of their time
executing code
• Unix variants tends to favor I/O-bound processes,
thus providing good process response time
4
Linux scheduler –
Process Priority
• Linux’s priority-based scheduling
– Rank processes based on their worth and need for
processor time.
– Both the user and the system may set a process's priority
to influence the scheduling behavior of the system.
• Dynamic priority-based scheduling
– Begins with an initial base priority
– Then enables the scheduler to increase or decrease the
priority dynamically to fulfill scheduling objectives.
– E.g., a process that is spending more time waiting on I/O
will receive an elevated dynamic priority.
5
Linux scheduler –
Priority Ranges
• Two separate priority ranges.
– nice value, from -20 to +19 with a default of 0.
• Larger nice values correspond to a lower priority. (you
are being nice to the other processes on the system).
– real-time priority, by default range from 0 to 99.
• All real-time processes are at a higher priority than
normal processes.
• Linux implements real-time priorities in accordance
with POSIX standards on the matter.
6
2.4 scheduler
• Non-preemptible kernel
– Set p->need_resched if schedule() should be
invoked at the ‘next opportunity‘ (kernel => user
mode).
• Round-robin
– task_struct->counter: number of clock ticks left to
run in this scheduling slice, decremented by a
timer.
7
2.4 scheduler
1. Check if schedule() was invoked from interrupt
handler (due to a bug) and panic if so.
2. Use spin_lock_irq() to lock ‘runqueue_lock’
3. Check if a task is ‘runnable’
– in TASK_RUNNING state
– in TASK_INTERRUPTIBLE state and a signal is pending
4. Examine the ‘goodness’ of each process
5. Context switch
8
2.4 scheduler – ‘goodness’
• ‘goodness’: identifying the best candidate
among all processes in the runqueue list.
– ‘goodness’ = 0: the entity has exhausted its
quantum.
– 0 < ‘goodness’ < 1000: the entity is a conventional
process/thread that has not exhausted its
quantum; a higher value denotes a higher level of
goodness.
9
2.4 scheduler – ‘goodness’
if (p->mm == prev->mm)
return p->counter + p->priority + 1;
else
return p->counter + p->priority;
• A small bonus is given to the task p if it
shares the address space with the previous
task.
10
2.4 scheduler - SMP
run queue
11
2.4 scheduler - SMP
Examine the processor field of the processes
and gives a consistent bonus (that is
PROC_CHANGE_PENALTY, usually 15) to the
process that was last executed on the ‘this_cpu’
CPU.
12
2.4 scheduler - performance
• The algorithm does not scale well
– It is inefficient to re-compute all dynamic priorities at
once.
• The predefined quantum is too large for high
system loads (for example: a server)
• I/O-bound process boosting strategy is not
optimal
– a good strategy to ensure a short response time for
interactive programs, but…
– some batch programs with almost no user interaction
are I/O-bound.
13
Recalculating Timeslices
(kernel 2.4)
• Problems:
– Can take a long time. Worse, it scales O(n) for n tasks
on the system.
– Recalculation must occur under some sort of lock
protecting the task list and the individual process
descriptors. This results in high lock contention.
– Nondeterminism is a problem with deterministic realtime programs.
14
Processes classification
• Definition:
– I/O-bound processes: spends much of its time
submitting and waiting on I/O requests
– Processor-bound processes: spend much of their
time executing code
• Linux tends to favor I/O-bound processes, thus
providing good process response time
• How to classify processes?
15
16
tq=0
tq=0
tq=0
tq=0
tq≠0
tq≠0
tq≠0
tq≠0
tq=???
17
Scheduling policy
time_quantumnew = time_quantumold/2 +
time_quantum_table[static_priority]
dynamic_priority ≈ time_quantumnew
18
tq=0
tq=0
tq=0
tq=?
tq≠0
tq≠0
tq≠0
tq≠0
tq=???
19
tq=0
tq=0
tq=0
tq=?
tq≠0
tq≠0
tq≠0
tq≠0
tq=???
20
tq≠0
tq=0
tq=0
tq=0
tq=?
tq≠0
tq≠0
tq≠0
tq=???
21
2.6 scheduler
run queue
task migration
(put + pull)
run queue
22
2.6 scheduler –
User Preemption
• User preemption can occur
– When returning to user-space from a system call
– When returning to user-space from an interrupt
handler
23
2.6 scheduler –
Kernel Preemption
• The Linux kernel is a fully preemptive kernel.
– It is possible to preempt a task at any point, so long as the
kernel is in a state in which it is safe to reschedule.
– “safe to reschedule”: kernel does not hold a lock
• The Linux design:
– additing of a preemption counter, preempt_count, to each
process's thread_info
– This count increments once for each lock that is acquired and
decrements once for each lock that is released
• Kernel preemption can also occur explicitly, when a task in
the kernel blocks or explicitly calls schedule().
– no additional logic is required to ensure that the kernel is in a
state that is safe to preempt!
24
Kernel Preemption
• Kernel preemption can occur
– When an interrupt handler exits, before returning
to kernel-space
– When kernel code becomes preemptible again
– If a task in the kernel explicitly calls schedule()
– If a task in the kernel blocks (which results in a call
to schedule())
25
O(1) & CFS scheduler
• 2.5 ~ 2.6.22: O(1) scheduler
– Time complexity: O(1)
– Using “run queue” (an active Q and an expired Q)
to realize the ready queue
• 2.6.23~present: Completely Fair Scheduler
(CFS)
– Time complexity: O(log n)
– the ready queue is implemented as a red-black
tree
26
O(1) scheduler
• Implement fully O(1) scheduling.
– Every algorithm in the new scheduler completes in constant-time, regardless of the
number of running processes. (Since the 2.5 kernel).
• Implement perfect SMP scalability.
– Each processor has its own locking and individual runqueue.
• Implement improved SMP affinity.
– Attempt to group tasks to a specific CPU and continue to run them there.
– Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes.
• Provide good interactive performance.
– Even during considerable system load, the system should react and schedule interactive
tasks immediately.
• Provide fairness.
– No process should find itself starved of timeslice for any reasonable amount of time.
Likewise, no process should receive an unfairly high amount of timeslice.
• Optimize for the common case of only one or two runnable processes, yet
scale well to multiple processors, each with many processes.
27
The Priority Arrays
• Each runqueuecontains two priority arrays (defined in
kernel/sched.c as struct prio_array)
– Active array: all tasks with timeslice left.
– Expired array: all tasks that have exhausted their timeslice.
• Priority arrays provide O(1) scheduling.
– Each priority array contains one queue of runnable processors per
priority level.
– The priority arrays also contain a priority bitmap used to efficiently
discover the highest-priority runnable task in the system.
28
The Linux O(1) scheduler algorithm
29
The Priority Arrays
• Each runqueuecontains two priority arrays (defined in
kernel/sched.cas struct prio_array)
– Active array: all tasks with timesliceleft.
– Expired array: all tasks that have exhausted their timeslice.
• Priority arrays provide O(1) scheduling.
– Each priority array contains one queue of runnable processors per
priority level.
– The priority arrays also contain a priority bitmap used to efficiently
discover the highest-priority runnable task in the system.
30
runqueue
Each runqueue contains two priority
arrays – active and expired.
Each of these priority arrays contains a
list of tasks indexed according to priority
Priority
queue
(0-139)
expired
active
31
runqueue
Linux assigns higher-priority tasks
longer time-slice
Time quantum ≈
1/priority
tsk1
tsk2
tsk3
active
expired
32
runqueue
Linux chooses the task with the
highest priority from the active
array for execution.
tsk1
tsk2
tsk3
active
expired
33
runqueue
tsk1
Round-robin
tsk2
tsk3
active
expired
34
runqueue
tsk1
Round-robin
tsk3
tsk2
active
expired
35
runqueue
tsk1
tsk2
tsk3
active
expired
36
runqueue
Most tasks have dynamic priorities
that are based on their “nice” value
(static priority) plus or minus 5
Interactivity of a task ≈
1/sleep_time
dynPrio = staticPrio +
bonus
bonus = -5 ~ +5
bonus ≈ 1/sleep_time
tsk1
tsk3
tsk2
tsk3
I/O
bound
active
expired
37
runqueue
When all tasks have exhausted
their time slices, the two priority
arrays are exchanged!
tsk1
tsk3
tsk2
active
expired
38
The O(1) scheduling algorithm
sched_find_first_bit()
1
1
1
tsk1
tsk3
tsk2
39
The O(1) scheduling algorithm
Insert O(1)
1
1
1
Remove O(1)
find first set bit O(1)
40
find first set bit O(1)
static inline unsigned long __ffs
(unsigned long word) {
int num = 0;
#if BITS_PER_LONG == 64
if ((word & 0xffffffff) == 0) {
num += 32;
word >>= 32;
}
#endif
if ((word & 0xffff) == 0) {
num += 16;
word >>= 16;
}
if ((word & 0xff) == 0) {
num += 8;
word >>= 8;
}
if ((word & 0xf) == 0) {
num += 4;
word >>= 4;
}
if ((word & 0x3) == 0) {
num += 2;
word >>= 2;
}
if ((word & 0x1) == 0)
num += 1;
return num;
}
41
2.6 scheduler –
CFS
• Classical schedulers compute time slices for
each process in the system and allow them to
run until their time slice/quantum is used up.
– After that, all process need to be recalculated.
• CFS considers only the wait time of a process
– The task with the most need for CPU time is
scheduled.
42
2.6 scheduler –
CFS
43
2.6 scheduler –
CFS (motivation)
• Traditional Linux scheduling policy
HP
LP
HP: high priority
LP: low priority
44
2.6 scheduler –
CFS (motivation)
• Traditional Unix scheduling policy
WT
WT
WT
WT
WT: waiting time
45
2.6 scheduler –
CFS (the idea case)
task
task
task
task
task
task
virtualization
High priority
Low priority
46
2.6 scheduler –
CFS (the idea case)
task
task
task
task
task
task
virtualization
fastly
slowly
47
2.6 scheduler –
CFS (the idea case)
task
8
task
:
8
task
:
8
task
:
3
task
:
3
task
:
3
speed
48
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
49
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
50
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
51
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
52
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
53
time
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
54
time
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
55
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
56
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
57
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
58
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
59
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
60
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
61
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
62
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
63
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
64
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
65
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
66
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
67
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
68
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
69
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
70
2.6 scheduler –
CFS
71
2.6 scheduler –
the RB-Tree
• To sort tasks on the red-black tree, the kernel
uses the difference fair_clock -wait_runtime.
– While fair_clock is a measure for the CPU time a
task would have gotten if scheduling were
completely fair,
– wait_runtime is a direct measure for the
unfairness caused by the imperfection of real
systems.
72
2.6 scheduler –
issues
• Different priority levels for tasks (i.e., nice
values) must be taken into account
• Tasks must not be switched too often because
a context switch has a certain overhead.
73
2.6 scheduler –
fields in the task_struct
74
2.6 scheduler –
fields in the task_struct
• prio and normal_prio indicate the dynamic
priorities, static_prio the static priority of a
process.
– The static priority is the priority assigned to the
process when it was started.
– normal_priority denotes a priority that is
computed based on the static priority and the
scheduling policy of the process.
75
2.6 scheduler –
fields in the task_struct
• The scheduler is not limited to schedule
processes, but can also work with larger
entities. This allows for implementing group
scheduling.
76
2.6 scheduler –
fields in the task_struct
• cpus_allowed is a bit field used on
multiprocessor systems to restrict the CPUs on
which a process may run.
– setaffinity()
– getaffinity()
77
2.6 scheduler –
priority
78
2.6 scheduler –
priority
kernel/sched.c
static const int prio_to_weight[40]
/* -20 */
/* -15 */
/* -10 */
/* -5 */
/*
0 */
88761,
29154,
9548,
3121,
1024,
71755,
23254,
7620,
2501,
820,
56483,
18705,
6100,
1991,
655,
/*
/*
/*
};
335,
110,
36,
272,
87,
29,
215,
172,
137,
70,
56,
45,
23,
18,
15,
5 */
10 */
15 */
46273,
14949,
4904,
1586,
526,
= {
36291,
11916,
3906,
1277,
423,
prio_to_weight[i] = prio_to_weight[i] ×1.25
79
2.6 scheduler –
priority
80
Summary
• The concept of OS schedulers
• Maximize throughput.
– This is what system administrators care about.
– How to maximize throughput (CPU & I/O).
• What is the major drawback of Linux 2.4
scheduler
• Pros and cons of Linux 2.6 schedulers
– O(1)
– CFS
81