Process Scheduling

Download Report

Transcript Process Scheduling

Process Scheduling
國立中正大學
資訊工程研究所
羅習五 老師
1
Outline
•
•
•
•
•
OS schedulers
Unix scheduling
Linux scheduling
Linux 2.4 scheduler
Linux 2.6 scheduler
– O(1) scheduler
– O(2) scheduler
2
Introduction
preemptive & cooperative multitasking
• A multitasking operating system is one that
can simultaneously interleave execution of
more than one process.
• Multitasking operating systems come in two
flavors: cooperative multitasking and
preemptive multitasking.
– Linux provides preemptive multitasking
– MAC OS 9 and earlier being the most notable
cooperative multitasking .
3
Linux scheduler –
Scheduling Policy
• Scheduling policy determines what runs when
– fast process response time (low latency)
– maximal system utilization (high throughput)
• Processes classification:
– I/O-bound processes: spends much of its time
submitting and waiting on I/O requests
– Processor-bound processes: spend much of their time
executing code
• Unix variants tends to favor I/O-bound processes,
thus providing good process response time
4
Linux scheduler –
Process Priority
• Linux’s priority-based scheduling
– Rank processes based on their worth and need for
processor time.
– Both the user and the system may set a process's priority
to influence the scheduling behavior of the system.
• Dynamic priority-based scheduling
– Begins with an initial base priority
– Then enables the scheduler to increase or decrease the
priority dynamically to fulfill scheduling objectives.
– E.g., a process that is spending more time waiting on I/O
will receive an elevated dynamic priority.
5
Linux scheduler –
Priority Ranges
• Two separate priority ranges.
– nice value, from -20 to +19 with a default of 0.
• Larger nice values correspond to a lower priority. (you
are being nice to the other processes on the system).
– real-time priority, by default range from 0 to 99.
• All real-time processes are at a higher priority than
normal processes.
• Linux implements real-time priorities in accordance
with POSIX standards on the matter.
6
2.4 scheduler
• Non-preemptible kernel
– Set p->need_resched if schedule() should be
invoked at the ‘next opportunity‘ (kernel => user
mode).
• Round-robin
– task_struct->counter: number of clock ticks left to
run in this scheduling slice, decremented by a
timer.
7
2.4 scheduler
1. Check if schedule() was invoked from interrupt
handler (due to a bug) and panic if so.
2. Use spin_lock_irq() to lock ‘runqueue_lock’
3. Check if a task is ‘runnable’
– in TASK_RUNNING state
– in TASK_INTERRUPTIBLE state and a signal is pending
4. Examine the ‘goodness’ of each process
5. Context switch
8
2.4 scheduler – ‘goodness’
• ‘goodness’: identifying the best candidate
among all processes in the runqueue list.
– ‘goodness’ = 0: the entity has exhausted its
quantum.
– 0 < ‘goodness’ < 1000: the entity is a conventional
process/thread that has not exhausted its
quantum; a higher value denotes a higher level of
goodness.
9
2.4 scheduler – ‘goodness’
if (p->mm == prev->mm)
return p->counter + p->priority + 1;
else
return p->counter + p->priority;
• A small bonus is given to the task p if it
shares the address space with the previous
task.
10
2.4 scheduler - SMP
run queue
11
2.4 scheduler - SMP
Examine the processor field of the processes
and gives a consistent bonus (that is
PROC_CHANGE_PENALTY, usually 15) to the
process that was last executed on the ‘this_cpu’
CPU.
12
2.4 scheduler - performance
• The algorithm does not scale well
– It is inefficient to re-compute all dynamic priorities at
once.
• The predefined quantum is too large for high
system loads (for example: a server)
• I/O-bound process boosting strategy is not
optimal
– a good strategy to ensure a short response time for
interactive programs, but…
– some batch programs with almost no user interaction
are I/O-bound.
13
Recalculating Timeslices
(kernel 2.4)
• Problems:
– Can take a long time. Worse, it scales O(n) for n tasks
on the system.
– Recalculation must occur under some sort of lock
protecting the task list and the individual process
descriptors. This results in high lock contention.
– Nondeterminism is a problem with deterministic realtime programs.
14
Processes classification
• Definition:
– I/O-bound processes: spends much of its time
submitting and waiting on I/O requests
– Processor-bound processes: spend much of their
time executing code
• Linux tends to favor I/O-bound processes, thus
providing good process response time
• How to classify processes?
15
16
tq=0
tq=0
tq=0
tq=0
tq≠0
tq≠0
tq≠0
tq≠0
tq=???
17
Scheduling policy
time_quantumnew = time_quantumold/2 +
time_quantum_table[static_priority]
dynamic_priority ≈ time_quantumnew
18
tq=0
tq=0
tq=0
tq=?
tq≠0
tq≠0
tq≠0
tq≠0
tq=???
19
tq=0
tq=0
tq=0
tq=?
tq≠0
tq≠0
tq≠0
tq≠0
tq=???
20
tq≠0
tq=0
tq=0
tq=0
tq=?
tq≠0
tq≠0
tq≠0
tq=???
21
2.6 scheduler
run queue
task migration
(put + pull)
run queue
22
2.6 scheduler –
User Preemption
• User preemption can occur
– When returning to user-space from a system call
– When returning to user-space from an interrupt
handler
23
2.6 scheduler –
Kernel Preemption
• The Linux kernel is a fully preemptive kernel.
– It is possible to preempt a task at any point, so long as the
kernel is in a state in which it is safe to reschedule.
– “safe to reschedule”: kernel does not hold a lock
• The Linux design:
– additing of a preemption counter, preempt_count, to each
process's thread_info
– This count increments once for each lock that is acquired and
decrements once for each lock that is released
• Kernel preemption can also occur explicitly, when a task in
the kernel blocks or explicitly calls schedule().
– no additional logic is required to ensure that the kernel is in a
state that is safe to preempt!
24
Kernel Preemption
• Kernel preemption can occur
– When an interrupt handler exits, before returning
to kernel-space
– When kernel code becomes preemptible again
– If a task in the kernel explicitly calls schedule()
– If a task in the kernel blocks (which results in a call
to schedule())
25
O(1) & CFS scheduler
• 2.5 ~ 2.6.22: O(1) scheduler
– Time complexity: O(1)
– Using “run queue” (an active Q and an expired Q)
to realize the ready queue
• 2.6.23~present: Completely Fair Scheduler
(CFS)
– Time complexity: O(log n)
– the ready queue is implemented as a red-black
tree
26
O(1) scheduler
• Implement fully O(1) scheduling.
– Every algorithm in the new scheduler completes in constant-time, regardless of the
number of running processes. (Since the 2.5 kernel).
• Implement perfect SMP scalability.
– Each processor has its own locking and individual runqueue.
• Implement improved SMP affinity.
– Attempt to group tasks to a specific CPU and continue to run them there.
– Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes.
• Provide good interactive performance.
– Even during considerable system load, the system should react and schedule interactive
tasks immediately.
• Provide fairness.
– No process should find itself starved of timeslice for any reasonable amount of time.
Likewise, no process should receive an unfairly high amount of timeslice.
• Optimize for the common case of only one or two runnable processes, yet
scale well to multiple processors, each with many processes.
27
The Priority Arrays
• Each runqueuecontains two priority arrays (defined in
kernel/sched.c as struct prio_array)
– Active array: all tasks with timeslice left.
– Expired array: all tasks that have exhausted their timeslice.
• Priority arrays provide O(1) scheduling.
– Each priority array contains one queue of runnable processors per
priority level.
– The priority arrays also contain a priority bitmap used to efficiently
discover the highest-priority runnable task in the system.
28
The Linux O(1) scheduler algorithm
29
The Priority Arrays
• Each runqueuecontains two priority arrays (defined in
kernel/sched.cas struct prio_array)
– Active array: all tasks with timesliceleft.
– Expired array: all tasks that have exhausted their timeslice.
• Priority arrays provide O(1) scheduling.
– Each priority array contains one queue of runnable processors per
priority level.
– The priority arrays also contain a priority bitmap used to efficiently
discover the highest-priority runnable task in the system.
30

runqueue

Each runqueue contains two priority
arrays – active and expired.
Each of these priority arrays contains a
list of tasks indexed according to priority
Priority
queue
(0-139)
expired
active
31

runqueue
Linux assigns higher-priority tasks
longer time-slice
Time quantum ≈
1/priority
tsk1
tsk2
tsk3
active
expired
32

runqueue
Linux chooses the task with the
highest priority from the active
array for execution.
tsk1
tsk2
tsk3
active
expired
33
runqueue
tsk1
Round-robin
tsk2
tsk3
active
expired
34
runqueue
tsk1
Round-robin
tsk3
tsk2
active
expired
35
runqueue
tsk1
tsk2
tsk3
active
expired
36

runqueue

Most tasks have dynamic priorities
that are based on their “nice” value
(static priority) plus or minus 5
Interactivity of a task ≈
1/sleep_time
dynPrio = staticPrio +
bonus
bonus = -5 ~ +5
bonus ≈ 1/sleep_time
tsk1
tsk3
tsk2
tsk3
I/O
bound
active
expired
37

runqueue
When all tasks have exhausted
their time slices, the two priority
arrays are exchanged!
tsk1
tsk3
tsk2
active
expired
38
The O(1) scheduling algorithm
sched_find_first_bit()
1
1
1
tsk1
tsk3
tsk2
39
The O(1) scheduling algorithm
Insert O(1)
1
1
1
Remove O(1)
find first set bit O(1)
40
find first set bit O(1)
static inline unsigned long __ffs
(unsigned long word) {
int num = 0;
#if BITS_PER_LONG == 64
if ((word & 0xffffffff) == 0) {
num += 32;
word >>= 32;
}
#endif
if ((word & 0xffff) == 0) {
num += 16;
word >>= 16;
}
if ((word & 0xff) == 0) {
num += 8;
word >>= 8;
}
if ((word & 0xf) == 0) {
num += 4;
word >>= 4;
}
if ((word & 0x3) == 0) {
num += 2;
word >>= 2;
}
if ((word & 0x1) == 0)
num += 1;
return num;
}
41
2.6 scheduler –
CFS
• Classical schedulers compute time slices for
each process in the system and allow them to
run until their time slice/quantum is used up.
– After that, all process need to be recalculated.
• CFS considers only the wait time of a process
– The task with the most need for CPU time is
scheduled.
42
2.6 scheduler –
CFS
43
2.6 scheduler –
CFS (motivation)
• Traditional Linux scheduling policy
HP
LP
HP: high priority
LP: low priority
44
2.6 scheduler –
CFS (motivation)
• Traditional Unix scheduling policy
WT
WT
WT
WT
WT: waiting time
45
2.6 scheduler –
CFS (the idea case)
task
task
task
task
task
task
virtualization
High priority
Low priority
46
2.6 scheduler –
CFS (the idea case)
task
task
task
task
task
task
virtualization
fastly
slowly
47
2.6 scheduler –
CFS (the idea case)
task
8
task
:
8
task
:
8
task
:
3
task
:
3
task
:
3
speed
48
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
49
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
50
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
51
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
52
time
2.6 scheduler –
CFS (the idea case)
8
8
8
3
3
3
53
time
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
54
time
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
55
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
56
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
57
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
58
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
59
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
60
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
61
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
62
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
63
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
64
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
65
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
66
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
67
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
68
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
69
2.6 scheduler –
CFS (the implementation)
8
8
8
3
3
3
70
2.6 scheduler –
CFS
71
2.6 scheduler –
the RB-Tree
• To sort tasks on the red-black tree, the kernel
uses the difference fair_clock -wait_runtime.
– While fair_clock is a measure for the CPU time a
task would have gotten if scheduling were
completely fair,
– wait_runtime is a direct measure for the
unfairness caused by the imperfection of real
systems.
72
2.6 scheduler –
issues
• Different priority levels for tasks (i.e., nice
values) must be taken into account
• Tasks must not be switched too often because
a context switch has a certain overhead.
73
2.6 scheduler –
fields in the task_struct
74
2.6 scheduler –
fields in the task_struct
• prio and normal_prio indicate the dynamic
priorities, static_prio the static priority of a
process.
– The static priority is the priority assigned to the
process when it was started.
– normal_priority denotes a priority that is
computed based on the static priority and the
scheduling policy of the process.
75
2.6 scheduler –
fields in the task_struct
• The scheduler is not limited to schedule
processes, but can also work with larger
entities. This allows for implementing group
scheduling.
76
2.6 scheduler –
fields in the task_struct
• cpus_allowed is a bit field used on
multiprocessor systems to restrict the CPUs on
which a process may run.
– setaffinity()
– getaffinity()
77
2.6 scheduler –
priority
78
2.6 scheduler –
priority
kernel/sched.c
static const int prio_to_weight[40]
/* -20 */
/* -15 */
/* -10 */
/* -5 */
/*
0 */
88761,
29154,
9548,
3121,
1024,
71755,
23254,
7620,
2501,
820,
56483,
18705,
6100,
1991,
655,
/*
/*
/*
};
335,
110,
36,
272,
87,
29,
215,
172,
137,
70,
56,
45,
23,
18,
15,
5 */
10 */
15 */
46273,
14949,
4904,
1586,
526,
= {
36291,
11916,
3906,
1277,
423,
prio_to_weight[i] = prio_to_weight[i] ×1.25
79
2.6 scheduler –
priority
80
Summary
• The concept of OS schedulers
• Maximize throughput.
– This is what system administrators care about.
– How to maximize throughput (CPU & I/O).
• What is the major drawback of Linux 2.4
scheduler
• Pros and cons of Linux 2.6 schedulers
– O(1)
– CFS
81