Scheduling in Linux
Download
Report
Transcript Scheduling in Linux
Scheduling in
Linux and
Windows 2000
CS 431
Sanjay Kumar (01D07041)
Sanjay Kumar Ram (01D07042)
Linux
kernel
Linux
process
Scheduling
policy
Data
Structures and Scheduling Algorithm in
Uniprocessor System
Scheduling
algorithm for multiprocessor
Scheduling
in windows 2000
Linux Kernel
Kernel subsystem overview
Linux Processes
Three class of processes
Interactive processes
Batch processes
Real-time processes
Linux processes have the following states:
Running
Waiting
Stopped
Zombie
Scheduling Policy
Linux kernel is non-preemptive but processes are preemptive
The set of rules used to determine when and how selecting a new process to
run is called scheduling policy
Linux scheduling is based on the time-sharing technique -several
processes are allowed to run "concurrently“
CPU time is roughly divided into "slices“
scheduling policy is also based on ranking processes according to their
priority
Static priority - assigned by the users to real-time processes and
ranges from 1 to 99, and is never changed by the scheduler
Dynamic priority - the sum of the base time quantum (the base priority
of the process) and of the number of ticks of CPU time left to the process
before its quantum expires in the current epoch
Scheduling Policy (contd.)
Process Preemption
How Long Must a Quantum Last?
When the dynamic priority of the currently running process is
lower than the process waiting in the runqueue
A process may also be preempted when its time quantum
expires
Quantum duration too short - system overhead caused by
task switching becomes excessively high
Quantum duration too long - processes no longer appear to
be executed concurrently, degrades the response time of
interactive applications, and degrades the responsiveness of
the system
The rule of thumb adopted by Linux is: choose a
duration as long as possible, while keeping good
system response time
Data Structures and Scheduling
Algorithm in Uniprocessor System
The Linux scheduling algorithm works by dividing the CPU time
into epochs
Each process has a base time quantum
In a single epoch, every process has a specified time quantum
epoch ends when all runnable processes have exhausted their quantum
the time-quantum value assigned by the scheduler to the process if it has
exhausted its quantum in the previous epoch
users can change the base time quantum of their processes by using the
nice( ) and setpriority( ) system calls
The INIT_TASK macro sets the value of the base time quantum
of process 0 (swapper) to DEF_PRIORITY, which is approx.
200ms
Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
The first entry in the process table is the special init process
Task_struct - represents the states of all tasks running in the
systems
schedule policy - SCHED_OTHER, SCHED_FIFO, SCHED_RR
field for process state – running, waiting, stopped, zombie
a field that indicates the processes priority
counter - a field which holds the number of clock ticks
a doubly linked list (next_task and prev_task) - to keep track
of all executing processes
mm_struct - contains a process's memory management
information
process ID and group ID are also stored
fs_struct - File specific process data
fields that hold timing information
Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
Each time the scheduler is run it does the following:
Kernel work - scheduler runs the bottom half handlers and processes the
scheduler task queue
Current process
current process must be processed before another process can be
selected to run
If the scheduling policy of the current processes is round robin then it is
put onto the back of the run queue
If the task is INTERRUPTIBLE and it has received a signal since the last
time it was scheduled then its state becomes RUNNING
If the current process has timed out, then its state becomes RUNNING
If the current process is RUNNING then it will remain in that state
Processes that were neither RUNNING nor INTERRUPTIBLE are
removed from the run queue
Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
Process Selection
most deserving process is selected by the scheduler
real time processes are given higher priority than ordinary
processes
when several processes have the same priority, the one
nearest the front of the run queue is chosen
when a new process is created the number of ticks left to the
parent is split in two halves, one for the parent and one for
the child
priority and counter fields are used both to implement timesharing and to compute the process dynamic priority
Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
schedule( ) Function
implements the scheduler.
finds a process in the runqueue list and then assigns the CPU to it
Direct invocation
1. Inserts current in the proper wait queue
2. Changes the state of current either to TASK_INTERRUPTIBLE or to
TASK_UNINTERRUPTIBLE
3. Invokes schedule( )
4. Checks if the resource is available; if not, goes to step 2
5. Once the resource is available, removes current from the wait queue
Lazy invocation
When current has used up its quantum of CPU time; this is done by the
update_process_times( ) function
When a process is woken up and its priority is higher than that of the
current process; performed by the reschedule_idle( ) function, which is
invoked by the wake_up_process( ) function
When a sched_setscheduler( ) or sched_ yield( ) system call is issued
Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
Actions performed by schedule( )
Before actually scheduling a process, the schedule( ) function starts by
running the functions left by other kernel control paths in various queues
The function then executes all active unmasked bottom halves
Scheduling
value of current is saved in the prev local variable and the
need_resched field of prev is set to 0
a check is made to determine whether prev is a Round Robin real-time
process. If so, schedule( ) assigns a new quantum to prev and puts it at
the bottom of the runqueue list
if state is TASK_INTERRUPTIBLE, the function wakes up the process
schedule( ) repeatedly invokes the goodness( ) function on the
runnable processes to determine the best candidate
when counter field becomes zero, schedule( ) assigns to all existing
processes a fresh quantum, whose duration is the sum of the priority
value plus half the counter value
Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
goodness( ) function
identify the best candidate among all processes in the runqueue list.
It receives as input parameters prev (the descriptor pointer of the
previously running process) and p (the descriptor pointer of the
process to evaluate)
The integer value c returned by goodness( ) measures the
"goodness" of p and has the following meanings:
c = -1000, p must never be selected; this value is returned when the
runqueue list contains only init_task
c =0, p has exhausted its quantum. Unless p is the first process in the
runqueue list and all runnable processes have also exhausted their
quantum, it will not be selected for execution.
0 < c < 1000, p is a conventional process that has not exhausted its
quantum; a higher value of c denotes a higher level of goodness.
c >= 1000, p is a real-time process; a higher value of c denotes a higher
level of goodness.
Scheduling algorithm for
multiprocessor
The Linux scheduler is defined in kernel/sched.c.
The new scheduler have been designed to accomplish
specific goals:
Implement fully O(1) scheduling, Every algorithm in the
new scheduler completes in constant-time, regardless of the
number of running processes or any other input.
Implement perfect SMP scalability, Each processor has
its own locking and individual runqueue.
Implement improved SMP affinity, Naturally attempt to
group tasks to a specific CPU and continue to run them
there. Only migrate tasks from one CPU to another to
resolve imbalances in runqueue sizes.
Scheduling algorithm for multiprocessor
(contd.)
Provide good interactive performance, Even during
considerable system load, the system should react and schedule
interactive tasks immediately.
Provide fairness, No process should find itself starved of
timeslice for any reasonable amount of time. Likewise, no
process should receive an unfairly high amount of timeslice.
Optimize for the common case of only 1-2 runnable processes,
yet scale well to multiple processors each with many processes
Runqueues
It is the basic data structure in the scheduler.
It is defined in kernel/sched.c as struct runqueue.
It is the list of runnable processes on a given processor; there is
one runqueue per processor. Each runnable process is on
exactly one runqueue.
It additionally contains per-processor scheduling information.
Scheduling algorithm for multiprocessor
(contd.)
A group of macros is used to obtain specific runqueues.
cpu_rq(processor) returns a pointer to the runqueue
associated with the given processor.
this_rq() returns the runqueue of the current processor.
task_rq(task) returns a pointer to the runqueue on which the
given task is queued.
this_rq_lock() locks the current runqueue
rq_unlock(struct runqueue *rq) unlocks the given
runqueue.
double_rq_lock() and double_rq_unlock(), to avoid
deadlock
Scheduling algorithm for multiprocessor
(contd.)
The
Priority Arrays
Each runqueue contains two priority arrays, the active
and the expired array.
Each priority array contains one queue of runnable
processors per priority level.
Priority arrays are the data structure that provide O(1)
scheduling.
It also contain a priority bitmap used to efficiently
discover the highest priority runnable task in the
system.
Each priority array contains a bitmap field that has at
least one bit for every priority on the system.
Scheduling algorithm for multiprocessor
(contd.)
Finding the highest priority task on the system is therefore only a
matter of finding the first set bit in the bitmap.
Each priority array also contains an array called queue of struct
list_head queues, one queue for each priority.
The Linux O(1) scheduler algorithm.
Scheduling algorithm for multiprocessor
(contd.)
Recalculating Timeslices
The new Linux scheduler alleviates the need for a recalculate
loop. Instead, it maintains two priority arrays for each processor:
1. active array, contains all the tasks in the associated runqueue
that have timeslice left.
2. expired array , contains all the tasks in the associated
runqueue that have exhausted their timeslice.
When each task's timeslice reaches zero, its timeslice is
recalculated before it is moved to the expired array.
Scheduling algorithm for multiprocessor
(contd.)
•
Recalculating all the timeslices is then as simple as just
switching the active and expired arrays. Because the
arrays are accessed only via pointer, switching them is
as fast as swapping two pointers.
•
This swap is a key feature of the new O(1) scheduler.
Instead of recalculating each process's priority and
timeslice all the time, the O(1) scheduler performs a
simple two-step array swap.
Scheduling algorithm for multiprocessor
(contd.)
Sleeping and waking up
Scheduling algorithm for multiprocessor
(contd.)
The load balancer
Scheduling in windows 2000
Windows 2000 scheduler:
Concern is with threads not processes
• i.e. threads are scheduled, not processes
Threads have priorities 0 through 31
31
16 “real-time” levels
16
15
15 variable levels
1
0
i
Used by zero page thread
Used by idle thread(s)
Manual Process Priority
Adjustments
Thread Scheduling
Strictly priority driven
32 queues (FIFO lists) of ready threads
• One queue for each priority level
• Queues are common to all CPUs
When thread becomes ready, it:
• either runs immediately, or
• is inserted at the tail end of the Ready queue for its current
priority
On a uniprocessor, highest priority Ready thread
always runs
Time-sliced, round-robin within a priority level
Thread Scheduling
Multiprocessor Issues
On a multiprocessor, highest-priority n threads
will always run (subject to Affinity, explained
later)
No attempt is made to share processors “fairly”
among processes, only among threads
Tries to keep threads on the same CPU
Scheduling Scenarios
Preemption
A thread becomes ready at a higher priority
than the running thread
• Lower-priority thread is preempted
• Preempted thread goes to the tail of its Ready
queue
Scheduling Scenarios (contd.)
Voluntary switch
When the running thread gives up the CPU
• Waiting on a dispatcher object
• Termination
• Explicit lowering of priority
Schedule the thread at the head of the next nonempty Ready queue
Running thread experiences quantum end
Priority is decremented unless at thread base priority
Thread goes to tail of Ready queue for its new priority
May continue running if no equal or higher-priority
threads are Ready – i.e. it “gets” the new quantum
Priority Adjustments
Threads in “dynamic” classes can have priority
adjustments applied to them (Boost or Decay)
Idle, Below Normal, Normal, Above Normal and High
Carried out upon wait completion
Used to avoid CPU starvation
No automatic adjustments in “real-time” class
Priority 16 and above
Scheduling is therefore “predictable” with respect to
the “real-time” threads
• Note though that this does not mean that there are absolute
latency guarantees
Priority Boosting
Priority
boost takes place after a wait
Occurs when a wait (usually I/O) is resolved
• Slow devices / long waits = big boost
• Fast devices / short waits = small boost
Boost is applied to thread’s base priority
• Does not go over priority 15
Keeps I/O devices busy
CPU Starvation
Balance Set Manager looks for “starved” threads
Special boost applied
BSM is a thread running at priority 16, waking up
once per second
Looks at threads that have been Ready for 4 seconds
or more
Boosts up to 10 Ready threads per pass
Priority 15
Quantum is doubled
Does not apply in real-time range
Multiprocessor Support
By default, threads can run on any available
processor
Soft affinity (introduced in NT 4.0)
Every thread has an “ideal processor”
When thread becomes ready:
• if “ideal” is idle, it runs there
• else, if previous processor is idle, it runs there
• else, may look at next thread, to run on its ideal processor
Multiprocessor Support (contd.)
Hard
affinity
Restricts thread to a subset of the available
CPUs
Can lead to:
• threads getting less CPU time that they normally
would
References
http://iamexwiwww.unibe.ch/studenten/schlpbch/linuxSch
eduling/LinuxScheduling.html
http://oopweb.com/OS/Documents/tlk/VolumeFrames.ht
ml
http://www.samspublishing.com/articles/article.asp?p=10
1760&rl=1
http://www.oreilly.com/catalog/linuxkernel/chapter/ch10.h
tml
G. M. Candea and M. B. Jones, “ Vassal: Loadable
Scheduler Support for Multi-Policy Scheduling,”
Proceedings of the 2nd USENIX Windows NT
Symposium Seattle, Washington, August 3–4, 1998