Lecture3-os-support
Download
Report
Transcript Lecture3-os-support
Lecture III: OS Support
CMPT 401
Dr. Alexandra Fedorova
The Role of the OS
• The operating system needs to provide support
for implementation of distributed systems
• We will look at how distributed systems services
interact with the operating systems
• We will discuss the support that the operating
system needs to provide
CMPT 401 © A. Fedorova
2
Direct Interaction with the OS
Process:
a DS
component
• A process directly interacts
with the OS via system calls
• Example: a web browser, a
web server
system calls
OS
CMPT 401 © A. Fedorova
3
Interaction via Middleware Layer
Process:
a DS
component
Function calls or
IPC
Middleware
system calls
• A process directly interacts
with the OS via a middleware
layer
• A middleware layer directly
interacts with the OS
• Example: a peer-to-peer file
system implemented over a
distributed hash table
OS
CMPT 401 © A. Fedorova
4
Interaction via Inclusion
OS
DS component
• A DS component is a part of the operating system, i.e., an operating
system daemon
• Example: Network File System (NFS) daemon
• Runs as a kernel thread, shares address space with the kernel,
interacts with the rest of the OS via function calls
• Why would one want to build a DS component that interacts with
the OS via inclusion?
CMPT 401 © A. Fedorova
5
Digression: Protection Implementation
In the Kernel
•
•
•
•
•
System calls are expensive
Why? – Protection domains
Refresh memory protection from your OS class
Good thing: we get memory protection
Bad thing: crossing protection domains is
expensive. Why?
• So is this the best solution?
CMPT 401 © A. Fedorova
6
Alternative: Protection Via Language
• Safety features are guaranteed by language/runtime
• Compiler checks safe memory access
• In addition there are manifests w.r.t. what the process will
and will not do
• This way you get protection
• And no need for hardware protection domains –
everything can run in a single address space
• Singularity: an OS from Microsoft implemented these
concepts
• ... End digression
CMPT 401 © A. Fedorova
7
Infrastructure Provided by the OS
• Networking
– Interface to network devices
– Implementation of common protocols: TPC, UDP, IP
• Processes and threads
– Efficient scheduling, load balancing and thread
switching
– Efficient thread synchronization
– Efficient inter-process communication (IPC)
CMPT 401 © A. Fedorova
8
The Need for Good Process/Thread
Support
•
•
Many distributed applications are implemented using multiple
threads or processes
Why?
CMPT 401 © A. Fedorova
9
Motivation for Multithreaded Designs
•
•
•
compute
•
block
•
Servers provide access to large data
sets (web servers, e-commerce servers)
Even in the presence of caching, they
often need to do I/O (to access files on
disk or a network FS)
I/O takes much longer than
computation
Overlapping I/O with computation to
improve response time
Threads make it easy to overlap I/O
with computation
While one thread blocks on I/O another
can perform computation
time
•
1 request
CMPT 401 © A. Fedorova
Multiple
threads
Single
thread
1.6 requests
10
Process or Thread Scheduling
• Will use “process” and “thread” interchangeably
– A single-threaded process maps to a kernel thread
– Each thread in a multithreaded process (usually) maps to a kernel
thread
• A scheduler decides which thread runs next on the CPU
• To ensure good support for DS components, a scheduler
must:
–
–
–
–
Be scalable
Balance the load well
Ensure good interactive response
Keep context switches to a minimum (why?)
CMPT 401 © A. Fedorova
11
Case Study: Solaris™ 10 OS
• Solaris is often used on server systems
• Known for its good scalability, good load
balancing and interactive performance
• We will look at Solaris runqueues and how they
are managed
– A runqueue is a scheduling queue
– A structure containing pointers to runnable threads –
i.e., threads that are waiting for CPU
CMPT 401 © A. Fedorova
12
Runqueues in Solaris
Global kernel priority
queue kpqueue
User priority queues
for CPU0 disp_qs
Pri 0
Pri 1
User priority queues
for CPU1 disp_qs
Pri N
Pri 0
Pri 1
…
•
•
Pri N
…
There is a user-level queue for each priority level
A dispatcher runs the thread from the highest-priority non-empty queue
CMPT 401 © A. Fedorova
13
Processor Load Balancing
• Load balancing ensures that the load is evenly distributed among the
CPUs on a multiprocessor
• This improves the overall response time
• Solaris kernel ensures that queues are well balanced when it
enqueues a thread into a runqueue
/*
* setbackdq() keeps runqs balanced such that the difference in length
* between the chosen runq and the next one is no more than RUNQ_MAX_DIFF.
* (…)
*/
A comment from Solaris source code. Source:
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/disp.c, line
1200
CMPT 401 © A. Fedorova
14
Tuning Thread Priorities For Improved
Response Time
• If a thread has waited too long for a processor, its priority is elevated,
so no thread is starved
• Threads holding critical resources are put to the front of the queue so
that they release those resources as quickly as possible
/*
* Put the specified thread on the front of the dispatcher
* queue corresponding to its current priority.
*
* Called with the thread in transition, onproc or stopped state
* and locked (transition implies locked) and at high spl.
* Returns with the thread in TS_RUN state and still locked.
*/
A comment on setfrontdq from Solaris source code. Source:
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/disp.c, line
1381
CMPT 401 © A. Fedorova
15
Ensuring Good Responsiveness in TimeSharing Scheduler
• Solaris’s time-sharing scheduler (the default scheduler)
assigns priorities so as to ensure good interactive
performance
• Timeslice: the amount of time a thread can run on CPU
before it is pre-empted
• If thread T used up it’s entire timeslice on CPU:
– priority(T)↓, timeslice(T)↑
• If thread T has given up CPU before using up its timeslice:
– priority(T) ↑, timeslice (T) ↓
• Why is this done?
CMPT 401 © A. Fedorova
16
Time-Sharing Scheduler: Answers
• Minimizing context switch costs:
– CPU-bound threads stay on CPU longer without a context switch
– In compensation, they are scheduled less often, due to decreased
priority
– Reducing the number of context switches improves performance
• Ensuring good response for interactive applications
– Interactive applications usually don’t use up their entire timeslice
– Example: process a network message and release the CPU before
the timeslice expires
– Those applications will have their priority elevated, so they will
respond quickly when response is needed (e.g., the next network
packet arrives)
CMPT 401 © A. Fedorova
17
What Limits Performance of MP/MT
Applications?
• The cost of context switching – depends on the hardware; the OS
cannot fix it alone
– Save/restore the registers
– Flush the CPU pipeline
– If switching address spaces
• May need to flush the TLB (depends on the processor)
• May need to flush the cache (depends on the processor)
• The cost of inter-process communication(IPC): requires context
switching
• The cost of inter-thread synchronization – by and large depends on
the program structure; OS can fix some of it, but not all
CMPT 401 © A. Fedorova
18
Thread Synchronization
CMPT 401 © A. Fedorova
If lock is not available, threads wait
Execution becomes serialized
19
Next…
• Talk about synchronization
• Operating system support for efficient
synchronization
• Transactional memory – new programming
paradigm for efficient synchronization
CMPT 401 © A. Fedorova
20