Transcript Concurrency
Advanced
Operating Systems
Lecture 7: Concurrency
University of Tehran
Dept. of EE and Computer Engineering
By:
Dr. Nasser Yazdani
Univ. of Tehran
Distributed Operating Systems
1
How to use shared resource
Some general problem and solutions.
References
Univ. of Tehran
Distributed Operating Systems
2
Outline
Introduction
Motivation
Implementing mutual exclusion
Implementing restartable atomic sequence
Kernel design considerations
The performance of three software
techniques
Conclusions
Univ. of Tehran
Distributed Operating Systems
3
Why Coordinate?
Critical section:
Must execute atomically, without
interruption.
Atomicity usually only w.r.t. other operations
on the same data structures.
What are sources of interruption?
Hardware interrupts, UNIX signals.
Thread pre-emption.
Interleaving of multiple CPUs.
Univ. of Tehran
Distributed Operating Systems
4
Spooling Example: Correct
Process 1
int next_free;
Shared memory
Process 2
int next_free;
…
out
1 next_free = in;
2 Stores F1 into
next_free;
3 in=next_free+1
4
abc
5
Prog.c
6
7
Prog.n
F1
in
F2
4 next_free = in
5 Stores F2 into
next_free;
…
Univ. of Tehran
Distributed Operating Systems
6 in=next_free+1
5
Spooling Example: Races
Process 1
int next_free;
Shared memory
Process 2
int next_free;
…
out
1 next_free = in;
3 Stores F1 into
next_free;
4 in=next_free+1
4
abc
5
Prog.c
6
7
Prog.n
F1
F2
2
next_free = in
/* value: 7 */
in
5 Stores F2 into
next_free;
…
Univ. of Tehran
Distributed Operating Systems
6 in=next_free+1
6
Critical Section Problem
N threads all competing to use the same shared
data
It might eventuate to Race condition
Each thread has a code segment, called a
critical section, in which share data is accessed
We need to ensure that when one thread is
executing in its critical section, no other thread
is allowed to execute in its critical section
Univ. of Tehran
Distributed Operating Systems
7
Critical Region (Critical
Section)
Process {
while (true) {
ENTER CRITICAL SECTION
Access shared variables; // Critical
Section; LEAVE CRITICAL SECTION
Do other work
}
}
Univ. of Tehran
Distributed Operating Systems
8
Critical Region
Requirement
Mutual Exclusion:
one process must execute within the critical.
Progress:
If no process is waiting in its critical section and several processes are
trying to get into their critical section, then entry to the critical section
cannot be postponed indefinitely.
No process running outside its critical region may block other
processes
Bounded Wait:
A process requesting entry to a critical section should only have to
wait for a bounded number of other processes to enter and leave the
critical section.
No process should have to wait forever to enter its critical region
Speed
and Number
of CPUs:
Univ. of Tehran
Distributed Operating Systems
9
Critical Regions (2)
Mutual exclusion using critical regions
Univ. of Tehran
Distributed Operating Systems
10
Synchronization approaches
Disabling Interrupts
Lock Variables
Strict Alternation
Peterson’s solution
TSL
Sleep and Wakeup
Message sending
Univ. of Tehran
Distributed Operating Systems
11
Disabling Interrupts
How does it work?
Why does it work?
With interrupts disabled, no clock interrupts can occur.
(The CPU is only switched from one process to another
as a result of clock or other interrupts, and with
interrupts disabled, no switching can occur.)
Problems:
Disable all interrupts just after entering a critical section
and re-enable them just before leaving it.
What if the process forgets to enable the interrupts?
Multiprocessor? (disabling interrupts only affects one
CPU)
Only used inside OS
Univ. of Tehran
Distributed Operating Systems
12
Lock Variables
Int lock;
lock:=0
While (lock);
lock = 1;
EnterCriticalSection;
access shared variable;
LeaveCriticalSection;
lock = 0;
Does the above code work?
Univ. of Tehran
Distributed Operating Systems
13
Strict Alternation
Thread Me; /* For two threads */
{
while (true)
{ while ( turn != my_thread_id) { };
Access shared variables; // Critical Section;
turn = other_thread_id;
Do other work
}
}
Satisfies mutual exclusion but not progress.
Why?
Notes:
While {turn != my_thread_id}
{}; /* busy waiting*/
A lock (turn variable) that uses busy waiting is called a spin lock
Univ. of Tehran
Distributed Operating Systems
14
Using Flags
int flag[2]= {false, false};
Thread Me;
{
while (true)
{ flag[my_thread_id] = true;
while (flag[other_thread_id] ) { };
Access shared variables; // Critical Section;
flag[my_thread_id] = false;
Do other work
}
}
Can block indefinitely
Why? (You go ahead!)
Univ. of Tehran
Distributed Operating Systems
15
Test & Set (TSL)
Requires hardware support
Does test and set atomically
char Test_and_Set ( char* target);
\\ All done atomically
{ char temp = *target;
*target = true;
return(temp)
}
Univ. of Tehran
Distributed Operating Systems
16
Problems with TSL
Operates at motherboard speeds, not
CPU.
Prevents other use of the memory
system.
Much slower than cached load or store.
Interferes with other CPUs and DMA.
Silly to spin in TSL on a uniprocessor.
Add a thread_yield() after every TSL.
Univ. of Tehran
Distributed Operating Systems
17
Other Similar Hardware
Instruction
Swap = TSL
void Swap (char* x,* y);
\\ All done atomically
{ char temp = *x;
*x = *y;
*y = temp
}
Univ. of Tehran
Distributed Operating Systems
18
Peterson’s Solution
int flag[2]={false, false};
int turn;
Thread Me;
{
while (true)
{ flag[my_thread_id] = true;
turn = other_thread_id;
while (flag[other_thread_id]
and turn == other_thread_id ) { };
Access shared variables; // Critical Section;
flag[my_thread_id] = false;
Do other work
}
}
It works!!!
Why?
Univ. of Tehran
Distributed Operating Systems
19
Sleep and Wakeup
Problem with previous solutions
Busy waiting
Wasting CPU
Priority Inversion:
a high priority waits for a low priority to leave the critical section
the low priority can never execute since the high priority is not
blocked.
Solution: sleep and wakeup
When blocked, go to sleep
Wakeup when it is OK to retry entering the critical
section
Semaphore operation that executes sleep and wakeup
Univ. of Tehran
Distributed Operating Systems
20
Semaphores
A semaphore count represents count
number of abstract resources.
New variable having 2 operations
The Down (P) operation is used to acquire a
resource and decrements count.
The Up (V) operation is used to release a
resource and increments count.
Any semaphore operation is indivisible
(atomic)
Semaphores solve the problem of the
wakeup-bit
Univ. of Tehran
Distributed Operating Systems
21
What’s Up? What’s Down?
Definitions of P and V:
Down(S) {
while (S <= 0) { }; // no-op
S= S-1;
}
Up(S) {
S++;
}
Counting semaphores: 0..N
Binary semaphores: 0,1
Univ. of Tehran
Distributed Operating Systems
22
Possible Deadlocks with
Semaphores
Example:
P0
P1
share two semaphores S and Q
S:= 1; Q:=1;
Down(S); // S=0 ------------> Down(Q); //Q=0
Down(Q); // Q= -1 <---------------------------> Down(S); // S=-1
// P0 blocked
// P1 blocked
Up(S);
Up(Q);
DEADLOCK
Up(Q);
Up(S);
Univ. of Tehran
Distributed Operating Systems
23
Monitor
A simpler way to synchronize
A set of programmer defined operators
monitor monitor-name {
// variable declaration
public entry P1(..);
{... };
......
public entry Pn(..);
{...};
begin
initialization code
end
Univ. of Tehran
Distributed Operating Systems
24
Monitor Properties
The internal implementation of a monitor type
cannot be accessed directly by the various threads.
The encapsulation provided by the monitor type
limits access to the local variables only by the local
procedures.
Monitor construct does not allow concurrent access
to all procedures defined within the monitor.
Only one thread/process can be active within the
monitor at a time.
Synchronization is built in.
Univ. of Tehran
Distributed Operating Systems
25
Cooperating Processors via
Message Passing
IPC is best provided by a “messaging system”
Messaging system and shared memory system
are not mutually exclusive, they can be used
simultaneously within a single OS or single
process
Two basic operations:
Send (destination, &message)
Receive (source, &message)
Message size: Fixed or Variable size.
Real life analogy: conversation
Univ. of Tehran
Distributed Operating Systems
26
Message Passing
Univ. of Tehran
Distributed Operating Systems
27
Direct Communication
Binds the algorithm to Process name
Sender explicitly names the received or
receiver explicitly names the sender
Send(P,message)
Receive(Q,message)
Link is established automatically between
every paid of processes that want to
communicate
Processes must know about each other
identity
One
link per pairDistributed
of processes
Univ. of Tehran
Operating Systems
28
Indirect Communication
send(A,message) /* send
a message to mailbox A */
receive(A,message) /*
receive a message from
mailbox A */
Mailbox is an abstract
object into which a
message can be placed to
or removed from.
Mailbox is owned either by
a process or by the system
Univ. of Tehran
Distributed Operating Systems
29
Mailbox Communication
Ownership by Process
When the process terminates, mailbox disappears
Process that declares a mailbox is the owner
Any other process that knows the name of the
mailbox, can use it.
Ownership by System
Mailbox is independent of any process
OS provides mechanisms to all the user to
Create a mailbox
Send and receive messages through the mailbox
Destroy
mailbox Distributed Operating Systems
Univ.
of Tehran
30
Asynchronous Messaging
In automatic buffering, sender does not
know if the message was received
Use ACKNOWLEDGEMENT mechanism
Process P:
Send(Q,message)
Receive(Q,message)
Process Q:
Receive(P,message)
Send(P,”acknowledgement”)
Univ. of Tehran
Distributed Operating Systems
31
Message Problems (1)
Using messaging systems failures can occur
We need error recovery: exception-condition
handling
Process termination
P(sender) terminates and Q(receiver) waits (blocks)
for ever
Solution: System terminates Q; system notifies Q that P
terminated;Q has an internal mechanisms (timer) how long
to wait for message from P
P(sender) sends a message and Q(receiver)
terminates, in automatic buffering, P sends messages
until buffer is full or forever; in no-buffering scheme,
Univ.
of Tehran for ever
Distributed Operating Systems
P blocks
32
Message Problems (2)
Lost Messages
OS guarantees retransmission
Sender is responsible for detecting lost
messages using timeouts
Sender gets an exception
Scrambled Messages
Message arrives from sender P to receiver Q,
but it is scrambled and corrupted due to
noise in the communication channel
Solution: need error detection mechanism such as
CHECHSUM to detect any error; need error
correction mechanisms to correct errors (e.g.
retransmission)
Univ. of Tehran
Distributed Operating Systems
33
Fast Mutual Exclusion for Uniprocessors
Describe restartable atomic sequences (an
optimistic mechanism for implementing atomic
operations on a uniprocessor)
Assumes that short, atomic sequences are
rarely interrupted.
Rely on a recovery mechanisms.
Performance improvements.
Univ. of Tehran
Distributed Operating Systems
34
Motivation of efficient
mutual-exclusion
Modern applications use multiple threads
As a program structuring device
As a mechanism for portability to multiprocessors
As a way to manage I/O and server concurrency
Many OSs are build on top of a microkernel
Many services are implemented as multithreaded
user-level applications
Even single threaded programs rely on basic OS
services that are implemented outside the kernel
Univ. of Tehran
Distributed Operating Systems
35
Implementing mutual
exclusion on a uniprocessor
Pessimistic methods
Memory-interlocked instruction
Software reservation
Kernel emulation
Restartable atomic sequences
Univ. of Tehran
Distributed Operating Systems
36
Memory-interlocked
instruction
Implicitly delays interrupts until the
instruction completes.
Require special hardware support from
the processor and bus.
The cycle time for an interlocked access is
several times greater than that for a noninterlocked access.
Univ. of Tehran
Distributed Operating Systems
37
Software reservation
Explicitly guards against arbitrary
interleaving.
A thread must register its intent to
perform an atomic operation, and then
wait.
Examples:
Dekker’s algorithm
Lamport’s algorithm
Peterson’s algorithm
Univ. of Tehran
Distributed Operating Systems
38
Kernel emulation
A strictly uniprocessor solution
Explicitly disables interrupts during
operations that must execute atomically.
Although requires no special hardware, its
runtime cost is high.
The kernel must be invoked on every
synchronization operation
Univ. of Tehran
Distributed Operating Systems
39
Restartable atomic
sequence
Instead of using a mechanism that guards against
interrupts, we can instead recognize when an
interrupt occurs and recover.
The recovery process: “restart the sequence”.
Are attractive because
Do not require hardware support.
Have a short code path with one load and store per
atomic read-modify-write.
Do not involve the kernel on every atomic operation.
Univ. of Tehran
Distributed Operating Systems
40
Implementing restartable
atomic sequences
Require kernel support to ensure that a
suspended thread is resumed at the
beginning of the sequence.
Strategies for implementing kernel
Explicit registration in Mach
Designated sequences in Taos
Univ. of Tehran
Distributed Operating Systems
41
Explicit registration in
Mach
The kernel keeps track of each address
space’s restartable atomic sequence.
An application registers the starting
address and length of the sequence with
kernel.
In response to the failure
Replace restartable atomic sequence with
conventional mechanisms code.
Univ. of Tehran
Distributed Operating Systems
42
Costs of explicit
registration
Cost of subroutine linkage
Because the kernel identifies restartable
atomic sequences by a single PC range per
address space, They cannot be inlined.
Cost of checking return PC
Kernel must check the return PC, whenever
a thread is suspended.
Make additional scheduling overhead
worthwhile.
Univ. of Tehran
Distributed Operating Systems
43
Designated sequences in
Taos
The kernel must recognize every
interrupted sequence.
Uses two-stage check to recognize
atomic sequences.
1st: rejects most interrupted code sequences
that are not restartable.
(the opcode of the suspended instruction is used as an
index into a hash table containing instructions eligible
to appear in a restartable atomic sequence)
2nd: uses another table
Univ. of Tehran
(indexed by opcode)
Distributed Operating Systems
44
Costs of designated
sequences
Cost of the two-stage check on every
thread switch.
Univ. of Tehran
Distributed Operating Systems
45
Kernel design
considerations
Placement of the PC check
Mutual exclusion in the kernel
Univ. of Tehran
Distributed Operating Systems
46
Placement of the PC check
When should the kernel check/adjust the PC of
a suspended thread?
When it is first suspended.
When it is about to be resumed.
Detection at user level
Whenever a suspended thread is resumed by the
kernel, it returns to a fixed user-level sequence.
Determine if the thread was suspended within a
restartable atomic sequence.
(complexity and overhead -&- save return address to
user-level stack at each suspension)
Univ. of Tehran
Distributed Operating Systems
47
Mutual exclusion in the
kernel
The kernel is itself a client of thread
management facilities.
Two events, can trigger a thread switching
Page fault
Thread preemption
Careless ordering of the PC check could lead to
mutual recursion between the thread scheduler
and the virtual memory system.
Univ. of Tehran
Distributed Operating Systems
48
software techniques of
mutual exclusion
“R.A.S.” via “Kernel Emulation” via “Software reservation”
Discuss performance at three levels
Basic overhead of various mechanisms.
Effect on the performance of common thread
management operations.
Effect of mutual exclusion overhead on the
performance of several application.
Univ. of Tehran
Distributed Operating Systems
49
Microbenchmarks
The performance is with test which enters
critical section (TSL) in a loop for 1M
Two version of Lamprot algorith (fast and meta)
Univ. of Tehran
Distributed Operating Systems
50
Thread management
overhead
Different thread management packages
Two thread
using mutex
and condition
variable
alternatively
Univ. of Tehran
Distributed Operating Systems
51
Application performance
afs-bench: file sys intensive like cp
Parthenon-n: theorem prover with n threads
Procon-64: producer-consumer
Thread suspensions: for R.A.S # of time to check
Univ. of Tehran
Distributed Operating Systems
52
Conclusions
R.A.S. represent a “common case”
approach to mutual exclusion on a
uniprocessor.
R.A.S. are appropriate for uniprocessors
that do not support memory-interlocked
atomic instructions.
Also on processors that do have hardware
support for synchronization, better
performance may be possible.
Univ. of Tehran
Distributed Operating Systems
53
Next Lecture
Distributed systems
References
Read the first chapter of the book
Univ. of Tehran
Distributed Operating Systems
54