On Optimistic Methods for Concurrency Control

Transcript On Optimistic Methods for Concurrency Control

Advanced
Operating Systems
Lecture 7: Concurrency
University of Tehran
Dept. of EE and Computer Engineering
By:
Dr. Nasser Yazdani
Univ. of Tehran
Distributed Operating Systems
1
How to use shared resource


Some general problem and solutions.
References


Fast Mutual Exclusion for Uniprocessors
On Optimistic Methods for Concurrency
Control
Univ. of Tehran
Distributed Operating Systems
2
Outline







Introduction
Motivation
Implementing mutual exclusion
Implementing restartable atomic sequence
Kernel design considerations
The performance of three software
techniques
Conclusions
Univ. of Tehran
Distributed Operating Systems
3
Why Coordinate?

Critical section:



Must execute atomically, without
interruption.
Atomicity usually only w.r.t. other operations
on the same data structures.
What are sources of interruption?



Hardware interrupts, UNIX signals.
Thread pre-emption.
Interleaving of multiple CPUs.
Univ. of Tehran
Distributed Operating Systems
4
Spooling Example: Correct
Process 1
int next_free;
Shared memory
Process 2
int next_free;
…
out
1 next_free = in;
2 Stores F1 into
next_free;
3 in=next_free+1
4
abc
5
Prog.c
6
7
Prog.n
F1
in
F2
4 next_free = in
5 Stores F2 into
next_free;
…
Univ. of Tehran
Distributed Operating Systems
6 in=next_free+1
5
Spooling Example: Races
Process 1
int next_free;
Shared memory
Process 2
int next_free;
…
out
1 next_free = in;
3 Stores F1 into
next_free;
4 in=next_free+1
4
abc
5
Prog.c
6
7
Prog.n
F1
F2
2
next_free = in
/* value: 7 */
in
5 Stores F2 into
next_free;
…
Univ. of Tehran
Distributed Operating Systems
6 in=next_free+1
6
Critical Section Problem




N threads all competing to use the same shared
data
It might eventuate to Race condition
Each thread has a code segment, called a
critical section, in which share data is accessed
We need to ensure that when one thread is
executing in its critical section, no other thread
is allowed to execute in its critical section
Univ. of Tehran
Distributed Operating Systems
7
Critical Region (Critical
Section)
Process {
while (true) {
ENTER CRITICAL SECTION
Access shared variables; // Critical
Section; LEAVE CRITICAL SECTION
Do other work
}
}
Univ. of Tehran
Distributed Operating Systems
8
Critical Region
Requirement
Mutual Exclusion:

one process must execute within the critical.
Progress:

If no Waiting process in its critical section, any process entry to its
critical section cannot be postponed indefinitely.

No process running outside its critical region may block other
processes
Bounded Wait:

A process requesting entry to a critical section should only have to
wait for a bounded number of other processes to enter and leave the
critical section.

No process should have to wait forever to enter its critical region
Speed and Number of CPUs:

No of
assumption
Univ.
Tehran
may beDistributed
made about
speeds
Operating
Systems or number of CPUs.
9
Critical Regions (2)
Mutual exclusion using critical regions
Univ. of Tehran
Distributed Operating Systems
10
Synchronization approaches







Disabling Interrupts
Lock Variables
Strict Alternation
Peterson’s solution
TSL
Sleep and Wakeup
Message sending
Univ. of Tehran
Distributed Operating Systems
11
Disabling Interrupts

How does it work?


Why does it work?


With interrupts disabled, no clock interrupts and switch
can occur.
Problems:



Disable all interrupts just after entering a critical section
and re-enable them just before leaving it.
What if the process forgets to enable the interrupts?
Multiprocessor? (disabling interrupts only affects one
CPU)
Only used inside OS
Univ. of Tehran
Distributed Operating Systems
12
Lock Variables
Int lock;
lock:=0
While (lock);
lock = 1;
EnterCriticalSection;
access shared variable;
LeaveCriticalSection;
lock = 0;
Does the above code work?
Univ. of Tehran
Distributed Operating Systems
13
Strict Alternation
Thread Me; /* For two threads */
{
while (true)
{ while ( turn != my_thread_id) { };
Access shared variables; // Critical Section;
turn = other_thread_id;
Do other work
}
}
Satisfies mutual exclusion but not progress.
Why?
 Notes:


While {turn != my_thread_id}
{}; /* busy waiting*/
A lock (turn variable) that uses busy waiting is called a spin lock
Univ. of Tehran
Distributed Operating Systems
14
Using Flags
int flag[2]= {false, false};
Thread Me;
{
while (true)
{ flag[my_thread_id] = true;
while (flag[other_thread_id] ) { };
Access shared variables; // Critical Section;
flag[my_thread_id] = false;
Do other work
}
}
Can block indefinitely
Why? (You go ahead!)
Univ. of Tehran
Distributed Operating Systems
15
Test & Set (TSL)


Requires hardware support
Does test and set atomically
char Test_and_Set ( char* target);
\\ All done atomically
{ char temp = *target;
*target = true;
return(temp)
}
Univ. of Tehran
Distributed Operating Systems
16
Problems with TSL

Operates at motherboard speeds, not
CPU.


Prevents other use of the memory
system.


Much slower than cached load or store.
Interferes with other CPUs and DMA.
Silly to spin in TSL on a uniprocessor.

Add a thread_yield() after every TSL.
Univ. of Tehran
Distributed Operating Systems
17
Other Similar Hardware
Instruction
Swap = TSL
void Swap (char* x,* y);
\\ All done atomically
{ char temp = *x;
*x = *y;
*y = temp
}

Univ. of Tehran
Distributed Operating Systems
18
Peterson’s Solution
int flag[2]={false, false};
int turn;
Thread Me;
{
while (true)
{ flag[my_thread_id] = true;
turn = other_thread_id;
while (flag[other_thread_id]
and turn == other_thread_id ) { };
Access shared variables; // Critical Section;
flag[my_thread_id] = false;
Do other work
}
}
It works!!!
Why?
Univ. of Tehran
Distributed Operating Systems
19
Sleep and Wakeup

Problem with previous solutions



Busy waiting
Wasting CPU
Priority Inversion:



a high priority waits for a low priority to leave the critical section
the low priority can never execute since the high priority is not
blocked.
Solution: sleep and wakeup



When blocked, go to sleep
Wakeup when it is OK to retry entering the critical
section
Semaphore operation that executes sleep and wakeup
Univ. of Tehran
Distributed Operating Systems
20
Semaphores


A semaphore count represents count
number of abstract resources.
New variable having 2 operations




The Down (P) operation is used to acquire a
resource and decrements count.
The Up (V) operation is used to release a
resource and increments count.
Any semaphore operation is indivisible
(atomic)
Semaphores solve the problem of the
wakeup-bit
Univ. of Tehran
Distributed Operating Systems
21
What’s Up? What’s Down?

Definitions of P and V:
Down(S) {
while (S <= 0) { }; // no-op
S= S-1;
}
Up(S) {
S++;
}


Counting semaphores: 0..N
Binary semaphores: 0,1
Univ. of Tehran
Distributed Operating Systems
22
Possible Deadlocks with
Semaphores
Example:
P0
P1
share two semaphores S and Q
S:= 1; Q:=1;
Down(S); // S=0 ------------> Down(Q); //Q=0
Down(Q); // Q= -1 <---------------------------> Down(S); // S=-1
// P0 blocked
// P1 blocked
Up(S);
Up(Q);
DEADLOCK
Up(Q);
Up(S);
Univ. of Tehran
Distributed Operating Systems
23
Monitor


A simpler way to synchronize
A set of programmer defined operators
monitor monitor-name {
// variable declaration
public entry P1(..);
{... };
......
public entry Pn(..);
{...};
begin
initialization code
end
Univ. of Tehran
Distributed Operating Systems
24
Monitor Properties





The internal implementation of a monitor type
cannot be accessed directly by the various threads.
The encapsulation provided by the monitor type
limits access to the local variables only by the local
procedures.
Monitor construct does not allow concurrent access
to all procedures defined within the monitor.
Only one thread/process can be active within the
monitor at a time.
Synchronization is built in.
Univ. of Tehran
Distributed Operating Systems
25
Cooperating Processors via
Message Passing



IPC is best provided by a “messaging system”
Messaging system and shared memory system
are not mutually exclusive, they can be used
simultaneously within a single OS or single
process
Two basic operations:




Send (destination, &message)
Receive (source, &message)
Message size: Fixed or Variable size.
Real life analogy: conversation
Univ. of Tehran
Distributed Operating Systems
26
Message Passing
Univ. of Tehran
Distributed Operating Systems
27
Direct Communication


Binds the algorithm to Process name
Sender explicitly names the received or
receiver explicitly names the sender





Send(P,message)
Receive(Q,message)
Link is established automatically between
every pair of processes that want to
communicate
Processes must know about each other
identity
One
link per pairDistributed
of processes
Univ. of Tehran
Operating Systems
28
Indirect Communication
send(A,message) /* send a
message to mailbox A */
receive(A,message) /* receive
a message from mailbox A */
 Mailbox is an abstract object
into which a message can be
placed to or removed from.
 Mailbox is owned either by a
process or by the system
Univ. of Tehran
Distributed Operating Systems
29
Fast Mutual Exclusion for Uniprocessors

Describe restartable atomic sequences (an
optimistic mechanism for implementing atomic
operations on a uniprocessor)



Assumes that short, atomic sequences are
rarely interrupted.
Rely on a recovery mechanisms.
Performance improvements.
Univ. of Tehran
Distributed Operating Systems
30
Motivation of efficient
mutual-exclusion

Modern applications use multiple threads




As a program structuring device
As a mechanism for portability to multiprocessors
As a way to manage I/O and server concurrency
Many OSs are build on top of a microkernel


Many services are implemented as multithreaded
user-level applications
Even single threaded programs rely on basic OS
services that are implemented outside the kernel
Univ. of Tehran
Distributed Operating Systems
31
Implementing mutual
exclusion on a uniprocessor

Pessimistic methods




Memory-interlocked instruction
Software reservation
Kernel emulation
Restartable atomic sequences
Univ. of Tehran
Distributed Operating Systems
32
Memory-interlocked
instruction



Implicitly delays interrupts until the
instruction completes.
Require special hardware support from
the processor and bus.
The cycle time for an interlocked access is
several times greater than that for a noninterlocked access.
Univ. of Tehran
Distributed Operating Systems
33
Software reservation



Explicitly guards against arbitrary
interleaving.
A thread must register its intent to
perform an atomic operation, and then
wait.
Examples:



Dekker’s algorithm
Lamport’s algorithm
Peterson’s algorithm
Univ. of Tehran
Distributed Operating Systems
34
Kernel emulation




A strictly uniprocessor solution
Explicitly disables interrupts during
operations that must execute atomically.
Although requires no special hardware, its
runtime cost is high.
The kernel must be invoked on every
synchronization operation
Univ. of Tehran
Distributed Operating Systems
35
Restartable atomic
sequence



Instead of using a mechanism that guards
against interrupts, we can instead recognize
when an interrupt occurs and recover.
The recovery process: “restart the sequence”.
Are attractive because



Do not require hardware support.
Have a short code path with one load and store
per atomic read-modify-write.
Do not involve the kernel on every atomic
operation.
Univ. of Tehran
Distributed Operating Systems
36
Implementing restartable
atomic sequences


Require kernel support to ensure that a
suspended thread is resumed at the
beginning of the sequence.
Strategies for implementing kernel


Explicit registration in Mach
Designated sequences in Taos
Univ. of Tehran
Distributed Operating Systems
37
Explicit registration in
Mach



The kernel keeps track of each address
space’s restartable atomic sequence.
An application registers the starting address
and length of the sequence with kernel.
In response to the failure

Replace restartable atomic sequence with
conventional mechanisms code.
Univ. of Tehran
Distributed Operating Systems
38
Costs of explicit
registration

Cost of subroutine linkage


Because the kernel identifies restartable atomic
sequences by a single PC range per address
space, They cannot be inlined.
Cost of checking return PC


Kernel must check the return PC, whenever a
thread is suspended.
Make additional scheduling overhead worthwhile.
Univ. of Tehran
Distributed Operating Systems
39
Designated sequences in
Taos


The kernel must recognize every interrupted
sequence.
Uses two-stage check to recognize atomic
sequences.

1st: rejects most interrupted code sequences that
are not restartable.
(the opcode of the suspended instruction is used as an index
into a hash table containing instructions eligible to appear in
a restartable atomic sequence)

2nd: uses another table
Univ. of Tehran
(indexed by opcode)
Distributed Operating Systems
40
Kernel design
considerations



Cost of the two-stage check on every
thread switch
Placement of the PC check
Mutual exclusion in the kernel
Univ. of Tehran
Distributed Operating Systems
41
Placement of the PC check

When should the kernel check/adjust the PC of
a suspended thread?



When it is first suspended.
When it is about to be resumed.
Detection at user level



Whenever a suspended thread is resumed by the
kernel, it returns to a fixed user-level sequence.
Determine if the thread was suspended within a
restartable atomic sequence.
(complexity and overhead -&- save return address to
user-level stack at each suspension)
Univ. of Tehran
Distributed Operating Systems
42
Mutual exclusion in the
kernel


The kernel is itself a client of thread
management facilities.
Two events, can trigger a thread switching



Page fault
Thread preemption
Careless ordering of the PC check could lead to
mutual recursion between the thread scheduler
and the virtual memory system.
Univ. of Tehran
Distributed Operating Systems
43
The performance


“R.A.S.” via “Kernel Emulation” via “Software reservation”
Discuss performance at three levels



Basic overhead of various mechanisms.
Effect on the performance of common thread
management operations.
Effect of mutual exclusion overhead on the
performance of several application.
Univ. of Tehran
Distributed Operating Systems
44
Microbenchmarks

The performance is with test which enters
critical section (TSL) in a loop for 1M

Two version of Lamprot algorith (fast and meta)
Univ. of Tehran
Distributed Operating Systems
45
Thread management
overhead

Different thread management packages
Two thread
using mutex
and condition
variable
alternatively
Univ. of Tehran
Distributed Operating Systems
46
Application performance




afs-bench: file sys intensive like cp
Parthenon-n: theorem prover with n threads
Procon-64: producer-consumer
Thread suspensions: for R.A.S # of time to check
Univ. of Tehran
Distributed Operating Systems
47
Conclusions



R.A.S. represent a “common case”
approach to mutual exclusion on a
uniprocessor.
R.A.S. are appropriate for uniprocessors
that do not support memory-interlocked
atomic instructions.
Also on processors that do have hardware
support for synchronization, better
performance may be possible.
Univ. of Tehran
Distributed Operating Systems
48
Next Lecture
Distributed systems
 References



Read the first chapter of the book
Read “The Anatomy of the Grid: Enabling Scalable
virtual Organizations”
Univ. of Tehran
Distributed Operating Systems
49

On Optimistic Methods for Concurrency Control

Transcript On Optimistic Methods for Concurrency Control

Directory