Transcript ppt

The Synergy Between Non-blocking
Synchronization and Operating System
Structure
By Michael Greenwald and David Cheriton
Presented by Jonathan Walpole
CS510 – Advanced Operating Systems
1
The Cache Kernel





Yet Another Attempt at a µ-kernel
Minimal privileged mode code: Caching model of
kernel functionality
OS functionality in user-mode class libraries
allowing application-customizablility of OS
Memory and signal-based communication
Goal: scalable, robust, flexible operating system
design
CS510 – Advanced Operating Systems
2
Synergy of NBS and kernel structure

Claim: NBS and Good Design go together in the
Cache Kernel design and implementation
o
o
NBS allows better OS structuring
Good OS structure simplifies NBS
CS510 – Advanced Operating Systems
3
Non-blocking Synchronization

Basic idea
o
o
o
o

Associate version number with data structure
Read version number at start of critical section
Atomically check and increment version number at end of
critical section at the same time as applying the update(s),
using a Double Compare and Swap (DCAS) instruction.
Try again on failure
Similar to Synthesis approach
CS510 – Advanced Operating Systems
4
Example: Deletion from Linked List
do {
retry:
backoffIfNeeded();
version = list->version;
/* Read version # */
for (p = list->head;(p->next != elt); p = p->next) {
if (p == NULL) {
/* Not found */
if (version != list->version)
{ goto retry; }
/* List changed */
return NULL;
/* Really not found */
}
}
} while( !DCAS(
&(list->version),
&(p->next),
version,
elt,
version+1,
elt->next
)
)
CS510 – Advanced Operating Systems
5
Double Compare and Swap
int DCAS(
int *addr1,
int old1,
int new1,
int *addr2,
int old2,
int new2)
{
<begin atomic>
if ((*addr1 == old1) && (*addr2 == old2)) {
*addr1 = new1;
*addr2 = new2;
return(TRUE);
} else {
return(FALSE);
}
<end atomic>
}
CS510 – Advanced Operating Systems
6
Problem …

What happens if someone else deletes an element
while we are traversing the list?
o
o
o
o

What if we end up traversing into the free pool?
What if the memory is reused?
What if we end up in a different data structure?
What if the memory is reused for a different type?
How can we avoid these problems?
CS510 – Advanced Operating Systems
7
Solutions to reader hijacking


If memory is reclaimed immediately upon updates,
readers must protect themselves from hijacking
Possible solution 1?
o

Readers use a load-linked / store-conditional (LL/SC)
sequence to detect changed pointers at the end of their
critical sections … or at the end of each use of a pointer
Solution 2:
o
Readers verify that version numbers associated with data
structures have not changed (2 memory barriers) at the
end of each use of a pointer
CS510 – Advanced Operating Systems
8
Type-stable memory management (TSM)

Descriptor that is allocated as type T is guaranteed
to remain of type T for at least tstable
o
o
o
I.e. cannot switch quickly from T1 to T2 by reallocation
Restrict or prevent memory reuse???
Generalization of existing technique:
•

Advantage
o

e.g. process-descriptors are statically allocated at system init,
and are type-stable for lifetime of system
T * ptr remains a pointer to an instance of T even if ptr is
changed asynchronously by another process
Problem
o
Unacceptable restriction on memory reuse for production
systems
CS510 – Advanced Operating Systems
9
TSM aids Non-Blocking Synchronization

Type stability ensures safety of pointer
dereferences.
o
Without TSM, delete example is too big to fit on slide
•
o
o
And very expensive to execute
Need to check for changes on each loop iteration
TSM makes NBS code simpler and faster
CS510 – Advanced Operating Systems
10
Other benefits of NBS in Cache Kernel


Signals are only form of IPC in Cache Kernel
NBS simplifies synchronization in signal handlers
o
Makes it easier to write efficient signal-safe code
CS510 – Advanced Operating Systems
11
Contention Minimizing Data Structures (CMDS)

Localization-based structuring used in Cache kernel:
o
o
o

Replication: Per-processor structures (ie. run queue)
Extra level of read-mostly hierarchy (ie. hash tables with
per bucket synchronization)
Cache block alignment of descriptors
Well-known techniques used in other systems
(Tornado, Synthesis, …)
CS510 – Advanced Operating Systems
12
Benefits of CMDS

Minimizes logical and physical contention
o
o


Reduces lock conflicts/convoys in lock-based system
Reduces synchronization contention with NBS
o

Minimizes (memory) consistency overhead
Minimizes false sharing
fewer retries from conflict at point of DCAS
CMDS is good for locks, cache consistency and NBS!
o
NBS needs CMDS
CS510 – Advanced Operating Systems
13
Minimizing the Window of Inconsistency
Another general strategy to improve performance
1) Delay writes, and group together at end
2) Find intermediate consistent states
3) Keep log to allow undo/backout

Small W of I reduces probability of leaving
inconsistent data structures after failure.
CS510 – Advanced Operating Systems
14
Minimizing the Window of Inconsistency




Preemption-safe: Easy to back-out of a critical
section
Reduces lock hold time, and thus, contention (in lockbased systems)
Less probability of failure leaving inconsistency
Good for system design whether you use blocking or
non-blocking synchronization!
CS510 – Advanced Operating Systems
15
Does NBS minimize the window of inconsistency?

What is the relationship between window of
inconsistency and probability of conflict
(contention)?
CS510 – Advanced Operating Systems
16
Priority Inversion Issues

NBS allows synchronization to be subordinate to
scheduling
o
o

It avoids priority inversion problem of locks
Also, page fault,I/O, and other blocking
Highest priority process always makes progress
CS510 – Advanced Operating Systems
17
Why NBS is good for OS structure

Fail-stop safe: OS class libraries tolerant of user
threads being terminated.
o



Most Cache Kernel OS functionality implemented in usermode class libraries.
Portable: same code on uniprocessor, multiprocessor,
and signal handlers.
Deadlock free (almost)
NBS supports class library implementation of OS
functions
CS510 – Advanced Operating Systems
18
Implementing non-blocking Synchronization

Basic approach
o
o
o

Read version number
Rely on TSM for type safety of pointers
Increment version number and check with every
modification (abort if changed).
Straightforward transformation from locking
o
o
o
… so long as you have DCAS and can tolerate TSM
Replace acquire with read of version number
Replace release with DCAS …
CS510 – Advanced Operating Systems
19
Implementing non-blocking Synchronization (cont.)

Variants of the basic approach:
o
o

Optimization:
o
o

N reads, 1 write: No backout
2 reads, 2 writes: No version number
Cache based advisory locking avoids contention (& “useless
parallelism”)
Uses Cload instruction
In the Cache kernel, every case of synchronization
falls into the special cases
CS510 – Advanced Operating Systems
20
Complexity and Correctness

DCAS reduces size of algorithms (lines of code) by
75% compared to CAS, for linked lists, queues and
stacks
o

Special case used of DCAS reduce complexity further
Relatively straightforward transformation from
locking
o
Similar code size
CS510 – Advanced Operating Systems
21
Performance Issues


Simulation-based study
With non-preemptive scheduling:
o
o

DCAS-based NBS almost as fast as spin-locks
CAS-based NBS slower
With preemptive scheduling:
o
DCAS and CAS-based NBS better than spin-locks
CS510 – Advanced Operating Systems
22
Conclusions

“Good OS structure” can support Non-blocking synchronization
o
o
o

Non-blocking synchronization can support convenient OS
structure
o
o
o

Type-Stable Mem. Mgmt (TSM)
Data Structures that Minimize Contention (CMDS)
Minimizing the window of inconsistency
Avoids deadlock; allows signals as sole IPC
Fault tolerant; enables functionality to be moved from kernel.
Performance; isolates scheduling from synch.
Strong synergy between non-blocking synch & good OS
structure.
CS510 – Advanced Operating Systems
23
Advantages of Non-blocking Synchronization

Non-blocking:
o

Portability:
o

Minimizes interference between synchronization and
process scheduling
Recovery:
o

same code on uni and multiprocessors and signal handlers
Performance:
o

Signal handlers don’t deadlock
Insulation from process failures
So why isn’t it universally deployed?
CS510 – Advanced Operating Systems
24
Obstacles to deployment

Complexity:
o

Correctness:
o

hard to convince that there are no subtle bugs
Performance:
o

confusing to design and write efficient algorithms
increased contention, high overhead
Unfamiliarity and insufficient system support
CS510 – Advanced Operating Systems
25