Thread implementation issues
Download
Report
Transcript Thread implementation issues
Multi-processor Scheduling
Two implementation choices
Single, global ready queue
Per-processor run queue
Which is better?
Queue-per-processor
Advantages of queue per processor
Promotes processor affinity (better cache locality)
Removes a centralized bottleneck
Which runs in global memory
Supported by default in Linux 2.6
Java 1.6 support: a double-ended queue
(java.util.Deque)
Use a bounded buffer per consumer
If nothing in a consumer’s queue, steal work from
somebody else
If too much in the queue, push work somewhere else
Thread Implementation Issues
Andrew Whitaker
Where do Threads Come From?
A few choices:
The operating system
A user-mode library
Some combination of the two…
Option #1: Kernel Threads
Threads implemented inside the OS
Thread operations (creation, deletion,
yield) are system calls
Scheduling handled by the OS scheduler
process
Described as “one-to-one”
One user thread mapped to one
kernel thread
Every invocation of Thread.start()
creates a kernel thread
OS threads
Option #2: User threads
Implemented as a library inside
a process
All operations (creation,
destruction, yield) are normal
procedure calls
Described as “many-to-one”
Many user-perceived threads map
to a single OS process/thread
process
OS thread
Process Address Space Review
Every process has a
user stack and a
program counter
In addition, each
process has a kernel
stack and program
counter
(not shown here)
stack
SP
heap
(dynamic allocated mem)
static data
(data segment)
code
(text segment)
PC
Threaded Address Space
Every thread always
has its own user stack
and program counter
For both user, kernel
threads
For user threads, there
is only a single kernel
stack, program
counter, PCB, etc.
User address space (for both user
and kernel threads)
thread 1 stack
SP (T1)
thread 2 stack
SP (T2)
thread 3 stack
SP (T3)
heap
(dynamic allocated mem)
static data
(data segment)
code
(text segment)
PC (T2)
PC (T1)
PC (T3)
User Threads vs. Kernel Threads
User threads are faster
Operations do not pass through the OS
But, user threads suffer from:
Lack of physical parallelism
Only run on a single processor!
Poor performance with I/O
A single blocking operation stalls the entire
application
For these reasons, most (all?) major OS’s
provide some form of kernel threads
When Would User Threads Be
Useful?
The calculator?
The web server?
The Fibonacci GUI?
Option #3: Two-level Model
OS supports native multithreading
And, a user library maps multiple
user threads to a single kernel
thread
“Many-to-many”
Potentially captures the best of
both worlds
Cheap thread operations
Parallelism
process
OS
threads
Problems with Many-to-Many
Threads
Lack of coordination between user and
kernel schedulers
“Left hand not talking to the right”
Specific problems
Poor performance
e.g., the OS preempts a thread holding a crucial lock
Deadlock
Given K kernel threads, at most K user threads can
block
• Other runnable threads are starved out!
Scheduler Activations, UW 1991
Add a layer of communication between kernel
and user schedulers
Examples:
Kernel tells user-mode that a task has blocked
User scheduler can re-use this execution context
Kernel tells user-mode that a task is ready to resume
Allows the user scheduler to alter the userthread/kernel-thread mapping
Supported by newest release of NetBSD
Implementation Spilling Over into the
Interface
In practice, programmers have
learned to live with expensive
kernel threads
For example, thread pools
Re-use a static set of threads
throughout the lifetime of the
program
Locks
interface Lock {
public void acquire();
// only one thread allowed between an
// acquire and a release
public void release();
}
Used for implementing critical sections
Modern languages (Java, C#) implicitly
acquire and release locks
Two Varieties of Locks
Spin locks
Threads busy wait until the lock is freed
Thread stays in the ready/running state
Blocking locks
Threads yield the processor until the lock is
freed
Thread transitions to the blocked state
Why Use Spin Locks?
Spin Locks can be faster
No context switching required
Sometimes, blocking is not an option
For example, in the kernel scheduler
implementation
Spin locks are never used on a
uniprocessor
Bogus Spin Lock Implementation
1.
2.
3.
4.
5.
6.
class SpinLock implements Lock {
private volatile boolean isLocked = false;
public void acquire() {
while (isLocked) { ; } // busy wait
isLocked = true;
}
7.
public void release() {
8.
isLocked = false;
9.
}
10. }
Multiple threads can
acquire this lock!
Hardware Support for Locking
Problem: Lack of atomicity in testing and
setting the isLocked flag
Solution: Hardware-supported atomic
instructions
e.g., atomic test-and-set
Java conveniently abstracts these
primitives (AtomicInteger, and friends)
Corrected Spin Lock
1.
2.
3.
class SpinLock implements Lock {
private final AtomicBoolean isLocked =
new AtomicBoolean (false);
4.
5.
6.
7.
public void acquire() {
// get the old value, set a new value
while (isLocked.getAndSet(true)) { ; }
}
8.
public void release() {
9.
assert (isLocked.get() == true);
10.
isLocked.set(false);
11. }
12. }
Blocking Locks: Acquire
Implementation
Atomically test-and-set locked status
If lock is already held:
Set thread state to blocked
Add PCB (task_struct) to a wait queue
Invoke the scheduler
Problem: must ensure
thread-safe access to
the wait queue!
Disabling Interrupts
Prevents the processor from being
interrupted
Serves as a coarse-grained lock
Must be used with extreme care
No I/O or timers can be processed
Thread-safe Blocking Locks
Atomically test-and-set locked status
If lock is already held:
Set thread state to blocked
Disable interrupts
Add PCB (task_struct) to a wait queue
Invoke the scheduler
Next task re-enables interrupts
Disabling Interrupts on a
Multiprocessor
Disabling interrupts can be done locally or
globally (for all processors)
Global disabling is extremely heavyweight
Linux: spin_lock_irq
Disable interrupts on the local processor
Grab a spin lock to lock out other processors
Preview For Next Week
1.
public class Example extends Thread {
2.
3.
4.
private static int x = 1;
private static int y = 1;
private static boolean ready = false;
5.
6.
7.
8.
9.
10.
11.
12.
public static void main(String[] args) {
Thread t = new new Example();
t.start();
x = 2;
y = 2;
ready = true;
}
13.
public void run() {
14.
while (! ready)
15.
Thread.yield(); // give up the processor
16.
System.out.println(“x= “ + x + “y= “ + y);
17.
}
18. }
What Does This Program Print?
Answer: it’s a race condition. Many
different outputs are possible
x=2, y=2
x=1,y=2
x=2,y=1
x=1,y=1
Or, the program may print nothing!
The ready loop runs forever