Thread implementation issues

Download Report

Transcript Thread implementation issues

Multi-processor Scheduling
 Two implementation choices
Single, global ready queue
Per-processor run queue
 Which is better?
Queue-per-processor
 Advantages of queue per processor
 Promotes processor affinity (better cache locality)
 Removes a centralized bottleneck
 Which runs in global memory
 Supported by default in Linux 2.6
 Java 1.6 support: a double-ended queue
(java.util.Deque)
Use a bounded buffer per consumer
If nothing in a consumer’s queue, steal work from
somebody else
If too much in the queue, push work somewhere else
Thread Implementation Issues
Andrew Whitaker
Where do Threads Come From?
A few choices:
The operating system
A user-mode library
Some combination of the two…
Option #1: Kernel Threads
 Threads implemented inside the OS
 Thread operations (creation, deletion,
yield) are system calls
 Scheduling handled by the OS scheduler
process
 Described as “one-to-one”
One user thread mapped to one
kernel thread
Every invocation of Thread.start()
creates a kernel thread
OS threads
Option #2: User threads
Implemented as a library inside
a process
All operations (creation,
destruction, yield) are normal
procedure calls
Described as “many-to-one”
Many user-perceived threads map
to a single OS process/thread
process
OS thread
Process Address Space Review
Every process has a
user stack and a
program counter
In addition, each
process has a kernel
stack and program
counter
(not shown here)
stack
SP
heap
(dynamic allocated mem)
static data
(data segment)
code
(text segment)
PC
Threaded Address Space
 Every thread always
has its own user stack
and program counter
For both user, kernel
threads
 For user threads, there
is only a single kernel
stack, program
counter, PCB, etc.
User address space (for both user
and kernel threads)
thread 1 stack
SP (T1)
thread 2 stack
SP (T2)
thread 3 stack
SP (T3)
heap
(dynamic allocated mem)
static data
(data segment)
code
(text segment)
PC (T2)
PC (T1)
PC (T3)
User Threads vs. Kernel Threads
User threads are faster
Operations do not pass through the OS
But, user threads suffer from:
Lack of physical parallelism
Only run on a single processor!
Poor performance with I/O
A single blocking operation stalls the entire
application
For these reasons, most (all?) major OS’s
provide some form of kernel threads
When Would User Threads Be
Useful?
The  calculator?
The web server?
The Fibonacci GUI?
Option #3: Two-level Model
 OS supports native multithreading
 And, a user library maps multiple
user threads to a single kernel
thread
 “Many-to-many”
 Potentially captures the best of
both worlds
Cheap thread operations
Parallelism
process
OS
threads
Problems with Many-to-Many
Threads
Lack of coordination between user and
kernel schedulers
“Left hand not talking to the right”
Specific problems
Poor performance
e.g., the OS preempts a thread holding a crucial lock
Deadlock
Given K kernel threads, at most K user threads can
block
• Other runnable threads are starved out!
Scheduler Activations, UW 1991
 Add a layer of communication between kernel
and user schedulers
 Examples:
Kernel tells user-mode that a task has blocked
 User scheduler can re-use this execution context
Kernel tells user-mode that a task is ready to resume
 Allows the user scheduler to alter the userthread/kernel-thread mapping
 Supported by newest release of NetBSD
Implementation Spilling Over into the
Interface
In practice, programmers have
learned to live with expensive
kernel threads
For example, thread pools
Re-use a static set of threads
throughout the lifetime of the
program
Locks
interface Lock {
public void acquire();
// only one thread allowed between an
// acquire and a release
public void release();
}
Used for implementing critical sections
Modern languages (Java, C#) implicitly
acquire and release locks
Two Varieties of Locks
Spin locks
Threads busy wait until the lock is freed
Thread stays in the ready/running state
Blocking locks
Threads yield the processor until the lock is
freed
Thread transitions to the blocked state
Why Use Spin Locks?
Spin Locks can be faster
No context switching required
Sometimes, blocking is not an option
For example, in the kernel scheduler
implementation
Spin locks are never used on a
uniprocessor
Bogus Spin Lock Implementation
1.
2.
3.
4.
5.
6.
class SpinLock implements Lock {
private volatile boolean isLocked = false;
public void acquire() {
while (isLocked) { ; } // busy wait
isLocked = true;
}
7.
public void release() {
8.
isLocked = false;
9.
}
10. }
Multiple threads can
acquire this lock!
Hardware Support for Locking
Problem: Lack of atomicity in testing and
setting the isLocked flag
Solution: Hardware-supported atomic
instructions
e.g., atomic test-and-set
Java conveniently abstracts these
primitives (AtomicInteger, and friends)
Corrected Spin Lock
1.
2.
3.
class SpinLock implements Lock {
private final AtomicBoolean isLocked =
new AtomicBoolean (false);
4.
5.
6.
7.
public void acquire() {
// get the old value, set a new value
while (isLocked.getAndSet(true)) { ; }
}
8.
public void release() {
9.
assert (isLocked.get() == true);
10.
isLocked.set(false);
11. }
12. }
Blocking Locks: Acquire
Implementation
Atomically test-and-set locked status
If lock is already held:
Set thread state to blocked
Add PCB (task_struct) to a wait queue
Invoke the scheduler
Problem: must ensure
thread-safe access to
the wait queue!
Disabling Interrupts
Prevents the processor from being
interrupted
Serves as a coarse-grained lock
Must be used with extreme care
No I/O or timers can be processed
Thread-safe Blocking Locks
Atomically test-and-set locked status
If lock is already held:
Set thread state to blocked
Disable interrupts
Add PCB (task_struct) to a wait queue
Invoke the scheduler
Next task re-enables interrupts
Disabling Interrupts on a
Multiprocessor
Disabling interrupts can be done locally or
globally (for all processors)
Global disabling is extremely heavyweight
Linux: spin_lock_irq
Disable interrupts on the local processor
Grab a spin lock to lock out other processors
Preview For Next Week
1.
public class Example extends Thread {
2.
3.
4.
private static int x = 1;
private static int y = 1;
private static boolean ready = false;
5.
6.
7.
8.
9.
10.
11.
12.
public static void main(String[] args) {
Thread t = new new Example();
t.start();
x = 2;
y = 2;
ready = true;
}
13.
public void run() {
14.
while (! ready)
15.
Thread.yield(); // give up the processor
16.
System.out.println(“x= “ + x + “y= “ + y);
17.
}
18. }
What Does This Program Print?
Answer: it’s a race condition. Many
different outputs are possible
x=2, y=2
x=1,y=2
x=2,y=1
x=1,y=1
Or, the program may print nothing!
The ready loop runs forever