Transcript ppt

User-Level Interprocess
Communication for Shared
Memory Multiprocessors
Brian N. Bershad, Thomas E.
Anderson, Edward D. Lazowska, and
Henry M. Levy
Presented by Arthur Strutzenberg
Interprocess Communication
• In the LRPC paper/presentation, it
discussed the need for
– Failure Isolation
– Extensibility
– Modularity
• Usually a balance between the 3 needs
and performance
• This will is a central theme for this paper
as well.
Interprocess Communication
• Traditionally this is the responsibility of the
Kernel
– This suffers from two problems
• Architectural Performance
• Interaction between kernel based communication and user
level threads
– Generally designers use a pessimistic (non
cooperative) approach
• This begs the following question
“How can you have your cake and eat it too?”
Interprocess Communication
• What if the Communication
layer is extracted out of the
kernel, and made part of the
User level
• This can increase performance
by allowing
– Messages sent between
address spaces directly
– Elimination of unnecessary
processor reallocation
– Amortization (processor
reallocation (when needed) is
spread over several
independent calls)
– Parallelism in message
passing is exploited
Application
Application
Stub
Stub
URPC
URPC
Message Channel
Fast Threads
Fast Threads
Kernel
User-Level Remote Procedure Call
(URPC)
• Allows communication between address
spaces without kernel mediation
• Isolates
– Processor Reallocation
– Thread Management
– Data Transfer
• Kernel is ONLY responsible for allocating
processors to the address space
URPC & Communication
• Application OS Communication typically is
– Narrow Channel (Ports)
– Limited Number of Operations
•
•
•
•
Create
Send
Receive
Destroy
• Most modern OS have support for RPC
URPC & Communication
• What does this buy URPC?
– RPC is generally limited in definition about
how the channels of communication operate
– Also the definition generally does not specify
how processor scheduling (reallocation) will
interact with the data transfer
URPC & Communication
• URPC exploits this information by
– Messages passed through logical channels
are kept in memory that is shared between
client and server
• This memory once allocated is kept intact
– Thread management is User Level
(lightweight instead of “Kernel weight”)
• (Haven’t we read this in another paper?)
URPC & Thread Management
• There is less overhead involved in
switching a processor to another thread in
the same address space (context
switching) versus reallocating it to another
thread in a different address space
(Processor Reallocation)
– URPC uses this along with the user level
scheduler to always give preference to
threads within the same address space
URPC & Thread Management
• Some numbers for comparison:
– A context switch within the address space
• 15 microseconds
– A processor reallocation
• 55 microseconds
URPC & Processor Allocation
• What happens when a client invokes a
procedure on a server process and the server
has no processors allocated to it?
– URPC calls this “underpowered”
– The paper identifies this as a load balancing problem
– The solution is reallocation from client to server
• A client with an idle processor can elect to reallocate the idle
processor to the server
• This is not without cost, as this is expensive and requires a
call to the kernel
Rationale for URPC
• The design of the
URPC package
presented in this
paper has three
main
components
– Thread
Management
– Data Transfer
– Processor
Reallocation
Application
Application
Stub
Stub
URPC
URPC
Message Channel
Fast Threads
Fast Threads
Kernel
Lets kill two birds with one stone
• URPC uses an “optimistic reallocation
policy” which makes the following
assumptions
– The Client will always have other work to do
– The server will (soon) have a processor
available to service messages
• This leads to the “amortization of cost”
– The cost of a processor reallocation is spread
over several calls
Why the optimistic approach
doesn’t always hold
• This approach does not work as well when the
application
–
–
–
–
Runs as a single thread
Is Real time
Has high latency I/O
Priority Invocations
• URPC handles this by allowing the client’s
address space to force a processor reallocation
to the server’s even though there might still be
work to do
The Kernel handles Processor
Reallocation
• URPC handles this through call called
“Processor.Donate”
• This passes control of an idle processor
down to the kernel, and then back up to a
specified address in the receiving space
Voluntary Return of Processors
• The policy of URPC on its server
processors is
“…Upon receipt of a processor from a client
address, return the processor when all
outstanding messages from the client
have generated replies, or when the
server determines that the client has
become ‘underpowered’….”
Parallels to the User Threads
Paper
• Even though URPC implement a policy/protocol,
there is absolutely no way to enforce it. This has
the potential to lead to some interesting side
effects.
• This is extremely similar to some of the
problems discussed in the User Threads paper
– For example, a server thread could conceivably
continue to hold a donated processor and handle
requests from other clients
What this leads to…
• One word: STARVATION
– URPC handles this by only directly reallocating
processors to load balance.
• In other words, the system also needs the notion
of preemptive reallocation
– The Preemptive reallocation must also adhere to
• No higher priority thread waits while a lower priority thread
runs
• No processor idles when there is work for it to do (even if the
work is in another address space)
Controlling Channel Access
• Data flows in URPC involving different
address spaces use a bidirectional shared
memory queue. The queues have a test
and set lock on either end, which the
papers specifically state must be NON
SPINNING
– The protocol is, if the lock is free, acquire it,
otherwise go on and do something else
– Remember this protocol operates under the
assumption that there is always work to do!!
Data Transfer Using Shared
Memory
• There is still the risk of what the paper
refers to as the “abusability factor” with
RPC, where Clients & Servers can
– Overload each other
– Deny service
– Provide bogus results
– Violate communication protocols
• URPC passes the responsibility to handle
this off to the stubs.
Cross-Address Space Procedure
Call and Thread Management
• This section of the paper identifies that there is a
correspondence between
Send  Receive
(messaging)
And
Start  Stop
(Threads)
• Does this not remind everybody of a classic
paper that we had to read?
Another link to the User Threads
Paper
• Additionally the paper identifies three
arguments with the thread—message
relationship
– High performance thread management
facilities are needed for fine-grained parallel
programs
– High performance can only be provided at the
user level
– The close interaction between communication
and thread management can be exploited
URPC Performance
• Some comparisons:
(values are in microseconds)
URPC Fast
Threads
Taos Threads
Ratio of Taos Cost
to URPC Cost
Procedure Call
7
7
1.0
Fork
43
1192
27.7
Fork;Join
102
1574
15.4
Yield
37
57
1.5
Acquire, Release
27
27
1.0
PingPong
53
271
5.1
Test
URPC Performance
• URPC can be broken
down into 4
components
–
–
–
–
Send
Poll
Receive
Dispatch
(values are in microseconds)
Component
Client
Server
Poll
18
13
Send
6
6
Receive
10
9
Dispatch
20
25
Total
54
53
Call Latency and Throughput
• Call Latency is the time from which a thread
calls into the stub until control returns from the
stub.
• These are load dependent, and depend on
– Number of Client Processors (C)
– Number of Server Processors (S)
– Number of runnable threads in the client’s Address
Space (T)
• The graphs measure how long it takes to make
100,000 “Null” procedure calls into the server in
a “tight loop”
Call Latency and Throughput
Conclusions
• In certain circumstances, it makes sense
to move the Communication layer from the
kernel to user space.
• Most OS’s are designed for a uniprocessor
system, and are ported over to an SMMP
system.
– URPC is one example of a system that is
designed for SMMP directly, and takes direct
advantage of the characteristics of the system
Conclusions
• As a lead in to Professor Walpoles Discussion
and Q&A, lets conclude by trying to fill out the
following table:
RPC Type
Generic RPC
LRPC
URPC
Similarities
Differences