Fast Communication
Download
Report
Transcript Fast Communication
Fast Communication
Firefly RPC
Lightweight RPC
CS
614
Tuesday March 13, 2001
Jeff Hoy
Why Remote Procedure Call?
Simplify building distributed systems and
applications
Looks
like local procedure call
Transparent to user
Balance between semantics and
efficiency
Universal
programming tool
Secure inter-process communication
RPC Model
Client
Application
Client Stub
Client Runtime
Call
Server
Application
Return
Network
Server Stub
Server Runtime
RPC In Modern Computing
CORBA and Internet Inter-ORB Protocol (IIOP)
Each CORBA server object exposes a set of
methods
DCOM and Object RPC
Built on top of RPC
Java and Java Remote Method Protocol
(JRMP)
Interface exposes a set of methods
XML-RPC, SOAP
RPC over HTTP and XML
Goals
Firefly RPC
Inter-machine
Communication
Maintain Security and Functionality
Speed
Lightweight RPC
Intra-machine
Communication
Maintain Security and Functionality
Speed
Firefly RPC
Hardware
DEC
Firefly multiprocessor
1 to 5 MicroVAX CPUs per node
Concurrency considerations
10
megabit Ethernet
Takes advantage of 5 CPUs
Fast Path in a RPC
Transport Mechanisms
IP
/ UDP
DECNet byte stream
Shared Memory (intra-machine only)
Determined at bind time
Inside
transport procedures “Starter”,
“Transporter”, “Ender”, and “Receiver” for
the server
Caller Stub
Gets control from calling program
Calls
“Starter” for packet buffer
Copies arguments into the buffer
Calls “Transporter” and waits for reply
Copies result data onto caller’s result
variables
Calls “Ender” and frees result packet
Server Stub
Receives incoming packet
Copies
data into stack, a new data block,
or left in the packet
Calls server procedure
Copies result into the call packet and
transmit
Transport Mechanism
“Transporter” procedure
Completes
RPC header
Calls “Sender” to complete UDP, IP, and
Ethernet headers (Ethernet is the chosen
means of communication)
Invoke Ethernet driver via kernel trap and
queue the packet
Transport Mechanism
“Receiver” procedure
Server
thread awakens in “Receiver”
“Receiver” calls the stub interface included
in the received packet, and the interface
stub calls the procedure stub
Reply is similar
Threading
Client Application creates RPC thread
Server Application creates call thread
Threads
operate in server application’s
address space
No need to spawn entire process
Threads need to consider locking
resources
Threading
Performance Enchancements
Over traditional RPC
Stubs
marshal arguments rather than
library functions handling arguments
RPC procedures called through procedure
variables rather than by lookup table
Server retains call packet for results
Buffers reside in shared memory
Sacrifices abstract structure
Performance Analysis
Null() Procedure
No
arguments or return value
Measures base latency of RPC mechanism
Multi-threaded caller and server
Time for 10,000 RPCs
Base latency – 2.66ms
MaxResult latency (1500 bytes) – 6.35ms
Send and Receive Latency
Send and Receive Latency
With larger packets, transmission time
dominates
Overhead
becomes less of an issue
Good for Firefly RPC, assuming large
transmission over network
Is overhead acceptable for intra-machine
communication?
Stub Latency
Significant overhead for small packets
Fewer Processors
Seconds for 1,000 Null() calls
Fewer Processors
Why the slowdown with one processor?
Fast
path can be followed only in
multiprocessor environment
Lock conflicts, scheduling problems
Why little speedup past two processors?
Future Improvements
Hardware
Faster network will help larger packets
Triple CPU speed will reduce Null() time by 52%
and MaxResult by 36%
Software
Omit IP and UDP headers for Ethernet datagrams,
2~4% gain
Redesign RPC protocol ~ 5% gain
Busy thread wait, 10~15% gain
Write more in assembler, 5~10% gain
Other Improvements
Firefly RPC handles intra-machine
communication through the same
mechanisms as inter-machine
communication
Firefly RPC also has very high overhead
for small packets
Does this matter?
RPC Size Distribution
Majority of RPC transfers under 200 bytes
Frequency of Remote Activity
Most calls are to the same machine
Traditional RPC
Most calls are small messages that take
place between domains of the same
machine
Traditional RPC contains unnecessary
overhead, like
Scheduling
Copying
Access
validation
Lightweight RPC (LRPC)
Also written for the DEC Firefly system
Mechanism for communication between
different protection domains on the
same system
Significant performance improvements
over traditional RPC
Overhead Analysis
Theoretical minimum to invoke Null()
across domains: kernal trap + context
change to call and a trap + context
change to return
Theoretical minimum on Firefly RPC: 109
us.
Actual cost: 464us
Sources of Overhead
355us added
Stub
overhead
Message buffer overhead
Not so much in Firefly RPC
Message
transfer and flow control
Scheduling and abstract threads
Context Switch
Implementation of LRPC
Similar to RPC
Call to server is done through kernel trap
Kernel
validates the caller
Servers export interfaces
Clients bind to server interfaces before
making a call
Binding
Servers export interfaces through a clerk
The
clerk registers the interface
Clients bind to the interface through a call
to the kernel
Server replies with an entry address and
size of its A-stack
Client gets a Binding Object from the
kernel
Calling
Each procedure is represented by a stub
Client makes a call through the stub
Manages
A-stacks
Traps to the kernel
Kernel switches context to the server
Server returns by its own stub
No verification needed
Stub Generation
Procedure representation
Call
stub for client
Entry stub for server
LRPC merges protocol layers
Stub generator creates run-time stubs in
assembly language
Portability
sacrificed for Performance
Falls back on Modula2+ for complex calls
Multiple Processors
LRPC caches domains on idle processors
Kernel
checks for an idling processor in the
server domain
If a processor is found, caller thread can
execute on the idle processor without
switching context
Argument Copying
Traditional RPC copies arguments four
times for intra-machine calls
Client
stub to RPC message to kernel’s
message to server’s message to server’s
stack
In many cases, LRPC needs to copy the
arguments only once
Client
stub to A-stack
Performance Analysis
LRPC is roughly three times faster than
traditional RPC
Null() LRPC cost: 157us, close to the
109us theoretical minimum
Additional
overhead from stub generation
and kernel execution
Single-Processor Null() LRPC
Performance Comparison
LRPC versus traditional RPC (in us)
Multiprocessor Speedup
Inter-machine Communication
LRPC is best for messages between
domains on the on the same machine
The first instruction of the LRPC stub
checks if the call is cross-machine
If
so, stub branches to conventional RPC
Larger messages are handled well, LRPC
scales by packet size linearly like
traditional RPC
Cost
LRPC avoids needless scheduling,
copying, and locking by integrating the
client, kernel, server, and message
protocols
Abstraction
is sacrificed for functionality
RPC is built into operating systems (Linux
DCE RPC, MS RPC)
Conclusion
Firefly RPC is fast compared to most RPC
implementations. LRPC is even faster.
Are they fast enough?
“The performance of Firefly RPC is now
good enough that programmers accept
it as the standard way to communicate”
(1990)
Is
speed still an issue?