Transcript ppt

Reliable Client-Server
Communication
Reliable Communication
• So far: Concentrated on process resilience (by
means of process groups).
• What about reliable communication channels?
• Error detection:
– Framing of packets to allow for bit error detection
– Use of frame numbering to detect packet loss
• Error correction:
– Add so much redundancy that corrupted packets can
be automatically corrected
– Request retransmission of lost, or last N packets
Reliable Communication
• Observation: Most of this work assumes pointto-point communication
– TCP reliable
– Mask omission failure (loss of messages)
– What if TCP connection breaks?
• High-level communication facilities.
Traditional RPC
Principle of RPC between a client and server program.
Remote Procedure Calls
• A remote procedure call occurs in the following steps:
1.
2.
The client procedure calls the client stub in the normal way.
The client stub builds a message and calls the local operating
system.
3. The client’s OS sends the message to the remote OS.
4. The remote OS gives the message to the server stub.
5. The server stub unpacks the parameters and calls the server.
6. The server does the work and returns the result to the stub.
7. The server stub packs it in a message and calls its local OS.
8. The server’s OS sends the message to the client’s OS.
9. The client’s OS gives the message to the client stub.
10. The stub unpacks the result and returns to the client.
RPC Failures
•
Five different classes of failures.
1.
2.
3.
4.
5.
Can’t find server.
Request message lost.
Server crashes after receiving request.
Reply message is lost.
Client crashes after receiving request.
Methods
• 1: no server -- report back to client
– Raise an exception.
– Lost transparency.
• 2: Lost Request -- resend message
– Start a timer, send another.
– Or is the server down?
3: Server Crashes
• Harder issue: Server can crash in two different points.
– Client can treat differently if known which case.
– But client only knows no rep, how it tell and act accordingly.
• Solution?
Server Crashes
• At least once: The server guarantees it will carry out an
operation at least once, no matter what. So keep trying
until a reply comes back.
• At most once: The server guarantees it will carry out an
operation at most once. So report failure immediately.
• No general solution for exactly once.
• Consider a print server that crashes and comes back up.
– Client sends a message, gets an ack.
– Server sends a completion message either right before or right
after.
– If crash, client can never reissue, always reissue, only reissue if
no ack, only reissue if there is an ack.
Print Server
• Three events that can happen at
the server:
1. Send the completion message
(M).
2. Print the text (P).
3. Crash (C).
Server Crashes
• These events can occur in six different orderings:
1. M →P →C: A crash occurs after sending the
completion message and printing the text.
2. M →C (→P): A crash happens after sending the
completion message, but before the text could be
printed.
3. P →M →C: A crash occurs after sending the
completion message and printing the text.
4. P→C(→M): The text printed, after which a crash
occurs before the completion message could be sent.
5. C (→P →M): A crash happens before the server could
do anything.
6. C (→M →P): A crash happens before the server could
do anything.
Server Crashes
• Server crashes and comes back up.
4: reply lost
• Detecting lost replies can be hard, because it can also be
that the server had crashed. You don’t know whether the
server has carried out the operation
• Solution:
– None, except that you can try to make your operations
idempotent: repeatable without any harm done if it happened to
be carried out before.
5: client crashes
• Problem: The server is doing work and holding
resources for nothing (called doing an orphan
computation).
– Orphan is killed (or rolled back) by client when it
reboots
– Broadcast new epoch number when recovering ⇒
servers kill orphans
– Require computations to complete in a T time units.
Old ones are simply removed.
• Question: What’s the rolling back for?