Chapter 5: End-to-End Protocols

Transcript Chapter 5: End-to-End Protocols

Computer Networks: A Systems Approach, 5e
Larry L. Peterson and Bruce S. Davie
Chapter 5
End-to-End Protocols
Copyright © 2010, Elsevier Inc. All rights Reserved
1
• How to turn this host-to-host packet delivery service
into a process-to-process communication channel
Chapter 5
4
Problem
2
Chapter 5
4
End-to-end Protocols
• Application-level processes that use end-to-end protocol
services have certain requirements
• E.g. Common properties that a transport protocol can
be expected to provide
–
–
–
–
–
–
–
Guarantees message delivery
Delivers messages in the same order they were sent
Delivers at most one copy of each message
Supports arbitrarily large messages
Supports synchronization between the sender and the receiver
Allows the receiver to apply flow control to the sender
Supports multiple application processes on each host
3
• Typical limitations of the network on which transport
protocol will operate
–
–
–
–
–
Chapter 5
4
End-to-end Protocols
Drop messages
Reorder messages
Deliver duplicate copies of a given message
Limit messages to some finite size
Deliver messages after an arbitrarily long delay
Such a network is said to provide a best-effort level of service.
4
• Challenge for Transport Protocols
Chapter 5
4
End-to-end Protocols
– Develop algorithms that turn the less-than-desirable
properties of the underlying network into the high level of
service required by application programs
5
Chapter 5
4
End-to-End Protocols
TCP
6
Chapter 5
4
Simple Demultiplexer (UDP)
• Extends host-to-host delivery service of the underlying
network into a process-to-process communication
service
• Adds a level of demultiplexing which allows multiple
application processes on each host to share the network
7
Chapter 5
4
Simple Demultiplexer (UDP)
• Sender: multiplex of UDP datagrams.
– UDP datagrams are received from multiple application programs.
– A single sequence of UDP datagrams is passed to the IP layer.
• Receiver: demultiplex of UDP datagrams.
– A single sequence of UDP datagrams is received from the IP layer.
– Each UDP datagram received is passed to appropriate application program.
Port 1
Port 2
Port 3
UDP
Demultiplexing based on port
UDP Datagram arrival at receiver
IP layer
8
Processes
UDP
Multiplexer
IP
Chapter 5
4
• UDP provides application multiplexing (via port numbers)
Processes
UDP
Demultiplexer
IP
9
Chapter 5
4
Simple Demultiplexer (UDP)
Format for UDP header
10
• Multiplexing and demultiplexing is only based on
port numbers.
Chapter 5
4
Simple Demultiplexer (UDP)
• Port can be used for sending a UDP datagram to
other hosts.
• Port can be used for receiving a UDP datagram
from other hosts.
11
Chapter 5
4
Simple Demultiplexer (UDP)
• Typically, a port is implemented by a
message queue (buffer)
UDP Message Queue
• Upon message receipt:
• When a message arrives, the
protocol (e.g., UDP) appends the
message to the end of the queue
• If the queue is full, the message is
discarded
• Message Process:
• When an application process wants
to receive a message, one is removed
from the front of the queue.
• Application processthen processes
the message.
12
• How does a process learns the port for the
process to which it wants to send a message?
Chapter 5
4
Simple Demultiplexer (UDP)
– Typically, a client process initiates a message exchange with a
server process.
– Once a client has contacted a server, the server knows the
client’s port (from the SrcPrt field contained in the message
header) and can reply to it.
13
• How does the client learns the server’s port in the first
place?
Chapter 5
4
Simple Demultiplexer (UDP)
– A common approach is for the server to accept messages at a
well-known port.
– i.e each server receives its messages at some fixed port that is
widely published (Static Binding)
• Like well-known emergency phone number 911.
– IANA assigns port numbers to specific services.
– List of all assignments is universally published by IANA
• Port numbers have a range of 0..65535
• Static Binding: ports 0-1023
• DNS : port 53 , SMTP: port 25 , FTP: port 21
14
Chapter 5
4
• Part of a IANA Port Assignment table:
15
• Dynamic binding
–
–
–
–
Chapter 5
4
Static Vs Dynamic Binding
Server port is well defined (static binding)
Client's port number is often chosen from the dynamic port range
Program dynamically receives port from operating system.
Port range (1024-5000)
• IANA later changed this to 49152-65535
• UDP and TCP: hybrid approach.
• Some port numbers are assigned to fixed services.
• Many port numbers are available for dynamic assignment to
application programs.
16
• Fixed UDP Port Numbers
Chapter 5
4
Simple Demultiplexer (UDP)
– Linux and Unix: /etc/services
17
• A client-server application such as DNS uses the
services of UDP because a client needs to send a
short request to a server and to receive a quick
response from it.
Chapter 5
4
Example 1
– The request and response can each fit in one user
datagram.
– Since only one message is exchanged in each direction,
the connectionless feature is not an issue;
– i.e. The client or server does not worry that messages are
delivered out of order.
18
• Is using UDP for sending emails appropriate?
• Why?
Chapter 5
4
Question
19
Chapter 5
4
Answer
• A client-server application such as SMTP, which is used in
electronic mail, CANNOT use the services of UDP
because a user can send a long e-mail message, which
may include multimedia (images, audio, or video).
• If the application uses UDP and the message does not fit
in one single user datagram, the message must be split
by the application into different user datagrams.
• Here the connectionless service may create problems.
The user datagrams may arrive and be delivered to the
receiver application out of order. The receiver application
may not be able to reorder the pieces. This means the
connectionless service has a disadvantage for an
application program that sends long messages.
20
• Can downloading a large text file from internet
use UDP as transport protocol?
• Why?
Chapter 5
4
Question
21
• Assume we are downloading a very large text
file from the Internet.
• We definitely need to use a transport layer that
provides reliable service.
• We don’t want part of the file to be missing or
corrupted when we open the file.
• The delay created between the delivery of the
parts are not an overriding concern for us; we
wait until the whole file is composed before
looking at it.
• In this case, UDP is not a suitable transport
layer.
Chapter 5
4
Answer
22
• Is using UDP for watching a real time stream
video OK?
• Why?
Chapter 5
4
Question
23
Chapter 5
4
Answer
• Assume we are watching a real-time stream video on our
computer.
• Such a program is considered a long file;
• it is divided into many small parts and broadcast in real time.
• The parts of the message are sent one after another.
•
If the transport layer is supposed to resend a corrupted or lost
frame, the synchronizing of the whole transmission may be lost
– The viewer suddenly sees a blank screen and needs to wait until the
second transmission arrives. This is not tolerable.
• However, if each small part of the screen is sent using one single
user datagram, the receiving UDP can easily ignore the corrupted
or lost packet and deliver the rest to the application program.
– That part of the screen is blank for a very short period of the time,
which most viewers do not even notice.
• However, video cannot be viewed out of order, so streaming
audio, video, and voice applications that run over UDP must
reorder or drop frames that are out of sequence.
24
Chapter 5
4
Reliable Byte Stream (TCP)
• IP provides a connectionless/unrealible packet delivery to host
• UDP provides delivery to multiple ports within a host
• In contrast to UDP, Transmission Control Protocol (TCP) offers
the following services
– Byte-stream service:
• Application programs see stream of bytes flowing from sender to receiver.
– Reliable:
• The sequence of bytes sent is the same as the sequence of bytes received.
• Technique: sliding window protocol.
– Connection oriented:
• A connection between sender and receiver is established before sending
data
– Like UDP: Delivery to multiple ports within a host
25
• TCP Provides :
– Flow control :
Chapter 5
4
Reliable Byte Stream (TCP)
• Prevent senders from overrunning the capacity of the receivers
• I.e Packages are not sent faster than they can be received.
• Technique: TCP variant of sliding window protocol.
– Congestion control:
• Prevent too much data from being injected into the network,
• i.e stop switches or links becoming overloaded
26
• TCP is a byte-oriented protocol:
Chapter 5
4
TCP Segment
– sender writes bytes into a TCP connection
– receiver reads bytes out of the TCP connection.
• Buffered Transfer:
– Application software may transmit individual bytes across the
stream.
– However, TCP does NOT transmit individual bytes over the
Internet.
– Rather, bytes are aggregated to packets before sending for
efficiency
27
– TCP on the destination host then empties the contents of the
packet into a receive buffer
– The receiving process reads from this buffer.
– The packets exchanged between TCP peers are called
segments.
Chapter 5
4
TCP Segment
28
Chapter 5
4
TCP Segment
How TCP manages a byte stream.
29
Chapter 5
4
TCP Header
TCP Header Format
• SrcPort and DstPort:
– identify the source and destination ports, respectively.
• Acknowledgment, SequenceNum, and AdvertisedWindow :
– involved in TCP’s sliding window algorithm.
• SequenceNum:
– Each byte of data has a sequence number
– Contains the sequence number for the first byte of data carried in that
segment.
30
• Flags: relay control information between TCP peers.
Chapter 5
4
TCP Header
– Possible flags :SYN, FIN, RESET, PUSH, URG, and ACK.
– The SYN and FIN flags are used when establishing and
terminating a TCP connection
– The ACK flag is set any time the Acknowledgment field is valid
– The URG flag signifies that this segment contains urgent data.
– The PUSH flag signifies that the sender invoked the push
operation
– The RESET flag signifies that the receiver has become confused
• it received a segment it did not expect to receive
• so receiver wants to abort the connection.
• Checksum:
– like in UDP
31
Chapter 5
4
TCP Connections
• Use of protocol port numbers.
– A TCP connection is identified by a pair of endpoints.
– An endpoint is a pair (host IP address, port number).
– E.g.
• Connection 1: (18.26.0.36, 1069) and (128.10.2.3, 25).
• Connection 2: (128.9.0.32, 1184) and (128.10.2.3, 53).
• Connection 3: (127.2.254.139, 1184) and (128.10.2.3, 53).
– Different connections may share the endpoints!
• Multiple clients may be connected to the same service.
32
• Connection setup:
Chapter 5
4
Connection Establishment/Termination in TCP
– Client do an active open to a
server
– Two sides engage in an
exchange of messages to
establish the connection.
– Then send data
• Both sides must signal that
they are ready to transfer
data.
– Both sides must also agree
on initial sequence numbers.
– Initial sequence numbers are
randomly chosen
Timeline for three-way handshake algorithm
33
Chapter 5
4
Connection Establishment
1.
Segment1 (ClientServer):
• State the initial sequence number it plans to
use
• Flags = SYN, SequenceNum = x.
1
2.
3.
Segment2(ServerClient):
• Acknowledges client’s sequence number
• Flags = ACK, Ack = x+1
• States its own beginning sequence number
• Flags = SYN, SequenceNum = y
Segment3(ClientServer):
• Acknowledges the server’s sequence number
• Flags = ACK, Ack = y +1
2
3
ACK filed acknowledge
next sequence number
expected from other side
34
• Connection Teardown:
– When participant is done sending data,
it closes one direction of the connection
Client
Chapter 5
4
Connection Establishment/Termination in TCP
Server
– Either side can initiate tear down
– Send FIN signal
– “I’m not going to send any more data”
– Other side can continue sending data
– Half open connection
– Must continue to acknowledge
– Acknowledging FIN
– Acknowledge last sequence number + 1
35
• Connection setup is an asymmetric
Chapter 5
4
Connection Establishment/Termination in TCP
– one side does a passive open
– the other side does an active open
• Connection teardown is symmetric
– each side has to close the connection independently
36
Chapter 5
4
End-to-end Issues
• At the heart of TCP is the sliding window algorithm
http://histrory.visualland.net/tcp_swnd.html
• TCP runs over the Internet rather than a point-to-point
link
• The following issues need to be addressed by the sliding
window algorithm
– TCP supports logical connections between processes
– TCP connections are likely to have widely different RTT times
– http://www.caida.org/research/performance/rtt/walrus0202/
– Packets may get reordered in the Internet
37
• Simple Positive Acknowledgement Protocol
Chapter 5
4
Sliding Window Revisited
– Normal situation:
• Positive acknowledgement with retransmission
Sender
Receiver
time
38
Sender
Chapter 5
4
• Error in transmission: packet loss
Receiver
Timeout
+
retransmission
• Under-utilize the network capacity.
• There is only one message at a time in the network.
39
Chapter 5
4
Sliding Window Protocol
• Better form of positive
acknowledgement and
retransmission.
– Sender may transmit multiple
packets before waiting for an
acknowledgement.
• Windows of packets of small
fixed size N.
– All packets within window may be
transmitted.
– If first packet inside the window is
acknowledged, window slides one
element further.
Sender sliding window
Send frame1- 4
1
2
3
4
5
6
7
8
9
Ack 1 arrive  send frame5
1
2
3
4
5
6
7
8
9
Ack 2 arrive  send frame6
1
2
3
4
5
6
7
8
9
40
Chapter 5
4
Sliding Window Revisited
Sender
Receiver
• Performance depends on window size N and on speed of
packet delivery.
– N = 1: simple positive acknowledgement protocol.
– By increasing N, network idle time may be eliminated.
• Steady state:
– sender transmits packets as fast as network can transfer them.
– Well tuned sliding window protocol keeps network saturated.
41
Chapter 5
4
Sliding Window Revisited
Relationship between TCP send buffer (a) and receive buffer (b).
42
Chapter 5
4
TCP Sliding Window
• Sending Side
– LastByteAcked ≤ LastByteSent ≤ LastByteWritten
• Receiving Side
– LastByteRead < NextByteExpected ≤ LastByteRcvd + 1
• E.g. At Sender:
Sending window size
(advertised by the receiver)
1
2
Bytes 1-2
acknowledged
3
4
5
6
Bytes 3-6 sent but
Not yet acknowledged
LastByteAcked
7
8
9
Bytes 7-9 not sent
Will send next
LastByteSent
10
11
Bytes 10-11 cannot
be sent until window moves
LastByteWritten
Fig. A Send Window
43
Chapter 5
4
TCP’s Sliding Window Mechanism
• TCP’s variant of the sliding window algorithm, which
serves several purposes:
• (1) it guarantees the reliable delivery of data,
• (2) it ensures that data is delivered in order, and
• (3) it enforces flow control between the sender and the receiver.
44
• Operates on byte level:
Chapter 5
4
TCP Sliding Window
– Bytes in window are sent from first to last.
– First bytes have to be acknowledged before window may slide.
• The size of a window is variable.
• Receiver delivers window advertisements to sender:
– On setup of a connection, the receiver informs the sender about the
size of its window
• =how many bytes the receiver is willing to accept = size of buffer on
receiver).
– The sender sends at most as many bytes as determined by the
window advertisement.
– Every ACK message from the receiver contains a new window
advertisement
– The sender adapts its window size correspondingly.
45
• Solves the problem of flow control:
Chapter 5
4
TCP Sliding Window
– As the window slides, the receiver may adjust the
speed of the sender to its own speed by modifying
the window size.
46
• How does TCP decide to transmit a segment?
Chapter 5
4
Triggering Transmission
– TCP supports a byte stream abstraction
– Application programs write bytes into streams
– It is up to TCP to decide that it has enough bytes to send a
segment
47
• What factors governs triggering transmission?
– Ignore flow control: window is wide open
Chapter 5
4
Triggering Transmission
– TCP has three mechanism to trigger the transmission of a
segment
1.
TCP maintains a variable MSS(Maximum Segment Size)
– TCP sends a segment as soon as it has collected MSS bytes from the
sending process
2.
Sending process has explicitly asked TCP to send it
– TCP supports push operation : immediate send rather than buffer
– E.g. Telnet application: interactive, each keystroke immediately sent to other side
3.
When a timer fires
– Resulting segment contains as many bytes as are currently buffered for
transmission
48
• Caused by poorly implemented TCP flow control.
Chapter 5
4
Silly Window Syndrome
• Silly Window Syndrome:
– Each ACK advertises a small window
– Each segment carries a small amount of data
• Problems:
– Poor use of network bandwidth
– Unnecessary computational overhead
• E.g.
– Server/Receiver consume data slowly.
– So it requests client/sender to reduce the sending window size
– If the server continues to be unable to process all incoming data:
• The window becomes smaller and smaller (shrink to a silly value)
• Data transmission becomes extremely inefficient.
49
Chapter 5
4
Silly Window Syndrome
2. Syndrome created by the Receiver
– Problem:
• Receiving application program consumes data slowly
• So receiver advertise smaller window to sender
– Solution :
• Clark’s solution
• Sending an ACK but announcing a window size of zero until there is
enough space to accommodate a segment of max. size or until half of
the buffer is empty
• i.e. receiver consumes data until “enough” space available to
advertise
TCP/IP Protocol Suite
50
50
• Syndrome created by the Sender
Chapter 5
4
Silly Window Syndrome
– Problem:
• Sending application program creates data slowly
– Goal:
• Do not send smaller segments
• Wait and collect data to send in a larger block
– How long should the sending TCP wait before transmitting?
• Solution: Nagle’s algorithm
• Nagle’s algorithm takes into account
– (1) the speed of the application program that creates the data, and
– (2) the speed of the network that transports the data
51
Chapter 5
4
Nagle’s Algorithm
• Solves slow sender problem
• If there is data to send but the window open is less than MSS,
then we may want to wait some amount of time before sending
the available data
• But how long?
– If we wait too long, then we hurt interactive applications like Telnet
– If we don’t wait long enough, then we risk sending a bunch of tiny packets
and falling into the silly window syndrome
• Solution:
– Introduce a timer and to transmit when the timer expires
52
• We could use a clock-based timer(e.g. fires every 100 ms)
Chapter 5
4
Nagle’s Algorithm
• Nagle introduced an elegant self-clocking solution
• Key Idea:
– As long as TCP has any data in flight, the sender will eventually receive an
ACK
– This ACK can be treated like a timer firing, triggering the transmission of
more data
53
When the application produces data to send
if both the available data and the window ≥ MSS
send a full segment
else
if there is unACKed data in flight
buffer the new data until an ACK arrives
else
send all the new data now
Chapter 5
4
Nagle’s Algorithm
54
• TCP estimates the round-trip delay for each
active connection
Chapter 5
4
Adaptive Retransmission
• For each connection:
– TCP generates a sequence of round-trip estimates
and
– produces a weighted average
55
• Why would TCP estimate RTT delay per
connection?
Chapter 5
4
Question
56
• The goal is to wait long enough to decide that a
packet was lost and needs to be retransmitted,
without waiting longer than necessary
Chapter 5
4
Answer
– We don’t want timeout to expire too soon or too late.
• Ideally, we would like to set the retransmission
timer to a value just slightly larger than
the round-trip time (RTT) between the two TCP
devices
Ref:
http://www.tcpipguide.com/free/t_TCPAdaptiveRetransmissionandRetransmissionTimerCal.ht
57
m
Chapter 5
4
• The problem is that there is no such “typical”
round-trip time.
• Why?
58
Chapter 5
4
• There are two main reasons for this:
1. Differences In Connection Distance:
1.
Pinging to Georgia State University, GA from within csbsju United State
2.
Pinging to a Oxford University in UK from csbsju United States
3.
Ping to Kharagpur college in India from csbsju United States
59
Chapter 5
4
2. Transient Delays and Variability:
–
The amount of time it takes to send data between any two
devices will vary over time due to various happenings on the
internetwork:
•
fluctuations in traffic, router loads etc.
• E.g.
60
Chapter 5
4
• It is for these reasons that TCP does not attempt to use a
static, single number for its retransmission timers
(timeout).
• Instead, TCP uses a dynamic, or adaptive retransmission
scheme.
61
• In addition to round trip delay estimations, TCP also maintains an
estimate of the variance and uses a linear combination of the
estimated mean and variance as the value of the timeout, as
shown below
Receiver
Sender
Chapter 5
4
Adaptive Retransmission
Receiver
Sender
estimate1
estimate1
estimate2
estimate2
timeout
timeout
Packet lost
Packet lost
Example 1: Connection with longer RTT
Example 2: Connection with shorter RTT
62
Chapter 5
4
Adaptive Retransmission
• Adaptive Retransmission Based on Round-Trip Time
Calculations
– TCP attempts to determine the approximate round-trip
time between the devices, and adjusts it over time to
compensate for increases or decreases in the average
delay.
– To learn how this is done look at RFC 2988
RTTestimated = ( * RTTestimated) + ((1- ) * RTTsample)
New estimate of Rtt
Smoothing factor [0..1]
Previous estimate of Rtt
Rtt for most recent segment/Ack
–  is a smoothing factor between 0 and 1
http://www.tcpipguide.com/free/t_TCPAdaptiveRetransmissionandRetransmissionTimerCal-2.htm
63
• Which values will be appropriate for  ?
Chapter 5
4
Question
64
Chapter 5
4
Answer
• RTTestimated = ( * RTTestimated) + ((1- ) * RTTsample)
• Higher values of  (closer to 1)
– provide better smoothing
– avoid sudden changes as a result of one very fast or very
slow RTT measurement.
– Conversely, this also slows down how quickly TCP reacts
to more sustained changes in round-trip time.
• Lower values of  (closer to 0) :
– make the RTT change more quickly in reaction to
changes in measured RTT
– can cause “over-reaction” when RTTs fluctuate wildly.
65
• Original Algorithm
Chapter 5
4
Adaptive Retransmission
1. Measure SampleRTT for each segment/ ACK pair
2. Compute weighted average of RTT
• EstRTT =  x EstRTT + (1 -  )x SampleRTT
  between 0.8 and 0.9
3. Set timeout based on EstRTT
• TimeOut = 2 x EstRTT
– Why?
66
• Problem (Acknowledgement Ambiguity)
Chapter 5
4
Original Algorithm
– ACK does not really acknowledge a transmission
• It actually acknowledges the receipt of data
– When a segment is retransmitted and then an ACK
arrives at the sender
• It is impossible to decide if this ACK should be associated
with the first or the second transmission for calculating
RTTs
– Why?
67
RTT appear too high
RTT appear too low
Chapter 5
4
Original Algorithm
Associating the ACK with (a) original transmission versus (b) retransmission
68
• How do you correct this problem?
Chapter 5
4
Question
69
• Surprisingly simple solution!
Chapter 5
4
Answer: Karn/Partridge Algorithm
• Refines RTT Calculation:
– Do not sample RTT when retransmitting
• Only measures SampleRTT for segments that have been sent only
once.
• Algorithm also included a second small change:
– Double timeout after each retransmission
• Do NOT base timeout on the last EstimatedRTT
• Use Exponential backoff (similar to Ethernets)
– Why?
70
• Karn-Partridge algorithm was an improvement over
the original approach
Chapter 5
4
Karn/Partridge Algorithm
• But it does not eliminate congestion
• We need to understand how timeout is related to
congestion
– If you timeout too soon, you may unnecessarily retransmit
a segment which adds load to the network
71
• Main problem with the original computation is that
it does not take variance of Sample RTTs into
consideration.
Chapter 5
4
Karn/Partridge Algorithm
EstRTT =  x EstRTT + (1 -  )x SampleRTT
• If the variance among Sample RTTs is small
– Then the Estimated RTT can be better trusted
– There is no need to multiply this by 2 to compute the
timeout
TimeOut = 2 x EstRTT
72
• On the other hand, a large variance in the samples
suggest that timeout value should not be tightly
coupled to the Estimated RTT
Chapter 5
4
Karn/Partridge Algorithm
• Jacobson/Karels proposed a new scheme for TCP
retransmission
73
• Difference = SampleRTT − EstimatedRTT
• EstimatedRTT = EstimatedRTT + ( × Difference)
• Deviation = Deviation + (|Difference| − Deviation)
Chapter 5
4
Jacobson/Karels Algorithm
–  is a fraction between 0 and 1.
– i.e. we calculate both the mean RTT and the variation in that mean.
• TimeOut = μ × EstimatedRTT +  × Deviation
– μ is typically set to 1
–  is set to 4.
• Thus, when the variance is small:
– TimeOut is close to EstimatedRTT;
• When variance is large :
– It causes the deviation term to dominate the calculation.
• Demo: http://www.grahn.us/projects/jacobson-karels/
74
Chapter 5
4
Summary
• We have discussed how to convert host-to-host packet
delivery service to process-to-process communication channel.
• We have discussed UDP
• We have discussed TCP
75

Chapter 5: End-to-End Protocols

Transcript Chapter 5: End-to-End Protocols

Directory