Transport Protocols, UDP, TCP
Download
Report
Transcript Transport Protocols, UDP, TCP
Week 10
Transport Protocols, UDP, TCP
1
Orientation
We move one layer up and look at the
transport layer across the Internet.
User
Process
User
Process
User
Process
TCP
IP
e.g.,
Ethernet
Media
User
Process
Application
Layer
UDP
Transport
Layer
IP
Layer
Network protocols
2
Orientation
TCP and UDP are end-to-end protocols
They are only implemented at the hosts
HOST
HOST
Application
Application
TCP/UDP
TCP/UDP
IP Router
IP
IP
Network 1
protocols
Network 1
protocols
IP
Network 2
protocols
Network 2
protocols
3
Transport Protocols in the Internet
• The Internet supports 2 transport protocols
UDP - User Datagram Protocol
datagram oriented
TCP - Transmission Control
Protocol
unreliable, connectionless
stream oriented
simple
reliable, connection-oriented
unicast and multicast
complex
useful for multimedia
only unicast
applications
used for control protocols
network management
(SNMP), routing (RIP),
naming (DNS), etc.
used for data applications:
web (http), email (smtp), file
transfer (ftp), SecureCRT,
etc.
4
UDP - User Datagram Protocol
UDP extends the host-to-to-host delivery service of IP to
an application process-to-application process delivery
service
It does this by multiplexing and demultiplexing packets
from multiple application-to-application communication
sessions
Applications
Applications
UDP
UDP
IP
IP
router
IP
router
IP
router
IP
5
UDP packet format
IP header UDP header
20 bytes
UDP data (payload)
8 bytes
Source Port Number
Destination Port Number
UDP message length
Checksum
0
15 16
31
• Port numbers identify sending and receiving applications (processes).
Maximum port number is 216-1= 65,535
• Message Length is between 8 bytes (i.e., data field can be empty) and 65,535
bytes (length of UDP header and data in bytes)
• Checksum is for UDP header and UDP data
6
Port Numbers
UDP (and TCP) use port numbers to identify
applications
There are 65,535 UDP ports per host.
User
Process
User
Process
User
Process
TCP
User
Process
User
Process
UDP
IP
User
Process
Demultiplex
based on
port number
Demultiplex
based on
Protocol field in IP
header
7
TCP
Service offered by TCP
TCP Header
TCP Connection Establishment and
Termination
Flow control
Error control
Congestion control
8
TCP = Transmission Control
Protocol
Provides a reliable unicast end-to-end byte
Byte Stream
Byte Stream
stream over an unreliable internetwork.
TCP
TCP
IP Internetwork
9
TCP is reliable
• Byte stream is broken up into chunks which are called segments
• Detecting errors:
• TCP has checksums for header and data. Segments with invalid
checksums are discarded
• Each segment that is transmitted has a sequence number.
• Receiver sends acknowledgments (ACKs) for segments
• Sender maintains a timer. An ACK is expected before the timer
times out
• Correcting errors:
• Lost or errored segments are retransmitted.
• Selective repeat ARQ scheme
• Cumulative ACKs
10
Byte Stream Service
To the lower layers, TCP handles data in "segments"
To the higher layers TCP handles data as a sequence of
bytes and does not identify boundaries between bytes
So:
Higher layers do not know about the beginning and
end of segments !
Application
Application
1. read 40 bytes
2. read 40 bytes
3. read 40 bytes
1. write 100 bytes
2. write 20 bytes
TCP
queue of
bytes to be
transmitted
Segments
TCP
queue of
bytes that
have been
received
11
TCP
Service offered by TCP
TCP Header
TCP Connection Establishment and
Termination
Flow control
Error control
Congestion control
12
TCP Format
• TCP segments have a 20 byte plus options header with >= 0 data bytes
IP header TCP header
20 bytes
TCP data
20 bytes
0
15 16
Source Port Number
31
Destination Port Number
Sequence number (32 bits)
4 bits
6 bits
reserved
header
length
0
Flags
TCP checksum
20 bytes
Acknowledgment number (32 bits)
window size
urgent pointer
Options (if any)
DATA (optional)
13
TCP header fields - Port
Numbers
Port Number:
• A port number identifies the endpoint of a connection.
• A pair <IP address, port number> identifies one endpoint of a
connection.
• Two pairs <client IP address, client port number> and <server
IP address, server port number> identify a TCP connection.
Applications
Ports:
23 80 104
Applications
7
80 16
TCP
TCP
IP
IP
Ports:
14
TCP header fields - Sequence
Number
Sequence Number (SeqNo):
Sequence number is 32 bits long.
So the range of SeqNo is
0 <= SeqNo <= 232 -1 4.3 Gbyte
Each sequence number identifies the byte in the stream of
data from the sending TCP to the receiving TCP that the first
byte of data in this segment represents.
Initial Sequence Number (ISN) of a connection is set during
connection establishment
1
500
Segment 1
(Seq. No. 1)
501
1000
Segment 2
(Seq. No. 501)
1001
1500
Segment 3
(Seq. No. 1001)
15
TCP header fields - Ack. No.
Acknowledgment
Number (AckNo):
Acknowledgments are piggybacked, i.e.,
a segment from A B contains an acknowledgement for a
segment sent in the B A direction
The AckNo in the B A segment header contains the
SeqNo for the next segment expected at B for the A B
flow
Example: The acknowledgment for a 1500-byte segment
with the sequence number 0 is AckNo=1500
A host uses the AckNo field to send acknowledgements.
If a host sends an AckNo in a segment it sets the “ACK
flag”
16
TCP header fields - Ack. No.
Contd.
Example:
Sender sends two segments with bytes “1..1500”
and “1501..3000”, but receiver only gets the
second segment.
• What is the sequence number of the first segment?
• What is the sequence number of the second segment?
• What is the ACK number sent in response by the
receiver when it receives the second segment?
17
TCP header fields - Header
Length
Header Length (4 bits):
Length of header in 32-bit words
Note that TCP header has variable length
(minimum of 20 bytes)
18
TCP header fields - Flags
Flag bits:
URG:
Urgent pointer is valid
– If the bit is set, the following bytes contain an urgent
message in the range:
SeqNo <= urgent message <= SeqNo+urgent pointer
ACK: Acknowledgement Number is valid
PSH: PUSH Flag
– Notification from sender to the receiver that the
receiver should pass all data that it has to the
application as soon as possible.
– Normally set by sender when the sender’s buffer is
empty (so TCP does not wait expecting more data)
19
TCP header fields - Flags
Contd.
Flag bits:
RST: Reset the connection
– The flag causes the receiver to reset the connection
– Receiver of a RST terminates the connection and
indicates higher layer application about the reset
SYN: Synchronize sequence numbers
– Sent in the first packet when opening a connection
FIN: Sender is finished with sending
– Used for closing a connection
– Both sides of a connection must send a FIN
20
TCP header fields
Window Size:
Each side of the connection advertises its receiving
window size
Window size is the maximum number of bytes that a
receiver can accept.
Maximum window size is 216-1= 65535 bytes
TCP Checksum:
TCP checksum covers both TCP header and TCP data
Urgent Pointer:
Only valid if URG flag is set
21
TCP header fields - Options
Options - a few examples:
End of
Options
kind=0
1 byte
NOP
(no operation)
kind=1
1 byte
Maximum
Segment Size
kind=2
len=4
maximum
segment size
1 byte
1 byte
2 bytes
22
TCP header fields
Options:
NOP is used to pad TCP header to a multiple of
4 bytes
Maximum Segment Size:
• Sets the maximum length of the segments
• This option can only appear in a SYN segment
23
TCP
Service offered by TCP
TCP Header
TCP Connection Establishment and
Termination
Flow control
Error control
Congestion control
24
Connection Management in TCP
Opening a TCP Connection
Closing a TCP Connection
Special Scenarios
State Diagram
25
TCP Connection Establishment
TCP uses a three-way handshake to open a connection:
(1) ACTIVE OPEN: Client sends a segment with
– SYN bit set
– port number of client, port number of server
– initial sequence number (ISN) of client
(2) PASSIVE OPEN: Server responds with a segment with
– SYN bit set
– initial sequence number of server
– ACK for ISN of client
(3) Client acknowledges by sending a segment with:
– ACK ISN of server
26
Three-Way Handshake
aida.poly.edu
mng.poly.edu
SYN (Seq
No = x)
o=x+
N
k
c
A
,
y
=
o
N
SYN (Seq
1)
ack (y + 1 )
27
A Closer Look with tcpdump
aida issues
a "telnet mng"
aida.poly.edu
1
mng.poly.edu
aida.poly.edu.1121 > mng.poly.edu.telnet: S 1031880193:1031880193(0)
win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp>
2
mng.poly.edu.telnet > aida.poly.edu.1121: S 172488586:172488586(0)
ack 1031880194 win 8760 <mss 1460>
3
aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488587 win 17520
4
aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880194:1031880218(24)
ack 172488587 win 17520
5
mng.poly.edu.telnet > aida.poly.edu.1121: P 172488587:172488590(3)
ack 1031880218 win 8736
6
aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880218:1031880221(3)
ack 172488590 win 17520
28
Three-Way Handshake
aida.poly.edu
mng.poly.edu
S 103188
0193:103
1880193(
win 16384
0)
<mss 146
0, ...>
8586(0)
8
4
2
7
:1
6
8
5
8
8
S 1724
<mss 1460>
0
6
7
8
in
w
4
9
ack 10318801
ack 172488
587 win 175
20
29
First data segment sequence
number
Note that the data segment following the
three-way handshake will start with the
sequence number following that of the SYN
segment
30
Why to start with a new ISN
The problem with starting off each connection with a
sequence number of 1 is that it introduces the possibility of
segments from different connections getting mixed up.
Traditionally, each device chose the ISN by making use of a
timed counter, like a clock of sorts, that was incremented
every 4 microseconds. This counter was initialized when TCP
started up and then its value increased by 1 every 4
microseconds until it reached the largest 32-bit value
possible (4,294,967,295) at which point it “wrapped around”
to 0 and resumed incrementing.
Period: 4 hours
31
TCP Connection Termination
Each end of the data flow must be shut down
independently (“half-close”)
If one end is done it sends a FIN segment. This
means that no more data will be sent
Four steps involved:
(1) X sends a FIN to Y (active close)
(2) Y ACKs the FIN,
(at this time: Y can still send data to X)
(3) and Y sends a FIN to X (passive close)
(4) X ACKs the FIN.
32
Connection termination with
tcpdump
aida.poly.edu
1
mng.poly.edu
mng.poly.edu.telnet > aida.poly.edu.1121: F 172488734:172488734(0)
ack 1031880221 win 8733
2 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488735 win 17484
3 aida.poly.edu.1121 > mng.poly.edu.telnet: F 1031880221:1031880221(0)
ack 172488735 win 17520
4 mng.poly.edu.telnet > aida.poly.edu.1121: . ack 1031880222 win 8733
33
TCP Connection Termination
aida.poly.edu
mng.poly.edu
F 172488734:172488734(0)
ack 1031880221 win 8733
. ack 17
2488735
win 174
84
F 10318
80221:1
0318802
ack 1 72
21(0)
488735
win 175
20
222 win
. ack 1031880
8733
34
TCP Half-close
FIN
ACK of FIN
DATA
ACK of DATA
FIN
ACK of FIN
35
MSS
B
A
MTU = 1500
MTU = 296
C
SYN <mss 1460>
SYN <mss 256>
Default is generally 536 bytes
36
Difference between TCP
connections and connections in a
connection-oriented network
TCP “connections” are not the same as connections
in a connection-oriented network
In a connection-oriented network, a signaling
procedure is used to reserve bandwidth for the
connection on every link of the end-to-end path
(e.g., circuit-switched networks)
A TCP connection involves the maintenance of
state information at the end hosts
Purpose is to provide error correction for TCP segments
Initial sequence number exchanged to avoid accidentally
sending data to an old connection
37
TCP
Service offered by TCP
TCP Header
TCP Connection Establishment and
Termination
Flow control
Error control
Congestion control
38
TCP flow control
• Flow Control:
How to prevent the sender
from overrunning the receiver buffer?
•Flow Control in TCP
• TCP implements sliding window flow control
• Window size is usually sent within acknowledgements.
39
Window Management in TCP
The receiver returns two parameters to the
sender in an ACK
AckNo
window size
(win)
32 bits
16 bits
The interpretation is:
• I am ready to receive new data with
SeqNo= AckNo, AckNo+1, …., AckNo+Win-1
Receiver can acknowledge data without opening
the window
Receiver can change the window size without
acknowledging data
40
TCP Flow Control
receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
speed-matching
app process may be
slow at reading from
buffer
service: matching the
send rate to the
receiving app’s drain
rate
41
TCP Flow control: how it works
Rcvr advertises spare
room by including value
of RcvWindow in
segments
(Suppose TCP receiver
discards out-of-order
segments)
spare room in buffer
Sender limits unACKed
data to RcvWindow
guarantees receive
buffer doesn’t overflow
= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]
42
Sliding windows
Offered window
advertised by receiver
1
2
3
Sent and
Acknow.
4
5
6
7
8
9
10
Sent not Usable window:
acked
Can send
ASAP
11
…
Can’t send
until window
moves
43
Sliding Window: Example
Receiver
Buffer
Sender
sends 2K
of data
0
4K
2K SeqNo=0
2K
Sender blocked
Sender
sends 2K
of data
Win=2048
AckNo=2048
2K SeqNo=2
048
4K
AckNo=4096
Win=0
3K
AckNo=4096
Win=1024
44
Sliding Window: In-class
example Sender
Receiver
4K bytes
win 4096
How many more
segments can it
send now?
3 segments
Sequence number:
Is 1025 carried in
TCP header?
Is 1024 carried in
TCP header?
What is 1024?
NOTATION
1:1025(1024)
1025:2049(1024)
4K bytes
2049:3073(1024)
3073:4097(1024)
1K
ack 1025 win 3072
How many segments
can it send now?
45
Sliding Window: In-class
example answers
Receiver
Sender
4K bytes
win 4096
How many more
segments can it
send now?
3 segments
1:1025(1024)
1025:2049(1024)
4K bytes
2049:3073(1024)
3073:4097(1024)
1K
ack 1025 win 3072
How many segments
can it send now? 0
46
Silly Window Syndrome
Let's say that the server
is only able to remove 1
byte of data from the
buffer for every 3 it
receives.
Let's say it also removes
40 additional bytes from
the buffer during the
time it takes for the next
client's segment to
arrive.
In the worst case, the
client then sends a
segment with exactly one
byte, refilling the buffer
until the application
draws off the next byte.
47
TCP
Service offered by TCP
TCP Header
TCP Connection Establishment and
Termination
Flow control
Error control
Congestion control
48
TCP error control
ARQ scheme with positive cumulative ACKs
Delayed ACKs:
TCP delays transmission of ACKs for up to
200ms
The hope is to have data ready in that time
frame. Then, the ACK can be piggybacked with
the data segment.
49
Delayed ACK timer
This timer ticks every 200ms.
First timeout occurs based on when the timer was initialized,
which is when the system was rebooted.
The figure below explains why the delay for the ACKdelay is
UP TO 200 ms (and not equal to 200 ms).
somewhere here
TCP receives
segment
1
2
200 ms
per tick
3
4
5
6
7
8
9
10
11
12
Delayed ACK timer expires (ACK has to be sent at
this point whether or not TCP buffer has received
data to enable piggybacking)
50
TCP Retransmission Timer
Retransmission Timer:
The setting of the retransmission timer is
crucial for efficiency
Timeout value too small -> results in
unnecessary retransmissions
Timeout
value too large -> long waiting time
before a retransmission can be issued
A problem is that the delays in the network are
not fixed
Therefore, the retransmission timers must be
adaptive
51
Measuring TCP Retransmission
Timers
ftp session
from aida
to rigoletto
aida.poly.edu
rigoletto.poly.edu
•Transfer file from aida to rigoletto
• Unplug Ethernet cable in the middle of file transfer
52
tcpdump Trace
10:42:01.704681
10:42:01.705603
10:42:01.706753
10:42:02.741764
10:42:05.741788
10:42:11.741828
10:42:23.741951
10:42:47.742176
10:43:35.742587
10:44:39.743140
10:45:43.743702
10:46:47.744271
10:47:51.752138
10:48:55.745547
10:49:59.746123
10:51:03.745839
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
aida.40001
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
rigoletto.ftp-data:
. 161189:162649(1460) ack 1 win 17520
. 162649:164109(1460) ack 1 win 17520
. 164109:165569(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
. 161189:162649(1460) ack 1 win 17520
R 165569:165569(0) ack 1 win 17520
53
Interpreting the Measurements
The interval between retransmission
attempts in seconds is:
600
1.03, 3, 6, 12, 24, 48, 64, 64, 64,
64, 64, 64, 64.
200
100
0
12
TCP gives up after 13th attempt
and 9 minutes (total timeout,
tcp_ip_abort_interval is 2 mins
in Solaris and can be
programmed by administrator 9 mins is the commonly used old
timeout value)
10
300
8
Timer is not increased beyond
64 seconds
6
4
Backoff Algorithm)
400
2
Time between retransmissions is
doubled each time (Exponential
Seconds
500
0
Transmission Attempts
54
TCP timers
First timeout occurs based on when timer was initialized.
This explains why the first timeout occurs at 1.03 sec and not 1.5.
If the base timer clock is 500 ms, the first timeout occurs after 3
timer ticks. This happens to occur at 1.03 sec after first segment
was sent. Subsequent retransmissions occur at 3 sec, 6 sec, 12 sec,
etc.
1
somewhere
here TCP sends
first segment
2
3
4
5
6
Retransmission timer
expires after three
ticks (<1.5 sec; in this
case it happens to be
1.03 sec)
500 ms
per tick
7
8
9
10
11
12
Retransmission timer
expires after six ticks
(3 sec)
55
Adaptive mechanism
The retransmission mechanism of TCP is adaptive
The retransmission timers are set based on round-trip time (RTT)
measurements that TCP performs
difference between segment
transmission and ACK
Can’t start a second RTT
measurement if timing on one
segment is in progress
Each connection has only one
timer
Segment 2
Segment 3
ent 2 + 3
egm
ACK for S
Segment
5
RTT #3
TCP does not ACK each segment
t1
en
ACK for Segm
RTT #2
But:
Segment 1
RTT #1
The RTT is based on time
egm
ACK for S
ACK for S
Segme
n
t4
ent 4
egment 5
56
Computation of RTO in adaptive
scheme
Retransmission timer is set to a Retransmission Timeout (RTO) value.
RTO is calculated based on the RTT measurements.
The RTT measurements are smoothed by the following estimators A (mean
RTT value) and D (smoothed mean deviation of RTT):
Err = M - A
A A+ g Err=A(1-g)+gM
D D+ h (|Err|-D)=D(1-h)+ h|Err|
RTO = A + 4D
The gains are set to h=1/4 and g=1/8
– In the formula for computing the new smoothed mean RTT A, 0.125
times the newly measured value (M) is added to 0.875 times the old
smoothed value of A
57
In-class example
Assume A=1, D=1 (initial values)
Segment 1
RTT =2
RTO= ?
RTO= ?
ent 1
ACK for Segm
Segment 2
X (packet lost)
RTO?
RTO=?
Segment 2 (retransmitted)
egment 2
ACK for S
+3
RTO= ?
58
Example of RTO computation
(adaptive) Assume A=1, D=1 (initial values)
• Err = 2 -1 =1 (since M, the measured RTT is 2)
• A = 1 + 0.125×1= 1.125; D = 1+0.25 (1-1)=1
• RTO = A+4D=1.125+4 = 5.125
• This is why in the figure below when segment 2 is lost,
it is retransmitted after 5.125 sec.
Segment 1
RTT =2
ent 1
ACK for Segm
Segment 2
RTO
=5.125
X (packet lost)
Segment 2 (retransmitted)
egment
ACK for S
2
59
In-class example
Assume A=1, D=1 (initial values)
RTO=A+4D=5
RTT =2
RTO=A+4D=5.125
(adaptive: new A = 1.125; D=1)
Segment 1
ent 1
ACK for Segm
Segment 2
X (packet lost)
RTO?
RTO=10.25
(doubling)
Segment 2 (retransmitted)
egment
ACK for S
2
RTO=10.25
(Karn's algorithm)
5.125 sec since that is the
retransmission timer value
60
Karn’s Algorithm
There will be no RTT measurement
for the original or retransmitted
segment
Therefore A and D cannot be
updated when the ACK is received,
and hence no new RTO computation
at this point.
Don’t confuse this with the RTO
being doubled when the segment is
retransmitted following the
exponential doubling rule.
Timeout !
RTT ?
The RTT measurement started for
the original transmission should be
terminated.
segme
RTT ?
If an ACK for a retransmitted
segment is received, the sender
cannot tell if the ACK belongs to the
original or the retransmission.
nt
retransm
ission
of segm
ent
ACK
• RTT measurement is suspended
• RTO is doubled
61
RTO = ?
for
ACK nt 4
me
1
ACK
Segmen
t
Seg
nt 2
Segme
SYN
for
ACK ent 3
m
Seg
nt 1
Segme
+ ACK
3 sec
RTT #2
RTT #1
t1
r
ACK fo
r
ACK fo
SYN
Timeout !
.
At t3:
.
RTO= ?
.
At t2:
Seg
men
Seg
t4
me
n
t5
Seg
me
nt 6
RTO = 6 sec; A = 2; D = 1
SYN
At t1:
Segm
ent 2
Segm
ent 3
In-class example
t2
t3
t4
t5 t 6
RTT #3
t7 t 8
t9
62
At t3:
RTO = 12 sec (Karn's algorithm)
Se g
men
Seg
t4
me
n
t5
Seg
me
nt 6
.
Segm
ent 2
Segm
ent 3
for
ACK nt 4
me
Se g
ent 2
SYN
for
ACK ent 3
m
Seg
3 sec
RTT #2
RTT #1
t1
r Segm
A C K fo
ent 1
+ ACK
r Segm
A C K fo
SYN
Timeout !
=6sec
.
RTO= 12 sec (doubling)
Segmen
t1
At t2:
ACK
RTO = 6 sec; A = 2; D = 1
SYN
At t1:
.
In-class example
t2
t3
t4
t5 t6
RTT #3
t7 t8
t9
63
Thus there are two schemes for determining
RTO and two schemes for controlling RTT
measurement
RTO
Exponential backoff if a segment is retransmitted
Adaptive RTO as a function of RTT (A+4D)
• RTT measurement is in progress and a new segment sent then no
RTT measurement is taken for new segment
RTT measurement
Karn’s algorithm
• no RTT measurement on retransmitted segment
Can’t start a second RTT measurement if timing on one segment is in
progress
64