Transcript PPT
Part 3: Transport Layer
CSE 3461/5461
Reading: Chapter 3, Kurose and Ross
1
Part 3: Transport Layer
Chapter goals:
Chapter Overview:
• Understand principles behind • Transport layer services
transport layer services:
• Multiplexing/demultiplexing
– Multiplexing/demultiplexing • Connectionless transport: UDP
– Reliable data transfer
• Principles of reliable data transfer
– Flow control
• Connection-oriented transport: TCP
– Congestion control
• Instantiation and
implementation in the
Internet
– Reliable transfer
– Flow control
– Connection management
• Principles of congestion control
• TCP congestion control
2
Transport Services and Protocols
• Provide logical
communication between
application processes
running on different hosts
• Transport protocols run in
end systems
• Transport layer vs.
network layer services:
– Network layer: data transfer
between end systems
– Transport layer: data transfer
between processes; relies on,
enhances, network layer
services
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
3
Transport-Layer Protocols
Internet transport services:
• Reliable, in-order unicast
delivery (TCP)
– Congestion
– Flow control
– Connection setup
• Unreliable (“best-effort”),
unordered unicast or multicast
delivery: UDP
• Services not available:
– Real-time
– Bandwidth guarantees
– Reliable multicast
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
4
Multiplexing/Demultiplexing (1)
Recall: Segment is a unit of data
exchanged between transport
layer entities
– Aka TPDU: transport
protocol data unit
Application-layer
data
Segment
header
Segment
P1
M
Ht
M
H n Segment
Application
Transport
Network
Demultiplexing: Delivering
received segments to
correct app layer processes
Receiver
P3
M
M
Application
Transport
Network
P4
M
P2
Application
Transport
Network
5
Multiplexing/Demultiplexing (2)
Multiplexing:
Gathering data from multiple
app processes, enveloping
data with header (later used
for demultiplexing)
Multiplexing/demultiplexing:
• Based on sender, receiver port
numbers, IP addresses
– Source, destination port
numbers in each segment
– Recall: Well-known port
numbers for specific
applications
• Internal action between layers: in
the network, messages from
different applications will be in
different packets.
32 bits
Source port #
Dest port #
Other header fields
Application
data
(message)
TCP/UDP segment format
6
Multiplexing/Demultiplexing: Examples
Host A
Source Port: X
Dest. Port: 23
Server B
Source Port:23
Dest. Port: X
Source IP: C
Dest IP: B
Source Port: Y
Dest. Port: 80
Port Use: Simple Telnet App
Web Client
Host A
Web Client
Host C
Source IP: A
Dest IP: B
Source Port: X
Dest. Port: 80
Source IP: C
Dest IP: B
Source Port: X
Dest. Port: 80
Web
Server B
Port Use: Web Server
7
UDP: User Datagram Protocol
[RFC 768]
• “No frills,” “bare bones”
Internet transport protocol
• “Best effort” service, UDP
segments may be:
– Lost
– Delivered out of order to
app
• Connectionless:
– No handshaking between
UDP sender, receiver
– Each UDP segment handled
independently of others
Why is there a UDP?
• No connection establishment
(which can add delay)
• Simple: no connection state at
sender, receiver
• Small segment header
• No congestion control: UDP
can blast away as fast as desired
8
UDP: More
• Often used for streaming
multimedia apps
– Loss tolerant
– Rate sensitive
• Other UDP uses
Length, in
bytes of UDP
segment,
(why?): including
header
– DNS
– SNMP
• Reliable transfer over UDP: add
reliability at application layer
– Application-specific error
recovery!
32 Bits
Source Port #
Dest Port #
Length
Checksum
Application
data
(message)
UDP Segment Format
9
UDP Checksum
Goal: Detect “errors” (e.g., flipped bits) in transmitted segment
Sender:
Receiver:
• Treat segment contents as
sequence of 16-bit integers
• Checksum: addition (one’s
complement sum) of segment
contents
• Sender puts checksum value
into UDP checksum field
• Compute checksum of received
segment
• Check if computed checksum
equals checksum field value:
– NO: Error detected
– YES: No error detected. But
maybe errors nonetheless?
More later ….
10
Principles of Reliable Data Transfer (1)
• Important in application, transport, link layers
– Top-10 list of important networking topics!
•
Characteristics of unreliable channel will determine complexity of reliable data
transfer protocol (RDT)
11
Principles of Reliable Data Transfer (2)
• Important in application, transport, link layers
– Top-10 list of important networking topics!
•
Characteristics of unreliable channel will determine complexity of reliable data
transfer protocol (RDT)
12
Principles Of Reliable Data Transfer (3)
• Important in application, transport, link layers
– Top-10 list of important networking topics!
•
Characteristics of unreliable channel will determine complexity of reliable data
transfer protocol (RDT)
13
Reliable Data Transfer: Getting Started (1)
rdt_send(): Called from above, (e.g.,
by app.). Passed data to
deliver to receiver upper layer
send
side
udt_send(): Called by RDT,
to transfer packet over
unreliable channel to receiver
deliver_data(): Called by RDT
to deliver data to upper layer
receive
side
rdt_rcv(): Called when packet
arrives on rcv-side of channel
14
Reliable Data Transfer: Getting Started (2)
We will:
• Incrementally develop sender, receiver sides of
Reliable Data Transfer protocol (RDT)
• Consider only unidirectional data transfer
– But control info will flow on both directions!
• Use finite state machines (FSM) to specify sender,
Event causing state transition
receiver
Actions taken on state transition
State: when in this
“state” next state
uniquely
determined by
next event
state
1
Event
Actions
state
2
15
RDT1.0: Reliable Transfer over
Reliable Channel
• Underlying channel perfectly reliable
– No bit errors
– No loss of packets
• Separate FSMs for sender, receiver:
– Sender sends data into underlying channel
– Receiver reads data from underlying channel
Wait for
call from
above
rdt_send(data)
packet = make_pkt(data)
udt_send(packet)
Sender
Wait for
call from
below
rdt_rcv(packet)
extract (packet,data)
deliver_data(data)
Receiver
16
RDT2.0: Channel with Bit Errors (1)
• Underlying channel may flip bits in packet
– Checksum to detect bit errors
• The question: how to recover from errors:
– acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
– negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
– sender
retransmits
pkt on receipt
of NAK
How
do humans
recover
from “errors”
• new mechanisms in rdt2.0 (beyond rdt1.0):
during conversation?
– error detection
– receiver feedback: control msgs (ACK,NAK) rcvr>sender
17
RDT2.0: Channel with Bit Errors (2)
• Underlying channel may flip bits in packet
– Checksum to detect bit errors
• The question: how to recover from errors:
– Acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
– Negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
– Sender retransmits pkt on receipt of NAK
• New mechanisms in RDT2.0 (beyond RDT1.0):
– Error detection
– Feedback: control msgs (ACK,NAK) from receiver to
sender
18
RDT2.0: FSM Specification
rdt_send(data)
sndpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait
for
Wait for call
ACK or
udt_send(sndpkt)
from above
NAK
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
Sender
Receiver
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
udt_send(NAK)
Wait for call
from below
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
19
RDT2.0: Operation with no Errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
Wait for call
ACK or
udt_send(sndpkt)
from above
NAK
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
udt_send(NAK)
Wait for call
from below
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
20
RDT2.0: Error Scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
Wait for call
ACK or
udt_send(sndpkt)
from above
NAK
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
Λ
Wait for call
from below
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
21
RDT2.0 has a Fatal Flaw!
What happens if
ACK/NAK
corrupted?
• Sender doesn’t know what
happened at receiver!
• Can’t just retransmit:
possible duplicate
Handling duplicates:
• Sender retransmits current
pkt if ACK/NAK corrupted
• Sender adds sequence
number to each pkt
• Receiver discards (doesn’t
deliver up) duplicate pkt
Stop and Wait
Sender sends one packet,
then waits for receiver response
22
RDT2.1: Sender, Handles Garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait
for
Wait for call
isNAK(rcvpkt) )
ACK or
0 from
udt_send(sndpkt)
NAK 0
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
udt_send(sndpkt)
L
Wait for
ACK or
NAK 1
Wait for
call 1 from
above
rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
23
RDT2.1: Receiver, Handles Garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt)
sndpkt =
make_pkt(NAK,chksum)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
Wait for
0 from
below
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt)
&&
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK,chksum)
udt_send(sndpkt)
Wait for
1 from
below
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt)
sndpkt =
make_pkt(NAK,chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt)
&&
has_seq0(rcvpkt)
sndpkt = make_pkt(ACK,chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
24
RDT2.1: Discussion
Sender:
• Seq # added to pkt
• Two seq. numbers (0,1)
will suffice. Why?
• Must check if received
ACK/NAK corrupted
• Twice as many states
– State must “remember”
whether “expected” pkt
should have seq # 0 or 1
Receiver:
• Must check if received
packet is duplicate
– State indicates whether 0
or 1 is expected pkt seq #
• Note: receiver can not
know if its last
ACK/NAK received
OK at sender
25
RDT2.2: A NAK-Free Protocol
• Same functionality as RDT2.1, using ACKs only
• Instead of NAK, receiver sends ACK for last pkt
received OK
– Receiver must explicitly include seq # of pkt being ACKed
• Duplicate ACK at sender results in same action as NAK:
retransmit current pkt
26
RDT2.2: Sender, Receiver Fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for
Wait for
isACK(rcvpkt,1) )
ACK 0
call 0 from
udt_send(sndpkt)
above
Sender FSM
fragment
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) ||
has_seq1(rcvpkt)) Wait for
udt_send(sndpkt)
0 from
below
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
L
Receiver FSM
fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt)
27
RDT3.0: Channels with Errors and Loss
New assumption:
Approach: sender waits
Underlying channel can
“reasonable” amount of
also lose packets (data,
time for ACK
• Retransmits if no ACK
ACKs)
– Checksum, seq. #,
ACKs, retransmissions
will help … but not
enough
received in this time
• If pkt (or ACK) just delayed
(not lost):
– Retransmission will be
duplicate, but seq. #’s
already handles this
– Receiver must specify
seq # of pkt being
ACKed
• Requires countdown timer
28
RDT3.0 Sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt)
isACK(rcvpkt,1) )
start_timer
L
rdt_rcv(rcvpkt)
L
Wait
for
ACK0
Wait for
call 0 from
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,1)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
stop_timer
stop_timer
timeout
udt_send(sndpkt)
start_timer
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,0) )
L
timeout
udt_send(sndpkt)
start_timer
Wait
for
ACK1
Wait for
call 1 from
above
rdt_rcv(rcvpkt)
rdt_send(data)
L
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
start_timer
29
RDT3.0 in Action (1)
Receiver
Sender
send pkt0
rcv ack0
send pkt1
ack0
rcv pkt0
send ack0
rcv ack0
send pkt1
pkt1
ack1
rcv ack1
send pkt0
send pkt0
pkt0
Receiver
Sender
rcv pkt1
send ack1
pkt0
ack0
rcv pkt0
send ack0
pkt1
X
loss
pkt0
ack0
rcv pkt0
send ack0
timeout
resend pkt1
pkt1
ack1
rcv ack1
send pkt0
(a) no loss
rcv pkt1
send ack1
pkt0
ack0
rcv pkt0
send ack0
(b) packet loss
30
RDT3.0 in Action (2)
Receiver
Sender
Receiver
Sender
send pkt0
rcv ack0
send pkt1
pkt0
ack0
rcv pkt0
send ack0
X
ack0
pkt1
ack1
rcv pkt1
(detect duplicate)
send ack1
pkt0
ack0
(c) ACK loss
rcv pkt0
send ack0
rcv pkt0
send ack0
pkt1
rcv pkt1
send ack1
rcv pkt1
send ack1
loss
rcv ack1
send pkt0
rcv ack0
send pkt1
pkt0
pkt1
ack1
timeout
resend pkt1
send pkt0
ack1
timeout
resend pkt1
rcv ack1
send pkt0
rcv ack1
send pkt0
pkt1
pkt0
ack1
ack0
pkt0
ack0
rcv pkt1
(detect duplicate)
send ack1
rcv pkt0
send ack0
rcv pkt0
(detect duplicate)
send ack0
(d) premature timeout/ delayed ACK
31
Performance of RDT3.0
• RDT3.0 is correct, but performance stinks
• E.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
8000 bits
L
Dtrans = R =
109 bits/sec
= 8 microsecs
– Usender: utilization – fraction of time sender busy sending
U
sender =
L/R
RTT + L / R
=
.008
30.008
= 0.00027
– If RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput
over 1 Gbps link
•
Network protocol limits use of physical resources!
32
RDT3.0: Stop-and-Wait Operation
sender
receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
RTT
ACK arrives, send next
packet, t = RTT + L / R
U
sender =
L/R
RTT + L / R
=
.008
30.008
= 0.00027
33
Pipelined Protocols
Pipelining: sender allows multiple, “in-flight”, yetto-be-acknowledged pkts
– Range of sequence numbers must be increased
– Buffering at sender and/or receiver
• Two generic forms of pipelined protocols: Go-Back-N,
Selective Repeat
34
Pipelining: Increased Utilization
sender
receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
RTT
ACK arrives, send next
packet, t = RTT + L / R
Packet pipelining increases
utilization by a factor of 3!
U
sender =
3L / R
RTT + L / R
=
.0024
30.008
= 0.00081
35
Pipelined Protocols: Overview
Go-Back-N:
• Sender can have up to
N unacked packets in
pipeline
• Receiver only sends
cumulative ACK
– Doesn’t ACK packet if
there’s a gap
• Sender has timer for
oldest unACKed packet
– When timer expires,
retransmit all unACKed
packets
Selective Repeat:
• Sender can have up to N
unacked packets in
pipeline
• Receiver sends individual
ACK for each packet
• Sender maintains timer
for each unACKed packet
– When timer expires,
retransmit only that
unacked packet
36
Go-Back-N: Sender
• k-bit seq # in pkt header
• “Window” of up to N, consecutive unACKed pkts allowed
•
ACK(n): ACKs all pkts up to, including seq # n - “cumulative
ACK”
– may receive duplicate ACKs (see receiver)
• timer for oldest in-flight pkt
• timeout(n): retransmit packet n and all higher seq # pkts in
window
37
GBN: Sender Extended FSM
rdt_send(data)
L
base=1
nextseqnum=1
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
else
refuse_data(data)
timeout
Wait
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt)
start_timer
udt_send(sndpkt[base])
udt_send(sndpkt[base+1])
…
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
38
GBN: Receiver Extended FSM
default
udt_send(sndpkt)
L
Wait
expectedseqnum=1
sndpkt =
make_pkt(expectedseqnum,
ACK,chksum)
rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
&& hasseqnum(rcvpkt,expectedseqnum)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++
ACK-only: always send ACK for correctly-received pkt
with highest in-order seq #
– May generate duplicate ACKs
– Need only remember expectedseqnum
• Out-of-order pkt:
– Discard (don’t buffer): no receiver buffering!
– Re-ACK pkt with highest in-order seq #
39
GBN in Action
Sender window (N=4)
012345678
012345678
012345678
012345678
012345678
012345678
Sender
send pkt0
send pkt1
send pkt2
send pkt3
(wait)
rcv ack0, send pkt4
rcv ack1, send pkt5
ignore duplicate ACK
012345678
012345678
012345678
012345678
pkt 2 timeout
send pkt2
send pkt3
send pkt4
send pkt5
Receiver
X loss
receive pkt0, send ack0
receive pkt1, send ack1
receive pkt3, discard,
(re)send ack1
receive pkt4, discard,
(re)send ack1
receive pkt5, discard,
(re)send ack1
rcv pkt2, deliver, send ack2
rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5
40
Selective Repeat
• Receiver individually acknowledges all correctly
received pkts
– Buffers pkts, as needed, for eventual in-order delivery
to upper layer
• Sender only resends pkts for which ACK not
received
– Sender timer for each unACKed pkt
• Sender window
– N consecutive seq #’s
– Limits seq #s of sent, unACKed pkts
41
Selective Repeat: Sender, Receiver Windows
42
Selective Repeat
Sender
Data from above:
• If next available seq # in
window, send pkt
Timeout(n):
• Resend pkt n, restart timer
ACK(n) in [sendbase,sendbase+N]:
• Mark pkt n as received
• If n smallest unACKed pkt,
advance window base to
next unACKed seq #
Receiver
Pkt n in [rcvbase,
rcvbase+N-1]
• send ACK(n)
• out-of-order: buffer
• in-order: deliver (also
deliver buffered, in-order
pkts), advance window to
next not-yet-received pkt
Pkt n in [rcvbaseN,rcvbase-1]
• ACK(n)
Otherwise:
•
Ignore
43
Selective Repeat in Action
Sender window (N=4)
012345678
012345678
012345678
012345678
012345678
012345678
Sender
send pkt0
send pkt1
send pkt2
send pkt3
(wait)
Receiver
X loss
rcv ack0, send pkt4
rcv ack1, send pkt5
record ack3 arrived
012345678
012345678
012345678
012345678
pkt 2 timeout
send pkt2
record ack4 arrived
record ack4 arrived
receive pkt0, send ack0
receive pkt1, send ack1
receive pkt3, buffer,
send ack3
receive pkt4, buffer,
send ack4
receive pkt5, buffer,
send ack5
rcv pkt2; deliver pkt2,
pkt3, pkt4, pkt5; send ack2
Q: what happens when ack2 arrives?
44
Selective Repeat:
Dilemma
Example:
• Seq #s: 0, 1, 2, 3
• Window size = 3
• Receiver sees no
difference in two
scenarios!
• Duplicate data
accepted as new in (b)
Q: What relationship
between seq # size
and window size to
avoid problem in (b)?
Receiver window
(after receipt)
Sender window
(after receipt)
0123012
pkt0
0123012
pkt1
0123012
0123012
pkt2
0123012
0123012
pkt3
0123012
0123012
pkt0
(a) no problem
X
Will accept packet
with seq number 0
Receiver can’t see sender side.
Receiver behavior identical in both cases!
Something’s (very) wrong!
0123012
pkt0
0123012
pkt1
0123012
0123012
pkt2
0123012
timeout
retransmit pkt0
0123012
(b) oops!
X
X
X
0123012
pkt0
Will accept packet
with seq number 0
45
TCP: Overview
• Point-to-point:
RFCs: 793, 1122, 1323, 2018, 2581
• Full duplex data:
– One sender, one receiver
– Bi-directional data flow in
same connection
– MSS: Maximum segment
size
• Reliable, in-order byte
steam:
– No “message boundaries”
• Pipelined:
• Connection-oriented:
– Handshaking (exchange of
control msgs) inits sender,
receiver state before data
exchange
– TCP congestion and flow
control set window size
• Send & receive buffers
• Flow controlled:
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
socket
door
– Sender will not overwhelm
receiver
segment
46
TCP Segment Structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK # valid
Source port #
Dest port #
Sequence number
Counting by bytes
of data
(not segments!)
Acknowledgement number
PSH: push data now
(Generally not used)
RST, SYN, FIN:
Connection establishment
(Setup, teardown cmds)
Internet
Checksum
(As in UDP)
Head Not
UA P R S F
Len Used
Checksum
Rcvr window size
Ptr urgent data
Options (variable length)
# Bytes
receiver willing
to accept
Application
data
(variable length)
47
TCP Sequence #s and ACKs
Seq. #’s:
– Byte stream “number”
of first byte in
segment’s data
ACKs:
– Seq. # of next byte
expected from other
side
– Cumulative ACK
Q: How does receiver handle
out-of-order segments?
– A: TCP spec doesn’t
say, up to implementer
Host A
Host B
User
types
‘C’
Host ACKs
receipt of
‘C’, echoes
back ‘C’
Host ACKs
receipt
of echoed
‘C’
time
Simple Telnet Scenario
48
TCP: Reliable Data Transfer
Event: Data received
from application above
Create, send segment
Wait
Wait
For
for
Event
event
Simplified sender, assuming
• One way data transfer
• No flow, congestion control
Event: timer timeout for
segment with seq # y
Retransmit segment
Event: ACK received,
with ACK # y
ACK processing
49
TCP: Reliable Data Transfer
50
TCP ACK Generation [RFC 1122, RFC 2581]
Event
TCP Receiver Action
In-order segment arrival,
no gaps,
everything else already ACKed
Delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
In-order segment arrival,
no gaps,
one delayed ACK pending
Immediately send single
cumulative ACK
Out-of-order segment arrival
higher-than-expect seq. #
gap detected
Send duplicate ACK, indicating seq. #
of next expected byte
Arrival of segment that
partially or completely fills gap
Immediate ACK if segment starts
at lower end of gap
51
TCP: Retransmission Scenarios
X
loss
time
Host A
Host B
Lost ACK scenario
Host B
Seq=100 timeout
Seq=92 timeout
timeout
Host A
time
Premature timeout,
cumulative ACKs
52
TCP Flow Control
Flow control
Sender won’t overrun
receiver’s buffers by
transmitting too much,
too fast
RcvBuffer = Size or TCP receive buffer
RcvWindow = Amount of spare room in buffer
Receiver: Explicitly informs
sender of (dynamically
changing) amount of free
buffer space
– RcvWindow field in
TCP segment
Sender: Keeps the amount of
transmitted, unACKed data
less than most recently
received RcvWindow
Receiver buffering
53
TCP Round Trip Time and Timeout
Q: How to set TCP
timeout value?
• Longer than RTT
– Note: RTT will vary
• Too short: premature
timeout
– Unnecessary
retransmissions
• Too long: slow reaction to
segment loss
Q: How to estimate RTT?
• SampleRTT: measured time from
segment transmission until ACK
receipt
– Ignore retransmissions,
cumulatively ACKed segments
• SampleRTT will vary, want estimated
RTT “smoother”
– Use several recent measurements,
not just current SampleRTT
54
TCP Round Trip Time and Timeout
Exponential weighted moving average
Influence of given sample decreases exponentially fast
Typical value of x: 0.1
Setting the timeout
• EstimatedRTT plus “safety margin”
• Large variation in EstimatedRTT needs larger safety margin
55
TCP Connection Management (1)
Recall: TCP sender, receiver
Three way handshake:
establish “connection” before
exchanging data segments
• Initialize TCP variables:
– Sequence #s
– Buffers, flow control info
(e.g. RcvWindow)
• Client: Connection initiator
Step 1: Client end system sends TCP
SYN control segment to server
– Specifies initial seq #
(client_isn)
Step 2: Server end system receives
SYN, replies with SYNACK control
segment
– ACKs received SYN
Socket clientSocket =
– Allocates buffers
new Socket("hostname","port#");
– Specifies server → receiver
initial seq. # (server_isn)
• Server: Contacted by client
Step 3: Client allocates buffers and
Socket connectionSocket =
variables upon receiving SYNACK
welcomeSocket.accept();
– SYN=0
– Seq = client_isn + 1
– Ack = server_isn +1
56
TCP Connection Management (2)
Closing a connection:
Client closes socket:
clientSocket.close();
Client
Server
close
Step 1: Client end system sends
close
Step 2: Server receives FIN,
replies with ACK. Closes
connection, sends FIN.
Timed wait
TCP FIN control segment to
server
Closed
57
TCP Connection Management (3)
Step 3: Client receives FIN,
replies with ACK.
Client
Server
closing
– Enters “timed wait” - will
respond with ACK to
received FINs
closing
Step 4: Server receives ACK.
Note: With small modification,
can handle simultaneous FINs.
Timed wait
Connection closed.
Closed
Closed
58
TCP Connection Management (4)
TCP server
lifecycle
TCP client
lifecycle
59
Principles of Congestion Control
Congestion:
• Informally: “too many sources sending too much data too
fast for network to handle”
• Different from flow control!
• Manifestations:
– Lost packets (buffer overflow at routers)
– Long delays (queueing in router buffers)
• Top-10 problem!
60
Causes/Costs of Congestion:
Scenario 1
• Two senders, two
receivers
• One router, infinite
buffers
• No retransmission
• Large delays
when congested
• Maximum
achievable
throughput
61
Causes/Costs of Congestion:
Scenario 2
• One router, finite buffers
• Sender retransmission of lost packet
62
Approaches Towards Congestion Control
Two broad approaches towards congestion control:
End-end congestion
control:
Network-assisted
congestion control:
• No explicit feedback from
network
• Congestion inferred from endsystem observed loss, delay
• Approach taken by TCP
• Routers provide feedback to
end systems
– Single bit indicating
congestion (SNA, DECbit,
TCP/IP ECN, ATM)
– Explicit rate sender should
send at
63
TCP Congestion Control (1)
• End-end control (no network assistance)
• Transmission rate limited by congestion window size, cwnd,
over segments:
cwnd
w segments, each with MSS bytes sent in one RTT:
64
TCP Congestion Control (2)
• “Probing” for usable
bandwidth:
– Ideally: transmit as fast as
possible (cwnd as large as
possible) without loss
– Increase cwnd until loss
(congestion)
– Loss: decrease cwnd, then
begin probing (increasing)
again
• Two “phases”
– Slow start
– Congestion avoidance
•
Important variables:
– cwnd
– ssthresh: defines
threshold between two slow
start phase, congestion
control phase
65
TCP Slow Start
Host B
RTT
Host A
• Exponential increase (per RTT)
in window size (not so slow!)
• Loss event: timeout (Tahoe
TCP) and/or or three duplicate
ACKs (Reno TCP)
time
66
TCP Congestion Avoidance
* TCP Reno skips slow start (fast
recovery) after three duplicate ACKs
67
TCP Congestion Control FSM
New ACK!
New ACK!
new ACK
cwnd = cwnd + MSS
(MSS/cwnd)
dupACKcount = 0
.
duplicate ACK
dupACKcount++
Λ
cwnd = 1 MSS
ssthresh = 64 KB
dupACKcount = 0
new ACK
cwnd = cwnd+MSS
dupACKcount = 0
transmit new segment(s), as allowed
transmit new segment(s), as allowed
Slow
start
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
dupACKcount == 3
cwnd > ssthresh
Congestion
avoidance
Λ
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
duplicate ACK
dupACKcount++
retransmit missing segment
New ACK!
timeout
ssthresh = cwnd/2
cwnd = 1
dupACKcount = 0
New ACK
retransmit missing segment
cwnd = ssthresh
dupACKcount = 0
ssthresh= cwnd/2
cwnd = ssthresh + 3
ssthresh= cwnd/2
cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
transmit new segment(s), as allowed
Fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
AIMD
TCP congestion
avoidance:
• AIMD: Additive
increase, multiplicative
decrease
– Increase window by 1
per RTT
– Decrease window by
factor of 2 on loss event
TCP Fairness
Fairness goal: if N TCP
sessions share same
bottleneck link, each
should get 1/N of link
capacity
TCP connection 1
TCP
connection 2
Bottleneck
router
capacity C
69
Why is TCP Fair?
Two competing sessions:
• Additive increase gives slope of 1, as throughout increases
• Multiplicative decrease decreases throughput proportionally
R
Equal bandwidth share
Loss: decrease window by factor of 2
Congestion avoidance: additive increase
Loss: decrease window by factor of 2
Congestion avoidance: additive increase
Connection 1 throughput
R
70
Summary
• Principles behind transport
layer services:
–
–
–
–
Multiplexing/demultiplexing
Reliable data transfer
Flow control
Congestion control
Next:
• Leaving the network “edge”
(application transport layer)
• Entering the network “core”
• Instantiation and
implementation in the Internet
– UDP
– TCP
71