3rd Edition: Chapter 3

Download Report

Transcript 3rd Edition: Chapter 3

What’s the Internet: “nuts and bolts” view
 millions of connected
computing devices: hosts
= end systems

examples of hosts?
router
workstation
server
mobile
local ISP
 running network apps
 examples of applications?
regional ISP
company
network
Transport Layer
3-1
What’s the Internet: “nuts and bolts” view
 communication links
 fiber, copper, radio,
satellite
 transmission rate =
bandwidth
• typical bandwidth for
modem? wireless?
router
workstation
server
mobile
local ISP
regional ISP
 routers: forward packets
(chunks of data)

what’s in a packet?
company
network
Transport Layer
3-2
What’s the Internet: “nuts and bolts” view
 protocols control sending,
receiving of msgs

e.g., TCP, IP, HTTP, FTP, PPP
 Internet: “network of networks”


loosely hierarchical
public Internet versus private
intranet
router
server

mobile
local ISP
 Internet standards

workstation
regional ISP
RFC: Request for comments
IETF: Internet Engineering
Task Force
company
network
Transport Layer
3-3
What’s the Internet: a service view
 communication infrastructure
enables distributed
applications:

Web, email, other examples?
 communication services
provided to apps:

connection-oriented reliable
• example apps?

Connectionless unreliable
• example apps?
Transport Layer
3-4
What’s a protocol?
human protocols:
 “what’s the time?”
 “I have a question”
 introductions
network protocols:
 machines rather than
humans
 all communication activity
in Internet governed by
protocols
protocols define format, order of msgs
sent and received among network
entities, and actions taken on msg
transmission, receipt
Transport Layer
3-5
What’s
a
protocol?
a human protocol and a computer network protocol:
Hi
TCP connection
req
Hi
TCP connection
response
Got the
time?
Get http://www.awl.com/kurose-ross
2:00
<file>
time
Q: Why are protocols so important?
Transport Layer
3-6
A closer look at network structure:
 network edge: applications
and hosts
 network core:


routers
network of networks
 access networks, physical
media: communication links
Transport Layer
3-7
Protocol “Layers”
Networks are complex!
 many “pieces”:
 hosts
 routers
 links of various media
 applications
 protocols
 hardware, software
Question:
Is there any hope of organizing
structure of network?
Or at least our discussion of
networks?
Transport Layer
3-8
Organization of air travel
ticket (purchase)
ticket (complain)
baggage (check)
baggage (claim)
gates (load)
gates (unload)
runway takeoff
runway landing
airplane routing
airplane routing
airplane routing
 a series of steps
Transport Layer
3-9
Layering of airline functionality
ticket (purchase)
ticket (complain)
ticket
baggage (check)
baggage (claim
baggage
gates (load)
gates (unload)
gate
runway (takeoff)
runway (land)
takeoff/landing
airplane routing
airplane routing
airplane routing
departure
airport
airplane routing
airplane routing
intermediate air-traffic
control centers
arrival
airport
Layers: each layer implements a service
 via its own internal-layer actions
 relying on services provided by layer below
Transport Layer 3-10
Why layering?
Dealing with complex systems:
 explicit structure allows identification, relationship of
complex system’s pieces
 layered reference model for discussion
 modularization eases maintenance, updating of system
 change of implementation of layer’s service transparent
to rest of system
 e.g., change in gate procedure doesn’t affect rest of
system
 layering considered harmful?
Transport Layer
3-11
Internet protocol stack
 application: supporting network applications
 FTP, SMTP, STTP
 transport: host-host data transfer
 TCP, UDP
 network: routing of datagrams from source to
destination

transport
IP, routing protocols
 link: data transfer between neighboring
network elements

application
PPP, Ethernet
 physical: bits “on the wire”
network
link
physical
Transport Layer 3-12
source
message
segment Ht
datagram Hn Ht
frame
Hl Hn Ht
M
M
M
M
Encapsulation
application
transport
network
link
physical
Hl Hn Ht
M
link
physical
Hl Hn Ht
M
switch
destination
M
Ht
M
Hn Ht
Hl Hn Ht
M
M
application
transport
network
link
physical
Hn Ht
Hl Hn Ht
M
M
network
link
physical
Hn Ht
Hl Hn Ht
M
M
router
Transport Layer 3-13
Internet transport protocols services
TCP service:
 connection-oriented: setup




required between client and
server processes
reliable transport between
sending and receiving process
flow control: sender won’t
overwhelm receiver
congestion control: throttle
sender when network overloaded
does not provide: timing,
minimum bandwidth guarantees
UDP service:
 unreliable data transfer
between sending and
receiving process
 does not provide: connection
setup, reliability, flow
control, congestion control,
timing, or bandwidth
guarantee
Q: why bother? Why is there a
UDP?
Transport Layer 3-14
Transport vs. network layer
 network layer: logical
communication between
hosts
 transport layer: logical
communication between
processes

relies on, enhances,
network layer services
Household analogy:
12 kids sending letters to 12 kids
 processes = kids
 app messages = letters in
envelopes
 hosts = houses
 transport protocol = Ann and
Bill
 network-layer protocol =
postal service
Transport Layer 3-15
Reliable data transfer: getting started
We’ll:
 incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
 consider only unidirectional data transfer

but control info will flow on both directions!
 use finite state machines (FSM) to specify
sender, receiver
state: when in this
“state” next state
uniquely determined
by next event
state
1
event causing state transition
actions taken on state transition
event
actions
state
2
Transport Layer 3-16
Rdt1.0: reliable transfer over a reliable channel
 underlying channel perfectly reliable
 no bit errors
 no loss of packets
 separate FSMs for sender, receiver:
 sender sends data into underlying channel
 receiver read data from underlying channel
Wait for
call from
above
rdt_send(data)
packet = make_pkt(data)
udt_send(packet)
sender
Wait for
call from
below
rdt_rcv(packet)
extract (packet,data)
deliver_data(data)
receiver
Transport Layer 3-17
Rdt2.0: channel with bit errors
 underlying channel may flip bits in packet
 checksum to detect bit errors
 the question: how to recover from errors:
 acknowledgements (ACKs): receiver explicitly tells sender
that pkt received OK
 negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
 sender retransmits pkt on receipt of NAK
 new mechanisms in rdt2.0 (beyond rdt1.0):


error detection
receiver feedback: control msgs (ACK,NAK) rcvr->sender
Transport Layer 3-18
rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
Wait for
call from
ACK or
udt_send(sndpkt)
above
NAK
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
sender
receiver
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
udt_send(NAK)
Wait for
call from
below
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
Transport Layer 3-19
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
Wait for
call from
ACK or
udt_send(sndpkt)
above
NAK
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
udt_send(NAK)
Wait for
call from
below
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
Transport Layer 3-20
rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
Wait for
call from
ACK or
udt_send(sndpkt)
above
NAK
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
udt_send(NAK)
Wait for
call from
below
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
Transport Layer 3-21
rdt2.0 has a fatal flaw!
 Stop and Wait

Sender sends one packet, then waits for receiver response
 What happens if ACK/NAK corrupted?


sender doesn’t know what happened at receiver!
can’t just retransmit: possible duplicate
 Handling duplicates:



sender adds sequence number to each pkt
sender retransmits current pkt if ACK/NAK garbled
receiver discards (doesn’t deliver up) duplicate pkt
Transport Layer 3-22
rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait
for
Wait for
isNAK(rcvpkt) )
ACK or
call 0 from
udt_send(sndpkt)
NAK 0
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
udt_send(sndpkt)
L
Wait for
ACK or
NAK 1
Wait for
call 1 from
above
rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
Transport Layer 3-23
rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) &&
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt)
Wait for
0 from
below
Wait for
1 from
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) &&
has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
Transport Layer 3-24
rdt2.1: discussion
Sender:
 seq # added to pkt
 two seq. #’s (0,1) will
suffice. Why?
 must check if received
ACK/NAK corrupted
 twice as many states

state must “remember”
whether “current” pkt
has 0 or 1 seq. #
Receiver:
 must check if received
packet is duplicate

state indicates whether
0 or 1 is expected pkt
seq #
 note: receiver can not
know if its last
ACK/NAK received OK
at sender
Transport Layer 3-25
rdt2.2: a NAK-free protocol
 same functionality as rdt2.1, using ACKs only
 instead of NAK, receiver sends ACK for last pkt
received OK

receiver must explicitly include seq # of pkt being ACKed
 duplicate ACK at sender results in same action as
NAK: retransmit current pkt
Transport Layer 3-26
rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for
Wait for
isACK(rcvpkt,1) )
ACK
call 0 from
0
udt_send(sndpkt)
above
sender FSM
fragment
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) ||
has_seq1(rcvpkt))
udt_send(sndpkt)
Wait for
0 from
below
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
receiver FSM
fragment
L
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt)
Transport Layer 3-27
rdt3.0: channels with errors and loss
New assumption:
underlying channel can
also lose packets (data
or ACKs)

checksum, seq. #, ACKs,
retransmissions will be
of help, but not enough
Approach: sender waits
“reasonable” amount of time
for ACK
 retransmits if no ACK received in
this time
 if pkt (or ACK) just delayed (not
lost):
 retransmission will be
duplicate, but use of seq. #’s
already handles this
 receiver must specify seq #
of pkt being ACKed
 requires countdown timer
Transport Layer 3-28
rdt3.0 sender
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
start_timer
rdt_rcv(rcvpkt)
L
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,1)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,0) )
timeout
udt_send(sndpkt)
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
stop_timer
stop_timer
timeout
udt_send(sndpkt)
start_timer
L
Wait
for
ACK0
Wait for
call 0from
above
L
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,1) )
Wait
for
ACK1
Wait for
call 1 from
above
rdt_send(data)
rdt_rcv(rcvpkt)
L
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
start_timer
Transport Layer 3-29
rdt3.0 in action
Transport Layer 3-30
rdt3.0 in action
Transport Layer 3-31
Performance of rdt3.0
 rdt3.0 works, but performance stinks
 Why?
Transport Layer 3-32
rdt3.0: stop-and-wait operation
sender
receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
RTT
first packet bit arrives
last packet bit arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
Transport Layer 3-33
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-beacknowledged pkts


range of sequence numbers must be increased
buffering at sender and/or receiver
 Two generic forms of pipelined protocols: go-Back-N, selective
repeat
Transport Layer 3-34
Pipelining: increased utilization
sender
receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
RTT
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
Transport Layer 3-35
Go-Back-N
Sender:
 k-bit seq # in pkt header
 “window” of up to N, consecutive unack’ed pkts allowed
 ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”
may deceive duplicate ACKs (see receiver)
 timer for each in-flight pkt
 timeout(n): retransmit pkt n and all higher seq # pkts in window

Transport Layer 3-36
GBN: receiver
ACK-only: always send ACK for correctlyreceived pkt with highest in-order seq #
may generate duplicate ACKs
 need only remember expectedseqnum

 out-of-order pkt:
 discard (don’t buffer) -> isn't this bad?
 Re-ACK pkt with highest in-order seq #
Transport Layer 3-37
GBN in
action
Transport Layer 3-38
Selective Repeat
 receiver individually acknowledges all correctly
received pkts

buffers pkts, as needed, for eventual in-order delivery
to upper layer
 sender only resends pkts for which ACK not
received

sender timer for each unACKed pkt
 sender window
 N consecutive seq #’s
 again limits seq #s of sent, unACKed pkts
Transport Layer 3-39
Selective repeat: sender, receiver windows
Transport Layer 3-40
Selective repeat
sender
data from above :
receiver
pkt n in
[rcvbase, rcvbase+N-1]
 if next available seq # in
 send ACK(n)
timeout(n):
 in-order: deliver (also deliver
window, send pkt
 resend pkt n, restart timer
ACK(n) in [sendbase,sendbase+N]:
 mark pkt n as received
 if n smallest unACKed pkt,
advance window base to
next unACKed seq #
 out-of-order: buffer
buffered, in-order pkts),
advance window to next notyet-received pkt
pkt n in
[rcvbase-N,rcvbase-1]
 ACK(n)
otherwise:
 ignore
Transport Layer 3-41
Selective repeat in action
Transport Layer 3-42
Selective repeat:
dilemma
Example:
 seq #’s: 0, 1, 2, 3
 window size=3
 receiver sees no
difference in two
scenarios!
 incorrectly passes
duplicate data as new in
(a)
Q: what relationship
between seq # size and
window size?
Transport Layer 3-43
TCP: Overview
 full duplex data:
 point-to-point:

RFCs: 793, 1122, 1323, 2018, 2581
one sender, one receiver

 reliable, in-order byte

steam:

no “message boundaries”
 pipelined:

 connection-oriented:

TCP congestion and flow
control set window size
 send & receive buffers
socket
door
application
reads data
TCP
send buffer
TCP
receive buffer
handshaking (exchange of
control msgs) init’s sender,
receiver state before data
exchange
 flow controlled:

application
writes data
bi-directional data flow in
same connection
MSS: maximum segment
size
sender will not overwhelm
receiver
socket
door
segment
Transport Layer 3-44
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
application
data
(variable length)
Transport Layer 3-45
TCP seq. #’s and ACKs
Seq. #’s:
 byte stream “number”
of first byte in
segment’s data
ACKs:
 seq # of next byte
expected from other
side
 cumulative ACK
 piggybacking
Q: how receiver handles out-oforder segments
 A: TCP spec doesn’t say,
- up to implementor
Host A
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
time
Transport Layer 3-46
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
 longer than RTT

but RTT varies
 too short: premature
timeout
 unnecessary
retransmissions
 too long: slow reaction
to segment loss
Q: how to estimate RTT?
 SampleRTT: measured time from
segment transmission until ACK
receipt
 ignore retransmissions
 SampleRTT will vary, want
estimated RTT “smoother”
 average several recent
measurements, not just
current SampleRTT
Transport Layer 3-47
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
Estimated RTT
Transport Layer 3-48
TCP reliable data transfer
 TCP creates rdt
service on top of IP’s
unreliable service
 Pipelined segments
 Cumulative acks
 TCP uses single
retransmission timer
 Retransmissions are
triggered by:


timeout events
duplicate acks
 Initially consider
simplified TCP sender:


ignore duplicate acks
ignore flow control,
congestion control
Transport Layer 3-49
TCP sender events:
data rcvd from app:
 Create segment with
seq #
 seq # is byte-stream
number of first data
byte in segment
 start timer if not
already running (think
of timer as for oldest
unacked segment)
 expiration interval:
TimeOutInterval
timeout:
 retransmit segment
that caused timeout
 restart timer
Ack rcvd:
 If acknowledges
previously unacked
segments


update what is known to
be acked
start timer if there are
outstanding segments
Transport Layer 3-50
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
switch(event)
event: data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event: timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
} /* end of loop forever */
TCP
sender
(simplified)
Comment:
• SendBase-1: last
cumulatively
ack’ed byte
Example:
• SendBase-1 = 71;
y= 73, so the rcvr
wants 73+ ;
y > SendBase, so
that new data is
acked
Transport Layer 3-51
TCP: retransmission scenarios
Host A
X
loss
Sendbase
= 100
SendBase
= 120
SendBase
= 100
time
SendBase
= 120
lost ACK scenario
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout
Transport Layer 3-52
TCP retransmission scenarios (more)
timeout
Host A
Host B
X
loss
SendBase
= 120
time
Cumulative ACK scenario
Transport Layer 3-53
TCP ACK generation
[RFC 1122, RFC 2581]
Event at Receiver
TCP Receiver action
Arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
Delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
Arrival of in-order segment with
expected seq #. One other
segment has ACK pending
Immediately send single cumulative
ACK, ACKing both in-order segments
Arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
Immediately send duplicate ACK,
indicating seq. # of next expected byte
Arrival of segment that
partially or completely fills gap
Immediate send ACK, provided that
segment startsat lower end of gap
Transport Layer 3-54
Fast Retransmit
 Time-out period often
relatively long:

long delay before
resending lost packet
 Detect lost segments
via duplicate ACKs.


Sender often sends
many segments back-toback
If segment is lost,
there will likely be many
duplicate ACKs.
 If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:

fast retransmit: resend
segment before timer
expires
Transport Layer 3-55
TCP Flow Control
 receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
 speed-matching
 app process may be
service: matching the
send rate to the
receiving app’s drain
rate
slow at reading from
buffer
Transport Layer 3-56
TCP Flow control: how it works
 Rcvr advertises spare
(Suppose TCP receiver
discards out-of-order
segments)
 spare room in buffer
room by including value
of RcvWindow in
segments
 Sender limits unACKed
data to RcvWindow

guarantees receive
buffer doesn’t overflow
= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]
Transport Layer 3-57
TCP Connection Management
Recall: TCP sender, receiver establish “connection” before
exchanging data segments
 initialize TCP variables:
 seq. #s
 buffers, flow control info (e.g. RcvWindow)
 client: connection initiator
Socket clientSocket = new
Socket("hostname","port
number");
 server: contacted by client
Socket connectionSocket = welcomeSocket.accept();
Transport Layer 3-58
TCP Connection Management
Three way handshake:
Step 1: client host sends TCP SYN segment to server
 specifies initial seq #
 no data
Step 2: server host receives SYN, replies with SYNACK
segment
server allocates buffers
 specifies server initial seq. #
Step 3: client receives SYNACK, replies with ACK segment,
which may contain data

Transport Layer 3-59
TCP Connection Management (cont.)
Closing a connection:
client closes socket:
clientSocket.close();
client
server
close
Step 1: client end system sends
TCP FIN control segment to
server
close
replies with ACK. Closes
connection, sends FIN.
timed wait
Step 2: server receives FIN,
closed
Transport Layer 3-60
TCP Connection Management (cont.)
Step 3: client receives FIN,
replies with ACK.

Enters “timed wait” - will
respond with ACK to
received FINs
client
server
closing
closing
Step 4: server, receives ACK.
Note: with small modification,
can handle simultaneous FINs.
timed wait
Connection closed.
closed
closed
Transport Layer 3-61
TCP Congestion Control
 end-end control (no network
assistance)
 sender limits transmission:
LastByteSent-LastByteAcked
 CongWin
 Roughly,
 CongWin is dynamic,
function of
CongWin
rate = network congestion
Bytes/sec
perceived
RTT
How does sender perceive
congestion?
 loss event = timeout or 3
duplicate acks
 TCP sender reduces rate
(CongWin) after loss event
three mechanisms:



AIMD
slow start
conservative after timeout
events
Transport Layer 3-62
TCP AIMD
multiplicative decrease:
cut CongWin in half
after loss event
congestion
window
additive increase:
increase CongWin by
1 MSS every RTT in
the absence of loss
events: probing
24 Kbytes
16 Kbytes
8 Kbytes
time
Long-lived TCP connection
Transport Layer 3-63
TCP Slow Start
 When connection begins,
CongWin = 1 MSS


Example: MSS = 500
bytes & RTT = 200 msec
initial rate = 20 kbps
 When connection begins,
increase rate
exponentially fast until
first loss event
 available bandwidth may
be >> MSS/RTT

desirable to quickly ramp
up to respectable rate
Transport Layer 3-64
TCP Slow Start (more)
 When connection


Host B
RTT
begins, increase rate
exponentially until
first loss event:
Host A
double CongWin every
RTT
done by incrementing
CongWin for every ACK
received
 Summary: initial rate
is slow but ramps up
exponentially fast
time
Transport Layer 3-65
Refinement
 After 3 dup ACKs:
CongWin is cut in half
 window then grows linearly
 But after timeout event:
 CongWin instead set to 1
MSS;
 window then grows
exponentially
 to a threshold, then grows
linearly

Philosophy:
• 3 dup ACKs indicates
network capable of
delivering some segments
• timeout before 3 dup
ACKs is “more alarming”
Transport Layer 3-66
Refinement (more)
Q: When should the
exponential increase
switch to linear?
A: When CongWin gets
to 1/2 of its value
before timeout.
Implementation:
 Variable Threshold
 At loss event, Threshold is set
to 1/2 of CongWin just before
loss event
Transport Layer 3-67
Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in slow-start
phase, window grows exponentially.
 When CongWin is above Threshold, sender is in congestion-
avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold set to
CongWin/2 and CongWin set to Threshold.
 When timeout occurs, Threshold set to CongWin/2 and
CongWin is set to 1 MSS.
Transport Layer 3-68
TCP sender congestion control
Event
State
TCP Sender Action
Commentary
ACK receipt
for previously
unacked
data
Slow Start
(SS)
CongWin = CongWin + MSS,
If (CongWin > Threshold)
set state to “Congestion
Avoidance”
Resulting in a doubling of
CongWin every RTT
ACK receipt
for previously
unacked
data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS *
(MSS/CongWin)
Additive increase, resulting
in increase of CongWin by
1 MSS every RTT
Loss event
detected by
triple
duplicate
ACK
SS or CA
Threshold = CongWin/2,
CongWin = Threshold,
Set state to “Congestion
Avoidance”
Fast recovery,
implementing multiplicative
decrease. CongWin will not
drop below 1 MSS.
Timeout
SS or CA
Threshold = CongWin/2,
CongWin = 1 MSS,
Set state to “Slow Start”
Enter slow start
Duplicate
ACK
SS or CA
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 3-69
Interplay between routing and
forwarding
routing algorithm
local forwarding table
header value output link
0100
0101
0111
1001
3
2
2
1
value in arriving
packet’s header
0111
1
3 2
Transport Layer 3-70
Graph abstraction
5
2
u
2
1
Graph: G = (N,E)
v
x
3
w
3
1
5
z
1
y
2
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }
Remark: Graph abstraction is useful in other network contexts
Example: P2P, where N is set of peers and E is set of TCP connections
Transport Layer 3-71
Graph abstraction: costs
5
2
u
v
2
1
x
• c(x,x’) = cost of link (x,x’)
3
w
3
1
5
z
1
y
- e.g., c(w,z) = 5
2
• cost could always be 1, or
inversely related to bandwidth,
or inversely related to
congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
Transport Layer 3-72
Routing Algorithm classification
Global or decentralized
information?
Global:
 all routers have complete
topology, link cost info
 “link state” algorithms
Decentralized:
 router knows physicallyconnected neighbors, link costs
to neighbors
 iterative process of
computation, exchange of info
with neighbors
 “distance vector” algorithms
Static or dynamic?
Static:
 routes change slowly over
time
Dynamic:
 routes change more quickly
 periodic update
 in response to link cost
changes
Transport Layer 3-73
A Link-State Routing Algorithm
Dijkstra’s algorithm
 net topology, link costs known
to all nodes
 accomplished via “link
state broadcast”
 all nodes have same info
 computes least cost paths
from one node (‘source”) to all
other nodes
 gives forwarding table for
that node
 iterative: after k iterations,
know least cost path to k
dest.’s
Notation:
 c(x,y): link cost from node x
to y; = ∞ if not direct
neighbors
 D(v): current value of cost of
path from source to dest. v
 p(v): predecessor node along
path from source to v
 N': set of nodes whose least
cost path definitively known
Transport Layer 3-74
Dijsktra’s Algorithm
1 Initialization:
2 N' = {u}
3 for all nodes v
4
if v adjacent to u
5
then D(v) = c(u,v)
6
else D(v) = ∞
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12
D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N'
Transport Layer 3-75
Dijkstra’s algorithm: example
Step
0
1
2
3
4
5
N'
u
ux
uxy
uxyv
uxyvw
uxyvwz
D(v),p(v) D(w),p(w)
2,u
5,u
2,u
4,x
2,u
3,y
3,y
D(x),p(x)
1,u
D(y),p(y)
∞
2,x
D(z),p(z)
∞
∞
4,y
4,y
4,y
5
2
u
v
2
1
x
3
w
3
1
5
z
1
y
2
Transport Layer 3-76
Dijkstra’s algorithm, discussion
Algorithm complexity: n nodes
 each iteration: need to check all nodes, w, not in N
 n(n+1)/2 comparisons: O(n2)
 more efficient implementations possible: O(nlogn)
Oscillations possible:
 e.g., link cost = amount of carried traffic
D
1
1
0
A
0 0
C
e
1+e
e
initially
B
1
2+e
A
0
D 1+e 1 B
0
0
C
… recompute
routing
0
D
1
A
0 0
C
2+e
B
1+e
… recompute
2+e
A
0
D 1+e 1 B
e
0
C
… recompute
Transport Layer 3-77
Distance Vector Algorithm (1)
Bellman-Ford Equation (dynamic programming)
Define
dx(y) := cost of least-cost path from x to y
Then
dx(y) = min {c(x,v) + dv(y) }
where min is taken over all neighbors of x
Transport Layer 3-78
Bellman-Ford example (2)
5
2
u
v
2
1
x
3
w
3
1
5
z
1
y
Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
2
B-F equation says:
du(z) = min { c(u,v) + dv(z),
c(u,x) + dx(z),
c(u,w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4
Node that achieves minimum is next
hop in shortest path ➜ forwarding table
Transport Layer 3-79
Distance Vector Algorithm (3)
 Dx(y) = estimate of least cost from x to y
 Distance vector: Dx = [Dx(y): y є N ]
 Node x knows cost to each neighbor v: c(x,v)
 Node x maintains Dx = [Dx(y): y є N ]
 Node x also maintains its neighbors’ distance
vectors

For each neighbor v, x maintains
Dv = [Dv(y): y є N ]
Transport Layer 3-80
Distance vector algorithm (4)
Basic idea:
 Each node periodically sends its own distance
vector estimate to neighbors
 When node a node x receives new DV estimate
from neighbor, it updates its own DV using B-F
equation:
Dx(y) ← minv{c(x,v) + Dv(y)}
for each node y ∊ N
 Under minor, natural conditions, the estimate Dx(y)
converge the actual least cost dx(y)
Transport Layer 3-81
Distance Vector Algorithm (5)
Iterative, asynchronous: each
local iteration caused by:
 local link cost change
 DV update message from
neighbor
Distributed:
Each node:
wait for (change in local link
cost of msg from neighbor)
 each node notifies neighbors
only when its DV changes

neighbors then notify their
neighbors if necessary
recompute estimates
if DV to any dest has
changed, notify neighbors
Transport Layer 3-82
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2
node x table
cost to
x y z
x ∞∞ ∞
y ∞∞ ∞
z 71 0
from
from
from
from
x 0 2 7
y 2 0 1
z 7 1 0
cost to
x y z
x 0 2 7
y 2 0 1
z 3 1 0
x 0 2 3
y 2 0 1
z 3 1 0
cost to
x y z
x 0 2 3
y 2 0 1
z 3 1 0
x
2
y
7
1
z
cost to
x y z
from
from
from
x ∞ ∞ ∞
y 2 0 1
z ∞∞ ∞
node z table
cost to
x y z
x 0 2 3
y 2 0 1
z 7 1 0
cost to
x y z
cost to
x y z
from
from
x 0 2 7
y ∞∞ ∞
z ∞∞ ∞
node y table
cost to
x y z
cost to
x y z
Dx(z) = min{c(x,y) +
Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
x 0 2 3
y 2 0 1
z 3 1 0
time
Transport Layer 3-83
Distance Vector: link cost changes
Link cost changes:
 node detects local link cost change
 updates routing info, recalculates
distance vector
 if DV changes, notify neighbors
“good
news
travels
fast”
1
x
4
y
50
1
z
At time t0, y detects the link-cost change, updates its DV,
and informs its neighbors.
At time t1, z receives the update from y and updates its table.
It computes a new least cost to x and sends its neighbors its DV.
At time t2, y receives z’s update and updates its distance table.
y’s least costs do not change and hence y does not send any
message to z.
Transport Layer 3-84
Distance Vector: link cost changes
Link cost changes:
 good news travels fast
 bad news travels slow - “count
to infinity” problem!
 44 iterations before algorithm
stabilizes: see text
60
x
4
y
50
1
z
Poissoned reverse:
 If Z routes through Y to get
to X :

Z tells Y its (Z’s) distance to
X is infinite (so Y won’t route
to X via Z)
 will this completely solve
count to infinity problem?
Transport Layer 3-85
Comparison of LS and DV algorithms
Message complexity
 LS: with n nodes, E links, O(nE)
msgs sent
 DV: exchange between
neighbors only
 convergence time varies
Speed of Convergence
 LS: O(n2) algorithm requires
O(nE) msgs
 may have oscillations
 DV: convergence time varies
 may be routing loops
 count-to-infinity problem
Robustness: what happens if router
malfunctions?
LS:


DV:


node can advertise incorrect
link cost
each node computes only its own
table
DV node can advertise incorrect
path cost
each node’s table used by
others
• error propagate thru network
Transport Layer 3-86
Multiple Access Links and Protocols
Two types of “links”:
 point-to-point
 PPP for dial-up access
 point-to-point link between Ethernet switch and host
 broadcast (shared wire or medium)
 traditional Ethernet
 upstream HFC
 802.11 wireless LAN
Transport Layer 3-87
Multiple Access protocols
 single shared broadcast channel
 two or more simultaneous transmissions by nodes: interference

collision if node receives two or more signals at the same time
multiple access protocol
 distributed algorithm that determines how nodes share channel,
i.e., determine when node can transmit
 communication about channel sharing must use channel itself!

no out-of-band channel for coordination
Transport Layer 3-88
Ideal Mulitple Access Protocol
Broadcast channel of rate R bps
1. When one node wants to transmit, it can send at
rate R.
2. When M nodes want to transmit, each can send at
average rate R/M
3. Fully decentralized:


no special node to coordinate transmissions
no synchronization of clocks, slots
4. Simple
Transport Layer 3-89
MAC Protocols: a taxonomy
Three broad classes:
 Channel Partitioning


divide channel into smaller “pieces” (time slots,
frequency, code)
allocate piece to node for exclusive use
 Random Access
 channel not divided, allow collisions
 “recover” from collisions
 “Taking turns”
 Nodes take turns, but nodes with more to send can take
longer turns
Transport Layer 3-90
Channel Partitioning MAC protocols: TDMA
TDMA: time division multiple access
 access to channel in "rounds"
 each station gets fixed length slot (length = pkt trans time)
in each round
 unused slots go idle
 example: 6-station LAN, 1,3,4 have pkt, slots 2,5,6 idle
Transport Layer 3-91
Channel Partitioning MAC protocols: FDMA
FDMA: frequency division multiple access
 channel spectrum divided into frequency bands
 each station assigned fixed frequency band
 unused transmission time in frequency bands go idle
 example: 6-station LAN, 1,3,4 have pkt, frequency bands 2,5,6
frequency bands
idle
Transport Layer 3-92
Random Access Protocols
 When node has packet to send


transmit at full channel data rate R.
no a priori coordination among nodes
 two or more transmitting nodes ➜ “collision”,
 random access MAC protocol specifies:


how to detect collisions
how to recover from collisions (e.g., via delayed
retransmissions)
 Examples of random access MAC protocols:



slotted ALOHA
ALOHA
CSMA, CSMA/CD, CSMA/CA
Transport Layer 3-93
Slotted ALOHA
Assumptions
 all frames same size
 time is divided into equal
size slots, time to transmit 1
frame
 nodes start to transmit
frames only at beginning of
slots
 nodes are synchronized
 if 2 or more nodes transmit
in slot, all nodes detect
collision
Operation
 when node obtains fresh frame,
it transmits in next slot
 no collision, node can send new
frame in next slot
 if collision, node retransmits
frame in each subsequent slot
with prob. p until success
Transport Layer 3-94
Slotted ALOHA
Pros
 single active node can
continuously transmit at full
rate of channel
 highly decentralized: only
slots in nodes need to be in
sync
 simple
Cons
 collisions, wasting slots
 idle slots
 nodes may be able to
detect collision in less than
time to transmit packet
 clock synchronization
Transport Layer 3-95
Slotted Aloha efficiency
Efficiency is the long-run
fraction of successful slots
when there are many nodes, each
with many frames to send
Suppose N nodes with many
frames to send, each transmits in
slot with probability p
 prob that node 1 has success in a
slot
= p(1-p)N-1
 prob that any node has a success =

Np(1-p)N-1
 For max efficiency with N
nodes, find p* that
maximizes
Np(1-p)N-1
 For many nodes, take limit
of Np*(1-p*)N-1 as N goes
to infinity, gives 1/e = .37
At best: channel
used for useful
transmissions 37%
of time!
Transport Layer 3-96
Pure (unslotted) ALOHA
 unslotted Aloha: simpler, no synchronization
 when frame first arrives
 transmit immediately
 collision probability increases:
 frame sent at t0 collides with other frames sent in [t0-1,t0+1]
Transport Layer 3-97
CSMA (Carrier Sense Multiple Access)
CSMA: listen before transmit:
If channel sensed idle: transmit entire frame
 If channel sensed busy, defer transmission
 Human analogy: don’t interrupt others!
Transport Layer 3-98
CSMA collisions
spatial layout of nodes
collisions can still occur:
propagation delay means
two nodes may not hear
each other’s transmission
collision:
entire packet transmission
time wasted
note:
role of distance & propagation
delay in determining collision
probability
Transport Layer 3-99
CSMA/CD (Collision Detection)
CSMA/CD: carrier sensing, deferral as in CSMA
collisions detected within short time
 colliding transmissions aborted, reducing channel
wastage

 collision detection:
 easy in wired LANs: measure signal strengths,
compare transmitted, received signals
 difficult in wireless LANs: receiver shut off while
transmitting
 human analogy: the polite conversationalist
Transport Layer 3-100
CSMA/CD collision detection
Transport Layer 3-101
“Taking Turns” MAC protocols
channel partitioning MAC protocols:
 share channel efficiently and fairly at high load
 inefficient at low load: delay in channel access, 1/N
bandwidth allocated even if only 1 active node!
Random access MAC protocols
 efficient at low load: single node can fully utilize channel
 high load: collision overhead
“taking turns” protocols
look for best of both worlds!
Transport Layer 3-102
“Taking Turns” MAC protocols
Polling:
 master node
“invites” slave nodes
to transmit in turn
 concerns:



polling overhead
latency
single point of
failure (master)
Token passing:
 control token passed from one
node to next sequentially.
 token message
 concerns:



token overhead
latency
single point of failure (token)
Transport Layer 3-103
Ethernet uses CSMA/CD
 No slots
 adapter doesn’t transmit
if it senses that some
other adapter is
transmitting, that is,
carrier sense
 transmitting adapter
aborts when it senses
that another adapter is
transmitting, that is,
collision detection
 Before attempting a
retransmission,
adapter waits a
random time, that is,
random access
Transport Layer 3-104
Ethernet CSMA/CD algorithm
1. Adaptor receives datagram from net 4. If adapter detects another
layer & creates frame
transmission while transmitting,
2. If adapter senses channel idle, it
aborts and sends jam signal
starts to transmit frame. If it
5. After aborting, adapter enters
senses channel busy, waits until
exponential backoff: after the
channel idle and then transmits
mth collision, adapter chooses a
3. If adapter transmits entire frame
K at random from
without detecting another
{0,1,2,…,2m-1}. Adapter waits
transmission, the adapter is done
K·512 bit times and returns to
with frame !
Step 2
Transport Layer 3-105
Ethernet’s CSMA/CD (more)
Jam Signal: make sure all other
transmitters are aware of
collision; 48 bits
Bit time: .1 microsec for 10 Mbps
Ethernet ;
for K=1023, wait time is about
50 msec
See/interact with Java
applet on AWL Web site:
highly recommended !
Exponential Backoff:
 Goal: adapt retransmission
attempts to estimated
current load

heavy load: random wait will
be longer
 first collision: choose K from
{0,1}; delay is K· 512 bit
transmission times
 after second collision: choose
K from {0,1,2,3}…
 after ten collisions, choose K
from {0,1,2,3,4,…,1023}
Transport Layer 3-106