Transcript ppt

Lecture 4: Congestion Control

Challenge: how do we efficiently share
network resources among billions of
hosts?
 Today:
TCP
– Hosts adjust rate based on packet losses
 Next
time: Alternative solutions
– Fair queueing, RED
– Vegas, packet pair
– ECN, rate control, credits
A brief Internet history...
1991
WWW/HTTP
1990
ARPANET
dissolved
1986
NNTP
1972
TELNET
RFC 318
1969
ARPANET
created
1970
1973
FTP
1977
MAIL
1982
TCP & IP
1984
DNS
RFC 454
RFC 733
RFC 793 & 791
RFC 883
1975
1980
RFC 977
1985
1990
1995
Multi-backbone
Internet
1992
MBONE
1995
TCP: This is your life...
1975
Three-way handshake
Raymond Tomlinson
In SIGCOMM 75
1974
TCP described by
Vint Cerf and Bob Kahn
In IEEE Trans Comm
1975
1980
1984
Nagel’s algorithm
1987
to reduce overhead
Karn’s algorithm
1990
of small packets;
to better estimate
4.3BSD Reno
predicts congestion
round-trip time
fast retransmit
collapse
delayed ACK’s
1983
1988
1986
BSD Unix 4.2
Van Jacobson’s
Congestion
supports TCP/IP
algorithms
collapse
congestion avoidance
observed
1982
and congestion control
TCP & IP
(most implemented in
RFC 793 & 791
4.3BSD Tahoe)
1985
1990
TCP: After 1990
1994
T/TCP
(Braden)
Transaction
TCP
1993
TCP Vegas
(Brakmo et al)
real congestion
avoidance
1993
1994
ECN
(Floyd)
Explicit
Congestion
Notification
1994
1996
SACK TCP
(Floyd et al)
Selective
Acknowledgement
1996
Hoe
Improving TCP
startup
1996
1996
FACK TCP
(Mathis et al)
extension to SACK
1999
TCP Rainier
???
TCP Flow Control

TCP is a sliding window protocol
 For
window size n, can send up to n bytes
without receiving an acknowledgement
 When the data is acknowledged then the
window slides forward

Each packet advertises a window size
 Indicates
number of bytes the receiver is
willing to get

Original TCP always send entire window
Visualizing the window
offered window
(advertised by receiver)
usable window
1
2
3
sent and
acknowledged
4
5
6
7
8
9
can send ASAP
sent, not ACKed
10 11 12
can’t send until
window moves
Left side of window advances when data is acknowledged.
Right side controlled by size of window advertisement
Observed TCP Problems

Too many small packets
 silly
window syndrome
 Nagel’s algorithm
 delayed acks

Too many retransmissions
 dynamic
retransmission timer
 congestion collapseb
Silly Window Syndrome

Problem: (Clark, 1982)
 If
receiver advertises small increases in the
receive window then the sender may waste
time sending lots of small packets

Solution
 Receiver
must not advertise small window
increases
 Increase window by min(MSS,RecvBuffer/2)
Nagel’s algorithm

Small packet problem:
 Don’t
want to send a 41 byte packet for
each keystroke
 How long to wait for more data?

Solution:
 Allow
only one outstanding small segment
that has not yet been acknowledged
 Self-adaptive: faster that ACKs arrive, more
tinigrams can be sent
Retransmission
How long a timeout to set?
 Estimate round-trip time R

= R + (1- )M
  is a smoothing factor of 0.9
 M is measured round-trip time (ACK’s to
data)
R

Timeout at R, where  is a delay
variance factor of 2.0
Karn’s algorithm

Problem:
 How
to estimate RTT when data is
retransmitted?

Solution:
 Don’t
sample RTT when you retransmit
data
 Also added exponential backoff to
retransmit interval
Router Congestion

What if rate of packets arriving > rate of
packets departing?
aaaaa
bbbbb
ccccc
ddddd

C
r
o
s
s
b
a
r
adcba
Routers drop any overflow packets
The Problem
Original TCP sent full window of data
 When links become loaded, queues fill
up, and this can lead to:

 Congestion
collapse: when round-trip time
exceeds retransmit interval this can create
a stable condition in which every packet is
being retransmitted many times
 Synchronized behavior: network oscillates
between loaded and unloaded
Jacobson Solution
Modify retransmission timer to adapt to
variations in queueing delay
 Infer network bandwidth from packet loss

 drops
=> congestion => reduce rate
– drops also caused by link noise!
 no

drops => no congestion => increase rate
Limit send rate based on network
bandwidth as well as receiver space
Jacobson’s RTT estimator

Problem:
 Can’t

deal with wide variance in RTT
Solution:
 Smooth
estimator of variance as well
Err = M - R
R = R + gErr
( gain for avg = 1/8)
D = D + h(|Err| - D) ( gain for dev = 1/4)
 Timeout
at R + 4D
 When variance is small, R dominates
TCP Congestion Control

Adjust rate to match network bandwidth
 Additive
increase/multiplicative decrease
– oscillate around bottleneck capacity
 Slow
start
– quickly identify bottleneck capacity
 Fast
retransmit
 Fast recovery
Tracking the Bottleneck Bandwidth
Throughput = window size/RTT
 Multiplicative decrease

 Timeout
=> dropped packet =>
cut window size in half

Additive increase
 Ack
arrives => no drop =>
increase window size by one packet/window
TCP “Sawtooth”
Oscillates around bottleneck bandwidth
 adjusts
to changes in competing traffic
round-trip times
27
24
21
18
15
12
9
6
3
Additive Increase/Multiplicative Decrease
18
16
14
12
window 10
(in segs) 8
6
4
2
0
0

Slow start

How do we find bottleneck bandwidth?
 Start
by sending a single packet
– start slow to avoid overwhelming network
 Multiplicative
increase until get packet loss
– quickly find bottleneck
 Previous
window < bottleneck rate
– shift into linear increase/multiplicative decrease
Slow Start

Quickly find the bottleneck bandwidth
Slow Start
300
250
200
window
150
(in segs)
100
50
0
0
1
2
3
4
5
round-trip times
6
7
8
Slow Start Problems

Slow start usually overshoots bottleneck
 will

Bursty traffic source
 will

lose many packets in window
cause bursty losses for other flows
Short flows
 Can
spend entire time in slow start!
Avoiding burstiness: ack pacing
bottleneck
packets
Sender
Receiver
acks
Window size = round trip delay * bit rate
Problems with Ack Pacing

Ack compression
 Variations
in queueing delays on return
path changes spacing between acks
 Example: ack waits for single long packet
 Worse with bursty cross-traffic

What happens after a timeout?
Ack Pacing After Timeout
1

Packet loss disrupts ack
pacing
3
5
2
1
1
4
 slow
start/additive increase
cause packet loss!

After loss, use slow start to
regain ack pacing
1
1
1
2
 switch
to linear increase at
last successful rate
 “congestion avoidance”
5
Putting It All Together
Slow Start + Congestion Avoidance
18
16
14
12
window 10
(in segs) 8
6
4
2

Timeouts dominate performance!
39
36
33
30
round-trip times
27
24
21
18
15
12
9
6
3
0
0
Fast Retransmit
1
Can we detect packet loss
without a timeout?
 Duplicate acks imply either

 packet
reordering (route change)
 packet loss

3
5
2
4
1
1
1
2
TCP Tahoe
 resend
if see three dup acks
 eliminates timeout delay
1
1
5
Fast Retransmit Caveats

Requires in order packet delivery
 Dynamically
adjust number of dup acks
needed for retransmit

Doesn’t work with small windows
 ex:

modems!
Doesn’t work if packets lost in burst
 ex:
at peak of slow start?
Fast Retransmit
Slow Start + Congestion Avoidance
+ Fast Retransmit
18
16
14
12
window 10
(in segs) 8
6
4
2

28
26
24
22
round-trip times
20
18
16
14
12
10
8
6
4
2
0
0
Regaining ack pacing limits performance
Fast Recovery
1

Use duplicate acks to maintain
ack pacing
3
5
2
1
1
4
 dup
ack => packet left network
 every other ack => send packet

Doesn’t work if lose packets in
burst
 timeout
and slow start to
reestablish ack pacing
1
1
1
2
3
Fast Recovery
Slow Start + Congestion Avoidance
+ Fast Retransmit + Fast Recovery
18
16
14
12
window 10
(in segs) 8
6
4
2
0
0
2
4
6
8
10
12
14
16
round-trip times
18
20
22
24
Delayed ACKS

Problem:
 In
request/response programs, you send
separate ACK and Data packets for each
transaction

Solution:
 Don’t ACK
data immediately
 Wait 200ms (must be less than 500ms)
 Must ACK every other packet
 Must not delay duplicate ACKs
Delayed Ack Impact

TCP congestion control triggered by acks
 if
receive half as many acks => window
grows half as fast

Slow start with window = 1
 will
trigger delayed ack timer
What if two TCP connections
share link?

Reach equilibrium independent of initial
bandwidth (assuming equal RTTs)
16
14
12
10
window
8
(in segs)
6
4
2
0
0
1
2
3
4
5
6
7
8 9 10 11 12 13 14 15 16 17 18 19
round-trip times
Equilibrium Proof
What if TCP and UDP share link?
Independent of initial rates, UDP will get
priority! TCP will take what’s left.
18
16
14
12
UDP
TCP
window 10
(in segs) 8
6
4
2
18
round-trip times
16
14
12
10
8
6
4
2
0
0

What if two different TCP
implementations share link?
If cut back more slowly after drops =>
will grab bigger share
 If add more quickly after acks => will
grab bigger share
 Incentive to cause congestion collapse!

 Many
TCP “accelerators”
 Easy to improve perf at expense of network

Solution: per-flow fair queueing at router
What if TCP connection is short?

Slow start dominates performance
 What
if network is unloaded?
 Burstiness causes extra drops

Packet losses unreliable indicator
 can
lose connection setup packet
 can get drop when connection near done
 signal unrelated to sending rate

In limit, have to signal every connection
 50%
loss rate as increase # of connections
Example: 10KB document
10Mb/s Ethernet,70ms RTT, 536 MSS
Ethernet ~ 10 Mb/s
 64KB window, 70ms RTT ~ 7.5 Mb/s
 can only use 10KB window ~ 1.2 Mb/s
 5% drop rate ~ 275 Kb/s (Matthis)
 model timeouts ~ 228 Kb/s (Padhye)
 slow start, no losses ~ 140 Kb/s
 slow start, with 5% drop ~ 75 Kb/s

Bandwidth (Kbps)
Short flow bandwidth
140
120
100
80
60
40
20
0
median
average
0
2.5
5
7.5
10 12.5 15
Packet loss rate (% )
Flow length=10Kbytes, RTT=70ms
Sharing congestion information

Intra-host sharing
 Multiple
web connections from a host
 [Padmanabhan98, Touch97]

Inter-host sharing
 For
a large server farm or a large client
population
 How much potential is there?
Destination locality
Destination Host Locality
(time since host last accessed)
1
All Flows
0.9
Inter-host only
Cumulative Fraction
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1
Seconds
10
100
Sharing Congestion Information
Enterprise/Campus Network
Border Router
Subnet
Congestion
Gateway
Internet
Matthis et al.
Problem: TCP has complex behavior
 Can we model its performance?

 Make
lots of simplifying assumptions
– one drop per window
– long flow behavior (ignore slow start)
– large receiver window
– no timeouts

Result
BW 
MSS C

, C  0.9
RTT
p
What does this mean?
BW 

MSS C

, C  0.9
RTT
p
Bigger RTT => worse performance
 For
shared links, TCP penalizes WAN traffic
 Queueing delay hurts performance

Higher drop rate => worse performance
 correlated
vs. non-correlated drops
– does my behavior cause more drops for me, or
just more drops for other flows?
Does adding buffers help?
++ Increases ability of router to handle
bursts => fewer drops
 -- Allows greater overshoot in slow start
=> bursty traffic => more drops
 -- Potentially increases RTT

Balakrishnan et al.
Observation: poor performance on wireless
links (<= 10% of peak)
 Problem: Interaction with TCP

 Wireless
links drop pkts due to noise, handoffs
 TCP interprets all drops as congestion
 =>TCP will reduce sending rate

Paper’s goal: Evaluate proposed solutions
Alternatives

Modify TCP implementation
 better
handling of multiple drops per window
 even faster retransmit (one dup ack)
 explicit bit for non-congestion packet loss

Retransmission at link layer
 Can
result in packet reordering
 TCP-aware suppresses duplicate acks

Split connection
Modify TCP Implementation
E2E: standard TCP
 NewReno: if multiple drops in window,
first ack after fast recovery will be “partial”
 Selective ACK: list recent missing pkts

 IETF
-- wait for 3 dup acks
 SMART -- resend after one dup ack
ELN: explicit bit for non-congestion loss
 ELN + SMART not simulated

Link Layer Solutions

Link layer: transparent retransmission
 end

host sees packets arrive out of order
TCP aware link layer
 suppress

Link layer with SACK
 better

duplicate acks
handling of multiple drops
Link layer with SACK and TCP aware
 bingo!
Split TCP

Install proxy at wireless link
 allows

local retransmissions
But
 proxy
failure visible to end hosts
 end to end xput limited to min across hops