Transcript Chapter 6

Chapter 6
TCP Congestion Control
Professor Rick Han
University of Colorado at Boulder
[email protected]
Announcements
•
•
•
•
Read Sections 6.1 - 6.4, Skip 6.5
Hope to get HW #4 to you by Thursday
Programming Assignment #2 due April 2
Midterm
•
Still grading, a little long, time management also
an issue, will be curved, hope to get back by
Thursday but may have to wait until April 2
• Next, more on TCP
Prof. Rick Han, University of
Colorado at Boulder
Recap of Previous Lecture
• TCP: Transmission Control Protocol
•
Three-way handshake
•
TCP State Machine:
•
•
•
•
SYN and SYN/ACK exchange
FIN and FIN/ACK exchange
•
•
•
Setup
Established
Tear-Down
•
•
Sequence # is # of lowest byte
Cumulative ACK
•
Receiver advertises window
TCP Segments
TCP Flow Control
Prof. Rick Han, University of
Colorado at Boulder
TCP Adaptive Retransmission
• TCP achieves reliability by retransmitting
segments after:
– A Timeout
– Receiving 3 duplicate cumulative ACK’s
• Two out-of-order segments arrive at receiver, but lowest
unacknowledged segment has yet to arrive
• Receiver repeats its highest received cumulative sequence # - hence duplicate cumulative ACK’s
• Doesn’t wait for timeout : “fast retransmit”
• Choosing the value of the Timeout
– If too small, retransmit unnecessarily
– If too large, poor throughput
– Make this adaptive, to respond to changing congestion
Prof. Rick Han, University of
delays in Internet
Colorado at Boulder
Initial Round-trip Estimator
• Round trip times exponentially averaged:
– New RTT estimate = a (old RTT estimate) + (1 - a) (new RTT)
– Recommended value for a: 0.8 - 0.9
• 0.875 for most TCP’s
• Retransmission Timeout RTO = b RTT, where b = 2
– Thought to be large enough to provide enough cushion
to prevent spurious retransmissions
– …and small enough to keep throughput high
– Every time the timer expires, retransmit segment
Prof. Rick Han, University of
Colorado at Boulder
RTT Retransmission Ambiguity
A
B
A
B
X
RTO
Sample
RTT
RTO
Sample
RTT
Prof. Rick Han, University of
Colorado at Boulder
Karn/Partridge’s modified
RTT Estimator
• Basic problem:
– If a sender has retransmitted a segment, then ACK
for that segment may correspond to any of the
retransmissions
• Is RTT for first transmission or retransmission?
• Solution:
– Each time a segment is retransmitted:
• Don’t average the RTT estimate with the current RTT
sample
• Also, Double the RTO – exponential backoff like Ethernet,
assuming that the packet loss was due to congestion.
– If a segment was ACK’ed after one transmission
• Recalculate RTT estimate and RTO
Prof. Rick Han, University of
Colorado at Boulder
Jacobson/Karel’s
Retransmission Timeout
• Key observation:
– Original smoothed RTT can’t keep up with
wide/rapid variations in RTT
• Solution:
– Base RTO on both the average RTT and
variance/standard deviation of RTT estimate
– Should have the property that:
• When stddev is large, want RTO to stay above
the rapid oscillations and not timeout too often
– i.e. set RTO = Average RTT + N*stddev
• When stddev is small, stay close to average
Prof. Rick Han, University of
RTT
Colorado at Boulder
Jacobson/Karel’s
Retransmission Timeout (2)
Err = current RTT – old Ave RTT A
Next A = old A + g*Err, g=0.125
Next Std Dev D = old D + h*(|Err|-old D),
h=0.25
RTO = A + 4 * Next D
Prof. Rick Han, University of
Colorado at Boulder
TCP Congestion Control
• Flow control addresses congestion at receiver,
not in middle of network
• Suppose you initially send up to the size of flow
control window
– Intermediate routers may not be able to handle so
much traffic
– Congestion overflows router buffers causing lost
packets causing retransmissions causing more
congestion – congestion collapse in late 80’s
• Van Jacobsen observation:
– Send only enough packets into the network that the
network has the capacity to handle without loss
Prof. Rick Han, University of
Colorado at Boulder
TCP Congestion Control (2)
• Define a congestion window CW
– Distinct from flow control window FW
– Actual window size W = min (CW, FW)
• # of data bytes that can be on the link
• If receiver is slowest, then W = FW
Else if network is slowest, then W = CW
• Send no more data than the bottleneck can handle
Prof. Rick Han, University of
Colorado at Boulder
TCP Congestion Control (3)
• How does sender set CW?
– Adaptively probe the network with data
segments
– Keep expanding the window until a segment is
lost, then contract window.
– Continue with expand/contract cycle
throughout connection – “sawtooth” behavior
Rate
Prof. Rick Han, University of
Colorado at Boulder
time
TCP “Slow” Start
• The rate at which new packets should be
injected into network is the rate at which ACKs
are returned by other end
• Use ACK’s to pace transmission of packets: “selfclocking”
• Start by setting CW = 1 segment (in bytes)
• Initial segment size set by receiver
• For each ACK that returns, increment CW by one.
• Send 1 packet. When ACK returns, increment CW, CW=2
• Send 2 packets. When 1st ACK returns, increment CW to
CW=3, when 2nd ACK returns, increment CW to CW=4
• Can send 4 packets. After 4 ACKs return, CW will be up to 8
• Exponential increase – not “slow”, quickly reach
window size that the network can accommodate
Prof. Rick Han, University of
Colorado at Boulder
TCP Slow Start
CW=1
CW=2
CW=3
CW=4
CW=8
Prof. Rick Han, University of
Colorado at Boulder
TCP Additive Increase/
Multiplicative Decrease
• How does a sender detect that CW is too large?
• It starts to see timeouts, which are interpreted as
packet loss due to congestion
• After a timeout:
• TCP remembers that congestion occurred near CW by
storing CW/2 in ssthresh = CW/2
• ssthresh = Slow Start Threshold
• TCP drastically resets CW=1 and slow starts again
• But TCP exponentially increases only to ssthresh,
halfway to old congestion mark
• After CW>ssthresh, additively increase CW
• Rationale: Be cautious about sending new data packets once
you get near old mark that caused timeouts/congestion
Prof. Rick Han, University of
Colorado at Boulder
TCP Additive Increase/
Multiplicative Decrease (2)
• Additive increase:
– If entire window’s worth (CW) of packets in a
RTT is ACKed w/o error, then increment CW
by one
– In practice, TCP adds a/CW to CW as each
ACK returns, rather than waiting for a full CW
of ACKs to return
Prof. Rick Han, University of
Colorado at Boulder
TCP Additive Increase
CW=1
CW=2
CW=3
CW=4
Prof. Rick Han, University of
Colorado at Boulder
TCP Additive Increase/
Multiplicative Decrease (3)
• Why not just slow start exclusively
(exponential increase) after timeout,
instead of additive increase?
• Rationale: Be more cautious about adding new
packets once you’re near old congestion point,
so don’t do exponential increase exclusively
• Each time a timeout occurs, divide CW by
half and store in ssthresh: multiplicative
decrease
• Minimum ssthresh and minimum CW is one
Prof. Rick Han, University of
Colorado at Boulder
TCP Additive Increase/
Multiplicative Decrease (4)
• Why not additive decrease instead of
multiplicative decrease after congestion?
• Consequences of having a too-large congestion
window are worse than having a too-small CW
• Adds to congestion
• Additive decrease can keep CW too large for
too long compared to multiplicative decrease
Prof. Rick Han, University of
Colorado at Boulder
TCP Saw Tooth Behavior
Congestion
Window
Initial
Slowstart
Timeouts
may still
occur
Slowstart
to pace
packets
Fast
Retransmit
and Recovery
Prof. Rick Han, University of
Colorado at Boulder
Time
Courtesy: Srini Seshan
TCP Additive Increase/
Multiplicative Decrease (5)
• What happens if the amount of unacknowledged
data is greater than CW?
– Can’t send new data
– Retransmit unacknowledged data
– Wait for ACKs for unacknowledged data to increase
CW above size of unacknowledged data, then can send
new data
• After a timeout, TCP slows down in two ways:
– Congestion window collapses, restricting new data
– RTO backs off exponentially, slowing down
retransmission of old unacknowledged data
Prof. Rick Han, University of
Colorado at Boulder
TCP Tahoe vs. TCP Reno
• TCP Tahoe
–
–
–
–
Slow start
Additive increase after hitting ssthresh
Multiplicative decrease and slow start after timeout
Fast Retransmit
• TCP Reno
– TCP Tahoe + Fast Recovery
– Observation: packet loss can be inferred not only by
timeouts, but also by duplicate ACKs
– Widely deployed on most UNIX systems, though
SACK-TCP is now gaining prominence
Prof. Rick Han, University of
Colorado at Boulder
TCP Tahoe vs. TCP Reno (2)
• How should TCP react if it receives duplicate
cumulative ACKs?
– This is a sign that some but not all packets are getting
through to receiver, out of order
– Don’t react as harshly as when there is a timeout
• If 3 duplicate ACKs are received, then infer that
one segment has been lost
– Retransmit immediately, rather than wait for a
timeout : called Fast Retransmit
– Cancel slow start, and drop CW to half its value
(approximately) rather than to one : called Fast
Recovery
Prof. Rick Han, University of
Colorado at Boulder
TCP Saw Tooth Behavior
Congestion
Window
Initial
Slowstart
Timeouts
may still
occur
Slowstart
to pace
packets
Fast
Retransmit
and Recovery
Prof. Rick Han, University of
Colorado at Boulder
Time
Courtesy: Srini Seshan
Congestion Avoidance
• Congestion control:
– Cycle of actively probing, transmitting more
than the network can handle, then backing off
• Congestion avoidance:
– Back off before there are packet losses
– How can you tell that congestion is increasing?
• Look at RTT – is it expanding?
• Be informed by routers that there is congestion
– Set a bit in the packet : DECbit
Prof. Rick Han, University of
Colorado at Boulder
Congestion Avoidance (2)
• DECbit
– Implemented in Digital Network Architecture (DNA)
– Could be implemented in TCP/IP
– Router sets a congestion bit, and receiver sends back
ACK with a congestion bit set at the transport layer
– If <50% of packets had congestion bit set, then add
one to CW
– If >50% of packets had congestion bit set, then
decrease CW by 0.875
Prof. Rick Han, University of
Colorado at Boulder
Congestion Avoidance (3)
• RED: Random Early Detection
– Rather than explicitly setting a congestion bit, drop a
packet before queues are completely full
• Controversial: why drop a packet until you have to?
– Early signals to end host via timeout or duplicate ACK
that router is getting full
– Policy:
• If avg queue len <= minthresh, queue packet
• If minthresh < avg queue len < maxthresh, drop packet with
probability p
• If avg queue len >= maxthresh, drop packet
Prof. Rick Han, University of
Colorado at Boulder
Congestion Avoidance (4)
• Source-based congestion avoidance: look at
RTT and back off early in your
transmissions
– Example: during two RTTs, if avg RTT > (min
RTT + max RTT)/2, then decrease CW by
factor of 8
– Example: compare throughput, keep
incrementing CW (CW[n+1]=CW+1) until:
• if send rate at RTT(n+1) < 0.5*send rate at RTT(n),
then decrease CW by one
Prof. Rick Han, University of
Colorado at Boulder