TCP IV - LINK@KoreaTech

Download Report

Transcript TCP IV - LINK@KoreaTech

TCP Tutorial
- Part IV Internet Computing Laboratory @ KUT
(http://icl.kut.ac.kr)
Youn-Hee Han
It is licensed under a Creative Commons Attribution 2.5 License
Evolution of TCP
2
Computer Network
Evolution of TCP
1975
Three-way handshake
Ray Tomlinson
In SIGCOMM 75
1974
TCP described by
Vint Cerf, Bob Kahn
In IEEE Trans Comm
1975
1980
1984
Nagel’s algorithm
to reduce overhead
of small packets;
predicts congestion
collapse
1983
BSD Unix 4.2
supports TCP/IP
1981
TCP & IP
RFC 793 & 791
1987
Karn’s algorithm
to better estimate
round-trip time
1986
Congestion
collapse
st
1 observed
1985
1990
4.3BSD Reno
fast recovery
delayed ACK’s
1988
Van Jacobson’s
algorithms
slow start,
congestion
avoidance, fast
retransmit (all
implemented in
4.3BSD Tahoe)
SIGCOMM 88
1990
The slides are provided by Bradley Durandetta, Dept. of CIS, Univ. of Delaware.
3
Computer Network
Evolution of TCP
1994
T/TCP
Transaction TCP
(Braden)
1994
1993
ECN
TCP Vegas(not
Explicit
implemented)
real congestion Congestion
Notification
avoidance
(Floyd)
(Brakmo et al)
1993
1994
1996
NewReno
modified fast
recovery
SACK TCP
Selective Ack
(Floyd et al)
1996
Improving TCP
startup
(Hoe)
1996
FACK TCP
Forward Ack
extension to SACK
(Mathis et al)
1996
The slides are provided by Bradley Durandetta, Dept. of CIS, Univ. of Delaware.
4
Computer Network
TCP Tahoe
TCP Tahoe



RFC 2001, Jan. 1997
Unconditionally (either type of loss event) cuts its cwin to 1 MSS and
enter slow start phase
Slow Start, Congestion Avoidance, Fast Retransmit
 Slow Start (Exponential growth of cwnd)

Each RTT : cwnd  2 * cwnd
 Congestion Avoidance (Linear growth of cwnd)

Each RTT : cwnd  cwnd + 1
 Fast Retransmit

Retransmits after 3 dupACKs without waiting for timeout
window
SS : slow start
CA : congestion avoidance
SS
5
CA
SS
CA
time
Computer Network
TCP Tahoe Trace
(with one dropped segment)
500000
480000
48
44
Sent Segment
ACK'ed Segment
cwnd
ssthresh
40
36
32
Lost segment
Detect the packet loss
28
440000
24
MSS
Sequence #
460000
20
420000
Begin congestion
avoidance
Begin slow-start
16
12
400000
8
4
380000
4.9
6
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
0
5.7
Computer Network
TCP Reno
TCP Reno


developed by Van Jacobson in 1990
standardized as RFC 2001, Jan. 1997
Van Jacobson in January 2006
Van Jacobson joined PARC as a research fellow in August 2006.
He previously served as chief scientist at Packet Design LLC[1].
Prior to that, he was chief scientist at Cisco Systems and group leader
for the Network Research Group at Lawrence Berkeley Laboratory.
7
Computer Network
TCP Reno
TCP Reno


Slow start, Congestion Avoidance, Fast Retransmit + Fast Recovery
Basic ideas
 Fast recovery avoid slow start
 3 dupACKs: fast retransmit + fast recovery
 Timeout: slow start
8
Computer Network
TCP Reno
Fast Retransmit



when the sender receives 3 “duplicative ACKs”, i.e. 4 identical ACKs without the
arrival of any other intervening packets,
Then, it assumes that the segment was lost,
Sender retransmits the segment and moves to Fast Recovery phase (instead of
moving to Slow Start phase)
Fast Recovery



9
the sender decreases cwnd the half of its original size, adds 3 (3 packets have left
the network and buffered by the receiver)
It continues to send new segments (if allowed by the cwnd value) until receiving
new different ACK, which should acknowledge receiving all segments sent till
moving to Fast Recovery phase (assuming that no more segments were lost).
For each additional duplicated ACK received, increment cwnd by 1
Computer Network
TCP Reno Example
Flight Size =No. of
Unacknowledged
segments
cwnd=8
Fast Retransmit
cwnd=8/2+3=7
ssthresh=8/2=4
=> Fast Recovery
0
1
2
3
4
5
6
7
8
TCP Reno
Ack(1)
Ack(1)
Ack(1)
Ack(1)
Ack(1)
Ack(1)
Ack(1)
Ack(1)
1
cwnd=8
cwnd=9
9
Ack(9)
Ack(10)
cwnd=10 10
cwnd=11 11
Exit Fast Recovery
cwnd=ssthresh=4
=> Congestion Avoidance
10
12
Computer Network
TCP Reno Trace
(with one dropped segment)
500000
48
Sent Segment
44
ACK'ed Segment
cwnd
480000
40
ssthresh
cwnd in Fast Recovery
36
32
Detect the packet loss
by 3 DupACKs
28
440000
24
MSS
Sequence #
460000
20
420000
16
12
400000
8
4
380000
0
4.9
11
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
5.7
Computer Network
TCP Tahoe & Reno Trace
(with one dropped segment)
500000
480000
Tahoe
Reno
Tahoe & Reno
Sequence #
460000
440000
420000
400000
380000
4.9
12
5.0
5.1
5.2
5.3
Time(s)
5.4
5.5
5.6
5.7
Computer Network
TCP Newreno
TCP Newreno

RFC 2582, April 1999

It enables to the algorithm to manage a loss of more than one packet
without changing the TCP message structure

Another improvement (TCP SACK – RFC 2018) enables to cope with a
loss of more than one packet by changing message structure (using TCP
options)
13
Computer Network
TCP Newreno
Limitation of TCP Reno


If cwnd size is too small (smaller than 4 packets) then it’s not possible to
get 3 duplicate acks and run the algorithm
TCP Reno can not manage a loss of multiple packets from a single
window of data
 It will cause a use of retransmission time out

TCP Reno doesn’t manage a loss of packets during the Fast Recovery
stage
What if There are Multiple Losses in a Window?




14
With
With
With
With
two losses in a window, Reno will occasionally timeout.
three losses in a window, Reno will usually timeout.
four losses in a window, Reno is guaranteed to timeout!
three or more losses in a window, Tahoe typically out performs Reno!
Computer Network
TCP Newreno
Limitation of TCP Reno
Flight Size =No. of
Unacknowledged
segments
cwnd=8
Fast Retransmit
cwnd=8/2+3=7
ssthresh=8/2=4
=> Fast Recovery
7
8
Ack(1)
Ack(1)
Ack(1)
Ack(1)
Ack(1)
Ack(1)
1
Ack(3)
cwnd=8
cwnd=9
Exit Fast Recovery
cwnd=ssthresh=4
=> Congestion Avoidance
15
0
1
2
3
4
5
6
Flight Size(=7) > cwnd
=> No new segments
9
Ack(3)
Segment #3 will be time-out
Computer Network
TCP Newreno
TCP Reno Trace
500000
(with two dropped segments)
48
Sent Segment
44
ACK'ed Segment
cwnd
480000
40
ssthresh
cwnd in Fast Recovery
36
32
28
440000
24
MSS
Sequence #
460000
20
420000
16
12
400000
8
4
380000
0
4.9
16
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
5.7
Computer Network
TCP Newreno
Idea:



The sender remembers a number of the last segment that was sent
before entering the Fast Retransmit phase
The sender deals with a situation when a “new” ACK (which is not
DupAcks) does not cover the last remembered segment (“partial ACK”)
The sender consider the partial ACK as a situation when more packets
were lost before entering the Fast Retransmit.
After discovering such situation the sender will retransmit the new
lost packet too and will stay at the Fast Recovery stage
The sender will finish the Fast Recovery stage when it will get ACK
that covers last segment sent before the Fast Retransmit
17
Computer Network
TCP Newreno
Modifications to Fast Recovery

Partial ACKs
 An ACK that acknowledges some but not all the segments that were
sent before the start of fast recovery.


NewReno interprets “Partial ACK” as an indication of multiple loss.
If partial ACK received, re-transmit the next lost segment
immediately and deflate cwnd as follows:
 cwnd = cwnd – the amount of new acknowledged data + 1


18
Sender remains in fast recovery until all data outstanding sent
before fast recovery are ACK’ed.
In this fast recovery phase, an additional dupACK’s increase cwnd
by 1*MSS
Computer Network
TCP NewReno Trace
(with two dropped segments)
500000
48
Sent Segment
44
ACK'ed Segment
cwnd
480000
40
ssthresh
cwnd in fast recovery
36
32
28
440000
24
MSS
Sequence #
460000
20
420000
16
7= 16 – 10 + 1
12
400000
8
4
380000
0
4.9
19
5.0
5.1
5.2
5.3
Time(s)
5.4
5.5
5.6
5.7
Computer Network
Tahoe, Reno & NewReno Trace
(with two dropped segments)
500000
480000
NewReno
Reno
Tahoe
Reno & NewReno
All
Sequence #
460000
440000
420000
400000
380000
4.9
20
5.0
5.1
5.2
5.3
Time (s)
5.4
5.5
5.6
5.7
Computer Network
Is There a Better Way?
The only way Tahoe, Reno and NewReno can detect congestion is by
creating congestion!


They carefully probe for congestion by slowly increasing their sending
rate.
When they find (create) congestion, they cut sending rate at least in half!
(basically AIMD)
This slow advance and rapid retreat approach results in a sawtoothed sending rate and highly erratic throughput.
What if TCP could detect congestion without causing congestion?
21
Computer Network
KB
Is There a Better Way?
70
60
50
40
30
20
10
0.5
1.0 1.5
2.0
2.5 3.0
3.5 4.0 4.5
Time (seconds)
5.0
5.5
6.0
6.5
7.0 7.5
8.0 8.5
0.5
1.0 1.5
2.0
2.5 3.0
3.5 4.0 4.5
Time (seconds)
5.0
5.5
6.0
6.5
7.0 7.5
8.0 8.5
0.5
1.0 1.5
2.0
2.5 3.0
3.5 4.0 4.5
Time (seconds)
5.0
5.5
6.0
6.5
7.0 7.5
8.0 8.5
Avg. source send rate
Sending KBps
Congestion window
1100
900
700
500
300
100
In shaded region the cogestion
widow increases. We expect
throughput to increase but it cannot
increase beyond available bandwidth
Any increase in the window size only
results in packets taking up buffer
space at bottleneck router
22
Queue size in router
Buffer space at bottleneck router
10
5
Computer Network
TCP Vegas
Introduced by Brakmo and Peterson (1994)


L. Brakmo, S. O'Malley, and L. Peterson. TCP Vegas: New techniques for
congestion detection and avoidance. In Proceedings of the SIGCOMM '94
Symposium (Aug. 1994) pages 24-35.
L. Brakmo and L. Peterson. TCP Vegas: End to End Congestion Avoidance
on a Global Internet. IEEE Journal on Selected Areas in Communication,
Vol 13, No. 8 (October 1995) pages 1465-1480.
TCP Vegas vs. TCP (New)reno

TCP Vegas provides a proactive response to congestion

TCP (New)reno is reactive with respect to congestion
Lawrence S. Brakmo @ The University of Arizona
23
Computer Network
TCP Vegas
TCP Tahoe vs. TCP (New)reno vs. TCP Vegas
cwnd
Timeout
 SS
CA
TCP Tahoe
cwnd  1
cwnd  1
SS
Duplicate ACK
 CA
SS
CA
SS
CA
time
cwnd
Timeout
 SS
Duplicate ACK
 CA
TCP (New)reno
cwnd  cwnd/2 + 3
cwnd  1
SS
CA
SS
CA
FR
time
cwnd
TCP Vegas
SS
24
CA
time
Computer Network
TCP Vegas
Modified congestion avoidance



source watches for some sign that router’s queue is building up and
congestion will happen too
Sets congestion window size based on the difference between the
expected and actual data rates
Define…
 cwnd: current congestion window size
 rtt*: the minimum of all measured round-trip time


It is commonly the RTT of the first segment sent by the connection
Expected Throughput = cwnd/rtt*
 rtt: actual (with congestion) round-trip time

Actual Throughput = cwnd/rtt
 Diff = Expected Throughput – Actual Throughput


25
Note that Diff is always positive (Diff>0)
Diff < 0 means that we need to change rtt*
Computer Network
TCP Vegas
Modified congestion avoidance

Estimated flight size 
  = (cwnd/rtt* – cwnd/rtt)  rtt*

 : low threshold for  (want  >  )
 : high threshold for  (want  <  )

TCP Vegas attempts to control cwnd as follows:

 If  < , increase cwnd by 1 * MSS
 If  > , decrease cwnd by 1 * MSS
 Otherwise (    ), cwnd is not changed.

26
The overall goal is to keep between  and  extra bytes in the network.
Computer Network
TCP Vegas
Example

Assumption





A connection with rtt* of 100ms
A segment size if 1KBytes (MSS = 1KBytes)
cwnd = 3KBytes
Expected Throughput = 3KBytes/100ms = 30KBytes/s
Actual Sampling
 rtt=300ms
 Actual Throughput = 3KBytes/300ms = 10KBytes/s
 Estimated flight size  = (30 – 10) KBytes/s * 0.1s = 2 KBytes  2 segments

Cwnd Control Example
 If =3KBytes (3 segments) and =5KBytes (5 segments),  < 

Result: cwnd = cwnd + 1 * MSS = 3 KBytes + 1 KBytes = 4 Kbytes
 If =1KBytes (1 segments) and =3KBytes (3 segments),     

Result: No change
 If =1KBytes (1 segments) and =1.5KBytes (1.5 segments),  < 

27
Result: cwnd = cwnd - 1 * MSS = 3 KBytes - 1 KBytes = 2 Kbytes
Computer Network
TCP Vegas
Alpha & Beta

Brakmo sets  to one and  to three.

=1 means…
 at least one segment will be left on the buffer of the bottleneck router.
 Then, when the aggregate traffic from the other connections decrease, our
connection can take advantage of the extra available bandwidth immediately
without having to wait for the one RTT delay necessary for the linear increase
to occur.

- =2 means…
 Small occasional changes in the available bandwidth will not create oscillations
in the window size.
 The - region provides a damping effects
28
Computer Network
TCP Vegas
=1 and =3
KB
congestion window
Consideration
70
60
50
40
30
20
10
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
5.0
5.5
6.0
6.5
7.0
7.5
8.0
CAM KBps
Black line = actual rate
Green line = expected rate
Shaded = region between  and 
Throughput
Time (seconds)
240
200
160
120
80
40
0.5
1.0
1.5
2.0
2.5
3.0
3.5 4.0 4.5
Time (seconds)
The goal is to keep the ActualRate between these two thresholds, that is, within the shaded region.
Whenever ActualRate falls below shaded region (i.e., gets too far from Expected Rate), TCP Vegas decreases
the congestion window because it fears that too many packets are being buffered in the network.
Whenever ActualRate goes above the shaded region (i.e., gets too close to the Expected Rate), TCP Vegas
increases the congestion window because it fears that it is underutilizing the network.
29
Computer Network
33
TCP Vegas
TCP Vegas generally outperforms TCP Reno in a homogeneous
environment


TCP Vegas achieves between 40% and 70% better throughput
TCP Vegas has 20% to 50% of the losses compared to the TCP Reno
TCP Reno
TCP Vegas
30
Computer Network
TCP Selective Ack
TCP SACK


RFC 2018, Oct. 1996
Selective Acknowledgment (SACK) allows…
 the receiver to inform the sender about all segments that have been
successfully received.
 the sender to retransmit only those segments that have been lost.
receiver
sender
31
Computer Network
TCP Selective Ack
TCP SACK-related Options

SACK-Permitted Option: from sender to receiver

SACK Option: from receiver to sender
TCP SACK Rules




32
A SACK cannot be sent unless the SACK-permitted option has been
received (in the SYN).
The first SACK block must contain the most recently received segment
that is to be SACKed.
The second block must contain the second most recently received
segment that is to be SACKed, and so forth.
Notice this can result in some data in the receiver’s buffers which should
be SACKed but is not (if there are more segments to SACK than available
space in the TCP header).
Computer Network
TCP Selective Ack
SACK Example (assuming Tahoe & long timeout)
receiver
sender
Assuming 3 maximum blocks
33
Computer Network
TCP Selective Ack
What should the Sender do?



The sender must keep a buffer of unacknowledged data.
When it receives a SACK option, it should turn on a SACK-flag bit for all
segments in the transmit buffer that are wholly contained within one of
the SACK blocks.
After this SACK flag bit has been turned on, the sender should skip that
segment during any later retransmission.
Rough Example

Assume packets 5-25 are transmitted

Let packets 5, 12, and 18 be lost

Receiver sends back a CACK=5, and SACK=(6-11,13-17,19-25)

34
Sender knows that packets 5, 12, and 18 are lost and retransmits them
immediately
Computer Network
TCP Selective Ack
SACK TCP at the Sender Example
receiver
sender
SENDER
TIMEOUT
35
Computer Network
TCP Selective Ack
SACK TCP Observation


SACK TCP follows standard TCP congestion control; it should not damage
the network.
This information allows the sender to better decide what it needs to
retransmit and what it does not.
 This can only serve to help the sender, and should not adversely affect other
TCPs (Tahoe, Reno, NewReno, Vegas).

36
While it is still possible for a SACK TCP to needlessly retransmit segments,
the number of these retransmissions has been shown to be quite low in
simulations, relative to Reno and Tahoe TCP.
Computer Network
TCP Selective Ack
SACK TCP Implementation

Windows 2000

Windows 98 / Windows ME

Solaris 7 and later

Linux kernel 2.1.90 and later

FreeBSD and NetBSD have optional modules
ACIRI has measured the behavior of 2278 random web servers that
claim to be SACK-enabled.
Out of these, 2133 (93.6%) appeared to ignore SACK data and only
145 (6.4%) appeared to actually use the SACK data.
– why is SACK deployment so much slower than Reno and New-Reno?
37
Computer Network