Transcript slides ppt

T H E R O YA L
SO C IETY
Buffer requirements
for TCP
Damon Wischik
www.wischik.com/damon
DARPA grant W911NF05-1-0254
Motivation
• Internet routers have buffers,
– to accomodate bursts in traffic
– to keep utilization high
• Large buffers are challenging:
– optical packet buffers are hard to build
– large electronic buffers are unsustainable
since data volumes double every 10–12 months [Odlyzko, 2003]
and CPU speeds double every 18 months
but DRAM memory access speeds double every 10 years
• How big do the buffers need to be,
to accommodate TCP traffic?
– 3 GByte? Rule of thumb says buffer = bandwidth×delay
– 300 MByte? buffer = bandwidth×delay/√#flows
[Appenzeller, Keslassy, McKeown, 2004]
– 30 kByte? constant buffer size, independent of line rate
[Kelly, Key, etc.]
History of TCP Transmission Control Protocol
• 1974: First draft of TCP/IP
[“A protocol for packet network interconnection”,
Vint Cerf and Robert Kahn]
• 1983: ARPANET switches on TCP/IP
• 1986: Congestion collapse
• 1988: Congestion control for TCP
[“Congestion avoidance and control”, Van Jacobson]
“A Brief History of the Internet”, the Internet Society
End-to-end congestion control
Internet congestion is controlled by the end-systems.
The network operates as a dumb pipe.
[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981]
request
User
Server
(TCP)
End-to-end congestion control
Internet congestion is controlled by the end-systems.
The network operates as a dumb pipe.
[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981]
Server
(TCP)
User
data
End-to-end congestion control
Internet congestion is controlled by the end-systems.
The network operates as a dumb pipe.
[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981]
acknowledgements
Server
(TCP)
User
data
The TCP algorithm, running on the server, decides how fast to send data.
traffic rate [0-100 kB/sec]
TCP algorithm
time [0-8 sec]
if (seqno > _last_acked) {
if (!_in_fast_recovery) {
_last_acked = seqno;
_dupacks = 0;
inflate_window();
send_packets(now);
_last_sent_time = now;
return;
}
if (seqno < _recover) {
uint32_t new_data = seqno - _last_acked;
_last_acked = seqno;
if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0;
_cwnd += _mss;
retransmit_packet(now);
send_packets(now);
return;
}
uint32_t flightsize = _highest_sent - seqno;
_cwnd = min(_ssthresh, flightsize + _mss);
_last_acked = seqno;
_dupacks = 0;
_in_fast_recovery = false;
send_packets(now);
return;
}
if (_in_fast_recovery) {
_cwnd += _mss;
send_packets(now);
return;
}
_dupacks++;
if (_dupacks!=3) {
send_packets(now);
return;
}
_ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss));
retransmit_packet(now);
_cwnd = _ssthresh + 3 * _mss;
_in_fast_recovery = true;
_recover = _highest_sent;
}
• The traffic rate produced by a
TCP flow follows a ‘sawtooth’
rate
Buffer sizing I: TCP sawtooth
time
• To prevent the link’s going idle,
the buffer must be big enough
to smooth out a sawtooth
– buffer = bandwidth×delay
•
But is this a worthwhile goal?
Why not sacrifice a little utilization, so that virtually no buffer is needed?
• When there are many TCP flows,
the sawteeth should average out,
so a smaller buffer is sufficient
– buffer = bandwidth×delay/√N
N = number of TCP flows
[Appenzeller, Keslassy, McKeown, 2004]
•
But what if they don’t average out, i.e. they remain synchronized?
• The traffic rate produced by a
TCP flow follows a ‘sawtooth’
rate
Buffer sizing I: TCP sawtooth
time
• To prevent the link’s going idle,
the buffer must be big enough
to smooth out a sawtooth
– buffer = bandwidth×delay
•
But is this a worthwhile goal?
Why not sacrifice some utilization, so that virtually no buffer is needed?
• When there are many TCP flows,
the sawteeth should average out,
so a smaller buffer is sufficient
– buffer = bandwidth×delay/√N
N = number of TCP flows
[Appenzeller, Keslassy, McKeown, 2004]
•
But what if they don’t average out, i.e. they remain synchronized?
aggregate
data rate
Buffer sizing II: TCP dynamics
tcprecord
service rate
queu
eapprox
| |
|
| |
|
||
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
queue size
[0-160pkt]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
||
•
•
tcp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
When the aggregate data rate is less than the service rate,
the queue stays small
No packet drops, so TCPs increase their data rate
50
51
time [0-3.5s]
52
aggregate
data rate
Buffer sizing II: TCP dynamics
tcprecord
service rate
queu
eapprox
| |
|
| |
|
||
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
queue size
[0-160pkt]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
||
•
•
•
•
tcp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
drops
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
time [0-3.5s]
When the aggregate data rate is less than the service rate,
the queue stays small
No packet drops, so TCPs increase their data rate
Eventually the aggregate data rate exceeds the service rate,
and a queue starts to build up
When the queue is full, packets start to get dropped
50
51
52
aggregate
data rate
Buffer sizing II: TCP dynamics
tcprecord
service rate
queu
eapprox
| |
|
| |
|
||
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
queue size
[0-160pkt]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
||
•
•
•
•
•
tcp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
time [0-3.5s]
When the aggregate data rate is less than the service rate,
the queue stays small
No packet drops, so TCPs increase their data rate
Eventually the aggregate data rate exceeds the service rate,
and a queue starts to build up
When the queue is full, packets start to get dropped
One round trip time later, TCPs respond and cut back
They may overreact, leading to synchronization i.e. periodic fluctuations
50
51
52
aggregate
data rate
Buffer sizing II: TCP dynamics
tcprecord
service rate
queu
eapprox
| |
|
| |
|
||
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
queue size
[0-160pkt]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
||
•
•
•
•
•
•
tcp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
time [0-3.5s]
When the aggregate data rate is less than the service rate,
the queue stays small
No packet drops, so TCPs increase their data rate
Eventually the aggregate data rate exceeds the service rate,
and a queue starts to build up
When the queue is full, packets start to get dropped
One round trip time later, TCPs respond and cut back
They may overreact, leading to synchronization i.e. periodic fluctuations
Is it possible to keep the link slightly underutilized,
so that the queue size remains small?
50
51
52
Buffer sizing III: packet burstiness
rate
• TCP traffic is made up of packets
– there may be packet clumps,
if the access network is fast
– or the packets may be spaced out
time
packets
• Even if we manage to keep total data rate < service rate,
packet-level bursts will still cause small, rapid fluctuations in queue size
• If the buffer is small, these fluctuations will lead to packet loss
– which makes TCPs keep their data rates low
– which keeps the queue size small, & reduces the chance of synchronization
tcprecord
15
aggregate
data rate
10
5
0
queue size
[0-16pkt]
queueapprox
t
cprecord
15
10
|||||||||||||||||||||||||||||||||||||||||||||||||||
5
|||
|||| | |
|
|| || |
||
||||||||||||||||||||||||||||||||
|||||||
|||||||||||||||||||||||||||||||||
|
||
|||||||||
||||||| |||||||||
|||
||||||||||||||||||||||||||||||
||||||||||||||||||||||| ||||||
|
||
|
||||||||| |||
||||
|||||||||
||||||||||||||| ||
0
qu
eu
eapprox
|
|
|||
|
||
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
||
|
||
|
|
||
|
|
||
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
||
|
|
||
||
|
||
|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
||
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|||
|
|
||
|
|
||
|
|
|
|
||
|
||
|
|
|
|
|
||
|
||
|
|
|
|
|
|
|
15
10
5
0
15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
t
cp
50
40
30
20
10
0
47. 5
10
48. 0
48. 5
49. 0
49. 5
50. 0
Mathematical models for the queue
• Over short timescales (<1ms), TCP traffic is approximately Poisson
[“Internet traffic tends toward Poisson and independent as the load increases”,
Cao, Cleveland, Lin, Sun, 2002]
tcprecord
• With small buffers, queue fluctuations are rapid
(average duration of a busy period is inversely proportional to line rate)
15
10
5
[Cao, Ramanan, 2002]
0
queueapprox
||
||
|
|
||
|
queue size
[0-16pkt]
|
15
10
5
0
|
||||
|
||||||
|
| ||
||
| ||||||
|
||| ||
||
||||||||||
|| | ||
||||
|||
||
| ||||
| |||
|||
tcp
50
40
30
|
|||||
| ||
|||||
| ||||
time [0-0.1s]
500 flows, service rate 0.8Mbit/s (100pkt/s/flow),
average RTT 120ms, buffer size 16pkt
20
10
0
40.00
40.02
40.04
40.06
40.08
40.10
Mathematical models for the queue
• Over short timescales (<1ms), TCP traffic is approximately Poisson
[“Internet traffic tends toward Poisson and independent as the load increases”,
Cao, Cleveland, Lin, Sun, 2002]
tcprecord
• With small buffers, queue fluctuations are rapid
(average duration of a busy period is inversely proportional to line rate)
15
10
5
[Cao, Ramanan, 2002]
0
queueapprox
||
||
|
|
||
|
queue size
[0-16pkt]
|
15
10
5
0
|
||||
|
||||||
|
| ||
||
| ||||||
|
||| ||
||
||||||||||
|| | ||
||||
|||
||
| ||||
| |||
|||
tcp
50
40
30
|
|||||
| ||
|||||
| ||||
time [0-0.1s]
500 flows, service rate 0.8Mbit/s (100pkt/s/flow),
average RTT 120ms, buffer size 16pkt
20
• Therefore, we model small-buffer routers using Poisson traffic models
Long-range dependence, self similarity, and heavy tails are irrelevant!
10
0
40.00
40.02
40.04
40.06
40.08
40.10
• The packet loss probability in an M/D/1/B queue, at link utilization ,
is roughly p = B
Mathematical models for TCP
• Over long timescales (>10ms), TCP traffic can be modelled
using differential equations & control theory
[Misra, Gong, Towsley, 2000]
average data rate at time t
round trip time
(“delay”)
pkt drop probability at time t-RTT
(calculated from Poisson model for the queue)
• Use this to
– calculate equilibrium link utilization (the “TCP throughput formula”)
available bandwidth per flow
– work out whether the system is synchronized or stable
– evaluate the effect of changing buffer size
– investigate alternatives to TCP
Instability plot
link utilization 
0.5
1
1.5
2
-1
TCP throughput formula
log10 of
pkt loss
probability p
-2
-3
C = bandwidth per flow
RTT = average round trip time
C RTT = TCP window size
-4
loss probability
for M/D/1/B queue
Instability plot
link utilization 
0.5
-1
log10 of
pkt loss
probability p
1
extent of
oscillations
in 
1.5
2
TCP throughput formula
-2
-3
C = bandwidth per flow
RTT = average round trip time
C RTT = TCP window size
-4
loss probability
for M/D/1/B queue
Instability plot
link utilization 
0.5
1
1.5
2
-1
C RTT = 4 pkts
log10 of
pkt loss
probability p
-2
-3
C RTT = 20 pkts
-4
C RTT = 100 pkts
B = 20 pkts
Different buffer size / TCP
Intermediate buffers buffer = bandwidth×delay / √ #flows
or
Large buffers buffer = bandwidth×delay
b25
b100
b400
-1
-2
Large buffers with AQM
-3
buffer = bandwidth×delay× {¼,1,4}
-4
0.5
b10
1
1.5
b20
b50
-1
-2
Small buffers
-3
buffer={10,20,50} pkts
-4
0.5
b50
1
1.5
b1000
-1
-2
Small buffers, ScalableTCP
p -3
-4
buffer={50,1000} pkts
[Vinnicombe 2002, T.Kelly 2002]
-5
-6
0.5
1
1.5
Results
lowbandwidth
flows
[window <5pkts]
highbandwidth
flows
[window >10pkts]
Small buffers
Large buffers
[20-50 pkts]
[>50ms queue delay]
• slightly
synchronized;
• high utilization;
• low delay/jitter.
• desynchronized;
• high utilization;
• high delay,
low jitter.
• desynchronized;
• severely
synchronized;
• moderate utilization;
• high utilization;
• low delay/jitter.
• high delay/jitter.
There is a virtuous circle:
desynchronization permits small buffers; small buffers induce desynchronization.
Limitations/concerns
• Surely bottlenecks are at the access network,
not the core network?
– Unwise to rely on this!
– The small-buffer theory works fine for as few as 20 flows
– If the core is underutilized, it definitely doesn’t need big buffers
• The Poisson model sometimes breaks down
– because of short-timescale packet clumps (not LRD!)
– need more measurement of short-timescale Internet traffic statistics
• Limited validation so far
[McKeown et al. at Stanford, Level3, Internet2]
• Proper validation needs
– goodly amount of traffic
– full measurement kit
– ability to control buffer size
Conclusion
• Buffer sizes can be very small
– a buffer of 25pkt gives link utilization > 90%
– small buffers mean that TCP flows get better feedback,
so they can better judge how much capacity is available
– use Poisson traffic models for the router,
differential equation models for aggregate traffic
– Further questions:
What is the impact of packet ‘clumpiness’?
How well would non-FIFO buffers work?
• TCP can be improved with simple changes
– e.g. spacing out TCP packets
– e.g. modify the window increase/decrease rules
[ScalableTCP: Vinnicombe 2004, Kelly 2004; XCP: Katabi, Handley, Rohrs 2000]
– any future transport protocol should be designed along these lines
– improved TCP may find its way into Linux/Windows within 5 years