Transcript slides ppt
New designs for
Internet congestion control
Damon Wischik (UCL)
http://www.cs.ucl.ac.uk/staff/D.Wischik
Some Internet History
• 1974: First draft of TCP/IP
“A protocol for packet network interconnection”,
Vint Cerf and Robert Kahn
• 1983: ARPANET switches on TCP/IP
• 1986: Congestion collapse
• 1988: Congestion control for TCP
“Congestion avoidance and control”, Van Jacobson
“A Brief History of the Internet”, the Internet Society
Sizing router buffers SIGCOMM 2004
Guido Appenzeller Isaac Keslassy
Stanford University
Stanford University
Nick McKeown
Stanford University
Abstract. All Internet routers contain buffers to hold packets during times of congestion. Today, the size of
the buffers is determined by the dynamics of TCP's congestion control algorithm. In particular, the goal is
to make sure that when a link is congested, it is busy 100% of the time; which is equivalent to making sure
its buffer never goes empty. A widely used rule-of-thumb states that each link needs a buffer of size B =
RTT*C, where RTT is the average round-trip time of a flow passing across the link, and C is the data rate
of the link. For example, a 10Gb/s router linecard needs approximately 250ms*10Gb/s = 2.5Gbits of
buffers; and the amount of buffering grows linearly with the line-rate. Such large buffers are challenging
for router manufacturers, who must use large, slow, off-chip DRAMs. And queueing delays can be long,
have high variance, and may destabilize the congestion control algorithms. In this paper we argue that the
rule-of-thumb (B = RTT*C) is now outdated and incorrect for backbone routers. This is because of the
large number of flows (TCP connections) multiplexed together on a single backbone link. Using theory,
simulation and experiments on a network of real routers, we show that a link with N flows requires no
more than B = (RTT*C)/N, for long-lived or short-lived TCP flows. The consequences on router design
are enormous: A 2.5Gb/s link carrying 10,000 flows could reduce its buffers by 99% with negligible
difference in throughput; and a 10Gb/s link carrying 50,000 flows requires only 10Mbits of buffering,
which can easily be implemented using fast, on-chip SRAM.
http://tiny-tera.stanford.edu/~nickm/papers/index.html
bandwidth [0-100 kB/sec]
TCP
time [0-8 sec]
if (seqno > _last_acked) {
if (!_in_fast_recovery) {
_last_acked = seqno;
_dupacks = 0;
inflate_window();
send_packets(now);
_last_sent_time = now;
return;
}
if (seqno < _recover) {
uint32_t new_data = seqno - _last_acked;
_last_acked = seqno;
if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0;
_cwnd += _mss;
retransmit_packet(now);
send_packets(now);
return;
}
uint32_t flightsize = _highest_sent - seqno;
_cwnd = min(_ssthresh, flightsize + _mss);
_last_acked = seqno;
_dupacks = 0;
_in_fast_recovery = false;
send_packets(now);
return;
}
if (_in_fast_recovery) {
_cwnd += _mss;
send_packets(now);
return;
}
_dupacks++;
if (_dupacks!=3) {
send_packets(now);
return;
}
_ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss));
retransmit_packet(now);
_cwnd = _ssthresh + 3 * _mss;
_in_fast_recovery = true;
_recover = _highest_sent;
}
How TCP shares capacity
individual
flow
bandwidths
available
bandwidth
sum of flow
bandwidths
time
Macroscopic description of TCP
• Let x be the mean bandwidth of a flow [pkts/sec]
Let RTT be the flow’s round-trip time [sec]
Let p be the packet loss probability
• The TCP algorithm increases x at rate 1/RTT2 [pkts/sec]
and reduces x by x/2 for every packet loss
• average increase in rate = average decrease in rate:
1/RTT2 = (p x) x/2
Macroscopic description
• Let x be the mean bandwidth of a flow [pkts/sec]
Let RTT be the flow’s round-trip time [sec]
Let p be the packet loss probability
• The TCP algorithm increases x at rate 1/RTT2 [pkts/sec]
and reduces x by x/2 for every packet loss
• average increase in rate = average decrease in rate:
1/RTT2 = (p x) x/2
• Consider a link with N identical flows
Let NC be the capacity of the link [pkts/sec]
• packet loss ratio = fraction of work that exceeds service rate:
p = (Nx-NC)+/Nx = (x-C)+/x
Fixed-point analysis
traffic intensity x/C
0.5
1
1.5
2
-1
C*RTT=4 pkts
log10 of
pkt loss
probability
-2
-3
C*RTT=20 pkts
-4
C*RTT=100 pkts
Teleological description
U(x)
P(y,C)
• Consider several TCP flows sharing a single link
• Let xr be the mean bandwidth of flow r [pkts/sec]
Let y be the total bandwidth of all flows [pkts/sec]
Let C be the total available capacity [pkts/sec]
• TCP and the network act so as to solve
maximise r U(xr) - P(y,C)
over xr0 where y=r xr
x
C
y
Rate control in communication networks: shadow
prices, proportional fairness and stability
Journal of the Operational Research Society, 1998
F.P.Kelly, A.K.Maulloo, D.K.H.Tan
Statistical Laboratory, Cambridge
Abstract. This paper analyses the stability and fairness of two classes of rate control algorithm for
communication networks. The algorithms provide natural generalizations to large-scale networks of simple
additive increase/multiplicative decrease schemes, and are shown to be stable about a system optimum
characterized by a proportional fairness criterion. Stability is established by showing that, with an
appropriate formulation of the overall optimization problem, the network's implicit objective function
provides a Lyapunov function for the dynamical system defined by the rate control algorithm. The
network's optimization problem may be cast in primal or dual form: this leads naturally to two classes of
algorithm, which may be interpreted in terms of either congestion indication feedback signals or explicit
rates based on shadow prices. Both classes of algorithm may be generalized to include routing control, and
provide natural implementations of proportionally fair pricing.
http://www.statslab.cam.ac.uk/~frank/rate.html
U(x)
Teleological description
little extra valued
attached to highbandwidth flows
severe penalty for
allocating too little
bandwidth
x
x
Teleological description
U(x)
flows with large
RTT are satisfied
with little bandwidth
flows with small
RTT want more
bandwidth
x
P(y,C)
Teleological description
no penalty unless
links are overloaded
C
y
Teleological description
Is this what we want the Internet to optimize?
Does it make good use of the network?
Can it deliver high bandwidth and good quality?
Is it a fair allocation?
Can we design a better allocation?
U(x)
•
•
•
•
•
x
C
y
Teleology & dynamics
• The network acts to solve an optimization problem.
– We can choose which optimization problem,
by changing the router & TCP’s code.
• But the network may or may not attain the solution!
U(x)
– To understand this, we need a
dynamical description of TCP
x
C
y
Dynamical description
• Consider a link with N flows
and capacity NC and buffer N1/2B
• Let xt be the average bandwidth at time t
Let pt be the packet loss probability at time t
• As N we believe a mean-field limit holds.
Dynamical description
• Fluid-based Analysis of a Network of AQM
Routers Supporting TCP Flows with an
Application to RED SIGCOMM 2000
Vishal Misra, Wei-Bo Gong, Don Towsley
• Rate-based versus queue-based models of
congestion control ACM Sigmetrics 2004
Supratim Deb, R. Srikant
• Mean field convergence of a rate model of
multiple TCP connections through a buffer
implementing RED To appear in Annals of Applied Probability
David McDonald, Julien Reynier
Dynamical stability/instability
arrival
rate x/C
1.4
1.2
0.8
0.6
20
40
60
80
100
20
40
60
80
100
1.4
1.2
0.8
0.6
time
• For some values of C and RTT,
the dynamical system is stable
• For others it is unstable and there are oscillations
(i.e. the flows are partially synchronized)
G.Raina and W. (2005)
Illustration of instability
Standard TCP, single bottleneck link, no AQM,
service C=60 pkts/sec/flow, buffer B=170pkts,
RTT=200 ms, #flows N=200
queue size
[0-170pkts]
flow bandwidths
[0-35pkts/RTT]
time [80-90sec]
Instability plot
traffic intensity x/C
0.5
1
1.5
2
-1
C*RTT=4 pkts
log10 of
pkt loss
probability
-2
-3
C*RTT=20 pkts
-4
C*RTT=100 pkts
Alternative buffer-sizing rules
Rule-of-thumb buffer size
buffer = bandwidth*delay
b100
b400
Rule-of-thumb buffer size, with RED
buffer=bandwidth*delay,
drop packets selectively before the buffer fills
b50
1.5
Small buffers
buffer=50 pkts
b50
b1000
-1
-2
p -3
Small buffers, ScalableTCP
-4
buffer=50 pkts, revised rate-increase rule
-5
-6
0.5
1
1.5
Scalable TCP: improving performance in
highspeed wide area networks SIGCOMM CCR 2003
Tom Kelly
CERN -- IT division
Abstract. TCP congestion control can perform badly in highspeed wide area networks because of its slow
response with large congestion windows. The challenge for any alternative protocol is to better utilize
networks with high bandwidth-delay products in a simple and robust manner without interacting badly with
existing traffic. Scalable TCP is a simple sender-side alteration to the TCP congestion window update
algorithm. It offers a robust mechanism to improve performance in highspeed wide area networks using
traditional TCP receivers. Scalable TCP is designed to be incrementally deployable and behaves identically
to traditional TCP stacks when small windows are sufficient. The performance of the scheme is evaluated
through experimental results gathered using a Scalable TCP implementation for the Linux operating system
and a gigabit transatlantic network. The preliminary results gathered suggest that the deployment of
Scalable TCP would have negligible impact on existing network traffic at the same time as improving bulk
transfer performance in highspeed wide area networks.
http://www-lce.eng.cam.ac.uk/~ctk21/scalable/
Teleological description
With small buffers,
the network likes to
run with slightly
lower utilization
U(x)
P(y,C)
ScalableTCP gives
more weight to highbandwidth flows
x
C
y
Conclusion
• The network acts to solve an optimization problem.
– We can choose which optimization problem,
by choosing the right buffer size & changing TCP’s code.
• It might not attain the solution
– In order to make sure the network is stable,
we need to choose the buffer size & TCP code carefully.
• PROPOSAL
– Buffers of size 20 packets in core routers
keep utilization below 90%; eliminate delay and jitter
– ScalableTCP
able to deliver higher bandwidth than TCP