3rd Edition: Chapter 2
Download
Report
Transcript 3rd Edition: Chapter 2
Week 11
TCP Congestion Control
1
Principles of Congestion Control
Congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!
2
Causes/costs of congestion: scenario 1
Host A
two senders, two
receivers
one router,
Host B
lout
lin : original data
unlimited shared
output link buffers
infinite buffers
no retransmission
large delays
when congested
maximum
achievable
throughput
3
Causes/costs of congestion: scenario 2
one router,
finite buffers
sender retransmission of lost packet
Host A
Host B
lin : original
data
l'in : original data, plus
retransmitted data
lout
finite shared output
link buffers
4
Causes/costs of congestion: scenario 2
= l
(goodput)
out
in
“perfect” retransmission only when loss:
always:
l
l > lout
in
retransmission of delayed (not lost) packet makes
(than perfect case) for same
R/2
l
lout
R/2
in larger
R/2
lin
R/2
a.
lout
lout
lout
R/3
lin
R/2
b.
R/4
lin
R/2
c.
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
5
Causes/costs of congestion: scenario 3
Q: what happens as lin
and linincrease ?
four senders
multihop paths
timeout/retransmit
Host A
lin : original data
lout
l'in : original data, plus
retransmitted data
finite shared output
link buffers
Host B
6
Example
What happens when each demand peaks at unity rate?
Throughput = 1.52? (How) twice the unity rate T = 1.07?
7
Max-min fair allocation
Given a network and a set of sessions we
would like to find a maximal flow that it
is fair
We will see different definitions for
max-min fairness and will learn a flow
control algorithm
The tutorial will give understanding what
is max-min fairness
8
How define fairness?
Any session is entitled to as much network
use as is any other.
Allocating the same share to all.
9
Max-Min Flow Control Rule
The rule is maximizing the network use
allocated to the sessions with the minimum
allocation
An alternative definition: is to maximize
the allocation of each session i under
constraint that an increase in i’s allocation
doesn’t cause a decrease in some other
session allocation with the same or smaller
rate than i
10
Example
Session 3
Session 2
Session 1
C=1
C=1
Session 0
Maximal fair flow division will be to give
for the sessions 0,1,2 a flow rate of 1/3
and for the session 3 a flow rate of 2/3
11
Notation
G=(N,A) - Directed network graph (N is set
of vertexes and A is set of edges)
Ca – the capacity of a link
Fa – the flow on a link
a
a
P – a set of the sessions
rp – the rate of a session
p
We assume a fixed, single-path routing method
12
Definitions
Fa
rp
all sessions p
crossing link a
We have following constraints on the
vector r= {rp | p Є P}
rp 0, for all p P
Fa Ca , for all a A
A vector
r satisfying these constraints is
said to be feasible
13
Definitions
r is said to be max-min
fair if is a feasible and for each p Є P, rp
A vector of rates
can not be increased while maintaining
feasibility without decreasing rp’ for some
session p’ for which rp’ ≤ rp
We want to find a rate vector that is max-
min fair
14
Bottleneck Link for a Session
Given some feasible flow r, we say that
a is a
bottleneck link with respect to r, for a session p
crossing a if Fa = Ca and rp ≥ rp’ for all sessions p’
crossing link a.
4:1
1:2/3
1
b
4
d
5
5:1/3
3:1/3
2:1/3
2
3
c
a
All link capacity is 1. Bottlenecks for 1,2,3,4,5 respectively are: c,a,a,d,a
Note: c is not a bottleneck for 5 and b is not a bottleneck for 1
15
Max-Min Fairness Definition
Using Bottleneck
Theorem: A feasible rate vector
r is max-
min fair if and only if each session has a
bottleneck link with respect to r
16
Algorithm for Computing
Max-Min Fair Rate Vectors
The idea of the algorithm:
Bring all the sessions to the state that they
have a bottleneck link and then according to
theorem it will be the maximal fair flow
We start with all-zero rate vector and to
increase rates on all paths together until Fa = Ca
for one or more links a.
At this point, each session using a saturated
link has the same rate as every other session
using this link. Thus, these saturated links
serve as bottleneck links for all sessions using
them
17
Algorithm for Computing
Max-Min Fair Rate Vectors
At the next step, all sessions not using the
saturated links are incremented equally in
rate until one or more new links become
saturated
Note that the sessions using the previously
saturated links might also be using these
newly saturated links (at a lower rate)
The algorithm continues from step to step,
always equally incrementing all sessions not
passing through any saturated link until all
session pass through at least one such link
18
Algorithm for Computing
Max-Min Fair Rate Vectors
Init: k=1, Fa0=0, rp0=0, P1=P and A1=A
1. For all aA, nak:= num of sessions pPk
crossing link a
2. Δr=minaAk(Ca-Fak-1)/nak (find inc size)
3. For all p Pk, rpk:=rpk-1+ Δr, (increment)
for other p, rpk:=rpk-1
4. Fak:=Σp crossing arpk (Update flow)
5. Ak+1:= The set of unsaturated links.
6. Pk+1:=all p’s, such that p cross only links in
Ak+1
7. k:=k+1
8. If Pk is empty then stop, else goto 1
19
Example of Algorithm Running
4:1
1:2/3
1
b
4
d
5
5:1/3
3:1/3
2:1/3
2
All link capacity is 1
3
c
a
Step 1: All sessions get a rate of 1/3, because of
a and the link a is
saturated.
Step 2: Sessions 1 and 4 get an additional rate increment of 1/3
for a total of 2/3. Link c is saturated now.
Step 3: Session 4 gets an additional rate increment of 1/3 for a
total of 1. Link d is saturated.
End
20
Example revisited
Max-min fair
vector if Tij = ∞
r = (½ ½ ½ ½ )
T = 2 > 1.52
What if the
demands T13 and
T31 = ¼ , T24 = ½,
T42 = ∞
r = (¼ ½ ¼ ¾)
21
Causes/costs of congestion: scenario 3
H
o
s
t
A
l
o
u
t
H
o
s
t
B
Another “cost” of congestion:
when packet dropped, any “upstream transmission
capacity used for that packet was wasted!
22
Approaches towards congestion control
Two broad approaches towards congestion control:
End-end congestion
control:
Network-assisted
congestion control:
no explicit feedback from
routers provide feedback
network
congestion inferred from
to end systems
end-system observed loss,
delay
approach taken by TCP
single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
explicit rate sender
should send at
23
Case study: ATM ABR congestion control
ABR: available bit rate:
“elastic service”
RM (resource management)
cells:
if sender’s path
sent by sender, interspersed
“underloaded”:
sender should use
available bandwidth
if sender’s path
with data cells
bits in RM cell set by switches
(“network-assisted”)
congested:
sender throttled to
minimum guaranteed
rate
NI bit: no increase in rate
(mild congestion)
CI bit: congestion
indication
RM cells returned to sender by
receiver, with bits intact
24
Case study: ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell
congested switch may lower ER value in cell
sender’ send rate thus minimum supportable rate on path
EFCI bit in data cells: set to 1 in congested switch
if data cell preceding RM cell has EFCI set, sender sets CI
bit in returned RM cell
25
TCP Congestion Control
end-end control (no network
assistance)
sender limits transmission:
LastByteSent-LastByteAcked
CongWin
Roughly,
rate =
CongWin
Bytes/sec
RTT
CongWin is dynamic, function
of perceived network
congestion
How does sender
perceive congestion?
loss event = timeout
or
3 duplicate acks
TCP sender reduces
rate (CongWin) after
loss event
three mechanisms:
AIMD
slow start
conservative after
timeout events
26
TCP AIMD
multiplicative decrease:
cut CongWin in half
after loss event
congestion
window
24 Kbytes
additive increase:
increase CongWin by
1 MSS every RTT in
the absence of loss
events: probing
16 Kbytes
8 Kbytes
time
Long-lived TCP connection
27
Additive Increase
increase CongWin by 1 MSS every RTT in
the absence of loss events: probing
cwnd
+= SMSS*SMSS/cwnd (*)
This adjustment is executed on every
incoming non-duplicate ACK.
Equation (*) provides an acceptable
approximation to the underlying principle
of increasing cwnd by 1 full-sized segment
per RTT.
28
TCP Slow Start
When connection begins,
CongWin = 1 MSS
Example: MSS = 500
bytes & RTT = 200 msec
When connection begins,
increase rate
exponentially fast until
first loss event
initial rate = 20 kbps
available bandwidth may
be >> MSS/RTT
desirable to quickly ramp
up to respectable rate
29
TCP Slow Start (more)
When connection
Host A
Host B
RTT
begins, increase rate
exponentially until
first loss event:
double CongWin every
RTT
done by incrementing
CongWin for every ACK
received
Summary: initial rate
is slow but ramps up
exponentially fast
time
30
Refinement
After 3 dup ACKs:
CongWin is cut in half,
Threshold is set to
CongWin
window then grows linearly
But after timeout event:
Philosophy:
• 3 dup ACKs indicates
network capable of
delivering some segments
• timeout before 3 dup
ACKs is “more alarming”
Threshold set to
CongWin/2 and CongWin
instead set to 1 MSS;
window then grows
exponentially
to a threshold, then grows
linearly
31
Refinement (more)
Q: When should the
exponential
increase switch to
linear?
A: When CongWin
gets to 1/2 of its
value before
timeout.
Implementation:
Variable Threshold
At loss event, Threshold is
set to 1/2 of CongWin just
before loss event
32
Summary: TCP Congestion Control
When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
33
TCP sender congestion control
Event
State
TCP Sender Action
Commentary
ACK receipt for
previously unacked
data
Slow Start (SS)
CongWin = CongWin + MSS,
If (CongWin > Threshold)
set state to “Congestion
Avoidance”
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously unacked
data
Congestion
Avoidance (CA)
CongWin = CongWin+MSS *
(MSS/CongWin)
Additive increase, resulting in
increase of CongWin by 1 MSS
every RTT
Loss event detected
by triple duplicate
ACK
SS or CA
Threshold = CongWin/2,
CongWin = Threshold,
Set state to “Congestion
Avoidance”
Fast recovery, implementing
multiplicative decrease. CongWin
will not drop below 1 MSS.
Timeout
SS or CA
Threshold = CongWin/2,
CongWin = 1 MSS,
Set state to “Slow Start”
Enter slow start
Duplicate ACK
SS or CA
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
34
TCP Futures
Example: 1500 byte segments, 100ms RTT, want 10
Gbps throughput
Requires window size W = 83,333 in-flight
segments
Throughput in terms of loss rate:
1.22 MSS
RTT p
p = 2·10-10
New versions of TCP for high-speed needed!
35
Macroscopic TCP model
Deterministic
packet losses
1/p packets transmitted in a cycle
success
loss
36
TCP Model Contd
Equate the trapozeid
area 3/8 W2 under to 1/p
C 3 / 2 1.22
37
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
38
Why is TCP fair?
Two competing sessions:
Additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
R
equal bandwidth share
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 throughput R
39
Fairness (more)
Fairness and UDP
Multimedia apps often
do not use TCP
do not want rate
throttled by congestion
control
Instead use UDP:
pump audio/video at
constant rate, tolerate
packet loss
Research area: TCP
friendly, DCCP
Fairness and parallel TCP
connections
nothing prevents app from
opening parallel cnctions
between 2 hosts.
Web browsers do this
Example: link of rate R
supporting 9 cnctions;
new app asks for 1 TCP, gets
rate R/10
new app asks for 10 TCPs,
gets R/2 !
40
Queuing Disciplines
Each router must implement some queuing
discipline
Queuing allocates both bandwidth and
buffer space:
Bandwidth: which packet to serve (transmit)
next
Buffer space: which packet to drop next (when
required)
Queuing also affects latency
41
Typical Internet Queuing
FIFO + drop-tail
Simplest
choice
Used widely in the Internet
FIFO (first-in-first-out)
Implies single class of traffic
Drop-tail
Arriving packets get dropped when queue is full
regardless of flow or importance
Important distinction:
FIFO: scheduling discipline
Drop-tail: drop policy
42
FIFO + Drop-tail Problems
Leaves responsibility of congestion control
completely to the edges (e.g., TCP)
Does not separate between different flows
No policing: send more packets get more
service
Synchronization: end hosts react to same
events
43
FIFO + Drop-tail Problems
Full queues
Routers are forced to have have large queues to
maintain high utilizations
TCP detects congestion from loss
• Forces network to have long standing queues in
steady-state
Lock-out problem
Drop-tail routers treat bursty traffic poorly
Traffic gets synchronized easily allows a few
flows to monopolize the queue space
44
Active Queue Management
Design active router queue management to
aid congestion control
Why?
Router
has unified view of queuing behavior
Routers see actual queue occupancy (distinguish
queue delay and propagation delay)
Routers can decide on transient congestion,
based on workload
45
Design Objectives
Keep throughput high and delay low
High power (throughput/delay)
Accommodate bursts
Queue size should reflect ability to accept
bursts rather than steady-state queuing
Improve TCP performance with minimal
hardware changes
46
Lock-out Problem
Random drop
Packet arriving when queue is full causes some
random packet to be dropped
Drop front
On full queue, drop packet at head of queue
Random drop and drop front solve the lock-
out problem but not the full-queues
problem
47
Full Queues Problem
Drop packets before queue becomes
full (early drop)
Intuition: notify senders of
incipient congestion
Example:
early random drop (ERD):
• If qlen > drop level, drop each new packet
with fixed probability p
• Does not control misbehaving users
48
Random Early Detection (RED)
Detect incipient congestion
Assume hosts respond to lost packets
Avoid window synchronization
Randomly
mark packets
Avoid bias against bursty traffic
49
RED Algorithm
Maintain running average of queue length
If avg < minth do nothing
Low queuing, send packets through
If avg > maxth, drop packet
Protection from misbehaving sources
Else mark packet in a manner proportional
to queue length
Notify sources of incipient congestion
50
RED Operation
Min thresh
Max thresh
P(drop)
Average Queue Length
1.0
maxP
minth
maxth
Avg queue length
51
Improving QOS in IP Networks
Thus far: “making the best of best effort”
Future: next generation Internet with QoS guarantees
RSVP: signaling for resource reservations
Differentiated Services: differential guarantees
Integrated Services: firm guarantees
simple model
for sharing and
congestion
studies:
52
Principles for QOS Guarantees
Example: 1Mbps IP phone, FTP share 1.5 Mbps link.
bursts of FTP can congest router, cause audio loss
want to give priority to audio over FTP
Principle 1
packet marking needed for router to distinguish
between different classes; and new router policy
to treat packets accordingly
53
Principles for QOS Guarantees
(more)
what if applications misbehave (audio sends higher
than declared rate)
policing: force source adherence to bandwidth allocations
marking and policing at network edge:
similar to ATM UNI (User Network Interface)
Principle 2
provide protection (isolation) for one class from others
54
Principles for QOS Guarantees
(more)
fixed (non-sharable) bandwidth to flow:
inefficient use of bandwidth if flows doesn’t use
Allocating
its allocation
Principle 3
While providing isolation, it is desirable to use
resources as efficiently as possible
55
Principles for QOS Guarantees
(more)
Basic fact of life: can not support traffic demands
beyond link capacity
Principle 4
Call Admission: flow declares its needs, network may
block call (e.g., busy signal) if it cannot meet needs
56
Summary of QoS Principles
Let’s next look at mechanisms for achieving this ….
57
Scheduling And Policing
Mechanisms
scheduling: choose next packet to send on link
FIFO (first in first out) scheduling: send in order of arrival to
queue
real-world example?
discard policy: if packet arrives to full queue: who to discard?
• Tail drop: drop arriving packet
• priority: drop/remove on priority basis
• random: drop/remove randomly
58
Scheduling Policies: more
Priority scheduling: transmit highest priority queued
packet
multiple
classes, with different priorities
class may depend on marking or other header info, e.g. IP
source/dest, port numbers, etc.
59
Scheduling Policies: still more
round robin scheduling:
multiple classes
cyclically scan class queues, serving one from each
class (if available)
60
Scheduling Policies: still more
Weighted Fair Queuing:
generalized Round Robin
each class gets weighted amount of service
in each cycle
61
Deficit Weighted Round Robin
(DWRR)
DWRR addresses the limitations of the WRR
model by accurately supporting the weighted fair
distribution of bandwidth when servicing queues
that contain variable-length packets.
DWRR addresses the limitations of the WFQ
model by defining a scheduling discipline that has
lower computational complexity and that can be
implemented in hardware. This allows DWRR to
support the arbitration of output port bandwidth
on high-speed interfaces in both the core and at
the edges of the network.
62
DRR
In DWRR queuing, each queue is configured with a
number of parameters:
A weight that defines the percentage of the output port
bandwidth allocated to the queue.
A DeficitCounter that specifies the total number of
bytes that the queue is permitted to transmit each time
that it is visited by the scheduler. The DeficitCounter
allows a queue that was not permitted to transmit in the
previous round because the packet at the head of the
queue was larger than the value of the DeficitCounter to
save transmission “credits” and use them during the next
service round.
63
DWRR
In the classic DWRR algorithm, the scheduler
visits each non-empty queue and determines the
number of bytes in the packet at the head of the
queue.
The variable DeficitCounter isincremented by the
value quantum. If the size of the packet at the
head of the queue is greater than the variable
DeficitCounter, then the scheduler moves on to
service the next queue.
If the size of the packet at the head of the queue
is less than or equal to the variable
DeficitCounter, then the variable DeficitCounter
is reduced by the number of bytes in the packet
and the packet is transmitted on the output port.
64
DWRR
quantum of service that is proportional to the
weight of the queue and is expressed in terms of
A
bytes. The DeficitCounter for a queue is
incremented by the quantum each time that the
queue is visited by the scheduler.
The scheduler continues to dequeue packets and
decrement the variable DeficitCounter by the size
of the transmitted packet until either the size of
the packet at the head of the queue is greater
than the variable DeficitCounter or the queue is
empty.
If the queue is empty, the value of DeficitCounter
is set to zero.
65
DWRR Example
Queue 1: 50% BW, quantum[1] =1000
300
400
600
Queue 2: 25% BW, quantum[2] =500
400
300
400
Queue 3: 25% BW, quantum[3] =500
400
300
600
Modified Deficit Round Robin gives priority to
one say VoIP class
66
Policing Mechanisms
Goal: limit traffic to not exceed declared parameters
Three common-used criteria:
(Long term) Average Rate: how many pkts can be sent
per unit time (in the long run)
crucial question: what is the interval length: 100 packets per
sec or 6000 packets per min have same average!
Peak Rate: e.g., 6000 pkts per min. (ppm) avg.; 1500
ppm peak rate
(Max.) Burst Size: max. number of pkts sent
consecutively (with no intervening idle)
67
Policing Mechanisms
Token Bucket: limit input to specified Burst Size and Average
Rate.
bucket can hold b tokens
tokens generated at rate
r token/sec unless bucket full
over interval of length t: number of packets admitted less than
or equal to (r t + b).
68
Policing Mechanisms (more)
token bucket, WFQ combine to provide
guaranteed upper bound on delay, i.e., QoS
guarantee!
arriving
traffic
token rate, r
bucket size, b
WFQ
per-flow
rate, R
D = b/R
max
69