Understanding Network Performance in Extreme Congestion Scenario

Download Report

Transcript Understanding Network Performance in Extreme Congestion Scenario

Transport and Application Layer
Approaches to Improve End-to-end
Performance in the Internet
PhD thesis defense
Amit Mondal
Committee:
Aleksandar Kuzmanovic, Asst. Professor, Northwestern Univ
Peter Dinda, Assoc. Professor, Northwestern Univ
Yan Chen, Assoc. Professor, Northwestern Univ
Jin Li, Principal Researcher, Microsoft Research
Internet — A multiservice IP network
VoIP
FTP
IPTV
The InternetSRis a UTAH
commercial infrastructure used
by diverse
set of applications and services
UCSB
UCLA
Video Conferencing Streaming
Gaming
2
Challenges involved…

Applications have end-to-end network performance
requirements
– Jitter, latency, packet loss, bandwidth, etc

Original Internet
–
–
–
–
Best effort service
No service assurance
TCP ensures only in-order packet delivery
Destination-based IP routing
“high throughput”
“low delay”
Need to provide support to new set of emerging
applications in the Internet
3
Application classification based on QoS
Low bandwidth
High bandwidth
Latency sensitive
VoIP, network games, SSH,
chatting, web browsing, ecommerce
Multimedia streaming,
IPTV, audio/video
conferencing
Latency
insensitive
Email
File transfer
(FTP/BitTorrent)
My focus:
 Low-latency interactive TCP applications (Chapter
II and III)
–

Telnet, SSH, network games, e-commerce, etc.
Interactive multimedia services (Chapter IV and V)
–
Audio/video conferencing, VoIP, streamed multimedia
services, etc.
4
Endpoint
Forward error
correction,
Bitrate adaptation,
Chapter-V, etc.
TCP smart framing, Limited
retransmit, Early retransmit,
Chapter-II, Chapter-III, etc.
N/A
N/A
N/A
Data Link
Physical
Transport
ECN, ECN+, packet marking &
differential dropping, Service
differentiation, etc.
Application
Infrastr
Overlay routing
(QRON, QSON, etc.)
Chapter-IV
IntServ, DiffServ,
Traffic
engineering,
Constraint based
routing
Network
The spectrum of QoS provisioning
MPLS
Bandwidth
overprovisioning
5
Research thesis
Despite much work to improve end-to-end performance
in the Internet, there still exists a significant space for
improvement. In my dissertation, I develop techniques to
reduce the gap further.

For example, I propose techniques that improve
– Response times of short TCP flows by five times in certain
scenarios
– Median Mean Opinion Score (MOS) of VoIP calls over WiFi
by a factor of two
6
Outline
Chapter I: Introduction
 Chapter II: Improving performance of thin-stream
TCP applications
 Chapter III: Removing exponential backoff from TCP
 Chapter IV: Multi-constraint QoS routing framework
 Chapter V: Audio/video performance Issues:
Diagnosis and solutions
 Conclusion

7
Chapter II: Improving thin-stream TCP
flows
Upgrading mice to elephants
data
packets
strict
priority
TCP-fair rate
“dummy”
packets
Packet switched


Circuit switched
A. Mondal and A. Kuzmanovic, “When TCP Friendliness Becomes Harmful”, IEEE INFOCOM 2007
A. Mondal and A. Kuzmanovic, “Upgrading Mice to Elephants: Effects and End-Point Solutions”, IEEE/ACM
Transactions on Networking, Volume 18, Issue 2, April 2010
8
Chapter III: Removing Exponential Backoff
from TCP

V. Jacobson, “Congestion Avoidance and Control,” in
ACM CCR, 18(4): 314-329, Aug 1988.
– Exponential retransmit timer backoff

Implicit packet conservation principle

Response times improvement of short and
interactive flows by five times in certain scenarios

A. Mondal and A. Kuzmanovic, “Removing Exponential Backoff from TCP”, In ACM SIGCOMM CCR, Volume
38, Number 5, October 2008.
9
Chapter IV: Multi-constraint QoS routing
framework

We design a framework that finds path under
multiple constraints without NP-hard computation
 Dijkstra’s algorithm involves NP-hard computation
Hybrid protocol of path vector protocol and ondemand route discovery
 Using simulation based on real-world data we
demonstrated that our solution is both efficient and
scalable
 Built a functional prototype using Click Modular
router


A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, “Supporting Application Network Flows with
Multiple QoS Constraints”, In IEEE IWQoS 2009
10
Chapter V: Audio/video performance
issues: Diagnosis and solutions

Identify challenges towards high quality audio/video
conferencing over the Internet

Understand loss and jitter behavior in shorter time
scale and quantify impacts of various network
scenarios

Investigate solutions


A. Mondal, R. Cutler, C. Huang, J. Li, and A. Kuzmanovic, “SureCall: Towards Glitch-Free Real-time
Audio/Video Conferencing”, In IEEE IWQoS 2010
A. Mondal, C. Huang, M. Jain, J. Li, and A. Kuzmanovic, “A Case of WiFi Relay: Improving VoIP Quality for
WiFi Users”, In IEEE ICC 2010
11
Modern AV conferencing System
12
SureCall platform

A distributed measurement and experiment
platform
–
–
–
–

Understand problems and experiment solutions
Agents installed on volunteers’ machines
Measurements and experiments driven by masters
SureCall agents are upgradeable without user
intervention
Available from
http://research.microsoft.com/~chengh/SureCall/SureCall.htm
13
SureCall measurement

Emulated bidirectional audio/video sessions using UDP
–
–
–
–
–

5 minute per hour
Audio bitrate : 24 kbps
Video bitrate: 192 kbps
STUN NAT traversal protocol for home users
Detailed packet-level traces collected
Network connectivity close to the clients
– ICMP packet pair with TTL=2


Traceroute to other endpoint at the beginning and end
of each session
Environmental details on client machines
– CPU load, network interface type
14
SureCall deployment
Microsoft global enterprise network
 Many residential networks
 Current deployment status

– 80 unique machines
• Enterprise - 32
• Home – 20
• Both – 28

Enterprise trace and Home trace
– Two separate masters (within enterprise network and in
public Internet)
15
SureCall dataset

4,800 hours of packet traces
– 4,100 from enterprise
– 700 from home

1968 unique IP addresses
– Enterprise - 1212
– Home -756

Trace classification and
stratification
– Intra-continental vs intercontinental
– Wired vs wireless
– Audio-only vs audio+video

Trace preprocessing
– Clock skew removal
Clock skew in wild
16
Jitter computation algorithm

Multiple algorithms to compute jitter
– Variance of one-way-delay samples
– Time difference between actual packet receiving time
and ideal receiving time
• Most relevant for multimedia streaming/conferencing with
playout buffer
17
Jitter in enterprise and residential
networks
US-US, wired traces
Inter-continental, wired traces
Residential networks have significantly higher jitter
compared to enterprise networks and affected greatly
by inter-continental links.
18
Jitter variation across hosts
Enterprise
Home
Jitter variation is much higher in residential networks than in
enterprise networks. The 95-th percentile jitter values are
significantly worse than median jitter values in home
networks.
19
Packet loss in residential and enterprise
networks
Even well provisioned enterprise networks can become quite
congested in short time scale. Both enterprise and home
networks show long tail in loss burst size distribution.
20
Impact of WiFi connections
In both enterprise and home networks, wireless traces
show significantly worse jitter statistics than wired traces.
Enterprise
Home
21
Impact of WiFi connections
In both enterprise and home networks, wireless traces
show significantly worse jitter statistics than wired traces.
The degradation due to WiFi in enterprise scenarios is
more severe than that in home scenarios.
Enterprise
Home
22
Impact of VPN on performance
Jitter
Loss
VPN connection causes more degradation compared
to wireless.
23
Can jitter predict future loss events?

Extent to which loss and jitter are correlated, i.e.
whether abrupt jitter increase can serve as a
precursor of network congestion and predict future
loss events
– audio/video conferencing applications can take
anticipatory action.

> 10 ms average increase in end-to-end delay for the
last three packets preceding a loss event
– enterprise networks ~ 82% ,
– home networks ~ 80%
24
Correlation between loss burst size and
jitter
Enterprise
Home
1. End-to-end delay increases significantly before loss events
in both enterprise and home networks.
2. Increase in end-to-end delay is not a great indicator of loss
burst size in enterprise networks.
25
Network audio diagnostics
Concealed: percent of packets interpolated or
extrapolated due to unrecovered packet loss
 Stretched: percent of packets stretched via time
compression
 Classifier operates as follows


Supervised training with ground-truth objectively
determined by PESQ score
26
Audio classifier performance
The classifier achieves a true positive rate >80%
and false positive rate < 1% for T1=T2=0.07.
27
WiFi Relay: Improving VoIP Quality for
WiFi Users

Large number of WiFi clients both in enterprise
and residential networks
– 43% enterprises provide only WiFi connections to their
employees
– 36% uses VoIP over WiFi
WiFi links can significantly degrade
VoIP performance

Possible reasons
– dense deployment of APs, overloading of an AP point,
other wireless devices in the vicinity, etc
28
Effectiveness of redundancy

Passive analysis with voice packet replication
– Replication ratio r = 2,3,4, or 5
Packet losses can be effectively mitigated using
application layer packet replication
29
Overhead of replication
• Typical audio packet size = 60 bytes
• Encapsulated with RTP(12bytes), UDP (8bytes),
IP(20bytes), 802.11 MAC(28bytes), PHY (20us for
802.11g) headers.
• w/o ACK: air time = DIFS + PHY header + (60+76
bytes)/54Mbps = 70 us
Replication
ratio
1
Air time (us)
w/o ACK
w/ ACK
70
102
96
128
Replicating audio packet
at application layer 111
causes
79
only marginal increase
in air time
3
87
120
2
4
30
WiFi relay solution



Nearby wired endpoints as relays
Heavy replication between relays and wireless endpoints
No dedicated infrastructure
31
Evaluation

Evaluated on SureCall platform
– Upgrade SureCall clients to support relay

Simultaneous direct call and relayed VoIP calls
between each pair of SureCall agents
– Apple-to-apple comparison
– One-hop overlay (only one wireless endpoint)
– Two-hop overlay (both endpoints are wireless)

Relay node selection based on enterprise internal
database
32
Impact of relay on jitter

No dedicated infrastructure, ordinary endpoints as
relay nodes
Relay has negligible impact on end-to-end jitter
CDF of jitter diff at 50th percentile
CDF of jitter diff at 95th percentile
33
Improvement with WiFi relay
WiFi relay greatly reduces
packet loss
WiFi relay significantly improve
VoIP quality for WiFi users

Mean Opinion Score
(MOS)
– Calculated from
packet loss rate and
jitter (Cole et al.
CCR’01)
– Fixed de-jitter buffer
of 100 ms
34
Summary of Chapter V
SureCall, a distributed experimental platform, to
address the challenges of audio/video
communications over Internet.
 Characterized enterprise and residential networks
over a wide variety of network scenarios
 Classifier that accurately predicts when network
issues most likely to cause audio quality degradation
 WiFi relay that significantly improve VoIP qualify for
WiFi clients

35
Conclusion
Proposed easily deployable techniques to improve
performance of TCP based interactive applications
 Demonstrated that exponential backoff can be
altogether removed from TCP without any stability
issues
 Designed an overlay framework to support
multimedia services with multiple QoS constraints
 Developed an distributed experimental framework,
SureCall, to understand the challenges towards IP
based audio/video communications and for rapid
evaluation of new protocols

36
Thank you!
37
Publications
[1] A. Mondal and A. Kuzmanovic, “When TCP Friendliness Becomes Harmful”, In IEEE INFOCOM
2007
[2] A. Mondal and A. Kuzmanovic , “A Poisoning-Resilient TCP Stack”, In IEEE ICNP 2007
[3] A. Mondal and A. Kuzmanovic , “Removing Exponential Backoff from TCP”, In ACM SIGCOMM
CCR, Volume 38, Number 5, October 2008.
[4] A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, “Supporting Application Network
Flows with Multiple QoS Constraints”, In IEEE IWQoS 2009
[5] A. Kuzmanovic, A Mondal, S. Floyd, and K.K. Ramakrishnan. “Adding Explicit Congestion
Notification (ECN) Capabilities to TCP’s SYN/ACK Packets”. RFC 5562, June 2009.
[6] A. Mondal and A. Kuzmanovic, “Upgrading Mice to Elephants: Effects and End-Point
Solutions”, In IEEE/ACM Transactions on Networking, Volume 18, Issue 2, April 2010
[7] A. Mondal, R. Cutler, C. Huang, J. Li, and A. Kuzmanovic, “SureCall: Towards Glitch-Free Realtime Audio/Video Conferencing”, In IEEE IWQoS 2010
[8] A. Mondal, C. Huang, M. Jain, J. Li, and A. Kuzmanovic, “A Case of WiFi Relay: Improving VoIP
Quality for WiFi Users”, In IEEE ICC 2010
[9] A. Mondal, I. Trestian, Z. Quin, and A. Kuzmanovic, “P2P as CDN (Akamizing BitTorrent)”,
under submission
[10] J. Miller, A. Mondal, R. Potharaju, P Dinda, and A. Kuzmanovic, “Network Monitoring is
People: Understanding End-user Perception of Network Problems”, Under submission.
38
Backup slides
39
QoS and the Internet

QoS Architectures
–
–
–
–

Integrated Service (Intserv)
Differentiated Service (Diffserv)
Multi Protocol Label Switching (MPLS)
Traffic Engineering and Constraint based routing
Key Challenges
– Scalability issues in core
– Complex signaling protocols
– Deployment overhead
 Current Internet still offers only a best-effort service
 Motivates to investigate easily deployable solutions
that improve end-to-end network performance
40
QoS using transport and application layer
techniques without network support
Explicit congestion notification [ Floyd 94]
 Packet marking and differential dropping [Guo and Matta’01]
 Limited transmit [Allman et al. 01]
 Service differentiation [Neoreddine and Tobagi’02]
 Differential congestion notification [Le et al.’04]
 TCP smart framing [Mellia et al. ‘05]
 ECN+ [Kuzmanovic’05]
 Early retransmit [Allman et al.’06]
 TCP SAReno [Yang and Vecinia’02]
 PCP [Anderson et al. ‘06]

41
Going beyond TCP-fair

Differentiated minRTO
– Application-limited flows use reduced minRTO value

Short-term padding with dummy packets
– Application data followed by three tiny dummy packets

Diversity approach
– Application layer FEC-based approach
– The simplest FEC scheme is replication
42
Why Exponential Backoff?

Jacobson adopted
exponential backoff
from the classical
shared-medium
Ethernet protocol
– “IP gateway has
essentially the same
behavior as Ether in a
shared-medium
network.”
43
Why Exponential Backoff?

Jacobson adopted
exponential backoff
from the classical
shared-medium
Ethernet protocol
– “IP gateway has
essentially the same
behavior as Ether in a
shared-medium
network.”
– Not true!
C
C
44
Removing exponential backoff from TCP
and its implications
Other reasons: no admission control, finite flow size,
skewed traffic distribution, etc.
 When to resend a packet?

– Implicit packet conservation principle
• As soon as the retransmission timeout expires
– End-to-end performance can only improve if we remove
the exponential backoff from TCP

Implications
– Significant improvement of response times for short and
interactive TCP flows
45
Multiple QoS Constraints

The Internet evolves towards the global multiservice
IP network
– Diverse applications and different QoS requirements

Many applications have multiple QoS requirements
– Video streaming, VoIP, Video conferencing, etc.
Need support for end-to-end QoS guarantee under
multiple constraints
 Multiple QoS constraints often make the routing
problem intractable

46
QoS provisioning using overlay networks

Build Overlay Backbone
– Deploy overlay nodes at strategic locations in the Internet

Provide support for per-flow forwarding
– e.g. Anagran Flow Aware Routers

Flow route management architecture
– Discover and setup end-to-end paths for individual flows
with diverse flow QoS requirements
– Monitor end-to-end flow performance to trigger path
adaptation
47
Overlay flow QoS management
architecture
Configure intermediate
overlay nodes for per-flow
forwarding
Sensing local link
characteristics
Adapt to different path
dynamically as current path
fails to meet QoS parameters
Find a path to X
with b/w > b, delay
< d and loss < l%
AS1
AS3
AS4
AS2
End user
Overlay node
Physical link
Logical link
48
Contribution
Design a scalable QoS routing protocol which finds
path under multiple constraints
 Propose a distributed algorithm for dynamic path
adaptation
 Evaluate accuracy, efficiency and scalability of the
protocol using large-scale simulation and compare
with other existing approaches
 Build a functional prototype using Click modular
router

49
Design challenges

Multiple QoS metrics
– Finding a feasible path using Dijkstra’s algorithm is NPComplete
– Randomized and approximation algorithms
– Single composite metric derived from multiple metrics
• Paths might not meet individual QoS constraints

Dynamic overlay-link properties
– Increases control message overhead
50
Multi-constraint QoS routing protocol

Path vector protocol to disseminate path
information
– Tag with QoS parameters

How to aggregate path information when multiple
QoS metrics are considered?
– Distribute the best paths for each metrics

What about QoS requests which could be served by
paths which are not in the best path set?
– On-demand route discovery

A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, “Supporting Application Network Flows with
Multiple QoS Constraints”, In IEEE IWQoS 2009
51
MCQoS: Disseminating path information
Advertise best path
for each QoS metric
Local link
info
B
QoS Path
Table
X
AS1 (10ms, 0.01%, 1Mbps) AS3 (5ms, 0.01%, 768Kbps) AS5
B/w
X
AS1 (2ms, 0.0%, 128Kbps) AS3 (3ms, 0.005%, 378Kbps) AS5
Loss
X
AS1 (2ms, 0.01%, 128Kbps) AS3 (3ms, 0.02%, 378Kbps) AS5
Delay
A
Tag QoS
characteristics
52
MCQoS: Aggregating path information
 What about QoS requests in the undecidable
region?
Bandwidth (b/w)
undecideable
best b/w
best delay
infeasible
feasible
Delay
There
The
source
will feasible
cannot
node
exist
already
requests
a feasible
knows
that
path
acan
path
in be
theifsupported
network
the QoS if
request
but
thethe
QoS
source
falls
request
in the
node
falls
feasible
in
might
the infeasible
region
not knowregion
about those paths, thus
cannot admit flows based on local information
53
MCQoS: On-demand route discovery
B/W
infeasible
undecideable
Admit or deny flow based on
local QoS table if in feasible or
infeasible region

Otherwise, On-demand route
discovery for requests in
undecideable region

Exploit advertisement received
from neighbors to reduce
search space while route
discovery
feasible
Delay
C
A

B
E
D
54
MCQoS: Illustration through example
best b/w
best delay
10ms 12Mbps
100ms 50Mbps
4ms 5Mbps
105ms 50Mbps
A
(1ms, 100Mbps)
5ms 5Mbps
106ms 50Mbps
C
B
(5ms, 100Mbps)
(2ms, 20Mbps)
2ms
8ms
E
5Mbps D
20Mbps
Requests:
10ms, 3Mbps
120ms, 15Mbps
10ms, 100Mbps
15ms, 15Mbps
OK ABD…E
OK ABC…E
X
---OK ABD…E
???
55
Route maintenance in MCQoS
D
F
H
A
C
E
G
B
 Route maintenance through path patching
 Each intermediate node knows the QoS requirements from
the node to the destination
 Upstream node periodically pushes QoS requirements to
downstream nodes
 As a node detects QoS violation, it triggers alternate path
search at local node
– Notify upstream node if no alternative path
56
Overhead analysis of path dissemination
10 2
1 5 9
3
4
4 10
6
7
8
5
6
10
In MCQoS protocol, a node advertises only the best
path to a destination. Thus many alternative paths
are pruned, which increases scalability.
57
Overhead analysis of on-demand route
discovery

Parameters
– Average out-degree of the nodes
– Overlay distance between source to destination

Worst case
– Message overhead is proportional to sum of all possible
path lengths from source to destination

Amortized cost
– Fraction of request in undecidable region
– Limit no of hops of route discovery
More than 99% of the undecidable region is discovered within
5 hops from the source node, thus amortized cost will be
significantly less than worst case scenario.
58
Experimental evaluation of MCQoS

Built an event-driven simulator

Generated random flat topology of nodes using GTITM
– Outdegree min(10, size/2)

Assigned link metrics from actual planetlab link
measurement data
59
Convergence time of path dissemination
 Convergence time: how long does it take to stabilize for
a given network snapshot?
 Re-stabilization time: how long does it take to stabilize
Being path vector based protocol MCQoS takes longer
once a link metric changes?
time to converge, but does not involve any NP-hard
computation, thus scale with network size
 QRON: Link state based multi-QoS routing protocol
using composite metric approach
60
Message overhead of path dissemination
Message overhead of MCQoS is comparable to LinkState based (QRON) protocol
61
Elaborating the undecidable region
 K-hop path: paths in the undecidabe region discovered
within k-hops of on-demand route discovery process
Depletion
 Global feasible region: feasible region at the source
node ifarea
the source node knew all alternative paths like link-state
protocol
 Depletion area: part of global feasible QoS region not
known at the source node because many alternate paths
are suppressed
62
Overhead of on-demand path discovery
 How many hops does it take to discover the entire
depletion area?
 We measure the fraction of depletion area discovered
More
thank90%
the depletion
is discovered
within
hopsoffrom
the sourcearea
node
within 3 hops
63
Improvement in accuracy by MCQoS
 A feasible path with a composite metric might not
satisfy individual QoS metrics.
 The line-segment based approach often suffers from
loss/distortion.
 Our hybrid approach has no false positive and false
negative percentage can be reduced to less than one
1% by 3-hop on-demand route discovery.
64
QoS violation ratio in dynamic environment
with MCQoS
 100 node topology
 Generate QoS requests with certain arrival rate with b/w
[5Mbps, 55Mbps] and delay [100ms,400ms]
 Each flow lasts between 5 to 10 minutes
 We simulate the network behavior for 10 minutes
 New flows arrive before network stabilizes
– Expect to observe QoS violation
Arrival rate
(conn/sec)The
60
120
240
300
QoS violation ratio is negligible even with
Violation arrival rate of 600 conn/sec.
0.32
0.33
0.78
0.4
ratio (%)
600
1.12
65
MCQoS enabled overlay node prototype
QoS path setup (Y:p -> X:q, Dms, L%, BKbps)
Local link
characteristics
Peers (path ads)
Rt. discovery req,
Rt. discovery reply
Flow setup req
Flow setup
DataIn
Control
Plane
S3
MCQoS
QoS Path
table
Flow id
Next hop
Y:p ->X:q
C
Click Router
DataOut
Data
Plane
66
Summary

Designed a scalable multiple constraints QoS flow
route management protocol
–
–
–
hybrid approach of path vector routing and on-demand
route discovery
Keep balance between flow setup time and control
message overhead
No complex NP-hard computation
Performed large-scale simulations to demonstrate
the efficiency and scalability of the approach
 Built a prototype using Click modular router

67
‘Composite Metric’ approach to multi-QoS
routing (1/2)
 Composite Metric = K1*delay + k2/bw where k1=1,
k2 = 10^7, delay in sec, b/w in bps
 False positive: flow is admitted but the path does not
meet the QoS
 False negative: there exists a feasible path but the
flow is not admitted
68
‘Composite Metric’ approach to multi-QoS
routing (2/2)
69
‘Line Segment’ approach to multi-QoS
routing (1/2)
 Lui et al. proposed line segment based approach to
for topology aggregation in delay-bw plane. Tam et
al. designed a distance vector based QoS protocol
using the line-segment approach
 False positive: Fraction of undecidable region that is
actually infeasible, but the approach labels as
feasible.
 False negative: Fraction of undecidable region that is
feasible, but the approach labels as infeasible.
70
‘Line Segment’ approach to multi-QoS
routing (2/2)
71