Transcript Document

Control Plane Issues
in the Internet:
Personal Perspective
2005.4.11. Monday
Microsoft Research Asia
Beijing, China
Sue B. Moon
Division of Computer Science
Dept. of EECS
KAIST
Overview
• Personal Perspective
– Single-Hop Delay
– Point-to-Point Delay
– Routing Anomaly
– Path Multiplicity as a Value-Added Service
2
Personal Experience at Sprint
• When I first arrived, I heard …
– “No loss” on Sprint backbone network
– “Almost no delay”
– “Cadillac brand of IP service”
3
Monitors in
San Jose PoP
* All monitored links are OC3
4
Min/Avg/Max Delay per
Minute
5
Link Utilization
6
Single-Hop Delay Distribution
7
Delay w/o Transmission Time (TT)
8
Minimum Router Transit Time (MRTT)
9
Is the queue
work-conserving?
10
Delay w/o TX and MRTT
11
Min/Avg/Max Delay without
Cisco Router Idiosyncracies
12
Summary of Single-Hop Delay
• Packet size is a major factor
• Non-work-conserving behavior of a
router is a main cause behind large
delay (> 1ms)
• Not much queueing observed
13
Point-to-Point Delay
14
Delay Distributions
Data Set 3
15
Hourly Delay Distributions
Data Set 3
16
Identification of Constant Factors:
Multi-Paths
• Equal Cost Multi Paths (ECMP)
– Src/Dst addresses, Router ID
Data Set 3
Path 3
Path 2
Path 1
Min delay of src/dst flow (Data Set 3)
17
Three Paths Connectivity
• Data Set 3
Fiber prop.delay
28ms
32ms
34ms
18
Path Separation of Data Set 3
• TTL difference
• Minimum delay of flow (src ip, dst ip)
Path 1
19
Identification of Constant Factors:
Packet Size
• Path transit time
– Propagation + packet processing (packet size)
d fixed  p : p  


20
Removing Constant Factors
d
var
: d  d
fixed
 p
Data Set 3
Path1
21
Variable Delay: Bulk
Data Set 3, Path 1
22
Variable Delay: Bulk (cont’d)
Data Set 3
23
Impact
of Bottleneck Link Load
90
24
Variable Delay Revisited: Tail
Data Set 3, Path 1
25
Peaks in Variable Delay
26
Closer Look
• Queue
Build up &
Drain
27
Summary of Pt-to-Pt Delay
• Not much queueing most of the time
• Severe congestion when bottleneck link
utililization > 90%
• Congestion periods longer than 1 sec
– Exact causes unknown
– Possible causes
• Route changes
28
Routing Loop
29
Issues in "Good" Routing
• Misbehaving routing protocols
– BGP misconfigurations
– Pathological behaviors
– Frequent changes
• Even under normal circumstances
– Transient behaviors
– Inter/intra-domain routing not well
understood
30
Scenario
for a Transient Routing Loop
In Normal Operation
31
When a link fails, R1 is the
first to detect.
32
R3 is updated before R2.
33
Finally R2 is updated, and the
loop is resolved.
34
CDF of Routing Loop Duration
in Time
35
VoIP experimental setup
[Boutremans2002]
• Traffic injected in the network:
– 200 byte UDP packets
– every 5ms.
• Packets captured and timestamped at
end-systems.
• Traceroute runs continuously during the
experiment.
• Induced link failures on purpose to
evalute convergence time and impact on
e2e connections
36
Information Sources
• IS-IS & BGP listener logs
• Router logs from both ends of
“failing” links
• Controlled bi-directional VoIP traffic
between Reston and ATL
• SNMP data
37
Delays (1 sec timescale)
~3.4ms
~2.6ms
3 links up
2 links down
2 links up
3 links down
38
When the two interfaces went
down …
6.6 seconds
39
When three links came back up
Traffic “black-holed”
for 0.975 seconds
For 30 secs packets
follow a shorter path
Traffic “black-holed”
for 1.745 seconds
40
Approaches To Fix It
• Fine-tuning parameters
– Timer values [Alattinoglu2002]
• Modify Routing Protocols
– Suppress advertisement and perform local
rerouting using a backwarding table
[Lee04]
– Centralized path computation
[Feamster04,Rexford04]
41
Our Approach
• Key Idea:
– Find disjoint overlay path and send duplicate
packets
• Assumptions
– Sender and receiver both within an AS
– Bidirectional link weights
– Extra income for extra b/w consumption
• Pros and cons
– Advantages
• No modification to current infrastructure
• Selective use by only those that need it
– Disadvantages
• Extra b/w consumption
42
Provisioning
for Interactive Streaming
• Interactive Streaming
– Not a driving force behind b/w
– A candidate for growing revenue
• Examples
– VoIP gradually taking over PSTN traffic
– Remote video viewing at door by cell
phone
– Online game traffic
• "Good" routing more important than
bandwidth
43
Basic Ideas
source
destination
candidate relay nodes!!!
44
Resilient to Failures
45
What I have learned …
• No loss, almost no delay
– Almost. I gained insight into causes
behind
• Debunking the myths [Odlyzko2005]
–
–
–
–
Streaming real-time traffic
QoS
Content is king
Usage-sensitive pricing
46
Other Issues Tackled
• Traffic Matrix Estimation
– Inspired by tomography in other fields
– Before arrival of efficient NetFlow
• Network Anomaly Detection
– NIDS, IDS => PCA-based global
monitoring
• Optimization
– Cross-layer resource allocation
47
Future Work
• Personal perspective
– More into creating value-added services
– MPLS/VPN performance issues
48
Acknowledgements
• Thank D. Papagiannaki, B.-Y. Choi, U. Hengartner, C.
Boutresmans, G. Iannaccone, and M. Cha for help with the
slides.
49