Transcript Document

State-of-the-Art of
Internet Traffic
Measurement and Analysis
The 31st APEC TEL WG Meeting
April 5th, 2005
Bangkok, Thailand
Sue B. Moon
Division of Computer Science
KAIST
South Korea
Overview
• Brief Historical Overview
• Evolution of Measurement Techniques
• Status Quo of Measurement Techniques
• Future Work
2
Brief Historical Overview
• 1970s and 1980s
– Performance was not an issue
– Very few papers about performance
– “ping” and “traceroute” were only tools
• 1990s
– Internet exploded
– Lack of measurement/analysis/visualization tools sorely felt
– Measurement became important in research
• 2000s
– Competition between ISPs became intense
– Service Level Agreements (SLAs) became critical
– Security became sales point
3
Evolution of
Measurement Techniques
• Internet Design Philosophy
• Basic “ping” and “traceroute”
• Vern Paxson’s Work
• My Personal Perspective
4
Internet Design Philosophy
• Packet switching
• Continued communcation around
failures
• Support for diverse services and
protocols
• Distributed management of resources
• No access control
• Simplicity at the core, complexity at
the edge
5
The Internet Hourglass
(Deering@IETF)
email, WWW, phone, ...
SMTP, HTTP, RTP, ...
TCP, UDP, ...
IP
ethernet, PPP, ...
CSMA, Sonet, ...
Copper, fiber, radio, ...
6
What is the Internet today?
BBN
Tier 2 ISP
UUnet
BT
Sprint (AS)
Dial-up ISP
Peering point
7
Internet Users
Different from PSTN Users
• ISPs
– Too much & diverse traffic to monitor
– Hard to get a complete picture
– Routers barely keep up with core tasks
• End-users
– More options than traditional telco
customers
8
ping
• ICMP-based tool for host reachability
• Algorithm
– Sends an ICMP echo request with:
• Identifier for unique ping process
• Sequence number per echo request
– Receiving host returns an ICMP echo reply
– Prints out RTT, TTL, and seq. #.
• Issues
– Many routers filter out ICMP packets
– It goes thru slow path on routers
– RTT includes end system processing time
9
traceroute
• Used to find out the forward path to a host
• Algorithm
–
–
–
–
Send an IP datagram with TTL=1
First router sends back ICMP time exceeded
Then send a datagram with TTL=2
Continue till destination is reached/TTL expired
• Issues
– not suited for performance measurements
10
Vern Paxson’s PhD Thesis
• Many findings about Internet
Performance
– Delay
– Loss
– Unexpected routing behaviors
• route changes, flaps,
– Clock synchronization
– Incomplete logging
11
Paxson’s Tools
• Instrumented “ping”s
– Send packets between a set of nodes
• In today’s Internet
– Active measurements for performance
monitoring
– Passive measurements for control-domain
monitoring
12
Passive Measurement
• No traffic injected for measurement
purpose
– Not invasive
• Only data collection increases traffic
– Access limited
• Measurement about total traffic
• Privacy/Security - serious concern
13
Passive Measurement
Examples
• Packet monitors
– Tcpdump for Unix-based hosts
– Dedicated measurement systems
• DAGMON (up to 10GE)
• Router/switch traffic statistics
– Network internal behavior
– SNMP MIBs
– Flow-level information
• Cisco’s NetFlow, Juniper’s Accounting, Arbor’s
PickFlow
14
Packet-Level Measurements
• Pros :
– very fine granularity
• Challenges :
– link speeds are increasing!
– Large volumes of data
– system design issues:
• disk/PCI bus speeds
• installation cost
15
Challenges in Data Collection
• On 1GE link
– # of flows per sec = 100K ~ 1 mil
– 1KB per flow => 1GB per sec
• On 10GE link
– # of packets per sec = 10 mil ~ 200 mil
– 2GHz processor => 10 cycles
• You need 10 GE link to monitor 10 GE
link!
16
Why Sampling/Filtering?
• Problems with large volumes of data
– feasibility of collection at high-speeds
• memory/bus/processor requirements
– storage limitations
– complexity of analysis
17
State-of-the-Art
• Cisco
– sampled netflow
• capture 1 in N
• aggregate by five-tuple
• Juniper
– filter on any combination of header fields
– sample 1 in N
• recommends 1 in 1000 or less
• How much data do you collect when N
= 1000?
18
Personal Experience at Sprint
• When I first arrived, I heard …
– “No loss” on Sprint backbone network
– “Almost no delay”
– “Cadillac brand of IP service”
19
Min/Avg/Max
Single-Hop Delay per Minute
20
Single-Hop Delay w/o Cisco Router
Idiosyncracies
21
Multi-Hop Delay Distributions
Data Set 3
22
Three Paths Connectivity
• Data Set 3
Fiber prop.delay
28ms
32ms
34ms
23
Identification of Constant Factors:
Multi-Paths
• Equal Cost Multi Paths (ECMP)
– Src/Dst addresses, Router ID
Data Set 3
Path 3
Path 2
Path 1
Min delay of src/dst flow (Data Set 3)
24
Peaks in Variable Delay
25
Closer Look
• Queue
Build up &
Drain
26
Issues in "Good" Routing
• Misbehaving routing protocols
– BGP misconfigurations
– Pathological behaviors
– Frequent changes
• Even under normal circumstances
– Transient behaviors
– Inter/intra-domain routing not well
understood
27
Routing Across Internet
• Protocols
– Interior Gateway Protocols (IS-IS, OSPF,
RIP)
– Exterior Gateway Protocols (BGP)
• How they work
– IGP : find “best” (shortest) path across a
domain
– BGP : announce reachability between
domains
• policy determines inter-domain paths
28
Routing Research Projects
• Routeviews
– 50+ peering at route-views.oregon-ix.net
– MRT format RIBs and BGP updates, “show
ip bgp” dumps, route dampening data
– only E-BGP
• RIPE (Réseaux IP Européens)
– routing updates from 9 mostly European
IXs
– “Looking Glass” services for BGP
– Routing information service (RIS)
29
Scenario
for a Transient Routing Loop
In Normal Operation
30
When a link fails, R1 is the
first to detect.
31
R3 is updated before R2.
32
Finally R2 is updated, and the
loop is resolved.
33
CDF of Routing Loop Duration
in Time
34
VoIP experimental setup
[Boutremans2002]
• Traffic injected in the network:
– 200 byte UDP packets
– every 5ms.
• Packets captured and timestamped at
end-systems.
• Traceroute runs continuously during the
experiment.
• Induced link failures on purpose to
evalute convergence time and impact on
e2e connections
35
Information Sources
• IS-IS & BGP listener logs
• Router logs from both ends of
“failing” links
• Controlled bi-directional VoIP traffic
between Reston and ATL
• SNMP data
36
Delays (1 sec timescale)
~3.4ms
~2.6ms
3 links up
2 links down
2 links up
3 links down
37
When the two interfaces went
down …
6.6 seconds
38
When three links came back up
Traffic “black-holed”
for 0.975 seconds
For 30 secs packets
follow a shorter path
Traffic “black-holed”
for 1.745 seconds
39
Approaches To Fix It
• Fine-tuning parameters
– Timer values [Alattinoglu2002]
• Modify Routing Protocols
– Suppress advertisement and perform local
rerouting using a backwarding table [Lee04]
– Centralized path computation
[Feamster04,Rexford04]
• Exploit multi-path
– Our approach to provide Value-Added Service
40
What I have learned …
• No loss, almost no delay
– Almost. I gained insight into causes
behind
• Debunking the myths [Odlyzko2005]
–
–
–
–
Streaming real-time traffic
QoS
Content is king
Usage-sensitive pricing
41
Other Issues Tackled
• Traffic Matrix Estimation
– Inspired by tomography in other fields
– Before arrival of efficient NetFlow
• Network Anomaly Detection
– NIDS, IDS => PCA-based global
monitoring
• Optimization
– Cross-layer resource allocation
42
Taxonomy of Traffic Matrices
• Point-to-Point
– demand btwn ingress and egress point
• Ingress/Egress : POP, link, router, BGP
prefix
43
Scalability
• Example : 20 POPs, 500 routers, 3K
links
• Granularity/size tradeoff
– POP-to-POP
: O(100)
– router-to-router : O (104)
– prefix-to-prefix : O (1010)
• Challenge - Collecting, storing and
manipulating large TMs!
44
Usage Based Charging
• Feasible?
– Where to measure?
• At last hop
– Scenario
• A: “I want to download B’s webpage”
• B: “That page is 1MB large”
• A: “OK”
– Between ISPs
• What do you do with retx, ack, delay?
45
Future Work
• In Measurement Technology
– Keep up with increased link speed 40GE
– Improve sampling techniques
– Infer what we cannot measure
– Pinpoint security holes
• Personal perspective
– More into creating value-added services
– MPLS/VPN performance issues
• “Sound” Measurement Infrastructure
46
Acknowledgements
• Thank D. Papagiannaki, B.-Y. Choi, U. Hengartner, C.
Boutresmans, and G. Iannaccone for help with the slides.
47
BACKUP SLIDES