ppt - CSE Labs User Home Pages

Download Report

Transcript ppt - CSE Labs User Home Pages

Internet Measurement Basics
• Measurement Overview and Internet Challenges
–Why measure? Why model measurements?
–What to measure? Where to measure?
• Measurement tools
–Active: ping, traceroute, and pathchar
–Passive: logs, SNMP, packet, and flow monitoring
• Two Case Studies:
– trace-route based routing behavior measurement [Pa97]
– OSPF-based passive monitoring of intra-domain routing [AG04]
• Operational applications of measurement
Readings: Please do the required readings
CSci5221: Internet Measurement Basics
1
Why Measure?
• The Internet is a man-made system, so why
do we need to measure it?
– Because we still don’t really understand it
– Because sometimes things go wrong
• Measurement for network operations
– Reliability analysis, Traffic engineering, Capacity Planning
• Better and more efficient management of network resources
• Detecting, diagnosing and predicting problems
• What-if analysis of future changes
• Measurement for scientific discovery
– Characterizing a complex system as organism
– Creating accurate models that represent reality
– Identifying new features and phenomena
CSci5221: Internet Measurement Basics
2
Why Build Models of
Measurements?
• Compact summary of measurements
– Efficient way to represent a large data set
– E.g., exponential distribution with mean 100 sec
• Expose important properties of measurements
– Reveals underlying cause or engineering question
– E.g., mean RTT to help explain TCP throughout
• Generate random but realistic data as input
– Generate new data that agree in key properties
– E.g., topology models to feed into simulators
“All models are wrong, but some models are useful.” – George Box
CSci5221: Internet Measurement Basics
3
What Can be Measured?
• Traffic
– Load statistics
– Packet or flow traces
• Performance of paths
– Application performance, e.g,. Web download time
– Transport performance, e.g., TCP bulk throughput
– Network performance, e.g., packet delay and loss
• Network structure
– Topology, and paths on the topology
– Dynamics of the routing protocol
CSci5221: Internet Measurement Basics
4
Where Measure, and How?
• Short answer
– Anywhere you can! 
• End hosts
– Application logs, e.g., Web server logs
– Sending active probes to measure performance
• Individual links/routers
–
–
–
–
Load statistics, packet traces, flow traces
Configuration state
Routing-protocol messages or table dumps
Alarms
• How: Active vs. Passive Measurement
– First understand some measurement challenges
CSci5221: Internet Measurement Basics
5
Internet Challenges Make
Measurement an Art
• Stateless routers
– Routers do not routinely store packet/flow state
– Measurement is an afterthought, adds overhead
• IP narrow waist
– IP measurements cannot see below network layer
– E.g., link-layer retransmission, tunnels, etc.
• Violations of end-to-end argument
– E.g., firewalls, address translators, and proxies
– Not directly visible, and may block measurements
• Decentralized control
– Autonomous Systems may block measurements
– No global notion of time
CSci5221: Internet Measurement Basics
6
Active Measurement Example: Ping
• Adding traffic for purposes of
measurement
– Trade-offs between accuracy and overhead
– Need careful methods to avoid introducing bias
• Ping
–
–
–
–
Host sends an ICMP ECHO packet to a target
… and captures the ICMP ECHO REPLY
Useful for checking connectivity, and RTT
Only requires control of one of the two end-points
• Problems with ping
– Round-trip rather than one-way delays
– Some hosts might not respond
CSci5221: Internet Measurement Basics
7
Active Measurement Example:
Pathchar for Links
rtt (i  1)  rtt (i )  d  L / c  
i : initial TTL value
c : link capacity
L : packet size
rtt(i+1)
-rtt(i)
Three delay components:
d : propagation delay
L / c : transmission delay
 : queueing delay  noise
How to infer d,c?
CSci5221: Internet Measurement Basics
min. RTT (L)
e
slope=1/c
d
8
L
Active Measurement Example:
Traceroute
• Time-To-Live field in IP packet header
– Source sends a packet with a TTL of n
– Each router along the path decrements the TTL
– “TTL exceeded” sent when TTL reaches 0
• Traceroute tool exploits this TTL behavior
TTL=1
source
Time
exceeded
destination
TTL=2
Send packets with TTL=1, 2, 3, … and record source of “time exceeded” message
CSci5221: Internet Measurement Basics
9
Challenges of Traceroute
• Measuring multiple paths
– Successive probes may traverse different paths
• Non-participating network elements
– Some routers and firewalls don’t reply
• Inaccurate delay information
– Includes processing delays on the router CPU
• Round-trip vs. one-way measurements
– Paths may have asymmetric properties
• Interfaces, not routers
– Returns IP address of interfaces, not routers
CSci5221: Internet Measurement Basics
10
Applications of Traceroute
• Network troubleshooting
– Identify forwarding loops and black holes
– Identify long and convoluted paths
– See how far the probe packets get
• Network topology inference
–
–
–
–
Launch traceroute probes from many places
… toward many destinations
Join together to fill in parts of the topology
… though traceroute undersamples the edges
CSci5221: Internet Measurement Basics
11
Paxson Study: Forwarding Loops
• Forwarding loop
– Packet returns to same router multiple times
• May cause traceroute to show a loop
– If loop lasted long enough
– So many packets traverse the loopy path
• Traceroute may reveal false loops
– Path change that leads to a longer path
– Causing later probe packets to hit same nodes
• Heuristic solution
– Require traceroute to return same path 3 times
Paxson Study: Causes of Loops
• Transient vs. persistent
– Transient: routing-protocol convergence
– Persistent: likely configuration problem
• Challenges
– Appropriate time boundary between the two?
– What about flaky equipment going up and down?
– Determining the cause of persistent loops?
• Anecdote on recent study of persistent
loops
– Provider has static route for customer prefix
– Customer has default route to the provider
Paxson Study: Path Fluttering
• Rapid changes between paths
– Multiple paths between a pair of hosts
– Load balancing policies inside the network
• Packet-based load balancing
– Round-robin or random
– Multiple paths for packets in a single flow
• Flow-based load balancing
– Hash of some fields in the packet header
– E.g., IP addresses, port numbers, etc.
– To keep packets in a flow on one path
Paxson Study: Routing Stability
• Route prevalence
–
–
–
–
Likelihood of observing a particular route
Relatively easy to measure with sound sampling
Poisson arrivals see time averages (PASTA)
Most host pairs have a dominant route
• Route persistence
–
–
–
–
How long a route endures before a change
Much harder to measure through active probes
Look for cases of multiple observations
Typical host pair has path persistence of a week
Paxson Study: Route Asymmetry
• Hot Potato Routing
• Other causes
Customer B
– Asymmetric link weights in
intradomain routing
– Cold-potato routing, where
AS requests traffic enter
at particular place
Provider B
multiple
peering
points
Early-exit
routing
Provider A
Customer A
• Consequences
– Lots of asymmetry
– One-way delay is not
necessarily half of the
round-trip time
Passive Measurement Example:
Logs at Hosts
• Web server logs
– Host, time, URL, response code, content length, …
– E.g., 122.345.131.2 - - [15/Oct/1998:00:00:25 -0400]
"GET /images/wwwtlogo.gif HTTP/1.0" 304 "http://www.aflcio.org/home.htm" "Mozilla/2.0
(compatible; MSIE 3.02; Update a; AK; AOL 4.0;
Windows 95)" "-"
• DNS logs
– Request, response, time
• Useful for workload characterization,
troubleshooting, etc.
CSci5221: Internet Measurement Basics
17
“Passive” Traffic Measurement
• Packet-level:
– Tcpdump: software based
– Special hardware packet collectors
• Flow-level:
– Cisco Netflow; other vendors have similar facility
– 5-tuple flow: srcIP, dstIP, scrPort, dstPort, protocol
• use a time-out value to “terminate” a flow
• statistics collected: start/end time, packet/byte counts
– Sampling may be used for scalability
• Link-level:
– SNMP traffic statistics, often over 5-min interval
– IETF MIB (management information base)
• Byte counts, packet counts, etc.
• pros and cons of each?
CSci5221: Internet Measurement Basics
18
Passive Measurement: SNMP
• Simple Network Management Protocol
– Coarse-grained counters on the router
– E.g., byte and packet counts
• Polling
– Management system can poll the counters
– E.g., once every five minutes
• Limitations
– Extremely coarse-grained statistics
– Delivered over UDP!
• Advantages: ubiquitous
CSci5221: Internet Measurement Basics
19
Passive Measurement: Packet
Monitoring
• Tapping a link
Multicast switch
Shared media (Ethernet, wireless)
Host A
Host A
Host B
Monitor
Host B
S
w
i
t
c
h
Host C
Monitor
Splitting a point-to-point link
Line card that does packet sampling
Router A
Router B
Monitor
CSci5221: Internet Measurement Basics
Router A
20
Packet Monitoring:
Selecting the Traffic
• Filter to focus on a subset of the packets
– IP addresses/prefixes (e.g., to/from specific Web sites, client
machines, DNS servers, mail servers)
– Protocol (e.g., TCP, UDP, or ICMP)
– Port numbers (e.g., HTTP, DNS, BGP, Napster)
• Collect first n bytes of packet (snap length)
–
–
–
–
–
Medium access control header (if present)
IP header (typically 20 bytes)
IP+UDP header (typically 28 bytes)
IP+TCP header (typically 40 bytes)
Application-layer message (entire packet)
CSci5221: Internet Measurement Basics
21
Analysis of Packet Traces
• IP header
– Traffic volume by IP addresses or protocol
– Burstiness of the stream of packets
– Packet properties (e.g., sizes, out-of-order, etc.)
• TCP header
– Traffic breakdown by application (e.g., Web)
– TCP congestion and flow control
– Number of bytes and packets per session
• Application header
– URLs, HTTP headers (e.g., cacheable response?)
– DNS queries and responses, user key strokes, …
CSci5221: Internet Measurement Basics
22
Packet vs. Flow Measurement
• Basic statistics (available from both techniques)
– Traffic mix by IP addresses, port numbers, and protocol
– Average packet size
• Traffic over time
– Both: traffic volumes on a medium-to-large time scale
– Packet: burstiness of the traffic on a small time scale
• Statistics per TCP connection
– Both: number of packets & bytes transferred over the link
– Packet: frequency of lost or out-of-order packets, and the
number of application-level bytes delivered
• Per-packet info (available only from packet traces)
– TCP seq/ack #s, receiver window, per-packet flags, …
– Probability distribution of packet sizes
– Application-level header and body (full packet contents)
CSci5221: Internet Measurement Basics
23
Network Topology Measurement
• Use traceroute
– Pros
• Can be done at end hosts
• “router-level” topology
• Can a “sample” of “global” Internet topology,
– Cons
• Active measurement, incur overhead/load on routers
• Not routers all respond to traceroutes
• IP address aliasing problem;
– Also MPLS tunnels may “obscure” real topology
• Only “sampled”, or “snapshots”
• BGP routing data
– “global” AS-level topology,
– Partial view, unless you can BGP data from all BGP routers
• ISP topology
– If you are the ISP operator, an easier task, but not necessarily
an easy task
CSci5221: Internet Measurement Basics
24
OSPF Protocol: A Quick Recap
• Link-state protocol
– Routers flood Link State Advertisements (LSAs)
– Routers compute shortest paths based on weights
– Routers identify next-hop to reach other routers
2
3
2
CSci5221:
1
1
1
3
5
4
3
Network Failures and Fast Convergence
25
Measurement: Intradomain Route
Monitoring
• OSPF is a flooding protocol
– Every link-state advertisements sent on every link
– Very helpful for simplifying the monitor
• Can participate in the protocol
– Shared media (e.g., Ethernet)
• Join multicast group and listen to LSAs
– Point-to-point links
• Establish an adjacency with a router
• … or passively monitor packets on a link
– Tap a link and capture the OSPF packets
Intradomain Route Monitoring
• Construct continuous view of topology
– Detect when equipment goes up or down
– Input to traffic-engineering and planning tools
• Detect routing anomalies
– Identify failures, LSA storms, and route flaps
– Verify that LSA load matches expectations
– Flag strange weight settings as misconfigurations
• Analyze convergence delay
– Monitor LSAs in multiple locations with go
– Compare the times when LSAs arrive
• Detect router implementation mistakes
CSci5221:
Network Failures and Fast Convergence
27
Passive Collection of LSAs
• OSPF is a flooding protocol
– Every LSA sent on every participating link
– Very helpful for simplifying the monitor
• Can participate in the protocol
– Shared media (e.g., Ethernet)
• Join multicast group and listen to LSAs
– Point-to-point links
• Establish an adjacency with a router
• … or passively monitor packets on a link
– Tap a link and capture the OSPF packets
• Note LSAs do not tell us the “root causes” of failures!
– need to gather route configurations, syslogs, …
– need to dig below IP: link/physical layers, …
CSci5221:
Network Failures and Fast Convergence
28
Reducing Volume of Information
• Prioritizing the messages
– Router failure over router recovery
– Link failure or weight change over a refresh
– Informational messages about weight settings
• Grouping related messages
– Link failure: group messages for the two ends
– Router failure: group the affected links
– Common failure: group links failing close in time
CSci5221:
Network Failures and Fast Convergence
29
Anomalies Found in Shaikh04 paper
• Intermittent hardware problem
– Router periodically losing OSPF adjacencies
– Risk of network partition if 2nd failure occurred
• External link flaps
– Congestion on edge link causing lost messages
– Lost adjacency leading to flapping routes
• Configuration errors
– Two routers assigned the same IP address
– Inefficient config leading to duplicate LSAs
• Vendor implementation bug
– More frequent refreshing of LSAs than specified
CSci5221:
Network Failures and Fast Convergence
30
Measurement Challenges for Operators
• Network-wide view
– Crucial for evaluating control actions
– Multiple kinds of data from multiple locations
• Large scale
– Large number of high-speed links and routers
– Large volume of measurement data
• Poor state-of-the-art
– Working within existing protocols and products
– Technology not designed with measurement in mind
• The “do no harm” principle
– Don’t degrade router performance
– Don’t require disabling key router features
– Don’t overload the network with measurement data
CSci5221: Internet Measurement Basics
31
Network Operations Tasks
• Reporting of network-wide statistics
– Generating basic information about usage and reliability
• Performance/reliability troubleshooting
– Detecting and diagnosing anomalous events
• Security
– Detecting, diagnosing, and blocking security problems
• Traffic engineering
– Adjusting network configuration to the prevailing traffic
• Capacity planning
– Deciding where and when to install new equipment
CSci5221: Internet Measurement Basics
32
Basic Reporting
• Producing basic statistics about the network
– For business purposes, network planning, ad hoc studies
• Examples
–
–
–
–
–
Proportion of transit vs. customer-customer traffic
Total volume of traffic sent to/from each private peer
Mixture of traffic by application (Web, Napster, etc.)
Mixture of traffic to/from individual customers
Usage, loss, and reliability trends for each link
• Requirements
– Network-wide view of basic traffic and reliability statistics
– Ability to “slice and dice” measurements in different ways
(e.g., by application, by customer, by peer, by link type)
CSci5221: Internet Measurement Basics
33
Troubleshooting
• Detecting and diagnosing problems
– Recognizing and explaining anomalous events
• Examples
–
–
–
–
–
Why a backbone link is suddenly overloaded
Why the route to a destination prefix is flapping
Why DNS queries are failing with high probability
Why a route processor has high CPU utilization
Why a customer cannot reach certain Web sites
• Requirements
– Network-wide view of many protocols and systems
– Diverse measurements at different protocol levels
– Thresholds for isolating significant phenomena
CSci5221: Internet Measurement Basics
34
Security
• Detecting and diagnosing problems
– Recognizing suspicious traffic or disruptions
• Examples
– Denial-of-service attack on a customer or service
– Spread of a worm or virus through the network
– Route hijack of an address block by adversary
• Requirements
–
–
–
–
Detailed measurements from multiple places
Including deep-packet inspection, in some cases
Online analysis of the data
Installing filters to block the offending traffic
CSci5221: Internet Measurement Basics
35
Traffic Engineering
• Adjusting resource allocation policies
– Path selection, buffer management, and link scheduling
• Examples
– OSPF weights to divert traffic from congested links
– BGP policies to balance load on peering links
– Link-scheduling weights to reduce delay for “gold” traffic
• Requirements
– Network-wide view of the traffic carried in the backbone
– Timely view of the network topology and configuration
– Accurate models to predict impact of control operations
(e.g., the impact of RED parameters on TCP throughput)
CSci5221: Internet Measurement Basics
36
Capacity Planning
• Deciding whether to buy/install new equipment
– What? Where? When?
• Examples
–
–
–
–
–
Where to put the next backbone router
When to upgrade a link to higher capacity
Whether to add/remove a particular peer
Whether the network can accommodate a new customer
Whether to install a caching proxy for cable modems
• Requirements
– Projections of future traffic patterns from measurements
– Cost estimates for buying/deploying the new equipment
– Model of the potential impact of the change (e.g., latency
reduction and bandwidth savings from a caching proxy)
CSci5221: Internet Measurement Basics
37
Examples of Public Data Sets
• Network-wide data
– Abilene and GEANT backbones
– Netflow, IGP, and BGP traces
• CAIDA DatCat
– Data catalogue maintained by CAIDA
– http://imdc.datcat.org/
• Interdomain routing
– RouteViews and RIPE-NCC
– BGP routing tables and update messages
• Traceroute and looking glass servers
– http://www.traceroute.org/
– http://www.nanog.org/lookingglass.html
CSci5221: Internet Measurement Basics
38