Internet Traffic Measurement - Purdue University :: Computer Science

Download Report

Transcript Internet Traffic Measurement - Purdue University :: Computer Science

Internet Traffic Measurement
CS590F Survey Project
by Vadim Gorbach
[email protected]
Purdue University
December 4, 2000
Why Measurements are Important
• “If we don’t measure the Internet, we don’t have objective data about
how it performs”
• Essential to understanding Internet’s behavior and growth
• Essential to identifying and ameliorating network problems
• Economy more and more relies on the Internet: infrastructure-wide
analysis and planning are needed to efficiently scale up the Internet
• Internet users need reliable means of verifying service guarantees; ISPs
need to diversify grades of service to improve revenues
• Corporations need affordable VPNs instead of pricey PNs: strict SLAs
are a must (currently, unclear inter-ISP business mechanics is a
hindrance)
• Many user groups view Internet as mission-critical: strict QoS
guarantees are required and as soon as possible
(Success story: AIAG network – AutoNet or ANX)
Why Measurements are Difficult
•
•
•
•
•
•
•
•
To effectively measure the global Internet, wide cooperation is needed.
However, ISPs are reluctant to coordinate their efforts
To be done correctly and accurately, profound understanding and experience
(expertise) are required, therefore…
Statistics collection is viewed as a luxury (OC48mon = $100,000) - only large
ISPs can afford statistics collection and analysis - demand is still dormant
Best Effort service, low profit margins for ISPs make operational support
difficult – data collection is low priority
Traffic volume, high trunk capacity, diversity of protocols, technologies and
applications make traffic monitoring and analysis a challenging endeavor
Results get obsolete very rapidly: Internet is under very active development –
traffic, technology and topology change very fast
Tremendous growth of Internet – it is difficult to scale measurements
Overprovisioning is a widely practiced solution to network congestion
Internet (the Patient) Today
• Global Internet is growing very fast:
– as of 11am today, there are 98,160,522 hosts on the Internet, and
344,456,271 users (out of 6,113,303,473; or 5.63%) and counting
(future is optimistic)
– number of hosts doubles every 16-18 months (close to Moore’s
law)
– volume of Internet traffic doubles in about every 100 days
– volume of World-Wide Web content doubles every 50-70 days
• Diversity (by no means synergetic) of protocols, applications and
technologies
• Concern: growing popularity of streaming applications, which
threatens the stability of the network
Internet Traffic
• MCI backbone measurements by CAIDA in April 1998:
– TCP: 95% of the bytes, 90% of the packets, 80% of the flows
– IPv6, IP-over-IP, ICMP: around 3%
– UDP: rest of the traffic
– HTTP: 75% of bytes, 70% of packets
– SMTP: 5% of bytes, 5% of packets
– FTP: 5% of bytes, 3% of packets
– NNTP: 2% of bytes, less than 1% of packets
– Telnet: 1% of bytes, less than 1% of packets (marked decrease due
to popularity of alternative protocols: ssh, kerberos, rlogin)
– Rest of the traffic is spread over mostly Web-related TCP and UDP
ports (81, 443, 3128 – Proxy?, 8000, 8080 - encodings?)
History
• 1981: RFC 792 (ICMP)
• September 1990: Advanced Network and Services.
http://www.advanced.org
• Early 1995: NSF relinquishes its control over the Internet, sending it
into free flight - Attempts to adequately track and monitor the Internet
get more and more problematic (Will the story repeat with Internet2?)
• 1994-1996: Vern Paxson’s core thesis work
• October 1996: Internet2 consortium (http://www.internet2.edu)
• October 1997: Next Generation Internet (NGI) – US government
initiative (http://www.ngi.gov/, http://www.hpcc.gov/)
History - II
• 1997: IETF IPPM (IP Performance Metrics) working group
– Common definition of IP Metrics: to develop standard metrics to measure
quality, performance, and reliability of Internet data delivery services
– The goal is to provide basis for unbiased quantitative measures of Internet
performance
• May 1998: RFC 2330 (Framework for IP performance metrics)
• September 1999: RFCs 2678 – 2681 (connectivity, one-way delay and
loss, round-trip delay)
• NLANR (http://www.nlanr.net/) and DOE/LBL (http://wwwnrg.ee.lbl.gov/) currently work on developing scalable tool set “to
measure the global Internet” for IPPM-style metrics (NIMI)
• Similar projects: Surveyor, IPMA, Felix, ???
Vern Paxson’s Thesis Work
•
•
•
•
Groundbreaking measurement work, first large-scale studies of the end-toend Internet routing and packet dynamics
Network Probe Daemon (NPD) framework, 1994-1995:
– NPDs cooperate with one another for measurement: both NPDs send
and receive the data, tracing packet departure and arrival times
Some of the findings:
– Forward and Reverse directions of the paths between two nodes are
often quite different: routes of more than half of all network paths
differ in at least one city visited
– Bottleneck bandwidths and queuing delays are frequently asymmetric
This work served as a basis for Surveyor (adopted for Internet2) and
PingER architectures; promoted SLA mechanism, which would provide
powerful economic incentive for improving QoS in NGI
Measurement
• Measurement is: data collection, analysis and visualization
• Traffic data:
– Network Topology and Mapping (connectivity)
– Workload (passive or non-intrusive)
– Performance (active)
– Routing (BGP routing tables)
• Active approach
– Inject traffic and wait for arrival to the destination or reply
• Passive approach
– No traffic injected; Measurements are done over a collection of
network monitors
Topology and Mapping
•
•
•
•
New physical connections among core Internet backbones occur hourly
Myriads of new technologies and applications: streaming audio and video,
distance education, entertainment, telephony and video-conferencing, as well
as numerous new and still evolving communication protocols
Tracking and visualizing Internet topology is clearly a challenge
CAIDA: skitter, a tool for dynamically discovering and depicting global
Internet topology
– X-ray tomography techniques: 3D-object from 2D-images
– collects connectivity, RTT and path data with a number of network monitors across
the United States, Europe and Asia
– sends ICMP packets with longer TTL, like traceroute does
– source host is notified of packets whose TTL expires (ICMP Time Exceeded
message)
•
There is abysmal lack of geographical mapping data for Internet address space
Workload (Passive) Measurement
• Usually infrastructure-wide measurements
• NLANR collects traces from major universities
• Traces suggest very active proliferation of new
applications (streaming video and audio, etc.). Also: Noncongestion controlled traffic directly affects infrastructural
stability of the networks
• Challenge: to develop passive measurement techniques. It
is difficult, because scope of applicability is limited and
needs to be developed
Workload (Passive) Measurement –II
•
•
•
•
•
•
Performed by network monitors: in routers, switches, or standalone devices
Used primarily for traffic analysis: composition of traffic by application,
packet size distributions, packet inter-arrival times, performance, path length.
Essential for engineering next generation internetworking equipment and
overall infrastructure.
Country-specific flows (geographic information), distributions of packet sizes,
flow volume, flow duration; NetFlow capability of Cisco routers
Results are commonly produced in the form of traffic matrices: traffic between
specific source and destination. Essential for investment decisions
Per-packet and per-byte statistics of switching is essential for optimizing
hardware and software architecture in switching equipment
It is important to see whether the traffic is composed of friendly flows or not.
E.g.: streaming multimedia traffic
Workload (Passive) Measurement - III
•
•
•
•
Other applications of passive monitoring: optimizing web caches and
proxies; security monitoring; monitoring effectiveness of congestion
control; impact of new technologies and protocols such as multicast or
IPv6, etc.
Current disparity in transmission and measurement technology: OC-192
routers and switches vs. no commodity measurement solution even at OC3 (late 1999).
Focus for near future: support for monitoring at least OC12-OC48 links
(this work is being done by Internet2 consortium), different interface
types, and encapsulation/framing; performance testing of monitors
(whether they keep up with the load); enhanced configuration of what to
collect; improving security and manageability
OC12mon at iMCI and VBNS
– VBNS is currently being upgraded to OC-48 trunks
– from hundreds of thousands to about a million simultaneous flows
Performance (Active) Measurement
• Most popular for benchmarking end-to-end performance of
commercial service providers (Transit, Access, Content hosting, and
Caching), analyzing traffic behavior across specific paths, monitoring
fulfillment of Service Level Agreements (SLA), diagnosing network
problems
• Commonly measured parameters: delay, packet loss, flow capacity
(throughput), availability
• However, there is no standard metric or measurement methodology
that would allow consistent comparison and calibration
• Unfortunately, AM often involves large number of parameters that are
difficult, if not impossible, to model independently
• “We lack in most cases the ability even to measure traffic at a
granularity that would enable infrastructure-level research”
Performance (Active) Measurement - II
• Proliferation of uncoordinated active measurement initiatives has led
to counterproductive actions, such as ISPs turning off ICMP traffic at
select routers to limit the visibility (and vulnerability) of their
infrastructure
• Challenge: Active Measurements are effective but invasive
• Focus for near future:
– tools that identify critical pieces of the public public infrastructure
– tools that find particular periodic cycles or frequency components in
performance data
– developing a calculus for describing and drawing the difference between
two given `snapshots' of network performance
– finding the topological `center' of the net, techniques for real-time
visualization of routing dynamics
– correlation with passive measurements
• See http://www.caida.org/TOOLS/taxonomy/ for available tools
ICMP
• Internet Control Message Protocol, RFC 792
• Integral part of IP
• RFC792: ICMP messages are sent in several situations: for example,
when a datagram cannot reach its destination, when the gateway does
not have a buffering capacity to forward a datagram, and when the
gateway can direct the host to send traffic on a shorter route
– The purpose of these control messages is to provide feedback about
problems in the communication environment, not to make IP reliable (IP is
not designed to be absolutely reliable).
– ICMP messages typically report errors in processing of datagrams (No
ICMP messages are sent about ICMP messages).
ICMP (continued)
• ICMP message types:
– Destination Unreachable (distance = )
– Time Exceeded (TTL expired)
– Parameter Problem (incorrect values)
– Source Quench (buffer is full)
– Redirect (shorter path found)
– Echo or Echo Reply (by ID or sequence number)
– Timestamp or Timestamp Reply (Originate, Receive, Transit)
– Information Request or Information Reply (to find out network
number)
Routing (Dynamics) Measurements
• The reliability and robustness of the Internet highly depend on
efficient, stable routing among provider networks
• Analysis of routing behavior has direct implications for the next
generation of networking hardware, software and operational policies
• Analysis of routing data (BGP – Border Gateway Protocol) – show
actual current traffic paths – but difficult to do exhaustive
measurement to generalize across providers
• Routing dynamics gives the following insights:
–
–
–
–
–
effects of outages on surrounding ISPs
effects of topology changes on Internet performance
unintended consequences of new routing policies
potential to improve ability to respond to congestion and topology changes
infrastructural vulnerabilities caused by critical paths
Routing (Dynamics) Measurements - II
• A very important area of work is identification of optimal routes given
performance results
• Other high-priority areas:
– assessing utilization of the IP address space
– extent of asymmetric routing and route instability as a function of service
provider and over time
– distribution of traffic by network address prefix lengths
– efficiency of usage of BGP routing table space, e.g., via aggregation
– favoritism of traffic flow and routing toward a small proportion of the
possible addresses/entities
– degree of incongruity between unicast and multicast routing
– quantifying effects on connectivity after removing specific ASes
Metrics
•
•
•
•
•
•
•
Utilization
Availability
Delay (one-way vs. round-trip)
Packet Loss (one-way vs. round-trip)
Throughput
Routing stability
No standards and well-understood methodologies
developed yet: results can be hard to interpret or
impossible to compare between implementations
Traffic Analysis
• Collected traffic data is of little use without strong ability to analyze
that data and predict network behavior
• Simulation and Modeling give essential insights
• However, there is little consensus currently on how to accomplish IP
traffic modeling – telephony models (developed at Bell Labs and
elsewhere) rely on queuing theory and other techniques that are not
readily replicable to packet-switched Internet. In particular, Erlang
distributions, Poisson arrivals, and other means for predicting callblocking probabilities and other vital telephony service characteristics,
typically don’t apply to wide area internetworking technologies
Projects: Coral/OC12mon
• Coral/OC12mon, a passive measurement architecture
deployed on iMCI and vBNS backbones
• Flow-based traffic characterization: flow size by protocol,
percentage composition of traffic by protocol and
application, distributions of flow sizes, length of packet
trains, statistics on IP fragmentation, prefix length
distribution, and address space utilization
• Matrices of traffic flow by country or AS, traffic import
and export, routing/address space coverage
• Non-flow-based analysis: interarrival time behavior,
protocol-relevant (TCP retransmissions, packet size
distributions), security applications
Projects: NIMI
•
Goal is to develop NIMI infrastructure for a very large (global) network that
would comprehensively and consistently:
– diagnose performance problems
– measure properties of a wide range of network paths for research purposes
– provide systematized assessment of ISP performance thus spurring ISPs to optimize
their networks
– facilitate public access to Internet measurements
– scalability for global Internet
•
•
Based on original Vern Paxson’s NPDs, where a collection of measurement
probes cooperatively actively measures the properties of Internet paths and
clouds by exchanging traffic among themselves, emphasis on scalability
NIMI is targeted as the fundamental measurement platform, with other
measurement infrastructures to be built on top of it
NIMI - II
•
Design goals:
–
–
–
–
–
–
–
–
–
–
Work in administratively diverse environment
Work in commercial Internet
Support a wide range of measurements
Conduct active measurements rather than passive (because of commercial Internet)
Scale to thousands of measurement platforms (minimizing measurement and
control traffic)
Give platform owners full administrative and policy control over their platforms…
…but make it easy for platform owners to delegate control (and not exercise it)
…and provide fine-grained control when needed
Build in solid security and authentication from the beginning (system design
integrity suffers when security mechanisms are added late in the design process)
Require minimal administration, maximal self-configuration (scalability)
NIMI - III
•
•
•
NIMI Architecture goals:
– Measurement requests
– Credential-based authentication (public-key cryptography)
– Policy based on ACLs (access control lists, representing NIMI platform’s measurement and
control policies) and credentials
– Security and Privacy (public-key cryptography)
– Delegating trust (hierarchical with subtables)
– Autoconfiguration
NIMI Architecture conceptually consists of NIMI platforms (perform measurements and record
results) and different external components that analyze the measurements and control the platforms
Each NIMI platform runs a measurement server whose job is to:
– authenticate measurement requests as they arrive
– check requests against platform’s policy table
– queue them for future execution
–
–
–
execute them at the appropriate time
bundle and ship the results of the requests to whatever destination the request specified
delete results when instructed to
NIMI - IV
•
Internally, the NIMI probe is divided into two distinct daemons:
– nimid is responsible for communication with the outside world and performing
access control checks
– scheduled does the actual measurement scheduling, execution and result packaging
•
External elements:
– CPOC (Configuration Point of Contact), which serves to configure and administer a
set of NIMI probes within the CPOC’s sphere, in particular:
• CPOC provides the initial policies for each distinct NIMI probe, and, over time, provides
updates to these policies
• When needed, CPOC acts as a repository for NIMI public keys and measurement modules
– MC (Measurement Client), which end users use to access the infrastructure. MC
communicates directly with the NIMI probes involved in the measurement (CPOC
is not involved in the processing of individual measurement requests)
– DAC (Data Analysis Client) acts as repository and post-processor of the data
returned by NIMI probe(s) upon completion of a measurement
NIMI - V
• NIMI is modular: it has no knowledge of particular measurement tools,
so the tools are standalone plug-in modules produced by third parties
• Currently (year 2000) the following measurement modules have been
deployed: traceroute, mtrace, treno, cap/capd, zing, mflect,
traffic/discardd, ftp
• Two major problems are currently being solved by NIMI team: how to
update the software on the measurement platforms securely, and to
constrain the resources consumed by different measurements
• Measurements of standardized performance metrics (RFCs by IETF
IPPM WG)
• Other research groups that develop probe platforms for smaller groups
of sites: IPMA (Merit Network), Surveyor (Advanced.org), Felix
(Telcordia)
Projects: Surveyor
•
•
•
•
•
Surveyor, a measurement infrastructure that measures end-to-end unidirectional delay,
packet loss, and route information along Internet paths
Deployed in Abilene (Internet2 network) at about 60 higher education and research sites
throughout the world, measures over 1500 paths among these sites (almost full mesh),
including transatlantic and transpacific paths
Features:
– Techniques for scalable and accurate measurements, tools for analysis, architecture
for long-term storage and data access
– Stress the importance of one-way measurements as opposed to traditional roundtrip measurements
Goal: to create architecture for consistent Internet measurement to promote accurate
common understanding of performance and reliability of the Internet paths
Measures one-way delay (RFC 2679) and one-way loss (RFC 2680) over long periods of
time, according to metrics specification developed by IETF IPPM workgroup (see RFC
2330, 2678); routing information (modified V.Jacobson’s traceroute)
Surveyor - II
• Emphasis is on unidirectional properties:
– many Internet paths are asymmetric (sequence of routers in forward and
reverse directions differ). In presence of asymmetric paths, traditional
round-trip measurements (e.g., “ping” for latency) measure the
performance of two different paths altogether
– even if the path is symmetric, load (and therefore performance) may be
radically different in the two directions. Examples: transatlantic and
transpacific paths; as a particular example, traffic from USA to New
Zealand is roughly 4 times higher than in reverse direction. Web caches in
New Zealand take advantage of this asymmetry
– clock synchronization is necessary for one-way measurements, therefore
global positioning system (GPS) hardware is used (precision is
synchronization is better than 1 millisecond; in practice, 2 microseconds
on the average)
Surveyor - III
• Dedicated measurement hardware:
– to ensure that each machine is uniform and runs with a controlled load
(unlike general-purpose multi-user workstations, prone to noise in
measurements)
– special hardware to synchronize clocks, which is easier to to install and
maintain using dedicated computer
– to provide a high level of security (to ensure measurements and the
measurement instrument are not compromised and are not sources of
attacks)
• Continuous measurement to accurately record traffic fluctuations
• Long-term performance data for provisioning, capacity planning and
overall engineering of networks and network research
• Real-time access to performance data – for real-time troubleshooting
Surveyor - IV
•
Measurement methodology:
– Delay and loss are measured using the same stream of active test traffic. A Poisson
process on the sending machine schedules test packets (, average sending rate is 2
packets/s). New Zealand and Swiss sites use  = 1. 12-byte UDP packets are used
(minimal size, for beginning)
– Fractal (self-similar) nature of Internet traffic: frequent snapshots are desirable.
However, the amount of test traffic should not perturb measurements
– Also, disk space was a limitation: 178,800 measurements a day per path required
initially more than 2 Mbytes of disk space per day plus relational database
overhead
– Delay: receiver subtracts timestamp in received packet from current time
– Loss: a packet does not arrive in 10 seconds (sequence numbers, Poisson process)
– Route information:
• Modified traceroute, with 10 (instead of 3) ICMP probes in case of failure, and 1 (instead
of 3) – in case of success; traceroutes are Poisson-generated with period 10 minutes on
average, with a forced traceroute if interval exceeds 10 minutes
Surveyor - V
•
Surveyor infrastructure:
– Dedicated measurement machines: Dell desktop PC with 200-400 MHz Pentium
processors, NIC (10base-T, 100base-T, FDDI, OC-3), GPS card (TrueTime; ISA,
PCI; with antenna and GPS daughter board by Trimble), BSDI OS v.3.1. These
machines report to central database
– Database (4-processor Silicon Graphics Origin 200 with FibreChannel 600-Gb
RAID storage array): catalogs all the performance data from dedicated
measurement machines (transferred using ssh)
– Analysis server: performs analysis (generates summary statistics for each path;
produces 24-hour plots) and posts results on the Web (3 daily plots for each path:
delay summary, loss summary, histogram of delay values)
•
Current work:
– wider Abilene deployment: test packets with DiffServ byte set to test QoSenhanced paths, deployment inside Qbone testbed, (IPv6 and multicast paths?)
– SNMP alerts about “interesting” paths, more near-real-time access, more analysis
enhancements
Other Measurement Efforts
•
•
•
•
•
IPMA project at Merit Network, Inc.
– routing protocol collectors to understand dynamic routing behavior in the
Internet
PingER project at Stanford
– complex measurements at monitoring sites throughout the high-energy
physics research community
WAND project in New Zealand
– passive (!) unidirectional measurements using GPS
RIPE Test-Traffic project in Europe
– IETF IPPM unidirectional metrics, similar to Surveyor
Felix project at Telcordia
– Prototype monitoring infrastructure to track “health” of large networks, without
requiring prior knowledge of network topology or routing information
– Linear Decomposition Algorithms (LDA) for topology discovery and performance
evaluation of specific network elements
Skitter tool by CAIDA
• Macro-level analysis of the Internet
• Measures forward IP paths from a single source to many destinations
(1998: 23,000) using traceroute-like incrementing of the TTL of each
hop
• Key goals:
– to identify and track routing behavior, e.g. providing indications of lowfrequency persistent routing changes
– to assist in dynamic discovery of network connectivity through probimh
paths to destinations spread throughout IPv4 address space
• A secondary goal is to collect RTTs for the paths to each of these
destinations for analysis of general trends in Internet performance
Conclusion: Challenges
• Less invasive active measurements
• More effective passive measurements
• Improving impact of measurements – aggregating, mining and
visualizing massive data sets in ways that are useful to many people
• Mapping IP addresses to more useful entities: specific systems, their
geographic location, countries, etc.
• Both top-down and bottom-up momentums are needed
• Internet data analysis is no longer justifiable as an isolated activity; the
Net has grown too large and under auspices of too many independent,
uncoordinated entities, therefore coordinated effort is in great need