Transcript Document

IP Network
Performance
Measurements
Bruce Morgan
AARNet Pty Ltd
Just checking…




Why metrics?
Metrics are important to identify network
related issues especially performance
Metrics can be diverse
No one metric is suitable for all needs
Types of Measurement

Active Measurement

Injecting measurement data into the
network
 E.g.

UDP, TCP, ICMP packets
Passive Measurement

Measuring what is there already
The Problem

Measurement of the network cloud is difficult
– but is essential if we are to gauge user
perception of the internet
The World Wide Wait
Some problems are host based, while others
are network based:





Physical latency
Network queuing and delays
Server processing delay
Timeouts and packet loss
TCP protocol delays
The Dark Cloud





Diverse network paths
Asymmetric paths
Policy routing
Committed Access Rates
Firewalls and filters
IP Performance Metrics


Framework spelt out in RFC 2330 from the
IPPM Working Group
Goal: “to achieve a situation in which users and
providers of Internet transport service have an
accurate common understanding of the
performance and reliability of the Internet
component 'clouds' that they use/provide.”
On the Standards track…




RFC 2678 IPPM Metrics for Measuring
Connectivity
RFC 2679 A One-way Delay Metric for
IPPM.
RFC 2680 A One-way Packet Loss Metric for
IPPM.
RFC 2681 A Round-trip Delay Metric for
IPPM.
A One-way Delay Metric




Type-P-One-way-Delay
The P is for protocol
A Poisson distribution is chosen to inject
packets
Both source and destination require time
synchronisation
A Round-trip Delay
Metric



Many applications do not perform
well with large end to end delays
Ease of deployment compared to
one-way metrics
Ease of interpretation
Ping


Two way path measurement based on RTTs
(return trip times)
Choice of monitored address
Host
 Router interface
 Router Loopback address

Packet Loss on ICMP

Loss Asymmetry



Loss = 1 – ((1 – Lossfwd).(1-Lossrcv))
Path Asymmetry
Possibility of Internet Service Providers
(ISPs) or sites or even hosts rate limiting
(including complete blocking) ICMP echo and
thus giving rise to invalid packet loss
measurements.
PingER


(Ping End-to-end Reporting) is the name given
to the Internet End-to-end Performance
Measurement (IEPM) project to monitor end-toend performance of Internet link
Uses ICMP RTT for measurement
Surveyor





Dedicated PC running Unix at
key sites
GPS for clock synchronization
One way delay & loss
measurements
Community is Internet 2 clients,
HEP sites collaborating with
Surveyor
PingER/Surveyor
Comparison


PingER uses the ICMP echo facility (ping) and
thus only makes round trip measurements.
Surveyor uses a GPS system to synchronise
time between sites and makes one way
measurements.
PingER/Surveyor
Comparison


Surveyor requires a dedicated platform (PC)
to be installed at each site that is monitored,
whereas PingER uses an existing host with
no special software installed at the
monitored site.
PingER cheaper!
PingER/Surveyor Comparison
Surveyor is more accurate and better for short
term measurement, especially for sites which
have good connectivity.
PingER is a more light weight solution, requires
less management, uses less bandwidth,
requires less storage, and nothing needs to be
installed at the remotely monitored sites and is
good for remote sites with poor connectivity.
PingER/Surveyor
Comparison
Surveyor
PingER
Method
1 way delay
2 way ping
Hosts
dedicated
selected
Frequency
~2*2/s
~ 0.01/s
Timing
Poisson
<2/s>
bursty (30 min
intervals)
Monitors
~30
18
Remotes
~30 (~full
mesh)
~300
(hierarchical)
Pairs
~900
~1200
Storage
~38Mbytes /
pair / mo
~ 0.6 Mbytes / pair
/ mo
PingER - Surveyor
Complementarity







Agree well
Surveyor has one way measurements, PingER only
round-trip
Surveyor dedicated platforms & strong central
management
experience with PingER shows this has benefits.
PingER more parsimonious/lightweight (bandwidth,
disk space, cpu)
but necessarily less accurate especially at small (hourly)
time resolution on low loss links.
PingER good for looking at long term trends &
grouping where statistics are less a problem
TCP SYN / ACK tools


In order to truly measure Web
traffic, which is almost entirely
TCP/IP traffic, it is best to probe
using TCP/IP rather than ICMP
SYN/ACK mechanism proves
useful for this purpose
TCP SYN/ACK tools
3 way handshake
Send SYN seq=x
Receive SYN
+ACK
Send ACK y+1
Receive SYN
Send SYN seq=y,
ACK x+1
Receive ACK
TCP SYN/ACK


Connection request by a SYN and
measures the time taken by the target to
respond with an ACK
The connection is promptly cleared by
another exchange of packets, this time
containing the FIN control flag.
TCP SYN/ACK tools
TCP SYN/ACK tools
Metric
Samples
Ping
30000
SYN/ACK
30000
Average
Standard
Deviation
Median
Minimum
Maximum
161.6 ms
33.0 ms
158.0 ms
11.6 ms
154.4 ms
151 ms
1222 ms
153.0 ms
150 ms
610 ms
Lost
packets
528 (1.76%)
469 (1.56%)
TCP SYN/ACK tools
Sting



Sting is a TCP-based network measurement tool
that measures end-to-end network path
characteristics. sting is unique because it can
estimate one-way properties, such as loss rate,
through careful manipulation and observation of
TCP behaviour.
Avoids increasing problems with ICMP-based
network measurement (blocking, spoofing, rate
limiting, etc).
http://www.cs.washington.edu/homes/savage/sting/
Current AARNet Measurements


MRTG
Perf
ICMP RTT measurements
 ICMP Packet Loss measurements


Wa



Host/endpoint reachability
TCP HTTP file transfer measurements
Netflow data
MRTG





Uses SNMP interface statistics
Provides multi-functionality from router
temperature to throughput
Visualisation package
Lacks granularity with time
Deployed at each RNO
MRTG graphs
WARNO/ International traffic on June 18
WARNO / VRNO traffic on June 18
Perf Tool




Perfd – uses a bsd based ping for RTT and
packet Loss calculation
Perf – web display tool of the data
Deployed at each RNO to measure all points of
the mesh
Used to check SLA agreement with Cable and
Wireless Optus
Perf – LA Cable
21 June 2000 ICMP Loss
Perf – LA Cable
21 June 2000 ICMP RTT
Perf – Optus IA3
21 June 2000
Packet Loss
Perf – Optus IA3
21 June 2000
ICMP RTT
Perf 6 June
Optus international ICMP Loss
Perf 6 June
Optus international ICMP RTT
Perf 6 June
ACTRNO ICMP Loss
Perf 6 June
ACTRNO ICMP RTT
WA




“what’s alive” is based on nocol
Checks reachability of hosts/endpoints
Uses ICMP echo, but could be easily
extended to check on service level availablity
Frequent check of all hosts
TCP based Measurements



Uses an active http file transfer
Measure at host
Measure from Netflow records
Can detect retransmissions
 These may occur from packet loss/out of
sequence packets in either direction

Load balancing impacts



Can use contiguous IP addresses on
monitoring machine to monitor per destination
load balancing
Monitoring machine can determine
performance on link but unable to determine
which link is used.
If a link fails then traffic will divert to other links
Load Balancing – round robin
Load Balancing – per packet
Load Balancing – 14 May
Load Balancing – 14 May
Load Balancing – 14 May
Flows…


A flow is taken to be either a
bidirectional or unidirectional
communication between a source and
destination host. The communication
shares an address/port
correspondence.
The biggest indicator of scan/DOS
attacks are generally flow records!
Netflow Records

We keep detailed Flow records
Timestamps and durations
 Source/destination addresses
 Protocol Types
 Cumulative IP Flags
 ICMP control types

Netflow Records



Useful for determining metric targets eg top
100 WWW hosts
Can derive useful measurements from the
netflow data itself
Be wary on derived throughput – flows can
take a long time.
What are the choices?



Various tools and methods are available
No one tool is good for everything
Combinations of tools, both passive and
active, leads to interesting and more detailed
analysis
AARNet futures…





Deployment of measurement machines
Monitoring and measuring ICMP, TCP and
UDP
Monitoring QOS
Deploying one-way and round-trip metrics
To ensure the network does what its supposed
to do…