Testing Bandwidth around the world

Download Report

Transcript Testing Bandwidth around the world

Igniting Internet Innovation
Select a Show (ALL) or Demo
Measuring the
Digital Divide
PingWorld
PingER
Internet Traffic
Characterization
Overview:
SLAC
FNAL
SLAC Analysis
Bandwidth
Monitoring
Monitoring Available
Bandwidth
IEPM-BW Overview
ABwE Overview
IEPM-BW @ SC2003
Bandwidth Challenge
World Wide Sharing
of Internet
Performance
Information
ABwE@SLAC
ABwE@SC2003
Real Time Monitoring
Iphicles Antonia
SC2003-L2
MonALISA
NP@SLAC
NP@SC2003
Testing Bandwidth
Availability and
Achievability around the
World (IEPM-BW)
An Internet End-to-end
Performance Monitoring (IEPM)
System
History
• For SC2001, SLAC put together a package to test the
achievable bandwidth from Denver to collaborator
sites around the world.
• Development continued, and today there are 8
monitoring sites:
•SC2003 showroom
•Internet2
•SLAC
•University of Manchester, UK
•FNAL
•INFN, Italy
•Georgia Tech
•NIKHEF, Amsterdam
Methodology
• On a regular basis tests are run to
measure available and achievable
bandwidth using:
– Available BandWidth Estimation (ABWE)
– Iperf (multiple and single streams)
– BBFTP (BaBar file transfer protocol)
– Ping
– Traceroutes and Reverse traceroutes
• The results are analyzed and presented in
several different ways
Time Series Graphs
Time series plots show variations over time
and allow for the visualization of
correlations
Note that when the ping times rise, the ABWE,
single stream (iperf1), multiple stream (Iperf), and
BBFTP results are lower.
Another way to look at the data is
via Diurnal Graphs (BBFTP example)
•The values are sorted by weekday/weekend periods and
grouped by time of day
•The Green points are weekday values and the green line is a fit
to the weekday values. There is clearly a regular diurnal
variation here also.
•The Red points are weekend values and the red line is a fit to
them.
Scatterplots provide another perspective on
the data
• For node1.lbl.gov, multiple
stream (iperf) and single
stream (iperf1) bandwidth
are about the same
• For node1.mcs.anl.gov the
single stream (iperf1)
bandwidth is much less that
the multiple stream iperf
This indicates that for the path
to node1.mcs.anl.gov there
is some bandwidth/stream
limiting
Histograms
Histograms can be used to display the distribution of
the bandwidth measurements.
Available Bandwidth
Frequency
Frequency
Achievable Bandwidth
These histograms for Bbftp and Abwe have
similar distributions, but the peak of the
distribution for Bbftp (achievable bandwidth) is
around 63 Mbits/s, while the Abwe available
bandwidth peak is around 70 Mbits/s
Trace routes are also done
A trace route shows the path through
the network from one node to
another
• Forward and reverse trace routes
are run about every 15 minutes
• Each differing trace route for a
given path is given a unique route
number.
• Several different displays are
available:
Traceroute Summary Table
Select time range
Select node(s)
HOUR
Colored boxes are unique route numbers. “.” indicates no
change from previous trace route.
And a map of
the routes is
shown: See
next page
Graphical Trace route
Node1.lsa.umich.edu
Routed via Abilene
Routed via Calren2
SLAC
node1.lsa.umich.edu
and hour 03:00 were
selected on the
traceroute summary
page
Long Term Trends
• The bandwidth test data,
traceroutes, and their analyses are
saved for future reference.
• Made available currently by:
– IEPM-BW User Interface
– MonaLISA
– Web Services
– PingER User Interface
More Information
• IEPM-BW Home
• IEPM-BW at SLAC
The PingER Project
An Internet End-to-end Performance
Monitoring (IEPM) System
History of the PingER
Project
• Early 1990’s: SLAC begins pinging nodes
around the world to evaluate the quality of
Internet connectivity between SLAC and
other HEP Institutions.
• Around 1996: The PingER project was
funded making it the first IEPM tool
available to the HEP community.
• Today: Believed to be the largest internet
end-to-end performance monitoring tool in
the world
PingER Today
• Today, the PingER Project includes 35
Monitoring-hosts in 12 countries. They
are monitoring Remote-hosts in 80
countries.
• THAT COVERS 75% OF THE WORLD
POPULATION AND 99% OF THE
INTERNET CONNECTED POPULATION!!!
PingER Architecture
There are three types
of hosts
• Remote-hosts:
hosts being
monitored
REMOTE
REMOTE
REMOTE
REMOTE
REMOTE
REMOTE
REMOTE
REMOTE
PingER Architecture
There are three types
of hosts
• Remote-hosts:
hosts being
monitored
Monitoring
• Monitoring-hosts:
Monitoring
Monitoring
run PingER
software to the
REMOTE
REMOTE
Remote-hosts
REMOTE
REMOTE
Monitoring
REMOTE
REMOTE
REMOTE
REMOTE
PingER Architecture
There are three types
of hosts
Archive
• Remote-hosts:
Archive
hosts being
monitored
Monitoring
• Monitoring-hosts:
Monitoring
Monitoring
run PingER
software to the
REMOTE
REMOTE
Remote-hosts
REMOTE
REMOTE
REMOTE
• Archive/AnalysisREMOTE
REMOTE
hosts: gather data
from Monitoring-sites
Monitoring
REMOTE
Methodology
• 11 100Byte pings are sent to each
Remote-host every 30 minutes.
 The first ping is used to prime the name server
cache and is then discarded.
• Depending on the quality of the
host’s connectivity, PingER may also
sends 10 1000Byte pings to each
Remote-host.
Methodology
• Round Trip Times, losses, out of order and
duplicate packets from pings are recorded
locally at the Monitoring-hosts.
• Data is gathered from Monitoring-hosts on
a daily basis by the Archive/Analysis-hosts
at SLAC and Fermi Lab.
• Archive/Analysis-hosts also provide web
based presentation and interactive
analysis tools.
Uses and Examples
• PingER can be used
to chronologically
track network
infrastructure
changes.
 Pinpoints network
upgrades
 Illustrates effects of the
upgrades
 Displays a reduction in
congestion based on
drops in average packet
loss.
Uses and Examples
• PingER can be used
to identify the need to
upgrade a network.
 Reported heavy packet loss
on network in early 2002
(10% - 30%)
 Bandwidth was increased
from 128Kbps to 512Kbps in
May 2002 to reduce loss.
 Upgrade of system in May
2002 resulted in a packet
loss reduction to roughly
0.1%.
Other Uses
Troubleshooting




Discerning if a reported problem is network
related
Identify the time a problem started
Provide quantitative analysis for ISPs
Also identifying step functions, periodic
network behavior, and recognize problems
affecting multiple sites.
In Summary
PingER provides ongoing support
for monitoring and maintaining
the quality of Internet
connectivity for the world wide
scientific community.
Future
• Plans include maintaining,
upgrading and expanding the
PingER deployment.
• The goal is to have the data,
analysis results and reports
made available to all interested
users via the data selection,
analysis, and display tools.
For More Information
• What is PingER?
• The PingER Tools
• Guided Tour of the IEPM Project
What is the composition of
the Traffic through your
Internet Connection?
IPFIX
(Internet Protocol Flow Information
eXport)
can help you find out
What is IPFIX?
• Router and switches can
generate records which describe
the traffic passing through them
• At SLAC we currently use Cisco
NETFLOW
• Records contain:
– source and destination node IP addresses
– Protocol and application information
– Number of bytes and packets
Data Analysis Categories
• There are many ways to analyze
the data. We currently do it by
Protocol, Application, and
Program
• Four metrics to consider include:
–
–
–
–
Number
Number
Number
Number
of
of
of
of
bytes
packets
flows
records
• They all present different
pictures of the data
IPFIX can also tell you:
• Which nodes the traffic is coming
from
• Which nodes the traffic is going to
• How the traffic varies by time of
day and day of the week
Example: Analysis of a
day’s data by Protocol and
Application
These “Application
Buckets” were defined
by the DOE/MICS
committee in 1st quarter
2003
Graphical displays:
• Aid in visualization
• Help to readily identify out of
ordinary traffic
The next 2 slides are an example
of this at work.
outgoing | incoming
outgoing | incoming
outgoing | incoming
outgoing | incoming
Example: ICMP attack
Negative values represent Outgoing traffic
Positive values represent Incoming traffic
ICMP attacks often are not byte or packet count heavy.
However they result in a lot of record and flows as seen in
the bottom two graphs.
outgoing | incoming
outgoing | incoming
Breakdown by SLAC Research
Programs
Negative values represent Outgoing traffic
Positive values represent Incoming traffic
Note the unusual outgoing ICMP pattern in the protocol record
graph on the left and the program traffic in the right hand
graph. This indicates which program had the outgoing ICMP
attack.
Longer Term Graphs
present overall picture
The exploits starting around August 18, 2003
were ICMP scan attacks. Note massive
incoming ICMP flows, but no equivalent
outgoing flows.
outgoing | incoming
Break in data is September 2003 is due to a
power outage at SLAC.
Application Analysis
Graphs
Most of the traffic, as
measured by bytes and
packets is bulk (“GRID”) data
transfer
A large part of the traffic as
measured by flows & records,
is not bulk transfer but WWW
and services.
Additional Data Mining
• How much traffic is attributable to
which collaborators
1 terabyte
Traffic by Top Level Domain
= 1 terabyte/day
Note: non-linear scale
Traffic by Top 2 Level Domains
= 1 terabyte/day
Note: non-linear scale
More Information
SLAC Netflow Analysis
IP Flow Information eXport
Protocol (IPFIX Information
Model)
Netflow at Fermilab
Network traffic analysis
• At FNAL we currently use Cisco
NETFLOW and open source Flowtools to characterize the network
traffic goes through our border
router
• Records contain:
– source and destination node IP addresses
– Protocol and application information
– Number of bytes and packets
Distribution of a week’s network traffic
by ‘Traffic Bucket’
(defined by DOE/MICS)
All Inbound Traffic, Oct 11, 2003
All Outbound Traffic, Oct 11, 2003
1%
0%
0%
0%
9%
10%
0%
1%
2%
0%
3%
8%
0%
WEB
DataGrids
INTERACTIVE
DATABASES
EMAIL
SERVICES
MONITORING
PHYSICS_ANALYSIS
OTHER
WEB
DataGrids
INTERACTIVE
DATABASES
EMAIL
SERVICES
MONITORING
PHYSICS_ANALYSIS
OTHER
79%
87%
Example of the FNAL traffic,
separated by Buckets
(defined by DOE)
Inbound traffic for a week (ending at 11pm, Oct 12th)
Outbound traffic for a week (ending at 11pm, Oct 12th)
Most of the traffic is SCIENTIFIC DATA or DATA GRID related,
bulk data transfers
Breakdown by FNAL Research
Programs
Inbound traffic for past 36 hours
(ending at 1pm, Oct 24th)
Inbound traffic for past 10 days
( ending at 1pm, Oct 24th)
Outbound traffic for past 36 hours
(ending at 1pm, Oct24th)
Outbound traffic for past 10 days
( ending at 1pm, Oct 24th)
Fermilab traffic flow monitoring
utility
Snapshot of the average daily traffic rate, taken on 10/24/2003
Traffic by Top Level Domain
Note: its Logarithmic scale on on X
NP-2000 Monitoring Appliance
Self-contained passive monitoring appliance
Enables:
• Auditing of network traffic
• Response-time anomaly detection
• Troubleshooting and diagnosis
• Long-term planning
Powerful Java data visualization interface
Network Physics , Mountain View, CA
www.networkphysics.com
Long-Term Throughput Monitoring
• Data collection is
self-managed
• No configuration
required
• Granularity matched to
requested time-scale
• Breakdown available by
protocol, application,
destination, …
Network Physics , Mountain View, CA
www.networkphysics.com
Breakdown by Protocol
Traffic composition by application type (TCP/UDP port)
Sample data showing
rogue application drilldown
Real Networks
Gnutella - Music Sharing
Flash Point - Games
Internal Users IP Address
AOL Streaming Audio
Real Broadcast Network – rbn.com
Network Physics , Mountain View, CA
www.networkphysics.com
Distribution by Destination AS
• All metrics summarized by destination AS
• Useful for discovery of logical groupings
• Summaries by AS-path also available with
optional BGP feed
Network Physics , Mountain View, CA
www.networkphysics.com
Traceroute Topology
Graphical traceroute analysis exhibits routing issues
View historical
traceroute data
Measures hop-by-hop delay metrics
to localize network latency problems
Network Physics , Mountain View, CA
www.networkphysics.com
Business-Level Grouping
User-defined IP address groupings to match meaningful organizations
Expand groups to
get member info
(IP address or protocol)
Network Physics , Mountain View, CA
www.networkphysics.com
Traffic Charts by Groups
All traffic metrics can be viewed by user-defined groups
Network Physics , Mountain View, CA
www.networkphysics.com
Connection Response-Time Analysis
Analyze end-user response time of TCP connections by:
• Connection setup time
• Application processing time
• Data transfer time
• Retransmission delays
• Round-trip time
Packet loss observed to
contribute about 50% of
data transfer time
Incident of a server problem
Network Physics , Mountain View, CA
www.networkphysics.com
Managing The Flows
Flows Link Network to
Business Priorities
–
–
–
Business impact
End-to-end performance
End user experience
Questions answered by flows
Customers, Internal
Users, Partners
• Who has a problem, what is impacted, why is it happening?
• Is the problem on my network, my providers network, the servers
or the application?
• Who is using what network resources, how is that impacting others?
• Can my power users get their job done?
• Am I meeting my service levels, are my providers meeting theirs?
Network Physics , Mountain View, CA
www.networkphysics.com
Applications
Business-Network Integration
Network Physics , Mountain View, CA
www.networkphysics.com
Metrics Definition
Server
NP-1000
Client
SYN
Time to First
Byte
Connection
Setup Time
Fetch
Time
Application
Response
Time
SYN/ACK
ACK
REQUEST
DATA
ACK
DATA
ACK
DATA
ACK
FIN
ACK
FIN
ACK
* Round Trip Time is measured for every DATA/ACK pair.
Network Physics , Mountain View, CA
ACK
www.networkphysics.com
Time
Network
Transfer Time
Round Trip
Time*
Connection
Duration
Software Architecture
Designed for: Flexibility, Scalability and Performance
flowstats
Flow
Acquisition
filter
logger
.
.
.
.
.
.
filter
logger
.
.
.
DB
Group
Aggregation
Analyzer
Database
Management
Data flow
Network Physics , Mountain View, CA
Analyzer
Intelligence/
Correlation
Data queue
www.networkphysics.com
UI
.
.
.
UI
Data
Presentation
Software Architecture
NP/BizFlow Services
Charts, Tables,
Graphs
Troubleshooting
Reporting
ISP Management
Intelligent Correlation
Business-Network Integration (BNI) Engine
Business
Groups
Group-to-Group
Business Links
Business
Conversations
Business
Apps
Unified Native Instrumentation
Performance
Utilization
Network Physics , Mountain View, CA
Route
BGP
www.networkphysics.com
Packet
New
MonALISA
MONitoring Agents using a
Large Integrated Services
Architecture
Is a Web Services based facility
currently in development which
provides real time access to
performance information located in
data repositories around the world.
MonALISA presents a map of the world with clickable icons on
the available data repositories
Shown here is the initial MonALISA page with the color encoded
ping times between the active repositories displayed
The earth can be rotated to show different perspectives. Here
the earth has been rotated to center on the North Pole. The
green to yellow lines show the ping RTTs between nodes.
Green lines represent shorter RTTs and yellow lines represent
longer RTTs
Placing the mouse over a route will display the latest
RTT and Packet Loss statistics. In this case from SLAC
to FNAL.
Ping: SLAC->FNAL: RTT=51 ms, Lost Pkts=0%
More detailed information can be extracted from the repositories. For
example: selecting the SLAC repository pops up a window which
facilitates drilling down to statistical details of specific nodes. The
node selected is in CERN. The box in the upper right hand corner shows
the statistics available. The LostPackages and RTT parameters have
been selected. When the “Plot” button is selected…
A real time plot of latest RTT & Packet Loss between
SLAC and CERN is displayed.
Other metrics are available
• Node utilization metrics
• Bandwidth test metrics
Other uses:
• Troubleshooting bumps in the
night
For more information on
MonALISA
See: MonaLisa
And the MonALISA demo here in
the SLAC and FNAL booth.
ABwE:
Basic characteristics:
• Interactive ( reply during 1 second)
• Very low impact on the network traffic (40
pkts to get value for destination)
• Simple and robust (responder could be
installed in any machine in the network)
• Keyword function for protection client-server
communication
• Both direction measurement
• Same resolution as other similar methods
ABwE:
Basic terminology:
• Avaialble Bw = Capacity – Used Load
• ABwE is able to distinguish two basic
statuses “free” – with no PPdelay and
“traffic” ~ PPdelay
• We measure packet delivery time during
“free” path and estimate it into DBC.
• We also measure PP dispersion time during
“loaded” path and estimate it into level of
XTR (cross traffic)
• ABwE reports 3 value: ABw, DBC and XTR
ABw = DBC - XTR
ABwE:
Basic principles:
PP2
PP1
time
rtt
Td-send
S
Probe Receiver
R
hop
T-stamp
Probe-Sender
Cross-traffic input
cross traffic
packets
Td-receive
PP1
time
PP2
PP2
PP1
T-stamp
Td-receive
T-stamp
Detail Timing for PP
on the way via experimental path
PP2 PP1
time
(stretching, compressing and contracting)
move direction
Td-send
Probe-Sender
H1
H2
622 Mbps
1000 Mbps
S
H3
H4
622 Mbps
155 Mbps
hop
hop
Cross-traffic
hop
Cross-traffic
1000 Mbps
hop
Cross-traffic
Cross-traffic
Input-H1
PP2 PP1
OUT-H1
time
Input-H2
Output-H2 (stretching)
Input-H3
Output-H3 (contracting)
Td23
Input-H4
Free spaces for Cross-traffic
Td34
PP2
Td-receive
Td23 = LPP/C23
Free spaces for cross-traffic
Static Dispersion Delay
Td-receive = Td34 = Td23
PP1
R
(Td)
Td –Dispersion time
Td –Dispersion time
CESNET
SLAC
CALTECH
NERSC
CERN
APAN
Basic relation between: Abw DBC XTR
The principles of gradually narrowing bandwidth
High Xtraffic -> Impact (in t1)
No impact (in t1)
load
load
1000
1000
622
622
622
155
Light beam
Light source
622
622
DBC
622
High load at High speed Segments
generate DBC for the path
Abw DBC XTR in practical example
Heavy load (xtraffic) shows new DBC - bottleneck
Normal situation
ABwE / MRTG match: TCP test to UFL
CALREN shows sending
traffic 600 Mbits/s
Heavy load (xtraffic) appeared in the path
(defined new DBC in the path)
Normal situation
IPLS shows traffic
800-900 Mbits/s
ABwE and IEPM (Iperf)
Mib.infn.it
Internet2.edu
Man.ac.uk
ANL.gov