Digital Divide and PingER

Download Report

Transcript Digital Divide and PingER

SLAC IEPM PingER and BW
monitoring & tools
Presented by Les Cottrell, SLAC
At LBNL, Jan 21, 2003
www.slac.stanford.edu/grp/scs/net/talk03/lbl-jan04.ppt
1
PingER
History of the PingER
Project
• Early 1990’s: SLAC begins pinging nodes around the world to
evaluate the quality of Internet connectivity between SLAC
and other HEP Institutions.
• Around 1996: The PingER project was funded making it the
first Internet end-to-end monitoring tool available to the HEP
community.
• Today: Believed to be the most extensive Internet end-to-end
performance monitoring tool in the world
2
PingER
PingER Today
• Today, the PingER Project includes 35 Monitoringhosts in 12 countries. They are monitoring Remotehosts in 80 countries. Over 55 remote sites.
• THESE COUNTRIES COVER 75% OF THE WORLD
POPULATION AND 99% OF THE INTERNET
CONNECTED POPULATION!!!
Just added Pakistan!
Colored by
region
Colored
countries have
remote PingER
hosts
3
PingER
PingER Architecture
There are three types of
hosts
• Remote-hosts:
hosts being
monitored
Monitoring
• Monitoring-hosts:
Monitoring
Monitoring
Make ping
measurements to
REMOTE
REMOTE
remote hosts
REMOTE
REMOTE
REMOTE
• Archive/AnalysisREMOTE
REMOTE
hosts: gather data from
Monitoring-sites, analyze
& make reports
Archive
Archive
PingER
Monitoring
REMOTE
4
Methodology
• Every 30 mins send 11*100Byte followed by
10*1000Byte pings from monitor to remote host
• Low impact:
– By default < 100bits/s per monitor-remote host pair
– Can reduce to ~ 10bits/s
– No need for co-scheduling of monitors
• Uses ubiquitous ping
– No software to install at any of over 500 remote hosts
– Very important for hosts in developing countries
• By centrally gathering the data, archiving, analyzing
and reporting, the requirements for monitoring hosts
are minimal (typically 1-2 days to install etc.)
5
Worldwide performance
• Performance is
improving
• Developed world
improving factor of
10 in 4-5 years
• S.E. Europe,
Russia, catching up
• India & Africa
worse off & falling
behind
• Developing world
3-10 years behind
• Many institutes in developing world have less performance
than a household in N. America or Europe
6
Current State – Aug ‘03
(throughput Mbps)
Remote regions
Monitoring Country
• Within region performance better
– E.g. Ca|EDU|GOV-NA, Hu-SE Eu, Eu-Eu, Jp-E Asia, Au-Au, RuRu|Baltics
• Africa, Caucasus, Central & S. Asia all bad
Bad < 200kbits/s < DSL
Poor > 200 < 500kbits/s
Acceptable > 500kbits/s, < 1000kbits/s
7
Good > 1000kbits/s
Network Readiness
Index vs Throughput
• NRI from Center for International Development,
Harvard U.
http://www.cid.harvard.edu/cr/pdf/gitrr2002_ch02.pdf
Internet for all focus
5.92
5.79
5.74
5.58
5.51
5.44
5.35
5.33
5.31
5.29
5.28
5.22
5.18
5.10
A&R focus
NRI Top 14
Finland
US
Singapore
Sweden
Iceland
Canada
UK
Denmark
Taiwan
Germany
Netherlands
Israel
Switzerland
Korea
• NRI correlates reasonably well with
Network Readiness
8
Typical uses
•
Troubleshooting
 Discerning if a reported problem is network related
 Identify the time a problem started
 Provide quantitative analysis for Network
specialists
 Identifying step functions, periodic network
behavior, and recognize problems affecting multiple
sites.
 Setting expectations (e.g. SLAs)
 Identifying need to upgrade
 Providing quantitative information to Policy
makers & Funding agencies
 Seeing the effects of upgrades
PingER
9
Pakistan performance
Routes: ESnet (hops 3-8) - DC
ATT (9-21) - Karachi
Karachi
NIIT/Rawalpindi
Islamabad
Lahore
Loss %
RTT ms
Routes: ESnet (hops 3-6) - SNV
SINGTEL (7-12) - Karachi
Pakistan Telecom
Karachi
Rawalpindi
Routes: ESnet (hops 3-6) - SNV
SINGTEL (7-12) - Karachi
Pakistan Telecom
Karachi
10
Lahore
NIIT performance from
U.S. (SLAC)
Preliminary results, started measurements end Dec 2003.
Ping RTT & Loss
Nb. Heavy
losses during
congested
day-times
Avg daily:
loss~1-2%,
RTT~320ms
Bandwidth measurements using packet pair dispersion & TCP
ABW (pkt-pair dispersion):Average To NIIT: ~350Kbits/s From NIIT: 365 Kbits/s
Iperf/TCP:
Average: To NIIT: ~320Kbits/s From: NIIT 40Kbits/s
Can also derive throughput (assuming standard TCP) from RTT & loss using:
BW~1.2*S(1460B)/(RTT*sqrt(loss)  ~ 260Kbits/s
11
Nominal path bottleneck capacity 1Mbits/s
In Summary
PingER provides ongoing support for monitoring and
maintaining the quality of Internet connectivity for
the world wide scientific community.
Information is available publicly on the web
http://www-iepm.slac.stanford.edu/cgi-wrap/pingtable.pl
PingER also quantifies the extent of the “Digital
Divide” and provides information to policy makers
and funding agencies.
12
PingER
IEPM-BW
• Need something for high-performance links
– 10pings/30 mins, i.e. min=0.21% in day, or 0.007% in month
(10-8 BER) – today’s better links exceed this
– Ping losses may not be like TCP losses
• Need for Grid, HENP applications and highperformance network connections
–
–
–
–
Set expectations, planning
Trouble-shooting, improving performance
Application steering
Testing new transports (e.g. FAST, HS-TCP, RBUDP, UDT),
applications, monitoring tools (e.g. QIperf, packet-pair
techniques …) in production environments
– Compare with passive measurements, advertised capacities
13
Methodology
• Monitoring host every 90 minutes (+- randomization)
cycles through collaborating hosts at several remote
sites:
– Sends active probes in-turn for: bbftp, gridtcp, bbcp, iperf1,
iperf, (qiperf), ping, abwe …
• Also measures traceroutes at 15min intervals
• Uses ssh for code deployment, management and to
start & stop servers remotely
– Deploy server code for iperf, ABwE, bbftp, GridFTP &
various utilities
• 10 monitoring sites, each with between 2 and 40
remote hosts monitored
– Main users SLAC (BaBar) & FNAL (D0, CDF, CMS)
• Data archived, analyzed, displayed at monitoring
hosts
14
Deployment
15
Monitor
HENP
Net research 100Mbits/s host Gbits/s host
125 measured bw Aug ‘02
Visualization
•
Time series:
– Overplot multiple metrics
– + route changes
– Zoom, history
– Choose individual metrics
Scatter plots
Histograms
Access to data
16
Traceroutes
• Analyse for unique routes, assign route #s
• Display route # at start, then “.” if no change
• If significant change, the display route # in red
Host
• Links to:
– History
– Reverse
– Single host
– Raw data
– Summary for
emailing
– Available BW
– Topology
Demo
Several routes changes
simultaneously
Hour
of day
Hour
of day
17
Topology
• Select times & hosts &
direction on table
• Mouse_over to see router
name
• Click on router to see sub
path below
• Colored by deduced AS
• Click on end nodes to see
names of all hops
18
Performance (ABwE)
Current bottleneck
capacity
(Usually limited by 100FE)
Mbits/s
• Requires ABwE server
(mirror) at remote
sites
• Gets performance for
both directions
• Low impact 40 * 1000
byte packets
• Less than a second
for result
• Can do “real-time”
performance
monitoring
Iperf (90m)
Available bandwidth
Cross-traffic
24 hours
19
20
ABwE/Iperf match: Hadrian to UFL
Heavy load (xtraffic) appeared
It shows new DBC on the path
CALREN shows sending
traffic 600 Mbits/s
Normal situation
IPLS shows traffic
800-900 Mbits/s
21
Abing CLI
• Demo abing command line tool
– Since low impact (40*1000 packets) can run like
ping
22
Navigation
• MonALISA
23
• For ABwE:
Prediction, trouble
shooting
• Working on auto detection of long
term (many minutes) step changes
in bandwidth
– Developed simple algorithm and
qualifying effectiveness
– Looking at NLANR
(McGregor/H-W Braun plateau
change detector)
• http://www.ripe.net/pam2001
/Abstracts/talk_03.html
– Look at correlation between
performance & route changes &
RTT
– For significant changes, gather:
RTT, routes (fwd/rev, before &
after if changed), NDT info,
bandwidth info (fwd & rev)
– Fold in diurnal changes
– Generate real-time email alerts
with filtering
demo
Diurnal
Predictions
24
Program API
• Not realistic to look at thousands of graphs
• Programs also want to look at data. E.g.
– Data placement for replica servers
– Analysis, visualization (e.g. MonALISA)
– Trouble shooting
• Correlate data from many sources when suspect/spot problem
• Publish the data in standard way
• W3C Web Service, GGF OGSI Grid Service
– Currently XMLRPC and SOAP servers
– Using Network Measurement Working Group schema ( NM-WG .xsd)
• Demo mainly proof of principal, to access IEPM single &
multistream iperf, multistream GridFTP & bbftp, ABwE and
PingER data
– Not pushing deployment and use until schema more solid
25
IEPM SOAP Client
#!/usr/local/bin/perl -w
use SOAP::Lite;
my $node = "node1.cacr.caltech.edu";
my $timePeriod="20031201-20031205T143000";
my $measurement = SOAP::Lite
->service('http://www-iepm.slac.stanford.edu/tools/soap/wsdl/IEPM_profile.wsdl')
->GetBandwidthAchievableTCP("$node", "$timePeriod");
print “Host=“ .$measurement->{'subject'}->{'destination'}->{'name'},"\n";
print $measurement->{'subject'}->{'destination'}->{'address'}->{'IP'},"\n";
print “Times:\n”.$measurement->{'path.bandwidth.achievable.TCP'}
->{'timestamp'}->{'startTime'},"\n";
print “Values:\n”.$measurement->{'path.bandwidth.achievable.TCP'}
->{'achievableThroughputResult'}->{'value'},"\n";
Host=node1.cacr.caltech.edu
Not-disclosed
Times:
1070528106 1070533504 1070538907 1070544307 1070549706 1070555108 1070560505 107
0565907 1070571306 1070576706 1070582106 1070587506 1070592906 1070598310 107060
3706 1070609111 1070614506 1070619905 1070625306 1070630706 1070636106 107064150
8 1070646905 1070652306 1070657705
Values:
183.5 174.3 196.76 188.75 196.67 196.05 195.86 187.69 192.91 152.99 181.85 193.0
3 190.21 190.54 168.71 166.79 196.17 172.1 183.77 194.44 195.84 194.01 192.49 17
1.55 176.43
Results
For more see: http://www-iepm.slac.stanford.edu/tools/web_services/
Demo: http://www-iepm.slac.stanford.edu/tools/soap/IEPM_client.html
26
For More Information
• PingER:
– www-iepm.slac.stanford.edu/pinger/
• ICFA/SCIC Network Monitoring report, Jan04
– www.slac.stanford.edu/xorg/icfa/icfa-net-paper-jan04.html
• The PingER Project: Active Internet Performance Monitoring for
the HENP Community, IEEE Communications Magazine on
Network Traffic Measurements and Experiments.
• IEPM-BW
– http://www-iepm.slac.stanford.edu/bw/
• ABWE: www-iepm.slac.stanford.edu/bw/abwe/abwe-cf-iperf.html and
http://moat.nlanr.net/PAM2003/PAM2003papers/3781.pdf
27
PingER