Digital Divide and PingER
Download
Report
Transcript Digital Divide and PingER
SLAC IEPM PingER and BW
monitoring & tools
Presented by Les Cottrell, SLAC
At LBNL, Jan 21, 2003
www.slac.stanford.edu/grp/scs/net/talk03/lbl-jan04.ppt
1
PingER
History of the PingER
Project
• Early 1990’s: SLAC begins pinging nodes around the world to
evaluate the quality of Internet connectivity between SLAC
and other HEP Institutions.
• Around 1996: The PingER project was funded making it the
first Internet end-to-end monitoring tool available to the HEP
community.
• Today: Believed to be the most extensive Internet end-to-end
performance monitoring tool in the world
2
PingER
PingER Today
• Today, the PingER Project includes 35 Monitoringhosts in 12 countries. They are monitoring Remotehosts in 80 countries. Over 55 remote sites.
• THESE COUNTRIES COVER 75% OF THE WORLD
POPULATION AND 99% OF THE INTERNET
CONNECTED POPULATION!!!
Just added Pakistan!
Colored by
region
Colored
countries have
remote PingER
hosts
3
PingER
PingER Architecture
There are three types of
hosts
• Remote-hosts:
hosts being
monitored
Monitoring
• Monitoring-hosts:
Monitoring
Monitoring
Make ping
measurements to
REMOTE
REMOTE
remote hosts
REMOTE
REMOTE
REMOTE
• Archive/AnalysisREMOTE
REMOTE
hosts: gather data from
Monitoring-sites, analyze
& make reports
Archive
Archive
PingER
Monitoring
REMOTE
4
Methodology
• Every 30 mins send 11*100Byte followed by
10*1000Byte pings from monitor to remote host
• Low impact:
– By default < 100bits/s per monitor-remote host pair
– Can reduce to ~ 10bits/s
– No need for co-scheduling of monitors
• Uses ubiquitous ping
– No software to install at any of over 500 remote hosts
– Very important for hosts in developing countries
• By centrally gathering the data, archiving, analyzing
and reporting, the requirements for monitoring hosts
are minimal (typically 1-2 days to install etc.)
5
Worldwide performance
• Performance is
improving
• Developed world
improving factor of
10 in 4-5 years
• S.E. Europe,
Russia, catching up
• India & Africa
worse off & falling
behind
• Developing world
3-10 years behind
• Many institutes in developing world have less performance
than a household in N. America or Europe
6
Current State – Aug ‘03
(throughput Mbps)
Remote regions
Monitoring Country
• Within region performance better
– E.g. Ca|EDU|GOV-NA, Hu-SE Eu, Eu-Eu, Jp-E Asia, Au-Au, RuRu|Baltics
• Africa, Caucasus, Central & S. Asia all bad
Bad < 200kbits/s < DSL
Poor > 200 < 500kbits/s
Acceptable > 500kbits/s, < 1000kbits/s
7
Good > 1000kbits/s
Network Readiness
Index vs Throughput
• NRI from Center for International Development,
Harvard U.
http://www.cid.harvard.edu/cr/pdf/gitrr2002_ch02.pdf
Internet for all focus
5.92
5.79
5.74
5.58
5.51
5.44
5.35
5.33
5.31
5.29
5.28
5.22
5.18
5.10
A&R focus
NRI Top 14
Finland
US
Singapore
Sweden
Iceland
Canada
UK
Denmark
Taiwan
Germany
Netherlands
Israel
Switzerland
Korea
• NRI correlates reasonably well with
Network Readiness
8
Typical uses
•
Troubleshooting
Discerning if a reported problem is network related
Identify the time a problem started
Provide quantitative analysis for Network
specialists
Identifying step functions, periodic network
behavior, and recognize problems affecting multiple
sites.
Setting expectations (e.g. SLAs)
Identifying need to upgrade
Providing quantitative information to Policy
makers & Funding agencies
Seeing the effects of upgrades
PingER
9
Pakistan performance
Routes: ESnet (hops 3-8) - DC
ATT (9-21) - Karachi
Karachi
NIIT/Rawalpindi
Islamabad
Lahore
Loss %
RTT ms
Routes: ESnet (hops 3-6) - SNV
SINGTEL (7-12) - Karachi
Pakistan Telecom
Karachi
Rawalpindi
Routes: ESnet (hops 3-6) - SNV
SINGTEL (7-12) - Karachi
Pakistan Telecom
Karachi
10
Lahore
NIIT performance from
U.S. (SLAC)
Preliminary results, started measurements end Dec 2003.
Ping RTT & Loss
Nb. Heavy
losses during
congested
day-times
Avg daily:
loss~1-2%,
RTT~320ms
Bandwidth measurements using packet pair dispersion & TCP
ABW (pkt-pair dispersion):Average To NIIT: ~350Kbits/s From NIIT: 365 Kbits/s
Iperf/TCP:
Average: To NIIT: ~320Kbits/s From: NIIT 40Kbits/s
Can also derive throughput (assuming standard TCP) from RTT & loss using:
BW~1.2*S(1460B)/(RTT*sqrt(loss) ~ 260Kbits/s
11
Nominal path bottleneck capacity 1Mbits/s
In Summary
PingER provides ongoing support for monitoring and
maintaining the quality of Internet connectivity for
the world wide scientific community.
Information is available publicly on the web
http://www-iepm.slac.stanford.edu/cgi-wrap/pingtable.pl
PingER also quantifies the extent of the “Digital
Divide” and provides information to policy makers
and funding agencies.
12
PingER
IEPM-BW
• Need something for high-performance links
– 10pings/30 mins, i.e. min=0.21% in day, or 0.007% in month
(10-8 BER) – today’s better links exceed this
– Ping losses may not be like TCP losses
• Need for Grid, HENP applications and highperformance network connections
–
–
–
–
Set expectations, planning
Trouble-shooting, improving performance
Application steering
Testing new transports (e.g. FAST, HS-TCP, RBUDP, UDT),
applications, monitoring tools (e.g. QIperf, packet-pair
techniques …) in production environments
– Compare with passive measurements, advertised capacities
13
Methodology
• Monitoring host every 90 minutes (+- randomization)
cycles through collaborating hosts at several remote
sites:
– Sends active probes in-turn for: bbftp, gridtcp, bbcp, iperf1,
iperf, (qiperf), ping, abwe …
• Also measures traceroutes at 15min intervals
• Uses ssh for code deployment, management and to
start & stop servers remotely
– Deploy server code for iperf, ABwE, bbftp, GridFTP &
various utilities
• 10 monitoring sites, each with between 2 and 40
remote hosts monitored
– Main users SLAC (BaBar) & FNAL (D0, CDF, CMS)
• Data archived, analyzed, displayed at monitoring
hosts
14
Deployment
15
Monitor
HENP
Net research 100Mbits/s host Gbits/s host
125 measured bw Aug ‘02
Visualization
•
Time series:
– Overplot multiple metrics
– + route changes
– Zoom, history
– Choose individual metrics
Scatter plots
Histograms
Access to data
16
Traceroutes
• Analyse for unique routes, assign route #s
• Display route # at start, then “.” if no change
• If significant change, the display route # in red
Host
• Links to:
– History
– Reverse
– Single host
– Raw data
– Summary for
emailing
– Available BW
– Topology
Demo
Several routes changes
simultaneously
Hour
of day
Hour
of day
17
Topology
• Select times & hosts &
direction on table
• Mouse_over to see router
name
• Click on router to see sub
path below
• Colored by deduced AS
• Click on end nodes to see
names of all hops
18
Performance (ABwE)
Current bottleneck
capacity
(Usually limited by 100FE)
Mbits/s
• Requires ABwE server
(mirror) at remote
sites
• Gets performance for
both directions
• Low impact 40 * 1000
byte packets
• Less than a second
for result
• Can do “real-time”
performance
monitoring
Iperf (90m)
Available bandwidth
Cross-traffic
24 hours
19
20
ABwE/Iperf match: Hadrian to UFL
Heavy load (xtraffic) appeared
It shows new DBC on the path
CALREN shows sending
traffic 600 Mbits/s
Normal situation
IPLS shows traffic
800-900 Mbits/s
21
Abing CLI
• Demo abing command line tool
– Since low impact (40*1000 packets) can run like
ping
22
Navigation
• MonALISA
23
• For ABwE:
Prediction, trouble
shooting
• Working on auto detection of long
term (many minutes) step changes
in bandwidth
– Developed simple algorithm and
qualifying effectiveness
– Looking at NLANR
(McGregor/H-W Braun plateau
change detector)
• http://www.ripe.net/pam2001
/Abstracts/talk_03.html
– Look at correlation between
performance & route changes &
RTT
– For significant changes, gather:
RTT, routes (fwd/rev, before &
after if changed), NDT info,
bandwidth info (fwd & rev)
– Fold in diurnal changes
– Generate real-time email alerts
with filtering
demo
Diurnal
Predictions
24
Program API
• Not realistic to look at thousands of graphs
• Programs also want to look at data. E.g.
– Data placement for replica servers
– Analysis, visualization (e.g. MonALISA)
– Trouble shooting
• Correlate data from many sources when suspect/spot problem
• Publish the data in standard way
• W3C Web Service, GGF OGSI Grid Service
– Currently XMLRPC and SOAP servers
– Using Network Measurement Working Group schema ( NM-WG .xsd)
• Demo mainly proof of principal, to access IEPM single &
multistream iperf, multistream GridFTP & bbftp, ABwE and
PingER data
– Not pushing deployment and use until schema more solid
25
IEPM SOAP Client
#!/usr/local/bin/perl -w
use SOAP::Lite;
my $node = "node1.cacr.caltech.edu";
my $timePeriod="20031201-20031205T143000";
my $measurement = SOAP::Lite
->service('http://www-iepm.slac.stanford.edu/tools/soap/wsdl/IEPM_profile.wsdl')
->GetBandwidthAchievableTCP("$node", "$timePeriod");
print “Host=“ .$measurement->{'subject'}->{'destination'}->{'name'},"\n";
print $measurement->{'subject'}->{'destination'}->{'address'}->{'IP'},"\n";
print “Times:\n”.$measurement->{'path.bandwidth.achievable.TCP'}
->{'timestamp'}->{'startTime'},"\n";
print “Values:\n”.$measurement->{'path.bandwidth.achievable.TCP'}
->{'achievableThroughputResult'}->{'value'},"\n";
Host=node1.cacr.caltech.edu
Not-disclosed
Times:
1070528106 1070533504 1070538907 1070544307 1070549706 1070555108 1070560505 107
0565907 1070571306 1070576706 1070582106 1070587506 1070592906 1070598310 107060
3706 1070609111 1070614506 1070619905 1070625306 1070630706 1070636106 107064150
8 1070646905 1070652306 1070657705
Values:
183.5 174.3 196.76 188.75 196.67 196.05 195.86 187.69 192.91 152.99 181.85 193.0
3 190.21 190.54 168.71 166.79 196.17 172.1 183.77 194.44 195.84 194.01 192.49 17
1.55 176.43
Results
For more see: http://www-iepm.slac.stanford.edu/tools/web_services/
Demo: http://www-iepm.slac.stanford.edu/tools/soap/IEPM_client.html
26
For More Information
• PingER:
– www-iepm.slac.stanford.edu/pinger/
• ICFA/SCIC Network Monitoring report, Jan04
– www.slac.stanford.edu/xorg/icfa/icfa-net-paper-jan04.html
• The PingER Project: Active Internet Performance Monitoring for
the HENP Community, IEEE Communications Magazine on
Network Traffic Measurements and Experiments.
• IEPM-BW
– http://www-iepm.slac.stanford.edu/bw/
• ABWE: www-iepm.slac.stanford.edu/bw/abwe/abwe-cf-iperf.html and
http://moat.nlanr.net/PAM2003/PAM2003papers/3781.pdf
27
PingER