Internet2 Presentation Template
Download
Report
Transcript Internet2 Presentation Template
Internet2 E2E piPEs Project
Eric L. Boyd
17 July 2015
Internet2 E2E piPEs
Project: End-to-End Performance
Initiative Performance Environment
System (E2E piPEs)
Approach: Collaborative project
combining the best work of many
organizations, including
DANTE/GEANT, Daresbury, EGEE,
GGF NMWG, NLANR/DAST, UCL,
Georgia Tech, etc.
7/17/2015
2
Internet2 E2E piPEs Goals
Enable end-users & network operators to:
• determine E2E performance capabilities
• locate E2E problems
• contact the right person to get an E2E problem resolved.
Enable remote initiation of partial path
performance tests
Make partial path performance data publicly
available
Interoperable with other performance
measurement frameworks
7/17/2015
3
Measurement Infrastructure
Components
End-to-End Path
Router
Router
Regularly Scheduled Tests
On-Demand Tests
Test
Request
Server
Test
Results
Result
Request
Laptop computer
Test
Results
Test
Results
Server
Database of
Performance
Results
7/17/2015
4
Sample piPEs Deployment
Regularly Scheduled
Tests
On-Demand Tests
Result Collection
Network
Backbone
Test Data
Backbone
Node
Network Backbone
Regional
Node
Regional
Node
Backbone
Node
Backbone
Node
Regional
Node
Regional
Test Data
Application
Domain
Test Data
7/17/2015
5
Project Phases
Phase 1: Tool Beacons
• BWCTL (Complete), http://e2epi.internet2.edu/bwctl
• OWAMP (Complete), http://e2epi.internet2.edu/owamp
• NDT (Complete), http://e2epi.internet2.edu/ndt
Phase 2: Measurement Domain Support
• General Measurement Infrastructure (Prototype)
• Abilene Measurement Infrastructure Deployment
(Complete), http://abilene.internet2.edu/observatory
Phase 3: Federation Support
• AA (Prototype – optional AES key, policy file, limits file)
• Discovery (Measurement Nodes, Databases) (Prototype –
nearest NDT server, web page)
• Test Request/Response Schema Support (Prototype – GGF
NMWG Schema)
7/17/2015
6
BWCTL (Jeff Boote)
http://e2epi.internet2.edu/bwctl
bwctld
Resource
Broker
bwctld
Initial
connectio
n/
s
st
e
qu
e
lts
R
u
s
e
R
bwctl
client
Initial
i on
connect
Re
q
Re ues
su ts/
lts
bwctld
Resource
Broker
bwctld
bwctld
Verify Time/
Return Results
bwctld
iperf
Test
Stream
iperf
7/17/2015
7
OWAMP (Jeff Boote)
http://e2epi.internet2.edu/owamp
Server
owampd
[Resource Broker]
Client
Initial
connection
owping
client
[control]
ts/
s
e
qu
Re sults
Re
owampd
[control]
OWD test
endpoint
sts/
Reque s
t
Resul
OWD test
endpoint
7/17/2015
8
NDT (Rich Carlson)
Network Diagnostic Tester
• Developed at Argonne National Lab
• Ongoing integration into piPEs framework
Redirects from well-known host to
“nearest” measurement node
Detects common performance problems
in the “first mile” (edge to campus DMZ)
In deployment on Abilene:
• http://ndt-seattle.abilene.ucaid.edu:7123
7/17/2015
9
piPEs Deployment
In Progress
Abilene
US Govt. Labs
US Universities
GEANT
APAN
Israel
Italy
Poland
7/17/2015
10
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
11
Test from the Edge to the Middle
Divide and conquer: Partial Path Analysis
Install OWAMP and / or BWCTL
Begin testing!:
• http://e2epi.internet2.edu/pipes/ami/bwctl/
– Key Required
• http://e2epi.internet2.edu/pipes/ami/owamp/
– No Key Required
7/17/2015
12
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
13
Abilene Measurement Domain
Part of the Abilene Observatory:
http://abilene.internet2.edu/observatory
Regularly scheduled OWAMP (1-way latency) and
BWCTL/Iperf (Throughput, Loss, Jitter) Tests
Web pages displaying:
• Latest results http://abilene.internet2.edu/ami/bwctl_status.cgi/TCP/now
“Weathermap”
http://abilene.internet2.edu/ami/bwctl_status_map.cgi/TCP/now
• Worst 10 Performing Links
http://abilene.internet2.edu/ami/bwctl_worst_case.cgi/TCP/now
Data available via web service:
http://abilene.internet2.edu/ami/webservices.html
7/17/2015
14
Quality Control of Abilene
Measurement Infrastructure (1)
Problem Solving Approach
• Ongoing measurements start detecting a problem
• Ad-hoc measurements used for problem diagnosis
On-going Measurements
• Expect Gbps flows on Abilene
• Stock TCP stack (albeit tuned)
– Very sensitive to loss
– “Canary in a coal mine”
– Web100 just deployed for additional reporting
• Skeptical eye
– Apparent problem could reflect interface contention
7/17/2015
15
Quality Control of Abilene
Measurement Infrastructure (2)
Regularly Scheduled Tests
• Track TCP and UDP Flows (BWCTL/Iperf)
• Track One-way Delays (OWAMP)
• IPv4 and IPv6
Observe:
•
•
•
•
Worst 10 TCP flows
First percentile TCP flow
Fiftieth percentile TCP flow
What percentile breaks 900 Mbps threshold
General Conclusions:
• On Abilene, IPv4 and IPv6 statistically indistinguishable
• Consistently low values to one host or across one path
indicates a problem
7/17/2015
16
A (Good) Day in the Life of
Abilene
7/17/2015
17
Abilene IPv4 TCP performance
1,000
900
800
Mb/s
700
600
First two weeks in March
50th percentile right at 980 Mb/s
1st percentile about 900 Mb/s
Take it as a baseline.
500
400
300
200
100
14-Mar04
13-Mar04
12-Mar04
11-Mar04
10-Mar04
9-Mar04
8-Mar04
7-Mar04
6-Mar04
5-Mar04
4-Mar04
3-Mar04
2-Mar04
1-Mar04
0
1st percentile
50th percentile
7/17/2015
18
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Beware the Ides of March
1st percentile down to 522 Mb/s
Circuit problems along west coast.
nb: 50th percentile very robust.
400
300
200
100
21Mar04
19Mar04
17Mar04
15Mar04
13Mar04
11Mar04
9Mar04
7Mar04
5Mar04
3Mar04
1Mar04
0
1st percentile
50th percentile
7/17/2015
19
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Recovery – sort of; life through 29 April
1st percentile back up to mid-800s,
lower and shakier.
nb: 50th percentile still very robust.
400
300
200
100
2
2A
pr
0
4
1
4A
pr
0
4
7A
pr
0
4
3
1M
ar
0
4
2
3M
ar
0
4
1
5M
ar
0
4
8M
ar
0
4
1M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
20
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Ah, sudden improvement through 5-May
1st percentile back up above 900 Mb/s
and more stable.
But why??
400
300
200
100
2
M
a
y0
4
2
3
A
p
r0
4
1
4
A
p
r0
4
5
A
p
r0
4
2
7
M
ar
0
4
1
8
M
ar
0
4
9
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
21
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Then, while Matt Z is tearing up the tracks
1st percentile back down to the 500s.
Diagnosis: something is killing Seattle.
Oh, and Sunnyvale is off the air.
400
300
200
100
7
M
a
y0
4
2
7
A
p
r0
4
1
7
A
p
r0
4
8
A
p
r0
4
3
0
M
ar
0
4
2
0
M
ar
0
4
1
0
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
22
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
400
300
200
100
1
1
M
a
y0
4
3
0
A
p
r0
4
2
0
A
p
r0
4
1
0
A
p
r0
4
3
1
M
ar
0
4
2
1
M
ar
0
4
1
1
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
Matt fixes Sunnyvale, and things get
(slightly) worse: both Seattle and
Sunnyvale are bad.
1st percentile right at 500 Mb/s.
Diagnosis: web100 interaction.
7/17/2015
23
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Matt fixes the web100 interaction.
1st percentile cruising through 700 Mb/s.
Life is good.
400
300
200
100
1
1
M
a
y0
4
3
0
A
p
r0
4
2
0
A
p
r0
4
1
0
A
p
r0
4
3
1
M
ar
0
4
2
1
M
ar
0
4
1
1
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
24
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
400
300
200
100
1
1
M
a
y0
4
3
0
A
p
r0
4
2
0
A
p
r0
4
1
0
A
p
r0
4
3
1
M
ar
0
4
2
1
M
ar
0
4
1
1
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
Friday the (almost) 13th; JUNOS
“up” grade induces packet loss for
about four hours along many links.
1st percentile falls to 63 Mb/s.
Long-distance paths chiefly impacted.
7/17/2015
25
A “Known” Problem
Mid-May: routers all got a new software
load to enable a new feature
Everything seemed to come up, but on
some links, utilization did not rebound
Worst-10 reflected very low
performance across those links
QoS parameter configuration format
change…
7/17/2015
26
7/17/2015
27
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Nice weekend.
1st percentile rises to 968 Mb/s.
But why??
400
300
200
100
7/17/2015
28
1
8
M
a
y0
4
6
M
a
y0
4
2
5
A
p
r0
4
1
4
A
p
r0
4
3
A
p
r0
4
2
3
M
ar
0
4
1
2
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
29
We Found It First
Streams over SNVA-LOSA link all
showed problems
NOC responded: Found errors on
SNVA-LOSA link
(NOC is now tracking errors more
closely…)
7/17/2015
30
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
31
Example Application
Community: VLBI (1)
Very-Long-Baseline Interferometry (VLBI) is a
high-resolution imaging technique used in
radio astronomy.
VLBI techniques involve using multiple radio
telescopes simultaneously in an array to
record data, which is then stored on magnetic
tape and shipped to a central processing site
for analysis.
Goal: Using high-bandwidth networks,
electronic transmission of VLBI data (known
as “e-VLBI”).
7/17/2015
32
Example Application
Community: VLBI (2)
Haystack <-> Onsala
• Abilene, Eurolink, GEANT, NorduNet, SUNET
User: David Lapsley, Alan Whitney
Constraints
• Lack of administrative access (needed for Iperf)
• Heavily scheduled, limited windows for testing
Problem
• Insufficient performance
Partial Path Analysis with BWCTL/Iperf
• Isolated packet loss to local congestion in Haystack area
• Upgraded bottleneck link
7/17/2015
33
Example Application
Community: VLBI (3)
Result
• First demonstration of real-time, simultaneous
correlation of data from two antennas (32 Mbps,
work continues)
Future
• Optimize time-of-day for non-real-time data
transfers
• Deploy BWCTL at 3 more sites beyond Haystack,
Onsala, and Kashima
7/17/2015
34
Example Application
Community: ESnet / Abilene (1)
3+3 Group
• US Govt. Labs: LBL, FNAL, BNL
• Universities: NC State, OSU, SDSC
• http://measurement.es.net/
Observed:
• 400 usec 1-way Latency Jump
• Noticed by Joe Metzger
Detected:
• Circuit connecting router in the CentaurLab to the NCNI
edge router moved to a different path on metro DWDM
system
• 60 km optical distance increase
• Confirmed by John Moore
7/17/2015
35
Example Application
Community: ESnet / Abilene (2)
7/17/2015
36
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
37
American / European
Collaboration Goals
Awareness of ongoing Measurement Framework
Efforts / Sharing of Ideas (Good / Not Sufficient)
Interoperable Measurement Frameworks (Minimum)
• Common means of data extraction
• Partial path analysis possible along transatlantic paths
Open Source Shared Development (Possibility, In
Whole or In Part)
End-to-end partial path analysis for transatlantic
research communities
• VLBI: Haystack, Mass. Onsala, Sweden
• HENP: Caltech, Calif. CERN, Switzerland
7/17/2015
38
American / European
Collaboration Achievements
UCL E2E Monitoring Workshop 2003
• http://people.internet2.edu/~eboyd/ucl_workshop.html
Transatlantic Performance Monitoring
Workshop 2004
• http://people.internet2.edu/~eboyd/transatlantic_workshop.ht
ml
Caltech <-> CERN Demo
Haystack, USA <-> Onsala, Sweden
piPEs Software Evaluation (In Progress)
Architecture Reconciliation (In Progress)
7/17/2015
39
How Can you Participate?
Set up BWCTL, OWAMP, NDT Beacons
Set up a measurement domain
• Now: Place tool beacons “intelligently”
– Determine locations
– Determine policy
– Determine limits
– “Register” beacons
• Future: Install piPEs software
– Run regularly scheduled tests
– Store performance data
– Make performance data available via web service
– Make visualization CGIs available
Solve Problems / Alert us to Case Studies
7/17/2015
40
7/17/2015
41
Extra Slides
7/17/2015
42
American/European
Demonstration Goals
Demonstrate ability to do partial path
analysis between “Caltech” (Los Angeles
Abilene router) and CERN.
Demonstrate ability to do partial path
analysis involving nodes in the GEANT
network.
Compare and contrast measurement of a
“lightpath” versus a normal IP path.
Demonstrate interoperability of piPEs and
analysis tools such as Advisor and
MonALISA
7/17/2015
43
Demonstration Details
Path 1: Default route between LA and CERN
is across Abilene to Chicago, then across
Datatag circuit to CERN
Path 2: Announced addresses so that route
between LA and CERN traverses GEANT via
London node
Path 3: “Lightpath” (discussed earlier by Rick
Summerhill)
Each measurement “node” consists of a
BWCTL box and an OWAMP box “next to”
the router.
7/17/2015
44
All Roads Lead to Geneva
Path 1 — DataTag — Default Route
Path 2 — Eurolink — "Cooked” Alternate Route
Path 3 — Lightpath — "Cooked” Alternate Route
Circles Correspond to OWAMP / BWCTL Measurement Node Pair
7/17/2015
45
Results
BWCTL:
http://abilene.internet2.edu/ami/bwctl_status_eu.cgi/
BW/14123130651515289600_14124243902743445
504
OWAMP:
http://abilene.internet2.edu/ami/owamp_status_eu.c
gi/14123130651515289600_1412424390274344550
4
MONALISA
NLANR Advisor
7/17/2015
46
Insights (1)
Even with shared source and a single team
of developer-installers, inter-administrative
domain coordination is difficult.
• Struggled with basics of multiple paths.
– IP addresses, host configuration, software (support source
addresses, etc.)
• Struggled with cross-domain administrative coordination
issues.
– AA (accounts), routes, port filters, MTUs, etc.
• Struggled with performance tuning measurement nodes.
– host tuning, asymmetric routing, MTUs
7/17/2015
47
Insights (2)
Connectivity takes a large amount of
coordination and effort; performance
takes even more of the same.
Current measurement approaches have
limited visibility into “lightpaths.”
• Having hosts participate in the measurement is
one possible solution.
7/17/2015
48
Insights (3)
Consider interaction with security; lack
of end-to-end transparency is
problematic.
• Security filters are set up based on expected
traffic patterns
• Measurement nodes create new traffic
• Lightpaths bypass expected ingress points
7/17/2015
49