Internet2 Presentation Template

Download Report

Transcript Internet2 Presentation Template

Internet2 E2E piPEs Project
Eric L. Boyd
17 July 2015
Internet2 E2E piPEs
 Project: End-to-End Performance
Initiative Performance Environment
System (E2E piPEs)
 Approach: Collaborative project
combining the best work of many
organizations, including
DANTE/GEANT, Daresbury, EGEE,
GGF NMWG, NLANR/DAST, UCL,
Georgia Tech, etc.
7/17/2015
2
Internet2 E2E piPEs Goals
 Enable end-users & network operators to:
• determine E2E performance capabilities
• locate E2E problems
• contact the right person to get an E2E problem resolved.
 Enable remote initiation of partial path
performance tests
 Make partial path performance data publicly
available
 Interoperable with other performance
measurement frameworks
7/17/2015
3
Measurement Infrastructure
Components
End-to-End Path
Router
Router
Regularly Scheduled Tests
On-Demand Tests
Test
Request
Server
Test
Results
Result
Request
Laptop computer
Test
Results
Test
Results
Server
Database of
Performance
Results
7/17/2015
4
Sample piPEs Deployment
Regularly Scheduled
Tests
On-Demand Tests
Result Collection
Network
Backbone
Test Data
Backbone
Node
Network Backbone
Regional
Node
Regional
Node
Backbone
Node
Backbone
Node
Regional
Node
Regional
Test Data
Application
Domain
Test Data
7/17/2015
5
Project Phases
Phase 1: Tool Beacons
• BWCTL (Complete), http://e2epi.internet2.edu/bwctl
• OWAMP (Complete), http://e2epi.internet2.edu/owamp
• NDT (Complete), http://e2epi.internet2.edu/ndt
Phase 2: Measurement Domain Support
• General Measurement Infrastructure (Prototype)
• Abilene Measurement Infrastructure Deployment
(Complete), http://abilene.internet2.edu/observatory
Phase 3: Federation Support
• AA (Prototype – optional AES key, policy file, limits file)
• Discovery (Measurement Nodes, Databases) (Prototype –
nearest NDT server, web page)
• Test Request/Response Schema Support (Prototype – GGF
NMWG Schema)
7/17/2015
6
BWCTL (Jeff Boote)
http://e2epi.internet2.edu/bwctl
bwctld
Resource
Broker
bwctld
Initial
connectio
n/
s
st
e
qu
e
lts
R
u
s
e
R
bwctl
client
Initial
i on
connect
Re
q
Re ues
su ts/
lts
bwctld
Resource
Broker
bwctld
bwctld
Verify Time/
Return Results
bwctld
iperf
Test
Stream
iperf
7/17/2015
7
OWAMP (Jeff Boote)
http://e2epi.internet2.edu/owamp
Server
owampd
[Resource Broker]
Client
Initial
connection
owping
client
[control]
ts/
s
e
qu
Re sults
Re
owampd
[control]
OWD test
endpoint
sts/
Reque s
t
Resul
OWD test
endpoint
7/17/2015
8
NDT (Rich Carlson)
Network Diagnostic Tester
• Developed at Argonne National Lab
• Ongoing integration into piPEs framework
Redirects from well-known host to
“nearest” measurement node
Detects common performance problems
in the “first mile” (edge to campus DMZ)
In deployment on Abilene:
• http://ndt-seattle.abilene.ucaid.edu:7123
7/17/2015
9
piPEs Deployment
In Progress
Abilene
US Govt. Labs
US Universities
GEANT
APAN
Israel
Italy
Poland
7/17/2015
10
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
11
Test from the Edge to the Middle
Divide and conquer: Partial Path Analysis
Install OWAMP and / or BWCTL
Begin testing!:
• http://e2epi.internet2.edu/pipes/ami/bwctl/
– Key Required
• http://e2epi.internet2.edu/pipes/ami/owamp/
– No Key Required
7/17/2015
12
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
13
Abilene Measurement Domain
 Part of the Abilene Observatory:
http://abilene.internet2.edu/observatory
 Regularly scheduled OWAMP (1-way latency) and
BWCTL/Iperf (Throughput, Loss, Jitter) Tests
 Web pages displaying:
• Latest results http://abilene.internet2.edu/ami/bwctl_status.cgi/TCP/now
“Weathermap”
http://abilene.internet2.edu/ami/bwctl_status_map.cgi/TCP/now
• Worst 10 Performing Links
http://abilene.internet2.edu/ami/bwctl_worst_case.cgi/TCP/now
 Data available via web service:
http://abilene.internet2.edu/ami/webservices.html
7/17/2015
14
Quality Control of Abilene
Measurement Infrastructure (1)
Problem Solving Approach
• Ongoing measurements start detecting a problem
• Ad-hoc measurements used for problem diagnosis
On-going Measurements
• Expect Gbps flows on Abilene
• Stock TCP stack (albeit tuned)
– Very sensitive to loss
– “Canary in a coal mine”
– Web100 just deployed for additional reporting
• Skeptical eye
– Apparent problem could reflect interface contention
7/17/2015
15
Quality Control of Abilene
Measurement Infrastructure (2)
Regularly Scheduled Tests
• Track TCP and UDP Flows (BWCTL/Iperf)
• Track One-way Delays (OWAMP)
• IPv4 and IPv6
Observe:
•
•
•
•
Worst 10 TCP flows
First percentile TCP flow
Fiftieth percentile TCP flow
What percentile breaks 900 Mbps threshold
General Conclusions:
• On Abilene, IPv4 and IPv6 statistically indistinguishable
• Consistently low values to one host or across one path
indicates a problem
7/17/2015
16
A (Good) Day in the Life of
Abilene
7/17/2015
17
Abilene IPv4 TCP performance
1,000
900
800
Mb/s
700
600
First two weeks in March
50th percentile right at 980 Mb/s
1st percentile about 900 Mb/s
Take it as a baseline.
500
400
300
200
100
14-Mar04
13-Mar04
12-Mar04
11-Mar04
10-Mar04
9-Mar04
8-Mar04
7-Mar04
6-Mar04
5-Mar04
4-Mar04
3-Mar04
2-Mar04
1-Mar04
0
1st percentile
50th percentile
7/17/2015
18
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Beware the Ides of March
1st percentile down to 522 Mb/s
Circuit problems along west coast.
nb: 50th percentile very robust.
400
300
200
100
21Mar04
19Mar04
17Mar04
15Mar04
13Mar04
11Mar04
9Mar04
7Mar04
5Mar04
3Mar04
1Mar04
0
1st percentile
50th percentile
7/17/2015
19
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Recovery – sort of; life through 29 April
1st percentile back up to mid-800s,
lower and shakier.
nb: 50th percentile still very robust.
400
300
200
100
2
2A
pr
0
4
1
4A
pr
0
4
7A
pr
0
4
3
1M
ar
0
4
2
3M
ar
0
4
1
5M
ar
0
4
8M
ar
0
4
1M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
20
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Ah, sudden improvement through 5-May
1st percentile back up above 900 Mb/s
and more stable.
But why??
400
300
200
100
2
M
a
y0
4
2
3
A
p
r0
4
1
4
A
p
r0
4
5
A
p
r0
4
2
7
M
ar
0
4
1
8
M
ar
0
4
9
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
21
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Then, while Matt Z is tearing up the tracks
1st percentile back down to the 500s.
Diagnosis: something is killing Seattle.
Oh, and Sunnyvale is off the air.
400
300
200
100
7
M
a
y0
4
2
7
A
p
r0
4
1
7
A
p
r0
4
8
A
p
r0
4
3
0
M
ar
0
4
2
0
M
ar
0
4
1
0
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
22
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
400
300
200
100
1
1
M
a
y0
4
3
0
A
p
r0
4
2
0
A
p
r0
4
1
0
A
p
r0
4
3
1
M
ar
0
4
2
1
M
ar
0
4
1
1
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
Matt fixes Sunnyvale, and things get
(slightly) worse: both Seattle and
Sunnyvale are bad.
1st percentile right at 500 Mb/s.
Diagnosis: web100 interaction.
7/17/2015
23
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Matt fixes the web100 interaction.
1st percentile cruising through 700 Mb/s.
Life is good.
400
300
200
100
1
1
M
a
y0
4
3
0
A
p
r0
4
2
0
A
p
r0
4
1
0
A
p
r0
4
3
1
M
ar
0
4
2
1
M
ar
0
4
1
1
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
24
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
400
300
200
100
1
1
M
a
y0
4
3
0
A
p
r0
4
2
0
A
p
r0
4
1
0
A
p
r0
4
3
1
M
ar
0
4
2
1
M
ar
0
4
1
1
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
Friday the (almost) 13th; JUNOS
“up” grade induces packet loss for
about four hours along many links.
1st percentile falls to 63 Mb/s.
Long-distance paths chiefly impacted.
7/17/2015
25
A “Known” Problem
 Mid-May: routers all got a new software
load to enable a new feature
 Everything seemed to come up, but on
some links, utilization did not rebound
 Worst-10 reflected very low
performance across those links
 QoS parameter configuration format
change…
7/17/2015
26
7/17/2015
27
Abilene IPv4 TCP performance
1,000
900
800
700
Mb/s
600
500
Nice weekend.
1st percentile rises to 968 Mb/s.
But why??
400
300
200
100
7/17/2015
28
1
8
M
a
y0
4
6
M
a
y0
4
2
5
A
p
r0
4
1
4
A
p
r0
4
3
A
p
r0
4
2
3
M
ar
0
4
1
2
M
ar
0
4
1
M
ar
0
4
0
1st percentile
50th percentile
7/17/2015
29
We Found It First
Streams over SNVA-LOSA link all
showed problems
NOC responded: Found errors on
SNVA-LOSA link
(NOC is now tracking errors more
closely…)
7/17/2015
30
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
31
Example Application
Community: VLBI (1)
Very-Long-Baseline Interferometry (VLBI) is a
high-resolution imaging technique used in
radio astronomy.
VLBI techniques involve using multiple radio
telescopes simultaneously in an array to
record data, which is then stored on magnetic
tape and shipped to a central processing site
for analysis.
Goal: Using high-bandwidth networks,
electronic transmission of VLBI data (known
as “e-VLBI”).
7/17/2015
32
Example Application
Community: VLBI (2)
Haystack <-> Onsala
• Abilene, Eurolink, GEANT, NorduNet, SUNET
User: David Lapsley, Alan Whitney
Constraints
• Lack of administrative access (needed for Iperf)
• Heavily scheduled, limited windows for testing
Problem
• Insufficient performance
Partial Path Analysis with BWCTL/Iperf
• Isolated packet loss to local congestion in Haystack area
• Upgraded bottleneck link
7/17/2015
33
Example Application
Community: VLBI (3)
Result
• First demonstration of real-time, simultaneous
correlation of data from two antennas (32 Mbps,
work continues)
Future
• Optimize time-of-day for non-real-time data
transfers
• Deploy BWCTL at 3 more sites beyond Haystack,
Onsala, and Kashima
7/17/2015
34
Example Application
Community: ESnet / Abilene (1)
3+3 Group
• US Govt. Labs: LBL, FNAL, BNL
• Universities: NC State, OSU, SDSC
• http://measurement.es.net/
Observed:
• 400 usec 1-way Latency Jump
• Noticed by Joe Metzger
Detected:
• Circuit connecting router in the CentaurLab to the NCNI
edge router moved to a different path on metro DWDM
system
• 60 km optical distance increase
• Confirmed by John Moore
7/17/2015
35
Example Application
Community: ESnet / Abilene (2)
7/17/2015
36
Example piPEs Use Cases
Edge-to-Middle (On-Demand)
• Automatic 2-Ended Test Set-up
Middle-to-Middle (Regularly Scheduled)
• Raw Data feeds for 3rd-Party Analysis Tools
– http://vinci.cacr.caltech.edu:8080/
• Quality Control of Network Infrastructure
Edge-to-Edge (Regularly Scheduled)
• Quality Control of Application Communities
Edge-to-Campus DMZ (On-Demand)
• Coupled with Regularly Scheduled Middle-to-Middle
• End User determines who to contact about performance
problem, armed with proof
7/17/2015
37
American / European
Collaboration Goals
 Awareness of ongoing Measurement Framework
Efforts / Sharing of Ideas (Good / Not Sufficient)
 Interoperable Measurement Frameworks (Minimum)
• Common means of data extraction
• Partial path analysis possible along transatlantic paths
 Open Source Shared Development (Possibility, In
Whole or In Part)
 End-to-end partial path analysis for transatlantic
research communities
• VLBI: Haystack, Mass.  Onsala, Sweden
• HENP: Caltech, Calif.  CERN, Switzerland
7/17/2015
38
American / European
Collaboration Achievements
UCL E2E Monitoring Workshop 2003
• http://people.internet2.edu/~eboyd/ucl_workshop.html
Transatlantic Performance Monitoring
Workshop 2004
• http://people.internet2.edu/~eboyd/transatlantic_workshop.ht
ml
Caltech <-> CERN Demo
Haystack, USA <-> Onsala, Sweden
piPEs Software Evaluation (In Progress)
Architecture Reconciliation (In Progress)
7/17/2015
39
How Can you Participate?
Set up BWCTL, OWAMP, NDT Beacons
Set up a measurement domain
• Now: Place tool beacons “intelligently”
– Determine locations
– Determine policy
– Determine limits
– “Register” beacons
• Future: Install piPEs software
– Run regularly scheduled tests
– Store performance data
– Make performance data available via web service
– Make visualization CGIs available
Solve Problems / Alert us to Case Studies
7/17/2015
40
7/17/2015
41
Extra Slides
7/17/2015
42
American/European
Demonstration Goals
 Demonstrate ability to do partial path
analysis between “Caltech” (Los Angeles
Abilene router) and CERN.
 Demonstrate ability to do partial path
analysis involving nodes in the GEANT
network.
 Compare and contrast measurement of a
“lightpath” versus a normal IP path.
 Demonstrate interoperability of piPEs and
analysis tools such as Advisor and
MonALISA
7/17/2015
43
Demonstration Details
 Path 1: Default route between LA and CERN
is across Abilene to Chicago, then across
Datatag circuit to CERN
 Path 2: Announced addresses so that route
between LA and CERN traverses GEANT via
London node
 Path 3: “Lightpath” (discussed earlier by Rick
Summerhill)
 Each measurement “node” consists of a
BWCTL box and an OWAMP box “next to”
the router.
7/17/2015
44
All Roads Lead to Geneva
Path 1 — DataTag — Default Route
Path 2 — Eurolink — "Cooked” Alternate Route
Path 3 — Lightpath — "Cooked” Alternate Route
Circles Correspond to OWAMP / BWCTL Measurement Node Pair
7/17/2015
45
Results
 BWCTL:
http://abilene.internet2.edu/ami/bwctl_status_eu.cgi/
BW/14123130651515289600_14124243902743445
504
 OWAMP:
http://abilene.internet2.edu/ami/owamp_status_eu.c
gi/14123130651515289600_1412424390274344550
4
 MONALISA
 NLANR Advisor
7/17/2015
46
Insights (1)
 Even with shared source and a single team
of developer-installers, inter-administrative
domain coordination is difficult.
• Struggled with basics of multiple paths.
– IP addresses, host configuration, software (support source
addresses, etc.)
• Struggled with cross-domain administrative coordination
issues.
– AA (accounts), routes, port filters, MTUs, etc.
• Struggled with performance tuning measurement nodes.
– host tuning, asymmetric routing, MTUs
7/17/2015
47
Insights (2)
Connectivity takes a large amount of
coordination and effort; performance
takes even more of the same.
Current measurement approaches have
limited visibility into “lightpaths.”
• Having hosts participate in the measurement is
one possible solution.
7/17/2015
48
Insights (3)
Consider interaction with security; lack
of end-to-end transparency is
problematic.
• Security filters are set up based on expected
traffic patterns
• Measurement nodes create new traffic
• Lightpaths bypass expected ingress points
7/17/2015
49