The HOPI Testbed and the new Internet2 Network

Download Report

Transcript The HOPI Testbed and the new Internet2 Network

The Internet2 Network Observatory
Rick Summerhill
Director Network Research, Architecture, and Technologies
Brian Cashman
Network Planning Manager
Matt Zekauskas
Senior Engineer
Eric Boyd
Director Performance Architecture and Technologies
Internet2 Fall Member Meeting
6 December 2006
Chicago, IL
Agenda
• Introduction
•
•
•
•
History and Motivation
What is the Observatory?
Examples of Research Projects
The New Internet2 Observatory
• Initial Observatory Realization
• Measurement Capabilities
• Hardware Deployment in New Racks
• Observatory Usage
• Uses to date
• Network Research Considerations
• Future uses (and collections)
• Sharing Observatory Data and Tools for Inter-domain Use
• perfSONAR
History and Motivation
•Original Abilene racks included measurement
devices
• Included a single (somewhat large) PC
• Early OWAMP, surveyor measurements
• Optical splitters at some locations
•Motivation was primarily operations, monitoring,
and management - understanding the network and
how well it operates
•Data was collected and maintained whenever
possible
• Primarily a NOC function
• Available to other network operators to understand the network
• It became apparent that the datasets were valuable as a
network research tool
The Abilene Upgrade Network
Rick Summerhill
Upgrade of the Abilene Observatory
• An important decision was made during the
Abilene upgrade process (Juniper T-640
routers and OC-192c)
• Two racks, one of which was dedicated to measurement
• Potential for research community to collocate equipment
• Two components to the Observatory
• Collocation - network research groups are able to collocate
equipment in the Abilene router nodes
• Measurement - data is collected by the NOC, the Ohio ITEC,
and Internet2, and made available to the research community
An Abilene router node
Power
(48VDC)
Power
Measurement
Machines
(nms)
Space for
Collocation!
Eth. Switch
Measurement
(Observatory)
Rack
Out-of-band
(M-5)
T-640
Dedicated servers at each node
• Houston Router
Node - In this
picture:
• Measurement
machines
• Collocated
PlanetLab
machines
Example Research Projects
• Collocation projects
• PlanetLab – Nodes installed in all Abilene Router Nodes.
See http://www.planet-lab.org
• The Passive Measurement and Analysis Project (PMA) The Router clamp. See http://pma.nlanr.net
• Projects using collected datasets. See
http://abilene.internet2.edu/observatory/researchprojects.html
• “Modular Strategies for Internetwork Monitoring”
• “Algorithms for Network Capacity Planning and Optimal
Routing Based on Time-Varying Traffic Matrices”
• “Spatio-Temporal Network Analysis”
• “Assessing the Presence and Incidence of Alpha Flows in
Backbone Networks”
The New Internet2 Network
• Expanded Layer 1, 2 and 3 Facilities
• Includes SONET and Wave equipment
• Includes Ethernet Services
• Greater IP Services
• Requires a new type of Observatory
The New Internet2 Network
The New Internet2 Observatory
• Seek Input from the Community, both Engineers and
Network Researchers
• Current thinking is to support three types of services
• Measurement (as before)
• Collocation (as before)
• Experimental Servers to support specific projects - for
example, Phoebus (this is new)
• Support different types of nodes:
• Optical Nodes
• Router Nodes
• For example, as illustrated in the following diagrams
• Brian, Eric, and Matt will talk further about the Observatory
Nodes
Router Nodes
Rick Summerhill
Optical Nodes
Rick Summerhill
The New York Node - First Installment
Existing Observatory Capabilities
• One way latency, jitter, loss
• IPv4 and IPv6 (“owamp”)
• Regular TCP/UDP throughput tests – ~1 Gbps
• IPv4 and IPv6; On-demand available (“bwctl”)
• SNMP
• Octets, packets, errors; collected 1/min
• Flow data
• Addresses anonymized by 0-ing the low order 11 bits
• Routing updates
• Both IGP and BGP - Measurement device participates in both
• Router configuration
• Visible Backbone – Collect 1/hr from all routers
• Dynamic updates
• Syslog; also alarm generation (~nagios); polling via router proxy
Observatory Functions
Device Function
Details
nms-rthr1
Measurement
BWCTL on-demand 1 Gpbs router throughput,
Thrulay
nms-rthr2
Measurement
BWCTL on-demand 10 Gbps router
throughput, Thrulay
nms-rexp
Experimental
NDT/NPAD
nms-rpsv
Measurement
Netflow collector
nms-rlat
Measurement
OWAMP with locally attached GPS timing
nms-rpho
Experimental
Phoebus 2 x 10GE to Multiservice Switch
nms-octr
Management
Controls Multiservice Switch
nms-oexp Experimental
NetFPGA
nms-othr
On-demand Multiservice Switch 10 Gbps
throughput
Measurement
Router Nodes
Router Nodes
Optical Nodes
Optical Nodes
Observatory Hardware
• Dell 1950 and Dell 2950 servers
•
•
•
•
•
Dual Core 3.0 GHz Xeon processors
2 GB memory
Dual RAID 146 GB disk
Integrated 1 GE copper interfaces
10 GE interfaces
• Hewlett-Packard 10GE switches
• 9 servers at router sites, 3 at optical only sites
Observatory Databases – Datа Types
•Data is collected locally and stored in
distributed databases
•Databases
• Usage Data
• Netflow Data
• Routing Data
• Latency Data
• Throughput Data
• Router Data
• Syslog Data
Sub-outline: Uses and Futures
• Some uses of existing datasets and tools
• Quality Control
• Network Diagnosis
• Network Characterization
• Network Research
• Consultation with researchers
• Open questions
Recall: Datasets
•Usage Data
•Netflow Data
•Routing Data
•Latency Data
•Throughput Data
•Router Data
•Syslog Data
•ND, NR
•ND, NC, NR
•NR
•QC, ND, NR
•QC, ND, NR
•ND, NR
•NR
And, of course, most used for operations
Quality Control: e-VLBI
• When starting to connect telescopes, needed
to verify inter-site paths
• Set up throughput testing among sites (using
same Observatory tool: bwctl)
• Kashima, JP
• Onsala, SE
• Boston, MA (Haystack)
• Collect and graph data; distribute via web
• Quick QC check before applications tests start
Network Diagnosis: e-VLBI
• Target at the time: 50Mbps
• Oops: Onsala-Boston: 1Mbps
• Divide and Conquer
• Verify Abilene backbone tests look good
• Use Abilene test point in Washington DC
• Eliminated European and trans-Atlantic pieces
• Focus on problem: found oversubscribed link
Quality Control: IP Backbone
• Machines with 1GE interfaces, 9000
MTU
• Full mesh
• IPv4 and IPv6
• Expect > 950 Mbps TCP
• Keep list of “Worst 10”
• If any path < 900 Mbps for two
successive testing intervals, throw alarm
Quality Control: Peerings
• Internet2 and ESnet have been watching
the latency across peering points for a
while.
• Internet2 and DREN have been
preparing to do some throughput and
latency testing
• During the course of this set up, found
interesting routing and MTU size issues
Network Diagnosis: End Hosts
• NDT, NPAD servers
• Quick check from a host that has a
browser
• Easily eliminate (or confirm) last mile
problems (buffer sizing, duplex
mismatch, …)
• NPAD can find switch limitations,
provided the server is close enough
Network Diagnosis: Generic
• Generally looking for configuration & loss
• Don’t forget security appliances
• Is there connectivity & reasonable
latency? (ping -> OWAMP)
• Is routing reasonable (traceroute, proxy)
• Is host reasonable (NDT; NPAD)
• Is path reasonable (BWCTL)
Network Characterization
• Flow data collected with flow-tools
package
• All data not used for security alerts and
analysis [REN-ISAC] is anonymized
• Reports from anonymized data available
(see truncated addresses)
• Additionally, some Engineering reports
Network Research Projects
• Major consumption
• Flows
• Routes
• Configuration
• Nick Feamster (while at MIT)
• Dave Maltz (while at CMU)
• Papers in SIGCOMM, INFOCOM
• Hard to track folks that just pull data off of web sites
Network Research Facilities Grant
• Thanks to NSF funds, access to network
researchers for 1.5 yrs
• Interviews
• Presentations at Network Research
conferences and workshops
This material is based in part on work supported by the National Science Foundation
(NSF) under Grant No. SCI-0441149. Any opinions, findings and conclusions or
recommendations expressed in this material are those of the author(s) and do not
necessarily reflect the views of the NSF.
Grant Result Snippets
• Liked Abilene observatory. Keep passive!
• Biggest thing – more data
• But -- network research project driven
• Security-related: want payload
• Want some way to get more information from
flow data
• Alternate anonymization techniques
• Community consensus on passive measurement
Grant Results Snippets
• Want pool of researcher-developed
access tools (sharing among
researchers)
• Want ability to request new data sets
• Both new sources, and derived data
• Extend to cover new facilities (they were
thinking HOPI and L2VPNs, but…)
Lots of Work to be Done
• Internet2 Observatory realization inside racks
set for initial deployment, including new
research projects (NetFPGA, Phoebus)
• Software and links easily changed
• Could add or change hardware depending on
costs
• Researcher tools, new datasets
• Consensus on passive data
Not Just Research
• Operations and Characterization of new
services
• Finding problems with stitched together VLANs
• Collecting and exporting data from Dynamic Circuit
Service...
• Ciena performance counters
• Control plane setup information
• Circuit usage (not utilization, although that is also nice)
• Similar for underlying Infinera equipment
• And consider inter-domain issues
Observatory Requirements Strawman
• Small group: Dan Magorian, Joe Metzger and
Internet2
• See document off of
http://measurement.internet2.edu/
• Want to start working group under new
Network Technical Advisory Committee
• Interested? Talk to Matt or watch NTAC Wiki on
wiki.internet2.edu; measurement page will also
have some information…
Strawman: Potential New Focus Areas
• Technology Issues
• Is it working? How well? How debug
problems?
• Economy Issues – interdomain circuits
• How are they used? Are they used
effectively? Monitor violation of any rules
(e.g. for short-term circuits)
• Compare with “vanilla” IP services?
Strawman: Potential High-Level Goals
• Extend research datasets to new equipment
• Circuit “weathermap”; optical proxy
• Auditing Circuits
• Who requested (at suitable granularity)
• What for? (ex: bulk data, streaming media,
experiment control)
• Why? (add’l bw, required characteristics,
application isolation, security)
Inter-Domain Issues Important
• New services (various circuits)
• New control plane
• That must work across domains
• Will require some agreement among
various providers
• Want to allow for diversity…
Sharing Observatory Data
We want to make Internet2 Network
Observatory Data:
• Available:
• Access to existing active and passive
measurement data
• Ability to run new active measurement tests
• Interoperable:
• Common schema and semantics, shared across
other networks
• Single format
• XML-based discovery of what’s available
What is perfSONAR?
• Performance Middleware
• perfSONAR is an international consortium in which
Internet2 is a founder and leading participant
• perfSONAR is a set of protocol standards for
interoperability between measurement and
monitoring systems
• perfSONAR is a set of open source web services
that can be mixed-and-matched and extended to
create a performance monitoring framework
perfSONAR Design Goals
•
•
•
•
•
•
•
Standards-based
Modular
Decentralized
Locally controlled
Open Source
Extensible
Applicable to multiple generations of network
monitoring systems
• Grows “beyond our control”
• Customized for individual science disciplines
perfSONAR Integrates
• Network measurement tools
• Network measurement archives
• Discovery
• Authentication and authorization
• Data manipulation
• Resource protection
• Topology
perfSONAR Credits
•perfSONAR is a joint
effort:
•
•
•
•
ESnet
GÉANT2 JRA1
Internet2
RNP
•ESnet includes:
• ESnet/LBL staff
• Fermilab
•Internet2 includes:
•
•
•
•
University of Delaware
Georgia Tech
SLAC
Internet2 staff
•GÉANT2 JRA1 includes:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Arnes
Belnet
Carnet
Cesnet
CYNet
DANTE
DFN
FCCN
GRNet
GARR
ISTF
PSNC
Nordunet (Uninett)
Renater
RedIRIS
Surfnet
SWITCH
perfSONAR Adoption
•R&E Networks
•
•
•
•
•
Internet2
ESnet
GÉANT2
European NRENs
RNP
•Application Communities
• LHC
• GLORIAD Distributed
Virtual NOC
• Roll-out to other
application communities
in 2007
•Distributed Development
• Individual projects (10
before first release) write
components that integrate
into the overall framework
• Individual communities (5
before first release) write
their own analysis and
visualization software
Proposed Data to be made available via
perfSONAR
• First Priorities
• Link status (CIENA and Infinera data)
• VLAN
• SONET (Severely errored seconds, etc.)
• Light levels
• SNMP data
• OWAMP
• BWCTL
• Second Priorities
• Flow data
• Feedback? Alternate priorities?
What will (eventually) consume data?
• We intend to create a series of web pages that will
display the data
• Third-party Analysis/Visualization Tools
• European and Brazilian UIs
• SLAC-built analysis software
• More …
• Real applications
• Network-aware applications
• Consume performance data
• React to network conditions
• Request dynamic provisioning
• Future Example: Phoebus