20101007-LHCOPN-JZ - Indico
Download
Report
Transcript 20101007-LHCOPN-JZ - Indico
October 7th 2010, LHCOPN
Jason Zurawski, Internet2
Internet2 Update
Agenda
• Update on DYNES
• perfSONAR-PS Update
2 – 4/6/2016, © 2010 Internet2
DYNES Background
• Internet2 approached by 3 universities (Caltech, University of
Michigan, Vanderbilt) and asked to lead a proposal on behalf of
LHC and R&E community (July 2009)
– MRI-R2 solicitation capped the per university submissions at 3
• Issue was referred to advisory councils; recommendation to
proceed from AOAC and RAC (July 2009)
• Internet2 convened a series of community calls (August 2009)
• With strong community support, submitted proposal to Major
Research Instrument – Recovery and Reinvestment (MRI-R2)
(August 2009)
– Request was for ~$2 million
• Roughly 50% for equipment, 50% for personnel
3 – 4/6/2016, © 2010 Internet2
DYNES Summary
• What is it?:
– A nationwide cyber-instrument spanning ~40 US universities and ~14
Internet2 connectors
• Extends Internet2’s ION service into regional networks and campuses, based
on OSCARS implementation of IDC protocol (developed in partnership with
ESnet)
• Who is it?
– A collaborative team including Internet2, Caltech, University of
Michigan, and Vanderbilt University
– Community of regional networks and campuses
– LHC, astrophysics community, OSG, WLCG, other virtual organizations
• What are the goals?
– Support large, long-distance scientific data flows in the LHC, other
leading programs in data intensive science (such as LIGO, Virtual
Observatory, and other large scale sky surveys), and the broader
scientific community
– Build a distributed virtual instrument at sites of interest to the LHC but
available to R&E community generally
4 – 4/6/2016, © 2010 Internet2
DYNES - Dynamic Network Circuits
• DYNES will deliver the needed capabilities to the LHC, and to the broader
scientific community at all the campuses served,
by coupling to their analysis systems:
– Dynamic network circuit provisioning: IDC Controller
– Data transport: Low Cost IDC-capable Ethernet Switch;
FDT Server for high throughput, Low cost storage array
where needed (also non-LHC)
– End-to-end monitoring services
DYNES does not fund more bandwidth, but provides access to Internet2’s
dynamic circuit network (“ION”), plus the standard
mechanisms, tools and equipment needed
– To build circuits with bandwidth guarantees across multiple
network domains, across the U.S. and to Europe
• In a manageable way, with fair-sharing
• Will require scheduling services at some stage
– To build a community with high throughput capability
using standardized, common methods
DYNES: Why Dynamic Circuits ?
• To meet the science requirements, Internet2 and ESnet, along with several US
regional networks, US LHCNet, and in GEANT in Europe, have developed a
strategy (starting with a meeting at CERN, March 2004) based on a ‘hybrid’
network architecture
– Where the traditional IP network backbone is paralleled by a
circuit-oriented core network reserved for large-scale science traffic.
The major examples are Internet2’s Dynamic Circuit Network
(Its “ION Service”) and ESnet’s Science Data Network (SDN),
each of which provides:
– Increased effective bandwidth capacity, and reliability of network access,
by mutually isolating the large long-lasting flows (on ION and/or the SDN) and the
traditional IP mix of many small flows
– Guaranteed bandwidth as a service by building a system to automatically schedule
and implement virtual circuits traversing
the network backbone, and
– Improved ability of scientists to access network measurement data
for all the network segments end-to-end through the perfSONAR monitoring
infrastructure.
DYNES - Why Not Static Circuits or IP Networks ?
• Separation (physical or logical) of the dynamic circuit-oriented network from
the IP backbone is driven by the need to meet different functional, security,
and architectural needs:
– Static “nailed-up” circuits architectures will not scale.
It is impractical to set up enough static traditional circuits of 1
and 10 Gbps to serve these sites. Building a 2 X 2 mesh among
40 sites would mean ~780 circuits, for example, with an aggregate
provisioned bandwidth of several terabits/sec (Tbps)
– GP campus networks use firewalls since they must accommodate a huge
number of disparate devices, and provide access from many unidentified
sources, while guarding against improper use
(e.g. file sharing, denial of service attacks, inappropriate content).
Firewalls are not architected to support very large traffic flows.
But a science network can use digital certificates instead.
– Implementing many high capacity ports on traditional routers, which
need to support a huge number of routes, would be very expensive
• Price balance Worse in the next generation: 40G and 100G general purpose
router ports are several hundred k$ each.
DYNES Update
• As of Summer, 2010, DYNES has been funded!
• Revised budget: ~$1.74 million
– Hardware budget unchanged
– Personnel budget reduced ~25%
• Success due to strong community support
• Going forward, we are looking for strong community engagement
• Need to adapt the plan in response to NSF requests and evolving
network infrastructure
• http://www.internet2.edu/dynes
8 – 4/6/2016, © 2010 Internet2
DYNES Community Support
• Internet2 received a total of 60 Letters of Collaboration
–
–
–
–
44 Universities (some duplicates)
14 Regional Networks
1 Virtual Organization
1 Federal Lab
9 – 4/6/2016, © 2010 Internet2
DYNES Site Expectations (1)
Excerpting from the narrative …
• Each participating campus and regional network site is expected to
install and connect the equipment funded by this proposal or work with
the PIs to design a local customized solution.
• Those sites providing letters of collaboration are expressing an interest
in participation, but may need to evaluate internal budgets and other
conditions at the time of the award but prior to installation.
• Those sites listed but not providing letters of collaboration may be
interested in participating but upon submission time were not able to
commit within the timeframe of this solicitation.
• A site may opt-out of the project and return any equipment already
received if they no longer wish to participate, at the site’s sole
discretion. Likewise, if a site receives equipment but does not connect it
to the global instrument within 3 months then the management team
reserves the right to recall the equipment.
10 – 4/6/2016, © 2010 Internet2
DYNES Site Expectations (2)
Excerpting from the narrative …
• It may also be the case that a site has existing equipment that
meets some or all of the requirements of connecting to the
DYNES. The DYNES management will work with these sites to
determine how to best augment the existing facilities to meet or
exceed the base requirements for the site (possibly through
substitution of alternate equipment within the same per site
budget envelope).
• In all cases returned or unused equipment will be re-allocated to
another qualifying site identified by the DYNES management.
• The listed set of sites was selected for their early consumption of
bandwidth for this endeavor, but there are other sites that could
be substituted based on willingness to participate and the list is
expected to evolve.
11 – 4/6/2016, © 2010 Internet2
DYNES System Description
AIM: extend hybrid & dynamic capabilities to campus & regional networks.
•
A DYNES instrument must provide two basic
capabilities at the Tier 2S, Tier3s and
regional networks:
1. Network resource allocation such as
bandwidth to ensure performance of
the transfer
2. Monitoring of the network and data
transfer performance
•
•
All networks in the path require the ability
to allocate network resources and monitor
the transfer. This capability currently exists
on backbone networks such as Internet2 and
ESnet, but is not widespread at the campus
and regional level.
In addition Tier 2 & 3 sites require:
3. Hardware at the end sites capable of making
optimal use of the available network resources
Two typical transfers that DYNES
supports: one between a Tier 2 and a
Tier 3 and another between a Tier 2
and a Tier 1 site. The clouds represent
the network domains involved in such
a transfer.
DYNES: Tier2 and Tier3 - Instrument Design
• Each DYNES (sub-)instrument at a
Tier2 or Tier3 site consists of the
following hardware, where each
item has been carefully chosen to
combine
low cost & high performance:
1. An Inter-domain Controller (IDC)
2. An Ethernet switch
3. A Fast Data Transfer (FDT) server.
Sites with 10GE throughput capability
will have a dual-port Myricom 10GE
network interface
in the server.
4. An attached disk array with a Serial
Attached SCSI (SAS) controller
capable of several hundred
MBytes/sec to local storage.
4 1
3
2
5 Gbps with 2 Controllers
The Fast Data Transfer (FDT) server connects to the disk
array via the SAS controller and runs FDT software developed
by Caltech. FDT is an asynchronous multithreaded system that
automatically adjusts I/O and network buffers to achieve
maximum network utilization. The disk array stores datasets to
be transferred among the sites in some cases. The FDT server
serves as an aggregator/ throughput optimizer in this case,
feeding smooth flows over the networks directly to the Tier2 or
Tier3 clusters. The IDC server handles the allocation of
network resources on the switch, inter-actions with other
DYNES instruments related to network pro-visioning, and
network performance monitoring. The IDC creates virtual LANs
(VLANs) as needed.
DYNES: Regional Network - Instrument Design
• Regional networks require
1. An Ethernet switch
2. An Inter-domain Controller (IDC),
as shown.
The configuration of the IDC consists
of OSCARS, DRAGON, and
perfSONAR just as in the Tier2/
Tier 3 cases. This allows
the regional network to provision
resources on-demand through
interaction with the other
instruments
A regional network does not require a
disk array or FDT server because
they are providing transport for the
Tier 2 and Tier 3 data transfers, not
initiating them.
1
2
At the network level, each regional connects the incoming
campus connection to the Ethernet switch provided.
Optionally, if a regional network already has a qualified
switch compatible with the dynamic software that they prefer,
they may use that instead, or in addition to the provided
equipment. The Ethernet switch provides a VLAN
dynamically allocated by OSCARS & DRAGON. The VLAN
has quality of service (QoS) parameters set to guarantee the
bandwidth requirements of the connection as defined in the
VLAN. These parameters are determined by the original
circuit request from the researcher / application. through this
VLAN, the regional provides transit between the campus
IDCs connected in the same region or to the global IDC
DYNES Site Selection Process
• All sites will need to apply
– Application process still TBD
– Goal is to keep it lightweight, but fair
• External review panel
– Composition and process still TBD
– Likely 6 members
•
•
•
•
•
•
CMS overall representative
CMS Tier 3 representative
ATLAS overall representative
ATLAS Tier 3 representative
2 Non-LHC high performance networking community reps
Plus ex-officio members: PI and co-Pis
• Community feedback encouraged on site selection process
15 – 4/6/2016, © 2010 Internet2
Changes to ION Service and Impact on DYNES
• It’s been a year since the proposal was written
• In that time, Internet2 has:
– Migrated ION service from dedicated Ciena network onto the
Juniper MX platform
– Updated the Connector Options
• Old way: Contracting for an IP connection gives you a second ION
connection of the same size
• New way: For a modest fee increase, connector can purchase 2
connections to use as they choose
– Widespread adoption to date by connectors
• Practical impacts
– Many (but not all) connectors are using second connection for
redundant IP/CPS connectivity
• These changes change how DYNES will be implemented
16 – 4/6/2016, © 2010 Internet2
DYNES Roadmap
• ICO and Council notification (Completed August 2010)
• Community announcement
– Letter, website (Completed)
– 2 Community Calls (August 5th and August 17th)
• Set up external review panel (October, 2010)
• CFP (October, 2010)
• Calls with individual regionals and with individual campuses (plus
associated regional) upon request
• Ongoing in the Late Summer and Fall 2010
• DYNES BoF (Nov 2010, at the Internet2 FMM in Atlanta)
• Site selections announced (Late 2010)
• Distributed virtual Instrument building begins in 2011
17 – 4/6/2016, © 2010 Internet2
Questions
• Please send questions to [email protected]
• PI List:
–
–
–
–
Eric Boyd (Internet2)
Shawn McKee (University of Michigan
Harvey Newman (Caltech)
Paul Sheldon (Vanderbilt)
• Project updates shared via www.internet2.edu/dynes and
[email protected] mailing list
– Subscription information found on the website
18 – 4/6/2016, © 2010 Internet2
Agenda
• Update on DYNES
• perfSONAR-PS Update
19 – 4/6/2016, © 2010 Internet2
Software Releases
• pS Performance Toolkit 3.2 To Be Released (October 2010)
–
–
–
–
–
CentOS 5.5 Based
Includes Performance Measurement and perfSONAR-PS Tools
Special thanks to some LHCOPN Operators for helping to test!
http://psps.perfsonar.net/toolkit
Mailing List: https://lists.internet2.edu/sympa/info/performancenode-users
• perfSONAR-PS 3.2 To Be Released (October 2010)
–
–
–
–
–
Updates to all services
Support to expose Ganglia data
Available as source/RPMs
http://psps.perfsonar.net
Mailing List: https://lists.internet2.edu/sympa/info/perfsonar-psusers
20 – 4/6/2016, © 2010 Internet2
Development Roadmap
• Planned New Services
– GridFTP MA
– Traceroute/Tracepath (MTU Discovery and Correction) MA
– Netflow
•
•
•
•
Integrating NAGIOS and Gratia support into the pSPT
Improved GUIs
Integration with Dynamic Circuits
Working with USATLAS Tier2s on several projects:
– Specific NAGIOS Displays and GUIs for the VO
– Integrating Latency and Measurement tools onto a single machine
(e.g. prevent measurement pollution)
21 – 4/6/2016, © 2010 Internet2
Internet2 Update
October 7th 2010, LHCOPN
Jason Zurawski, Internet2
For more information, visit http://www.internet2.edu/dynes and http://psps.perfsonar.net
22 – 4/6/2016, © 2009 Internet2