20061206-ESnet4-Johnston

Download Report

Transcript 20061206-ESnet4-Johnston

ESnet4:
Networking for the Future of DOE
Science
December 5, 2006
William E. Johnston
ESnet Department Head and Senior Scientist
Lawrence Berkeley National Laboratory
[email protected], www.es.net
1
 DOE Office of Science and ESnet – the ESnet Mission
•
“The Office of Science is the single largest supporter of basic
research in the physical sciences in the United States, …
providing more than 40 percent of total funding … for the
Nation’s research programs in high-energy physics, nuclear
physics, and fusion energy sciences.” (http://www.science.doe.gov)
– This funding supports some 15,000 graduate students and post docs.
•
ESnet’s primary mission is to enable the large-scale
science that is the mission of the Office of Science (SC):
–
–
–
–
–
Sharing of massive amounts of data
Supporting thousands of collaborators world-wide
Distributed data processing
Distributed data management
Distributed simulation, visualization, and computational steering
• ESnet provides network and collaboration services to Office
of Science laboratories and many other DOE programs
2
 What ESnet Is
• A large-scale IP network built on a national circuit
infrastructure with high-speed connections to all major US and
international research and education (R&E) networks
• An organization of 30 professionals structured for the service
• An operating entity with an FY06 budget of $26.6M
• A tier 1 ISP (direct peerings will all major networks)
• The primary DOE network provider
– Provides production Internet service to all of the major DOE Labs* and
most other DOE sites
– Based on DOE Lab populations, it is estimated that between 50,000 100,000 users depend on ESnet for global Internet access
• additionally, each year more than 18,000 non-DOE researchers from
universities, other government agencies, and private industry use Office of
Science facilities
* PNNL supplements its ESnet service with commercial service
3
Office of Science US Community
Drives ESnet Design for Domestic Connectivity
Pacific Northwest
National Laboratory
Idaho National
Laboratory
Ames Laboratory
Argonne National
Laboratory
Fermi
National
Accelerator
Laboratory
Lawrence
Berkeley
National
Laboratory
Brookhaven
National
Laboratory
Stanford
Linear
Accelerator
Center
Princeton
Plasma
Physics
Laboratory
Lawrence
Livermore
National
Laboratory
Thomas Jefferson
National
Accelerator Facility
General
Atomics
Institutions supported by SC
Sandia
National
Major User Facilities
Laboratories
DOE Specific-Mission Laboratories
DOE Program-Dedicated Laboratories
DOE Multiprogram Laboratories
Oak Ridge
National
Laboratory
Los Alamos
National
Laboratory
National
Renewable Energy
Laboratory
Footprint of Largest SC Data Sharing Collaborators
Drives the International Footprint that ESnet Must Support
• Top 100 data flows generate 50% of all ESnet traffic (ESnet handles about 3x109 flows/mo.)
• 91 of the top 100 flows are from the Labs to other institutions (shown) (CY2005 data)
What Does ESnet Provide? - 1
•
An architecture tailored to accommodate DOE’s
large-scale science
– Move huge amounts of data between a small number of
sites that are scattered all over the world
•
Comprehensive connectivity
– High bandwidth access to DOE sites and DOE’s primary
science collaborators: Research and Education institutions
in the US, Europe, Asia Pacific, and elsewhere
•
Full access to the global Internet for DOE Labs
– ESnet is a tier 1 ISP managing a full complement of
Internet routes for global access
•
Highly reliable transit networking
– Fundamental goal is to deliver every packet that is received
to the “target” site
6
What Does ESnet Provide? - 2
•
A full suite of network services
– IPv4 and IPv6 routing and address space management
– IPv4 multicast (and soon IPv6 multicast)
– Primary DNS services
– Circuit services (layer 2 e.g. Ethernet VLANs), MPLS
overlay networks (e.g. SecureNet when it was ATM based)
– Scavenger service so that certain types of bulk traffic can
use all available bandwidth, but will give priority to any
other traffic when it shows up
– Prototype guaranteed bandwidth and virtual circuit services
7
What Does ESnet Provide? - 3
•
New network services
– Guaranteed bandwidth services
• Via a combination of QoS, MPLS overlay, and layer 2 VLANS
•
Collaboration services and Grid middleware
supporting collaborative science
– Federated trust services / PKI Certification Authorities with
science oriented policy
– Audio-video-data teleconferencing
•
Highly reliable and secure operation
– Extensive disaster recovery infrastructure
– Comprehensive internal security
– Cyberdefense for the WAN
8
What Does ESnet Provide? - 4
•
Comprehensive user support, including “owning” all
trouble tickets involving ESnet users (including
problems at the far end of an ESnet connection) until
they are resolved – 24x7x365 coverage
– ESnet’s mission is to enable the network based aspects of
OSC science, and that includes troubleshooting network
problems wherever they occur
•
A highly collaborative and interactive relationship
with the DOE Labs and scientists for planning,
configuration, and operation of the network
– ESnet and its services evolve continuously in direct
response to OSC science needs
– Engineering services for special requirements
9
ESnet History
ESnet0/MFENet
mid-1970s-1986
ESnet0/MFENet
56 Kbps microwave and
satellite links
ESnet1
1986-1995
ESnet formed to serve the
Office of Science
56 Kbps, X.25 to
45 Mbps T3
ESnet2
1995-2000
Partnered with Sprint to
build the first national
footprint ATM network
IP over 155 Mbps ATM
net
ESnet3
2000-2007
Partnered with Qwest to
build a national Packet over
SONET network and optical
channel Metropolitan Area
Networks
IP over 10Gbps SONET
Partner with Internet2 and
US Research& Education
community to build a
dedicated national optical
network
IP and virtual circuits on
a configurable optical
infrastructure with at
least 5-6 optical
channels of 10-100
Gbps each
10
transition
in progress
ESnet4
2007-2012
ESnet Science Data Network (SDN) core
ESnet3 Today Provides Global High-Speed Internet Connectivity for
DOE Facilities and Collaborators (Fall, 2006)
Japan (SINet)
Australia (AARNet)
Canada (CA*net4
Taiwan (TANet2)
Singaren
CA*net4
France
GLORIAD
(Russia, China)
Korea (Kreonet2
MREN
Netherlands
StarTap
Taiwan (TANet2,
ASCC)
GÉANT
- France, Germany,
Italy, UK, etc
SINet (Japan)
Russia (BINP)
CERN
(USLHCnet
DOE+CERN funded)
NSF/IRNC
funded
LIGO
PNNL
AU
MIT
JGI
TWC
LLNL
SNLL
LBNL
NERSC
SLAC
ESnet IP core:
Packet over SONET
Optical Ring and
Hubs
BNL
NASA
Ames
GA
AMPATH
PPPL
Equinix
OSC GTN
NNSA
KCP
JLAB
OSTI
LANL
ARM
AU
Lab DC
Offices
MAE-E
PAIX-PA
Equinix, etc.
YUCCA MT
FNAL
ANL
AMES
SNLA
Allied
Signal
42 end user sites (S. America)
Office Of Science Sponsored (22)
NNSA Sponsored (12)
Joint Sponsored (3)
Other Sponsored (NSF LIGO, NOAA)
Laboratory Sponsored (6)
R&E
commercial peering points
networks Specific R&E network peers
Other R&E peering points
ESnet core hubs
high-speed peering points with Internet2/Abilene
ORNL
ORAU
NOAA
SRS
AMPATH
(S. America)
International (high speed)
10 Gb/s SDN core
10G/s IP core
2.5 Gb/s IP core
MAN rings (≥ 10 G/s)
Lab supplied links
OC12 ATM (622 Mb/s)
OC12 / GigEthernet
OC3 (155 Mb/s)
45 Mb/s and less
ESnet’s Place in U. S. and International Science
•
ESnet, Internet2/Abilene, and National Lambda Rail (NLR)
provide most of the nation’s transit networking for basic
science
– Abilene provides national transit networking for most of the US
universities by interconnecting the regional networks (mostly via the
GigaPoPs)
– ESnet provides national transit networking and ISP service for the
DOE Labs
– NLR provides various science-specific and network R&D circuits
•
GÉANT plays a role in Europe similar to Abilene and ESnet in
the US – it interconnects the European National Research
and Education Networks (NRENs), to which the European
R&E sites connect
– A GÉANT operated, NSF funded like currently carries all non-LHC
ESnet traffic to Europe, and this is a significant fraction of all ESnet
traffic
12
ESnet is a Highly Reliable Infrastructure
“5 nines” (>99.995%)
“4 nines” (>99.95%)
“3 nines”
Dually connected sites
13
ESnet is An Organization Structured for the Service
Network engineering, routing and network
services, WAN security
5.5
8.3
Deployment and WAN maintenance
Internal infrastructure, disaster recovery,
security
Applied R&D for new network services
(Circuit services and end-to-end monitoring)
7.4
1.5
Science collaboration services
(Public Key Infrastructure certification authorities,
AV conferencing, email lists, Web services)
5.5
Management, accounting, compliance
3.5
30.7 FTE (full-time staff) total
Network operations and user support
(24x7x365, end-to-end problem resolution
14
ESnet FY06 Budget is Approximately $26.6M
Approximate Budget Categories
SC Special
Projects:
$1.2M
SC R&D:
$0.5M
Carryover:
$1M
Other
DOE:
$3.8M
Target
Special projects
carryover:
(Chicago and LI MANs):
$1.0M
$1.2M
Management and
compliance: $0.7M
Collaboration
services: $1.6
Internal
infrastructure,
security, disaster
recovery: $3.4M
Circuits & hubs:
$12.7M
SC operating:
$20.1M
Operations:
$1.1M
Engineering &
research: $2.9M
WAN
equipment:
$2.0M
Total funds:
$26.6M
Total expenses:
$26.6M
15
Strategy for Conduct of Business for the Last Few Years
•
Increasing Openness
– Making network data available to the community and the
Internet
– Outage Calendar and outage reports
– web-based GUI for traffic measurements (netinfo.es.net)
– Flow stats
– Increasing Instrumentation
• Performance testers (various levels of access to test circuits)
• OWAMP servers (one-way testers - equisitally sensitive. Note: The
OWAMP system is pretty much down. It is migrating to perfSONAR
and new, more relevant, R&E sites will be selected for continuous
monitoring
– Establish the goal of “network performance between
ESnet sites and Internet2 sites served by Abilene is
equivalent to network performance across one of the
networks or the other”
16
Strategy for Conduct of Business for the Last Few Years
•
Increasing involvement with national and
international collaborations and research activities
– perfSONAR - standards based monitoring platform
– OSCARS - community developed (ESnet leadership)
virtual circuit management
•
Increasing partnership with the R&E community
–
–
–
–
Joint ESnet/I2/Abileene meetings at Joint Techs
LHC network operations working group participation
DICE meetings
Joint Techs meetings participation
• attendance, talks, & program committee
– All leading up to partnership with Internet2 for building
ESnet4
17
A Changing Science Environment is the Key Driver of
the Next Generation ESnet
• Large-scale collaborative science – big facilities, massive data,
thousands of collaborators – is now a significant aspect of the
Office of Science (“SC”) program
• SC science community is almost equally split between Labs
and universities
– SC facilities have users worldwide
• Very large international (non-US) facilities (e.g. LHC and ITER)
and international collaborators are now a key element of SC
science
• Distributed systems for data analysis, simulations, instrument
operation, etc., are essential and are now common (in fact
dominate data analysis that now generates 50% of all ESnet
traffic)
18
Planning the Future Network - ESnet4
There are many stakeholders for ESnet
1. SC programs
–
–
–
–
–
–
–
Advanced Scientific Computing Research
Basic Energy Sciences
Biological and Environmental Research
Fusion Energy Sciences
High Energy Physics
Nuclear Physics
Office of Nuclear Energy
2. Major scientific facilities
– At DOE sites: large experiments, supercomputer centers, etc.
– Not at DOE sites: LHC, ITER
3. SC supported scientists not at the Labs
(mostly at US R&E institutions)
These account
for 85% of all
ESnet traffic
4. Other collaborating institutions
(mostly US, European, and AP R&E)
5. Other R&E networking organizations that support major collaborators
– Mostly US, European, and Asia Pacific networks
6. Lab operations and general population
7. Lab networking organizations
19
Planning the Future Network - ESnet4
•
Requirements of the ESnet stakeholders are primarily
determined by
1) Data characteristics of instruments and facilities that
will be connected to ESnet
• What data will be generated by instruments coming on-line over the next
5-10 years?
• How and where will it be analyzed and used?
2) Examining the future process of science
• How will the processing of doing science change over 5-10 years?
• How do these changes drive demand for new network services?
3) Studying the evolution of ESnet traffic patterns
• What are the trends based on the use of the network in the past 2-5
years?
• How must the network change to accommodate the future traffic patterns
implied by the trends?
20
(1) Requirements from Instruments and Facilities
DOE SC Facilities that are, or will be, the top network users
•
Advanced Scientific Computing Research
– National Energy Research Scientific
Computing Center (NERSC) (LBNL)*
– National Leadership Computing Facility
(NLCF) (ORNL)*
– Argonne Leadership Class Facility (ALCF)
(ANL)*
•
•
– William R. Wiley Environmental Molecular
Sciences Laboratory (EMSL) (PNNL)*
– Joint Genome Institute (JGI)
– Structural Biology Center (SBC) (ANL)
•
– Alcator C-Mod (MIT)*
– National Spherical Torus Experiment (NSTX)
(PPPL)*
– ITER
– National Synchrotron Light Source (NSLS)
(BNL)
– Stanford Synchrotron Radiation Laboratory
(SSRL) (SLAC)
•
High Energy Physics
– Tevatron Collider (FNAL)
– B-Factory (SLAC)
– Large Hadron Collider (LHC, ATLAS, CMS)
(BNL, FNAL)*
– Spallation Neutron Source (ORNL)*
– National Center for Electron Microscopy
(NCEM) (LBNL)*
– Combustion Research Facility (CRF) (SNLL)*
Fusion Energy Sciences
– DIII-D Tokamak Facility (GA)*
Basic Energy Sciences
– Advanced Light Source (ALS) (LBNL)*
– Advanced Photon Source (APS) (ANL)
Biological and Environmental Research
•
Nuclear Physics
– Relativistic Heavy Ion Collider (RHIC) (BNL)*
– Continuous Electron Beam Accelerator
Facility (CEBAF) (JLab)*
*14 of 22 are characterized by current case studies
21
The Largest Facility: Large Hadron Collider at CERN
LHC CMS detector
15m X 15m X 22m,12,500 tons, $700M
human (for scale)
22
(2) Requirements from Examining
the Future Process of Science
• In a major workshop [1], and in subsequent updates [2],
requirements were generated by asking the science
community how their process of doing science will /
must change over the next 5 and next 10 years in
order to accomplish their scientific goals
• Computer science and networking experts then
assisted the science community in
– analyzing the future environments
– deriving middleware and networking requirements needed to
enable these environments
• These were complied as case studies that provide
specific 5 & 10 year network requirements for
bandwidth, footprint, and new services
23
Science Networking Requirements Aggregation Summary
Science
Drivers
End2End
Reliability
Connectivity
Science Areas
/ Facilities
Magnetic
Fusion Energy
NERSC and
ACLF
• DOE sites
(Impossible • US Universities
without full
• Industry
redundancy)
99.999%
-
• DOE sites
• US Universities
• International
• Other ASCR
Today
End2End
Band
width
5 years
End2End
Band
width
200+
Mbps
1 Gbps
10 Gbps
20 to 40
Gbps
Traffic
Characteristics
• Bulk data
• Remote control
• Bulk data
• Remote control
• Remote file
system sharing
supercomputers
NLCF
Nuclear
Physics (RHIC)
Spallation
Neutron Source
-
-
High
(24x7
operation)
• DOE sites
• US Universities
• Industry
• International
• DOE sites
• US Universities
• International
Backbone
Band
width
parity
Backbone
band width
parity
12 Gbps
70 Gbps
• DOE sites
640 Mbps
Network Services
• Guaranteed
bandwidth
• Guaranteed QoS
• Deadline scheduling
• Guaranteed
bandwidth
• Guaranteed QoS
• Deadline Scheduling
• PKI / Grid
• Bulk data
• Remote file
system sharing
• Bulk data
• Guaranteed
bandwidth
• PKI / Grid
2 Gbps
• Bulk data
Science Network Requirements Aggregation Summary
Science
Drivers
End2End
Reliability
Connectivity
Science Areas
/ Facilities
Advanced
Light Source
-
Bioinformatics
-
Chemistry /
Combustion
Climate
Science
-
-
• DOE sites
• US Universities
• Industry
• DOE sites
• US Universities
Today
End2End
Band
width
5 years
End2End
Band width
1 TB/day
5 TB/day
300 Mbps
1.5 Gbps
625 Mbps
250 Gbps
12.5
Gbps in
two years
• DOE sites
• US Universities
• Industry
-
• DOE sites
• US Universities
• International
-
Traffic
Characteristics
Network Services
• Bulk data
• Guaranteed
bandwidth
• Remote control
• PKI / Grid
• Bulk data
• Guaranteed
bandwidth
• Remote control
• High-speed
• Point-tomulticast
multipoint
10s of
Gigabits per
second
• Bulk data
5 PB per year
• Bulk data
• Guaranteed
• Remote control bandwidth
• PKI / Grid
5 Gbps
• Guaranteed
bandwidth
• PKI / Grid
Immediate Requirements and Drivers
High Energy
Physics (LHC)
99.95+%
(Less
than 4
hrs/year)
• US Tier1 (FNAL, BNL)
• US Tier2
(Universities)
• International (Europe,
Canada)
10 Gbps
60 to 80 Gbps
(30-40 Gbps
per US Tier1)
• Bulk data
• Coupled data
analysis
processes
• Guaranteed
bandwidth
• Traffic isolation
• PKI / Grid
3) These
Total Esnet Traffic
Fraction ofin
TopObserved
100 AS-AS Traffic)
Trends (Showing
are Seen
Evolution of
Historical ESnet Traffic Patterns
1400
1200
top 100
sites to site
workflows
800
600
400
Jul, 06
Jan, 06
Jul, 05
Jan, 05
Jul, 04
Jan, 04
Jul, 03
Jan, 03
Jul, 02
Jan, 02
Jul, 01
Jan, 01
0
Jul, 00
200
Jan, 00
Terabytes / month
1000
ESnet Monthly Accepted Traffic, January, 2000 – June, 2006
• ESnet is currently transporting more than1 petabyte (1000 terabytes) per month
• More than 50% of the traffic is now generated by the top 100 sites — large-scale
science dominates all ESnet traffic
26
ESnet Traffic has Increased by
10X Every 47 Months, on Average, Since 1990
Apr., 2006
1 PBy/mo.
10000.0
Nov., 2001
100 TBy/mo.
53 months
1000.0
100.0
R2 = 0.9898
40 months
Oct., 1993
1 TBy/mo.
57 months
10.0
Aug., 1990
100 MBy/mo.
38 months
1.0
0.1
Log Plot of ESnet Monthly Accepted Traffic, January, 1990 – June, 2006
Jan, 06
Jan, 05
Jan, 04
Jan, 03
Jan, 02
Jan, 01
Jan, 00
Jan, 99
Jan, 98
Jan, 97
Jan, 96
Jan, 95
Jan, 94
Jan, 93
Jan, 92
Jan, 91
0.0
Jan, 90
Terabytes / month
Jul., 1998
10 TBy/mo.
Requirements from Network Utilization Observation
•
In 4 years, we can expect a 10x increase in traffic over
current levels without the addition of production LHC traffic
– Nominal average load on busiest backbone links is ~1.5 Gbps today
– In 4 years that figure will be ~15 Gbps based on current trends
•
Measurements of this type are science-agnostic
– It doesn’t matter who the users are, the traffic load is increasing
exponentially
– Predictions based on this sort of forward projection tend to be
conservative estimates of future requirements because they cannot
predict new uses
•
Bandwidth trends drive requirement for a new network
architecture
– New architecture/approach must be scalable in a cost-effective way
28
0
FNAL -> CERN traffic is comparable to BNL -> CERN
but on layer 2 flows that are not yet monitored for traffic – soon)
NERSC (DOE Supercomputer) -> LBNL
Math. & Comp. (MICS)
Fermilab -> U. Florida
20
IN2P3 (France) -> Fermilab
LIGO (NSF)
Italy R&E -> SLAC
40
Argonne -> US Commodity
Fermilab -> U. Oklahoma
60
Fermilab -> UK R&E
80
UC San Diego -> Fermilab
100
BNL -> French R&E
Traffic Volume of the Top 30 AS-AS Flows, June 2006
(AS-AS = mostly Lab to R&E site, a few Lab to R&E
network, a few “other”)
Fermilab -> Swiss R&E
Fermilab -> Germany R&E
SLAC -> Karlsruhe (Germany)
Italy R&E -> Fermilab
PNNL -> Abilene (US R&E)
ESNET -> CalTech
Fermilab -> Belgium R&E
SLAC -> IN2P3 (France)
U. Neb.-Lincoln -> Fermilab
SLAC -> UK R&E
Abilene (US R&E) -> PNNL
Fermilab -> Germany R&E
RIKEN (Japan) -> BNL
Fermilab -> Estonia
SLAC -> Italy R&E
Fermilab -> DESY-Hamburg (Germany)
120
BNL -> RIKEN (Japan)
140
Fermilab -> Italy R&E
Fermilab -> MIT
Fermilab -> U. Neb.-Lincoln
CERN -> BNL
Terabytes
Top
30
AS-AS
flows,
June
2006
Large-Scale Flow Trends, June 2006
Subtitle: “Onslaught of the LHC”)
DOE Office of Science Program
LHC / High Energy
Physics - Tier 0-Tier1
LHC / HEP - T1-T2
HEP
Nuclear Physics
Lab - university
Lab - commodity
Traffic Patterns are Changing Dramatically
1/05
total traffic,
TBy
total traffic,
TBy
1200
1200
1000
1000
6/06
800
2 TB/month
800
600
600
400
400
2 TB/month
200
0
200
Jun. 06
Jan, 00
0
1200
1000
7/05
800
• While the total traffic is increasing
600
400
exponentially
200
0
Jul, 05
2 TB/month
– Peak flow – that is system-to-system
– bandwidth is decreasing
1200
1/06
– The number of large flows is
increasing
1000
800
600
400
200
0
Jan, 06
2 TB/month
30
The Onslaught of Grids
Question: Why is peak flow bandwidth decreasing while total traffic is increasing?
plateaus indicate the emergence of
parallel transfer systems (a lot of
systems transferring the same
amount of data at the same time)
Answer: Most large data transfers are now done by parallel / Grid data
movers
• In June, 2006 72% of the hosts generating the top 1000 flows were
involved in parallel data movers (Grid applications)
• This is the most significant traffic pattern change in the history of
ESnet
• This has implications for the network architecture that favor path
multiplicity and route diversity
31
Network Observation – Circuit-like Behavior
Look at Top 20 Traffic Generator’s Historical Flow Patterns
Over 1 year, the work flow / “circuit” duration is about 3 months
1550
1350
Gigabytes/day
1150
950
750
550
350
150
LIGO – CalTech (host to host)
9/23/05
8/23/05
7/23/05
6/23/05
5/23/05
4/23/05
3/23/05
2/23/05
1/23/05
12/23/04
11/23/04
10/23/04
(no data)
9/23/04
-50
32
Network Observation – Circuit-like Behavior (2)
Look at Top 20 Traffic Generator’s Historical Flow Patterns
Over 1 year, work flow / “circuit” duration is about 1 day to 1 week
950
Gigabytes/day
750
550
350
150
SLAC - IN2P3, France (host to host)
9/23/05
8/23/05
7/23/05
6/23/05
5/23/05
4/23/05
3/23/05
2/23/05
1/23/05
12/23/04
11/23/04
10/23/04
(no data)
9/23/04
-50
33
What is the High-Level View of ESnet Traffic Patterns?
ESnet Inter-Sector Traffic Summary, Mar. 2006
48%
DOE sites
Inter-Lab
traffic
ESnet
~10%
7%
5%
12%
3%
R&E (mostly
universities)
Peering Points
58%
23%
43%
Traffic notes
• more than 90% of all traffic Office of Science
• less that 10% is inter-Lab
Commercial
International
(almost entirely
R&E sites)
Traffic coming into ESnet = Green
Traffic leaving ESnet = Blue
Traffic between ESnet sites
% = of total ingress or egress traffic
34
Requirements from Traffic Flow Observations
•
Most of ESnet science traffic has a source or sink outside of
ESnet
– Drives requirement for high-bandwidth peering
– Reliability and bandwidth requirements demand that peering be
redundant
– Multiple 10 Gbps peerings today, must be able to add more bandwidth
flexibly and cost-effectively
– Bandwidth and service guarantees must traverse R&E peerings
• Collaboration with other R&E networks on a common framework is critical
• Seamless fabric
• Large-scale science is now the dominant user of the network
– Satisfying the demands of large-scale science traffic into the future will
require a purpose-built, scalable architecture
– Traffic patterns are different than commodity Internet
35
Changing Science Environment  New Demands on Network
Requirements Summary
• Increased capacity
– Needed to accommodate a large and steadily increasing
amount of data that must traverse the network
• High network reliability
– Essential when interconnecting components of distributed
large-scale science
• High-speed, highly reliable connectivity between Labs
and US and international R&E institutions
– To support the inherently collaborative, global nature of largescale science
• New network services to provide bandwidth guarantees
– Provide for data transfer deadlines for
• remote data analysis, real-time interaction with instruments,
coupled computational simulations, etc.
36
 ESnet4 - The Response to the Requirements
I) A new network architecture and implementation strategy
• Rich and diverse network topology for flexible management and high
reliability
• Dual connectivity at every level for all large-scale science sources
and sinks
• A partnership with the US research and education community to
build a shared, large-scale, R&E managed optical infrastructure
•
a scalable approach to adding bandwidth to the network
•
dynamic allocation and management of optical circuits
II) Development and deployment of a virtual circuit service
• Develop the service cooperatively with the networks that are
intermediate between DOE Labs and major collaborators to ensure
and-to-end interoperability
37
 Next Generation ESnet: I) Architecture and Configuration
• Main architectural elements and the rationale for each element
1) A High-reliability IP core (e.g. the current ESnet core) to address
–
–
–
–
–
General science requirements
Lab operational requirements
Backup for the SDN core
Vehicle for science services
Full service IP routers
2) Metropolitan Area Network (MAN) rings to provide
–
–
–
–
Dual site connectivity for reliability
Much higher site-to-core bandwidth
Support for both production IP and circuit-based traffic
Multiply connecting the SDN and IP cores
2a) Loops off of the backbone rings to provide
– For dual site connections where MANs are not practical
3) A Science Data Network (SDN) core for
–
–
–
–
–
–
Provisioned, guaranteed bandwidth circuits to support large, high-speed science data flows
Very high total bandwidth
Multiply connecting MAN rings for protection against hub failure
Alternate path for production IP traffic
Less expensive router/switches
Initial configuration targeted at LHC, which is also the first step to the general configuration that will
address all SC requirements
– Can meet other unknown bandwidth requirements by adding lambdas
38
ESnet Target Architecture:
IP Core+Science Data Network Core+Metro Area Rings
international
connections
international
connections
international
connections
Loop off
Backbone
1625 miles / 2545 km
international
connections
Sunnyvale
SDN Core
New York
Denver
IP Core
Washington
DC
Metropolitan
Area Rings
LA
Albuquerque
San
Diego
IP core hubs
SDN hubs
international
connections
10-50 Gbps circuits
Production IP core
Science Data Network core
Metropolitan Area Networks
or backbone loops for Lab access
International connections
international
connections
Primary DOE Labs
Possible hubs
2700 miles / 4300 km
39
ESnet4
•
Internet2 has partnered with Level 3 Communications Co.
and Infinera Corp. for a dedicated optical fiber infrastructure
with a national footprint and a rich topology - the “Internet2
Network”
– The fiber will be provisioned with Infinera Dense Wave Division
Multiplexing equipment that uses an advanced, integrated opticalelectrical design
– Level 3 will maintain the fiber and the DWDM equipment
– The DWDM equipment will initially be provisioned to provide10 optical
circuits (lambdas - s) across the entire fiber footprint (80 s is max.)
•
ESnet has partnered with Internet2 to:
– Share the optical infrastructure
– Develop new circuit-oriented network services
– Explore mechanisms that could be used for the ESnet Network
Operations Center (NOC) and the Internet2/Indiana University NOC to
back each other up for disaster recovery purposes
40
ESnet4
•
ESnet will build its next generation IP network and
its new circuit-oriented Science Data Network
primarily on the Internet2 circuits (s) that are
dedicated to ESnet, together with a few National
Lambda Rail and other circuits
– ESnet will provision and operate its own routing and
switching hardware that is installed in various commercial
telecom hubs around the country, as it has done for the
past 20 years
– ESnet’s peering relationships with the commercial
Internet, various US research and education networks,
and numerous international networks will continue and
evolve as they have for the past 20 years
41
ESnet4
•
ESnet4 will also involve an expansion of the multi10Gb/s Metropolitan Area Rings in the
San Francisco Bay Area, Chicago, Long Island,
Newport News (VA/Washington, DC area), and
Atlanta
– provide multiple, independent connections for ESnet sites
to the ESnet core network
– expandable
•
Several 10Gb/s links provided by the Labs that will
be used to establish multiple, independent
connections to the ESnet core
– currently PNNL and ORNL
42
ESnet Metropolitan Area Network Ring Architecture for High Reliability Sites
US
LHCnet
switch
SDN
core
west
SDN
core
east
US
LHCnet
switch
SDN
core
switch
ESnet production IP core hub
IP core
west
SDN
core
switch
IP core
router
ESnet
IP core hub
ESnet SDN
core hub
MAN fiber ring: 2-4 x 10 Gbps channels provisioned initially,
with expansion capacity to 16-64
Large Science Site
ESnet MAN
switch
ESnet production
IP service
ESnet managed
λ / circuit services
Independent
port card
supporting
multiple 10 Gb/s
line interfaces
SDN circuits
to site systems
IP core
east
ESnet switch
Virtual
Circuit to
Site
Site
router
Site LAN
Site edge router
ESnet managed
virtual circuit services
tunneled through the
IP backbone
Site
Virtual Circuits
to Site
Site gateway router
43
Internet2 / Level3 / Infinera Optical Infrastructure
Pacific Northwest GP
2001 6th Ave
Westin Bldg
Portland
Oregon GP
707 SW Washington
Qwest
Seattle
1000 Denny Way
Level 3
Portland
1335 NW Northrop
Level 3
Chicago
CIC/MREN
MERIT
BOREAS
Internet2
710 N Lakeshore
Starlight
Tionesta
Reno
Eureka
Ogden
Omaha
Rawlins
Oakland
San Francisco
Sunnyvale
CENIC
1380 Kifer
Level 3
Tulsa
OneNet
18 W Archer
Level 3
San Luis Obispo
Santa Barbara
Raton
Los Angeles
818 W 7th
Level 3
Los Angeles
CENIC
600 W 7th
Equinix
Chicago
600 W Chicago
Level 3 MC
Phoenix
Tucson
Kansas CitySt. Louis
GPN
1100 Walnut
Nashville
Level 3
Tennessee GP
2990 Sidco Dr
Level 3
Rancho De La Fe
(tentative)
Pittsburgh
Pittsburgh GP
143 S 25th
Level 3
Cincinnati
Louisville
848 S 8th St
Level 3
Charlotte
Birmingham
Infinera equipment sites
regen site
other Level 3 node
Internet2 Core Node
Level 3 Fiber
Internet2 network
DWDM node (common)
extension from Internet2
network to a RON/Core
connector node
DWDM node
(ESnet only)
Valentine
Austin
San Antonio Houston
LEARN
1201 N I-45
Level 3
Raleigh
NCREN
5301 Departure
Dr
Level 3
Jacksonville
FLR
4814 Phillips Hwy
Level 3
Mobile
Sanderson
New York
NYSERNET
32 Ave of the
Americas
Philadelphia
MAGPI
401 N Broad
Level 3
Washington
MAX
1755 Old Meadow Lane
McLean, VA
Level 3
Atlanta, SLR
345 Courtland
Atlanta
180 Peachtree St NE
Level 3 MC
Dallas
El Paso
501 W Overland
Level 3
Cambridge
NOX
300 Bent St
Level 3
New York
111 8th
Level 3
Detroit
Indianapolis
1902 S East St
Level 3
Salt Lake
Denver
Inter-Mountain GP Front Range GP
572 S DeLong
1850 Pearl
Level 3
Level 3
Rochester
Buffalo
Edison
Sacramento
Albany
316 N Pearl
Syracuse Level 3
Cleveland
TFN
4000 Chester
Level 3
Rieth
Tallahassee
New Orleans
Baton Rouge
LONI
9987 Burbank
Level 3
Orlando
Tampa
Miami
South Florida GP
45 NW 5th
Level 3
44
ESnet4 2009 Configuration
(Some of the circuits may be allocated dynamically from shared a pool.)
Seattle
(28)
Portland
(? )
3
(29)
Boise
Boston
(7)
2
3
Chicago
(11)
Sunnyvale
3
(13)
Denver
Salt
Lake
City
2
San Diego
3
3
2
Albuq.
Wash. DC
2
(22)
Tulsa
(30)
OC48
(4)
(3) 2
1
(1)
Atlanta
(2)
(20)
El Paso
2
(17)
Raleigh
3
Nashville
2
(19)
Jacksonville
2
ESnet IP switch/router hubs
ESnet IP switch only hubs
(6)
(5)
Houston
Baton
Rouge
ESnet SDN switch hubs
Layer 1 optical nodes at eventual ESnet Points of Presence
Layer 1 optical nodes not currently in ESnet plans
Lab site
Philadelphia
3 (26)
2
2
2
(25)
(21)
(0)
(24)
3 (10)
KC
(15)
(23)
LA
Clev.
NYC
3
(32)
(9)
(20)
ESnet IP core
ESnet Science Data Network core
ESnet SDN core, NLR links (existing)
Lab supplied link
LHC related link
MAN link
International IP Connections
Internet2 circuit number
ESnet4 2009 Configuration
Long from
Island MAN
Chicago
MAN
(Some of the circuitsWest
may
be allocated
dynamically
shared a pool.)
600 W.
Chicago
Seattle
Starlight
(28)
Portland
32 AoA, NYC
(? )
3
(29)
BNL
BoiseUSLHCNet
3
(24)
San Diego
2
(11)
(13)
(25)
3
Philadelphia
3 (26)
(21)
Tulsa
(30)
OC48
(4)
(3) 2
1
(1)
Newport News - Elite
Atlanta
(2)
(20)
NERSC
2
(17)
2
SNLL
ESnet IP switch only hubs
Atlanta
Jacksonville
(6)
(5)
Baton
Houston
Rouge
Jacksonville
180 Peachtree
ESnet SDN switch hubs
Layer 1 optical nodes at eventual ESnet Points of Presence
Layer 1 optical nodes not currently in ESnet plans
56 Marietta
(20)
Nashville
ESnet IP switch/router hubs
LLNL
Raleigh
3
Nashville
2
LBNL
Wash. DC
2
(22)
(0)
El Paso
Lab site
3 (10)
KC
3
2
Albuq.
SLAC
(19)
Clev.
2
JGI
2
(9)
NYC
ANL
(15)
San Francisco
Bay Area MAN
(23)
LA
FNAL
Denver
Salt
Lake
City
2
3
Chicago
(32)
Boston
111-8th
(7)
2
Sunnyvale
USLHCNet
Wash.,
DC
MATP
ESnet IP core
JLab
ESnet Science Data Network core
ELITE
ESnet SDN core, NLR
links (existing)
Lab supplied link
ODU
LHC related link
MAN link
International IP Connections
Internet2 circuit number
Internet2 and ESnet Optical Node
ESnet
Internet2
IP
core
M320
ESnet
metro-area
networks
T640
SDN
core
switch
support devices:
•measurement
•out-of-band access
•monitoring
•security
dynamically
allocated and
routed waves
(future)
grooming
Ciena device
CoreDirector
optical
interface
to
R&E
Regional
Nets
Network Testbeds
various equipment and
experimental control plane
management systems
support devices:
•measurement
•out-of-band access
•monitoring
•…….
Future access to control plane
fiber east
fiber west
Infinera DTN
fiber north/south
Internet2/Level3
National Optical Infrastructure
47
Typical ESnet4 Hub
To AoA,
NYC
To
Cleveland
GE
AOA
CL
t
o
AO V
AE
to
V
LE
2xT1
2xT1
10
MAX GE
L a mb
da
7609
10
1GE SX
WDC-CR1
ORAUDC
WDC-AR1
7206VXR
M320
1GE LX? GE to Eqx-ASH
OC3c
DS3
to
AT
L
1GE
SX
DS3
OC3 SM to DOE
T3 to DOE-RT1
T3 to NGA
WDC-PR1
GE
M7i
10
SD
N
JLAB
Foundry
10 GE
10 GE
E
G
MATP
7609
ITE
L
E
DClabs
LLNL-DC
T1
WDC-SDN1
MAX NGIX-E
Coillege Park
To
Atlanta
10
GE
10
C
to
GE
10
10 GE
10 G
E to
SD
N
N
SD
To GEANT
NGIX-E
6509
MAX
WDC L(3)
3 racks
10
E
G
to
L
AT
MAE-E DS3
48
The Evolution of ESnet Architecture
ESnet IP
core
ESnet Science Data
Network (SDN) core
ESnet IP
core
ESnet to 2005:
ESnet from 2006-07:
• A routed IP network with sites
singly attached to a national core
ring
• A routed IP network with sites
dually connected on metro area
rings or dually connected directly to
core ring
• A switched network providing
virtual circuit services for dataintensive science
• Rich topology offsets the lack of
dual, independent national cores
ESnet sites
ESnet hubs / core network connection points
Metro area rings (MANs)
Other IP networks
Circuit connections to other science networks (e.g. USLHCNet)
49
ESnet4 Planed Configuration
Core networks: 40-50 Gbps in 2009-2010, 160-400 Gbps in 2011-2012
Canada
Asia-Pacific
Asia Pacific
(CANARIE)
Canada
Europe
(CANARIE)
(GEANT)
CERN (30 Gbps)
CERN (30 Gbps)
GLORIAD
Europe
(Russia and
China)
(GEANT)
Boston
Australia
1625 miles / 2545 km
Science Data
Network Core
Boise
IP Core
New York
Denver
Washington
DC
Australia
Tulsa
LA
Albuquerque
San Diego
South America
IP core hubs
(AMPATH)
SDN (switch) hubs
Primary DOE Labs
Core network fiber path is
High speed cross-connects
~ 14,000 miles / 24,000 km
with Ineternet2/Abilene
Possible hubs
2700 miles / 4300 km
South America
(AMPATH)
Jacksonville
Production IP core (10Gbps) ◄
SDN core (20-30-40Gbps) ◄
MANs (20-60 Gbps) or
backbone loops for site access
International connections
50
Next Generation ESnet: II) Virtual Circuits
• Traffic isolation and traffic engineering
– Provides for high-performance, non-standard transport mechanisms that
cannot co-exist with commodity TCP-based transport
– Enables the engineering of explicit paths to meet specific requirements
• e.g. bypass congested links, using lower bandwidth, lower latency paths
• Guaranteed bandwidth (Quality of Service (QoS))
– User specified bandwidth
– Addresses deadline scheduling
• Where fixed amounts of data have to reach sites on a fixed schedule,
so that the processing does not fall far enough behind that it could never
catch up – very important for experiment data analysis
• Reduces cost of handling high bandwidth data flows
– Highly capable routers are not necessary when every packet goes to the
same place
– Use lower cost (factor of 5x) switches to relatively route the packets
• Secure
– The circuits are “secure” to the edges of the network (the site boundary)
because they are managed by the control plane of the network which is
isolated from the general traffic
• Provides end-to-end connections between Labs and collaborator
institutions
51
Virtual Circuit Service Functional Requirements
• Support user/application VC reservation requests
– Source and destination of the VC
– Bandwidth, start time, and duration of the VC
– Traffic characteristics (e.g. flow specs) to identify traffic designated for the VC
• Manage allocations of scarce, shared resources
– Authentication to prevent unauthorized access to this service
– Authorization to enforce policy on reservation/provisioning
– Gathering of usage data for accounting
• Provide circuit setup and teardown mechanisms and security
– Widely adopted and standard protocols (such as MPLS and GMPLS) are well
understood within a single domain
– Cross domain interoperability is the subject of ongoing, collaborative
development
– secure and-to-end connection setup is provided by the network control plane
• Enable the claiming of reservations
– Traffic destined for the VC must be differentiated from “regular” traffic
• Enforce usage limits
– Per VC admission control polices usage, which in turn facilitates guaranteed
bandwidth
– Consistent per-hop QoS throughout the network for transport predictability
52
ESnet Virtual Circuit Service: OSCARS
(On-demand Secured Circuits and Advanced Reservation System)
Software Architecture (see Ref. 9)
•
•
•
•
Web-Based User Interface (WBUI) will prompt the user for a
username/password and forward it to the AAAS.
Authentication, Authorization, and Auditing Subsystem (AAAS) will
handle access, enforce policy, and generate usage records.
Bandwidth Scheduler Subsystem (BSS) will track reservations and map
the state of the network (present and future).
Path Setup Subsystem (PSS) will setup and teardown the on-demand
paths (LSPs).
User
request
via WBUI
User
Human
User
Web-Based
User Interface
Reservation Manager
Path Setup
Subsystem
User
feedback
User
Application
User app request via
AAAS
Authentication,
Authorization,
And Auditing
Subsystem
Bandwidth
Scheduler
Subsystem
Instructions to
routers and
switches to
setup/teardown
LSPs
The Mechanisms Underlying OSCARS
Based on Source and Sink IP addresses, route of LSP between ESnet border routers is determined
using topology information from OSPF-TE. Path of LSP can be explicitly directed to take SDN network.
On the SDN Ethernet switches all traffic is MPLS switched (layer 2.5), which stitches together VLANs
On ingress to ESnet,
packets matching
reservation profile are
filtered out (i.e. policy
based routing),
policed to reserved
bandwidth, and
injected into a LSP.
Source
VLAN 1
VLAN 2
VLAN 3
SDN
SDN
SDN
RSVP, MPLS
enabled on
internal interfaces
Label Switched Path
IP Link
IP
IP
Sink
IP
high-priority
queue
standard,
best-effort
queue
MPLS labels are attached onto packets from Source and
placed in separate queue to ensure guaranteed bandwidth.
Interface queues
Regular production traffic queue.
54
Environment of Science is Inherently Multi-Domain
•
End points will be at independent institutions – campuses or
research institutes - that are served by ESnet, Abilene,
GÉANT, and their regional networks
– Complex inter-domain issues – typical circuit will involve five or more
domains - of necessity this involves collaboration with other networks
– For example, a connection between FNAL and DESY involves five
domains, traverses four countries, and crosses seven time zones
FNAL (AS3152)
[US]
GEANT (AS20965)
[Europe]
ESnet (AS293)
[US]
DESY (AS1754)
[Germany]
DFN (AS680)
[Germany]
55
OSCARS: Guaranteed Bandwidth VC Service For SC Science
•
To ensure compatibility, the design and implementation is done in collaboration
with the other major science R&E networks and end sites
– Internet2: Bandwidth Reservation for User Work (BRUW)
• Development of common code base
– GEANT: Bandwidth on Demand (GN2-JRA3), Performance and Allocated Capacity for
End-users (SA3-PACE) and Advance Multi-domain Provisioning System (AMPS)
extends to NRENs
– BNL: TeraPaths - A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research
– GA: Network Quality of Service for Magnetic Fusion Research
– SLAC: Internet End-to-end Performance Monitoring (IEPM)
– USN: Experimental Ultra-Scale Network Testbed for Large-Scale Science
•
In its current phase this effort is being funded as a research project by the Office
of Science, Mathematical, Information, and Computational Sciences (MICS)
Network R&D Program
•
A prototype service has been deployed as a proof of concept
– To date more then 20 accounts have been created for beta users, collaborators, and
developers
– More then 100 reservation requests have been processed
56
ESnet Virtual Circuit Service Roadmap
• Dedicated virtual circuits
• Dynamic virtual circuit allocation
• Generalized MPLS (GMPLS)
Initial production service
2005
2006
2007
2008
Full production service
• Interoperability between GMPLS circuits,
VLANs, and MPLS circuits (layer 1-3)
• Interoperability between VLANs and MPLS circuits
(layer 2 & 3)
• Dynamic provisioning of Multi-Protocol Label Switching
(MPLS) circuits in IP nets (layer 3) and in VLANs for
Ethernets (layer 2)
57
Federated Trust Services – Support for Large-Scale Collaboration
•
Remote, multi-institutional, identity authentication is critical
for distributed, collaborative science in order to permit
sharing widely distributed computing and data resources, and
other Grid services
•
Public Key Infrastructure (PKI) is used to formalize the
existing web of trust within science collaborations and to
extend that trust into cyber space
– The function, form, and policy of the ESnet trust services are driven
entirely by the requirements of the science community and by direct
input from the science community
• International scope trust agreements that encompass many
organizations are crucial for large-scale collaborations
– ESnet has lead in negotiating and managing the cross-site, crossorganization, and international trust relationships to provide policies
that are tailored for collaborative science
 This service, together with the associated ESnet PKI service, is the
basis of the routine sharing of HEP Grid-based computing resources
between US and Europe
58
No.of certificates or requests
DOEGrids CA (one of several CAs) Usage Statistics
16000
15500
15000
14500
14000
13500
13000
12500
12000
11500
11000
10500
10000
9500
9000
8500
8000
7500
7000
6500
6000
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
User Certificates
Service Certificates
Expired(+revoked) Certificates
Total Certificates Issued
Total Cert Requests
Jan-03 Apr-03 Jul-03 Oct-03 Jan-04 Apr-04 Jul-04 Oct-04 Jan-05 Apr-05 Jul-05
Oct-05 Jan-06 Apr-06 Jul-06
Oct-06
Production service began in June 2003
User Certificates
4059 Total No. of Active Certificates
5113
Host & Service Certificates
8236 Total No. of Expired Certificates
7182
Total No. of Requests
ESnet SSL Server CA Certificates
FusionGRID CA certificates
15285 Total No. of Certificates Issued
12295
113
86
* Report as of Dec 1, 2006
59
DOEGrids CA Usage - Virtual Organization Breakdown
DOEGrids CA Statistics (5113)
Argonne Nat. Lab.
*Others
Earth System Grid
19.6%
ESnet
Fusion Grid
International Very Large **OSG
10.2%
Data Grid
Lawrence Berkeley Lab.
Nat. Energy Research
Supercomputer Center LCG
1.3%
Oak Ridge Nat. Lab.
Pacific Northwest Nat. Lab.
Particle Physics Data Grid
Fermi Nat. Accelerator Lab.
FNAL
LHC Computing GRid
29.2%
Open Science Grid
* DOE-NSF collab. & Auto renewals
ANL
2.3%
ESG
0.7%
ESnet
0.3%
FusionGRID
0.7%
iVDGL
18.0%
LBNL
0.9%
NERSC
1.4%
ORNL
0.5%
PNNL
0.0%
PPDG
15.0%
** OSG Includes (BNL, CDF, CMS, DES, DOSAR, DZero, Fermilab, fMRI, GADU, geant4, GLOW, GRASE, GridEx,
GROW, i2u2, iVDGL, JLAB, LIGO, mariachi, MIS, nanoHUB, NWICG, OSG, OSGEDU, SDSS, SLAC, STAR & USATLAS)
60
DOEGrids CA (Active Certificates) Usage Statistics
6000
5750
5500
5250
5000
No.of certificates or requests
4750
4500
4250
4000
3750
3500
3250
Active User Certificates
3000
Active Service Certificates
2750
2500
Total Active Certificates
2250
2000
1750
1500
1250
1000
750
500
250
0
Jan-03 Apr-03 Jul-03 Oct-03 Jan-04 Apr-04 Jul-04 Oct-04 Jan-05 Apr-05 Jul-05 Oct-05 Jan-06 Apr-06 Jul-06 Oct-06
Production service began in June 2003
* Report as of Dec 1, 2006
61
Summary
• ESnet is currently satisfying its mission by enabling SC
science that is dependant on networking and distributed,
large-scale collaboration:
“The performance of ESnet over the past year has been
excellent, with only minimal unscheduled down time. The
reliability of the core infrastructure is excellent.
Availability for users is also excellent” - DOE 2005 annual
review of LBL
• ESnet has put considerable effort into gathering
requirements from the DOE science community, and has
a forward-looking plan and expertise to meet the five-year
SC requirements
– A Lehman review of ESnet (Feb, 2006) has strongly endorsed the
plan presented here
62
References
1.
High Performance Network Planning Workshop, August 2002
–
2.
3.
http://www.doecollaboratory.org/meetings/hpnpw
Science Case Studies Update, 2006 (contact [email protected])
DOE Science Networking Roadmap Meeting, June 2003
–
4.
http://www.es.net/hypertext/welcome/pr/Roadmap/index.html
DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for
Large-Scale Science Applications, April 2003
–
5.
http://www.csm.ornl.gov/ghpn/wk2003
Science Case for Large Scale Simulation, June 2003
–
6.
http://www.pnl.gov/scales/
Workshop on the Road Map for the Revitalization of High End Computing, June 2003
–
–
7.
http://www.cra.org/Activities/workshops/nitrd
http://www.sc.doe.gov/ascr/20040510_hecrtf.pdf (public report)
ASCR Strategic Planning Workshop, July 2003
–
8.
http://www.fp-mcs.anl.gov/ascr-july03spw
Planning Workshops-Office of Science Data-Management Strategy, March & May 2004
–
9.
http://www-conf.slac.stanford.edu/dmw2004
For more information contact Chin Guok ([email protected]). Also see
-
http://www.es.net/oscars
63
Additional Information
64
LHC Tier 0, 1, and 2 Connectivity Requirements Summary
CERN-1 CERN-2
CERN-3
TRIUMF
(Atlas T1,
Canada)
Vancouver
CANARIE
USLHCNet
Seattle
ESnet
SDN
Abilene / Gigapop
Footprint
Toronto
BNL
(Atlas T1)
Virtual Circuits
Boise
Chicago
Denver
KC
ESnet
IP Core
FNAL
(CMS T1)
Wash DC
Albuq.
San Diego
Dallas
GÉANT
Atlanta
GÉANT-2
LA
GÉANT-1
Sunnyvale
New York
Jacksonville
USLHC nodes
• Direct connectivity T0-T1-T2
Abilene/GigaPoP nodes
ESnet IP core hubs
• USLHCNet to ESnet to Abilene
ESnet SDN/NLR hubs
Tier 1 Centers
Cross connects with Internet2/Abilene
Tier 2 Sites
• Backup connectivity
• SDN, GLIF, VCs
65
Example Case Study Summary Matrix: Fusion
• Considers instrument and facility requirements, the process of science drivers
and resulting network requirements cross cut with timelines
Feature
Time
Frame
Anticipated Requirements
Science Instruments
and Facilities
Process of Science
Network
Network Services and
Middleware
Near-term
 Each experiment only gets a few
days per year - high productivity
is critical
 Experiment episodes (“shots”)
generate 2-3 Gbytes every
20 minutes, which has to be
delivered to the remote analysis
sites in two minutes in order to
analyze before next shot
 Highly collaborative experiment
and analysis environment
 Real-time data access and
analysis for experiment steering
(the more that you can analyze
between shots the more effective
you can make the next shot)
 Shared visualization capabilities
5 years
 10 Gbytes generated by
experiment every 20 minutes
(time between shots) to be
delivered in two minutes
 Gbyte subsets of much larger
simulation datasets to be delivered
in two minutes for comparison
with experiment
 Simulation data scattered across
United States
 Transparent security
 Global directory and naming
services needed to anchor all of
the distributed metadata
 Support for “smooth”
collaboration in a high-stress
environment
 Real-time data analysis for
experiment steering combined
with simulation interaction = big
productivity increase
 Real-time visualization and
interaction among collaborators
across United States
 Integrated simulation of the
several distinct regions of the
reactor will produce a much more
realistic model of the fusion
process
 Network bandwidth and data
analysis computing capacity
guarantees (quality of service)
for inter-shot data analysis
 Gbits/sec for 20 seconds out
of 20 minutes, guaranteed
 5 to 10 remote sites involved
for data analysis and
visualization
 Parallel network I/O between simulations,
data archives, experiments, and visualization
 High quality, 7x24 PKI identity
authentication infrastructure
 End-to-end quality of service and quality of
service management
 Secure/authenticated transport to ease access
through firewalls
 Reliable data transfer
 Transient and transparent data replication for
real-time reliability
 Support for human collaboration tools
5+ years
 Simulations generate 100s of
Tbytes
 ITER – Tbyte per shot, PB per
year
 Real-time remote operation of the
experiment
 Comprehensive integrated
simulation
 Quality of service for network
latency and reliability, and for
co-scheduling computing
resources
 Management functions for network quality
of service that provides the request and
access mechanisms for the experiment run
time, periodic traffic noted above.
 PKI certificate authorities that enable strong
authentication of the community members
and the use of Grid security tools and
services.
 Directory services that can be used to
provide the naming root and high-level
(community-wide) indexing of shared,
persistent data that transforms into
community information and knowledge
 Efficient means to sift through large data
repositories to extract meaningful
information from unstructured data.
66
The Increasing Dominance of Science Traffic
Traffic
AS-ASbyFlows
TrafficVolume
Volume of
of the
the Top
Top 100
100 flows,
Monthby Month
(Mostly Lab to R&E site, a few Lab to R&E network – all science)
1000
900
700
600
500
400
300
200
100
0
Ja
n,
0
Fe 4
b,
04
M
ar
,0
Ap 4
r,
M 04
ay
,0
Ju 4
n,
04
Ju
l, 0
Au 4
g,
0
Se 4
p,
0
Oc 4
t,
0
No 4
v,
0
De 4
c,
0
Ja 4
n,
0
Fe 5
b,
05
M
ar
,0
Ap 5
r,
M 05
ay
,0
Ju 5
n.
05
Ju
l, 0
Au 5
g,
0
Se 5
p,
0
Oc 5
t,
0
No 5
v,
De 0 5
c,
0
Ja 5
n,
0
Fe 6
b,
06
M
ar
,0
Ap 6
r,
M 06
ay
,0
Ju 6
n.
06
Terabytes/month
800
67
Parallel Data Movers now Predominate
Look at the hosts involved in 2006-01-31–— the plateaus in the
host-host top 100 flows are all parallel transfers (thx. to Eli Dart for this observation)
A132023.N1.Vanderbilt.Edu
lstore1.fnal.gov
5.847
bbr-xfer07.slac.stanford.edu
babar2.fzk.de
2.113
A132021.N1.Vanderbilt.Edu
lstore1.fnal.gov
5.884
bbr-xfer05.slac.stanford.edu
babar.fzk.de
2.254
A132018.N1.Vanderbilt.Edu
lstore1.fnal.gov
6.048
bbr-xfer04.slac.stanford.edu
babar.fzk.de
2.294
A132022.N1.Vanderbilt.Edu
A132021.N1.Vanderbilt.Edu
lstore1.fnal.gov
lstore2.fnal.gov
6.39
6.771
bbr-xfer07.slac.stanford.edu
bbr-xfer04.slac.stanford.edu
babar.fzk.de
babar2.fzk.de
2.337
2.339
A132023.N1.Vanderbilt.Edu
lstore2.fnal.gov
6.825
bbr-xfer05.slac.stanford.edu
babar2.fzk.de
2.357
A132022.N1.Vanderbilt.Edu
lstore2.fnal.gov
6.86
bbr-xfer08.slac.stanford.edu
babar2.fzk.de
2.471
A132018.N1.Vanderbilt.Edu
lstore2.fnal.gov
7.286
A132017.N1.Vanderbilt.Edu
lstore1.fnal.gov
7.62
A132017.N1.Vanderbilt.Edu
lstore2.fnal.gov
9.299
A132023.N1.Vanderbilt.Edu
lstore4.fnal.gov
10.522
bbr-xfer08.slac.stanford.edu
bbr-xfer04.slac.stanford.edu
bbr-xfer05.slac.stanford.edu
bbr-xfer08.slac.stanford.edu
babar.fzk.de
babar3.fzk.de
babar3.fzk.de
babar3.fzk.de
2.627
3.234
3.271
3.276
A132021.N1.Vanderbilt.Edu
lstore4.fnal.gov
10.54
A132018.N1.Vanderbilt.Edu
lstore4.fnal.gov
10.597
A132018.N1.Vanderbilt.Edu
lstore3.fnal.gov
10.746
A132022.N1.Vanderbilt.Edu
lstore4.fnal.gov
11.097
bbr-xfer07.slac.stanford.edu
bbr-xfer05.slac.stanford.edu
bbr-xfer07.slac.stanford.edu
bbr-xfer04.slac.stanford.edu
bbr-xfer08.slac.stanford.edu
babar3.fzk.de
bbr-datamove10.cr.cnaf.infn.it
bbr-datamove10.cr.cnaf.infn.it
bbr-datamove10.cr.cnaf.infn.it
bbr-datamove10.cr.cnaf.infn.it
3.298
2.366
2.519
2.548
2.656
A132022.N1.Vanderbilt.Edu
lstore3.fnal.gov
11.097
A132021.N1.Vanderbilt.Edu
lstore3.fnal.gov
11.213
A132023.N1.Vanderbilt.Edu
lstore3.fnal.gov
11.331
bbr-xfer08.slac.stanford.edu
bbr-xfer05.slac.stanford.edu
bbr-xfer04.slac.stanford.edu
bbr-xfer07.slac.stanford.edu
bbr-datamove09.cr.cnaf.infn.it
bbr-datamove09.cr.cnaf.infn.it
bbr-datamove09.cr.cnaf.infn.it
bbr-datamove09.cr.cnaf.infn.it
3.927
3.94
4.011
4.177
A132017.N1.Vanderbilt.Edu
lstore4.fnal.gov
11.425
bbr-xfer04.slac.stanford.edu
csfmove01.rl.ac.uk
5.952
A132017.N1.Vanderbilt.Edu
babar.fzk.de
babar.fzk.de
lstore3.fnal.gov
bbr-xfer03.slac.stanford.edu
bbr-xfer02.slac.stanford.edu
11.489
2.772
2.901
bbr-xfer04.slac.stanford.edu
move03.gridpp.rl.ac.uk
5.959
babar2.fzk.de
babar.fzk.de
bbr-export01.pd.infn.it
bbr-xfer06.slac.stanford.edu
bbr-xfer04.slac.stanford.edu
bbr-xfer03.slac.stanford.edu
3.018
3.222
11.289
bbr-export02.pd.infn.it
bbr-xfer03.slac.stanford.edu
19.973
bbr-xfer05.slac.stanford.edu
bbr-xfer05.slac.stanford.edu
bbr-xfer07.slac.stanford.edu
bbr-xfer08.slac.stanford.edu
bbr-xfer08.slac.stanford.edu
bbr-xfer07.slac.stanford.edu
csfmove01.rl.ac.uk
move03.gridpp.rl.ac.uk
csfmove01.rl.ac.uk
move03.gridpp.rl.ac.uk
csfmove01.rl.ac.uk
move03.gridpp.rl.ac.uk
5.976
6.12
6.242
6.357
6.48
6.604
68
OSCARS Reservations
1.
A user submits a request to the RM specifying start and end times, bandwidth
requirements, the source and destination hosts
2.
Using the source and destination host information submitted by the user, the
ingress and egress border routers, and circuit path (MPLS LSP) is determined
3.
This information is stored by the BSS in a database, and a script periodically
checks to see if the PSS needs to be contacted, either to create or tear down the
circuit
4.
At the requested start time, the PSS configures the ESnet provider edge (PE)
router (at the start end of the path) to create an LSP with the specified bandwidth
5.
Each router along the route receives the path setup request via the Reservation
Resource Protocol (RSVP) and commits bandwidth (if available) creating an endto-end LSP. The RM is notified by RSVP if the end-to-end path cannot be
established.
6.
Packets from the source (e.g. experiment) are routed through the site’s LAN
production path to ESnet’s PE router. On entering the PE router, these packets
are identified and filtered using flow specification parameters (e.g.
source/destination IP address/port numbers) and policed at the specified
bandwidth. The packets are then injected into the LSP and switched (using
MPLS) through the network to its destination (e.g. computing cluster).
7.
A notification of the success or failure of LSP setup is passed back to the RM so
that the user can be notified and the event logged for auditing purposes
8.
At the requested end time, the PSS tears down the LSP
69
Inter-domain Reservations: Tough Problem
• Motivation:
– For a virtual circuit service to be successful, it must
• Be end-to-end, potentially crossing several administrative domains
• Have consistent network service guarantees throughout the circuit
• Observation:
– Setting up an intra-domain circuit is easy compared with coordinating an interdomain circuit
• Issues:
– Cross domain authentication and authorization
• A mechanism to authenticate and authorize a bandwidth on-demand (BoD) circuit
request must be agreed upon in order to automate the process
– Multi-domain Acceptable Use Policies (AUPs)
• Domains may have very specific AUPs dictating what the BoD circuits can be used for
and where they can transit/terminate
– Domain specific service offerings
• Domains must have way to guarantee a certain level of service for BoD circuits
– Security concerns
• Are there mechanisms for a domain to protect itself (e.g. RSVP filtering)
70
Inter-domain Path Setup
ISP X
Routed path from
Host B to Host A
(via ISP X)
RM X
2
ISP A
Host A
RM A
ISP B
Host B
OSCARS
1
3
Routed path from
Host A to Host B
(via ISP Y)
ISP Y
RM Y
1.
On receiving the request from the user, OSCARS computes the virtual circuit
path and determines the downstream AS (ISP X).
2.
The request is then encapsulated in a message forwarded across the network
(ISP X) towards Host A, crossing all intervening reservations systems (RM X),
until it reaches the last reservation system (RM A) that has administrative control
over the network (ISP A) that Host A is attached to.
3.
The remote reservation system (RM A) then computes the path of the virtual
circuit, and initiates the bandwidth reservation requests from Host A towards
Host B (via ISP Y). This can be especially complex when the path back (from
Host B to A) is asymmetric and traverses AS’s (e.g. ISP Y) that were not
traversed on the forward path, causing the local OSCARS to see the path
originating from a different AS than it originally sent the request to.
71