CHEP2010 - CERN Indico

Download Report

Transcript CHEP2010 - CERN Indico

Global Networking for the LHC
Artur Barczyk
California Institute of Technology
ECOC Conference
Geneva, September 18th, 2011
1
INTRODUCTION
First Year of LHC from the network perspective
2
WLCG Worldwide Resources
WLCG Collaboration Status
Tier 0; 11 Tier 1s; 68 Tier 2 federations
Today we have 49 MoU signatories, representing 34
countries:
Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark,
Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep.
Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia,
Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA.
Today >140 sites
>250k CPU cores
>150 PB disk
In addition to WLCG, O(300) Tier3 sites, not shown
3
Data and Computing Models
Circa 1996
The Evolving MONARC
Picture: Circa 2003
The models
are based
on the
MONARC
model
Now 10+
years old
Variations by
experiment
From Ian Bird,
ICHEP 2010
4
The LHC Optical Private Network
Serving Tier0 and Tier1 sites
• Dedicated network resources for Tier0 and Tier1 data movement
• Layer 2 overlay on R&E
infrastructure
• 130 Gbps total Tier0-Tier1
capacity
• Simple architecture
– Point-to-point Layer 2 circuits
– Flexible and scalable topology
• Grew organically
– From star to partial mesh
• Open to technology choices
• have to satisfy requirements
• OC-192/SDH-64, EoMPLS,
OTN-3
• Federated governance model
– Coordination between
stakeholders
5
2010 Worldwide data distribution and analysis (F.Gianotti)
Total throughput of ATLAS data through the Grid: 1st January  November.
MB/s
per day
6 GB/s
~2 GB/s
(design)
Peaks of 10 GB/s reached
Grid-based analysis in Summer 2010: >1000 different users; >15M analysis jobs
The excellent Grid performance has been crucial for fast release of physics results. E.g.:
ICHEP: the full data sample taken until Monday was shown at the conference Friday 6
CMS Data Movements (2010)
(All Sites and Tier1-Tier2)
Throughput [GBy/s]
Throughput [GBy/s]
120 Days June-October 2010
2.5
2
1.5
2
Daily average total
rates reach over
2 GBytes/s
1.5
1
1
0.5
0.5
0
4
Daily average
T1-T2 rates reach
1-1.8 GBytes/s
6/19 7/03 7/17 7/31 8/14 8/28 9/11 9/25 10/9 0 6/23 7/07 7/21 8/4 8/18 9/1 9/15 9/29 10/13
132 Hours
in Oct. 2010
1 hour average:
to 3.5 GBytes/s
3
To ~50%
during Dataset
Reprocessing &
Repopulation
2
1
0
Tier2-Tier2 ~25%
of Tier1-Tier2
Traffic
10/6
10/7
10/8
10/9
10/10
7
THE RESEARCH AND EDUCATION
NETWORKING LANDSCAPE
Selected representative examples
8
GEANT Pan-European Backbone
34 NRENs, ~40M Users; 50k km Leased Lines
12k km Dark Fiber; Point to Point Services
GN3 Next Gen. Network Started in June 2009
Dark Fiber Core
Among 19 Countries:
 Austria
 Belgium
 Croatia
 Czech Republic
 Denmark
 Finland
 France
 Germany
 Hungary
 Ireland
 Italy
 Netherlands
 Norway
 Slovakia
 Slovenia
 Spain
 Sweden
 Switzerland
 United Kingdom
9
SURFNet & NetherLight: 8000 Km Dark Fiber
Flexible Photonic Infrastructure
5 Photonic
Subnets
λ Switching
10G, 40G;
100G Trials
Cross Border Fiber: to Belgium, on to CERN
(1650km); to Germany: X-Win, On to NORDUnet;
Fixed or
Dynamic
Lightpaths
for
LCG, GN3,
EXPRES
DEISA
LOFAR
CineGrid
Erik-Jan Bos
GARR-X in Italy: Dark Fiber Network
Supporting LHC Tier1 and Nat’l Tier2 Centers
GARR-X
10G Links Among
Bologna Tier1
& 5 Tier2s
Adding 5 More
Sites at 10G
2 x 10G Circuits
to the LHCOPN
Over GEANT
and to Karlsruhe
Via Int’l Tier2 –
Tier1 Circuits
Cross Border Fibers to Karlsruhe (Via CH, DE)
M. Marletta
US: DOE ESnet
Current ESnet4 Topology: Multi-10G backbone
SDN node
IP router node
10G link
Major site
v
12
DOE Esnet – 100Gbps Backbone
Upgrade
ESnet5 100G Backbone, Q4 2012
First deployment started Q3 2011
100G node
Router node
100G link
Major site
v
13
US LHCNet
Non-stop Operation; Circuit-oriented Services
Core: Optical
multiservice
Switches
Performance
enhancing
Standard
Extensions:
VCAT, LCAS
USLHCNet,
ESnet, BNL
& FNAL:
Facility,
equipment
and link
redundancy
Dynamic circuit-oriented network services with BW guarantees,
with robust fallback at layer 1: Hybrid optical network
14
Dark Fiber in NREN Backbones 2005 – 2010
Greater or Complete Reliance on Dark Fiber
TERENA Compendium 2010: www.terena.org/activities/compendium/
2005
2010
Cross Border Dark Fiber in Europe
Current and Planned: Increasing Use
TERENA
Compendium
2010
Global Lambda Integrated Facility
GLIF 2010 Map – Global View
http://glif.is
A Global Partnership of R&E Networks and
Advanced Network R&D Projects Supporting HEP
GLIF 2010 Map North America
~16 10G TransAtlantic Links
in 2010
GLIF 2010 Map: European View
R&E Networks, Links and GOLEs
GLIF Open Lightpath Exchanges:
MoscowLight, CzechLight, CERNLight, NorthernLight
NetherLight, UKLight
Open Exchange Points: NetherLight Example
3 x 40G, 30+ 10G Lambdas, Use of Dark Fiber
Convergence of Many Partners on Common Lightpath Concepts
Internet2, ESnet, GEANT, USLHCNet; nl, cz, ru, be, pl, es, tw, kr, hk, in, nordic
LHC NETWORKING - BEYOND
LHCOPN
21
Computing Models Evolution
• Moving away from the strict MONARC model
• Introduced gradually since 2010
• 3 recurring themes:
– Flat(ter) hierarchy: Any site can use any
other site as source of data
– Dynamic data caching: Analysis sites
will pull datasets from other sites
“on demand”, including from Tier2s in other regions
• Possibly in combination with strategic pre-placement of data sets
– Remote data access: jobs executing locally,
using data cached at a remote site in
quasi-real time
• Possibly in combination with
local caching
• Variations by experiment
22
LHC Open Network Environment
• So far, T1-T2, T2-T2, and T3 data movements have been using General
Purpose Network infrastructure
– Shared resources (with other science fields)
– Mostly best effort service
• Increased reliance on network performance  need more than best effort
•
Separate large LHC data flows from routed R&E GPN
• Collaboration on global scale, diverse environment, many parties
– Solution to be Open, Neutral and Diverse
– Agility and Expandability
• Scalable in bandwidth, extent and scope
• Organic activity, growing over time according to needs
• Architecture:
– Switched Core, Routed Edge
– Core: Interconnecting trunks between Open Exchanges
– Edge: Site Border Routers, or BRs of regional aggregation networks
• Services: Multipoint, static point-to-point, dynamic point-to-point
23
LHCONE High-Level Architecture
Overview
LHCONE
Conceptual
Diagram
24
LOOKING FORWARD: NEW
NETWORK SERVICES
25
Characterization of User Space
Cees de Laat; http://ext.delaat.net/talks/cdl-2005-02-13.pdf
This is
where LHC
users are
26
David Foster, 1st TERENA ASPIRE Workshop, May 2011
27
The Case for Dynamic Circuits in
LHC Data Processing
• Data models do not require full-mesh @ full-rate connectivity @ all times
• On-demand data movement will augment and partially replace static preplacement  Network utilisation will be more dynamic and less predictable
• Performance expectations will not decrease
– More dependence on the network, for the whole data processing system to work
well!
• Need to move large data sets fast between computing sites
– On-demand: caching
– Scheduled: pre-placement
– Transfer latency is important
• Network traffic far in excess of what was anticipated
• As data volumes grow rapidly, and experiments rely increasingly on the
network performance - what will be needed in the future is
– More bandwidth
– More efficient use of network resources
– Systems approach including end-site resources and software stacks
• Note: Solutions for the LHC community need global reach
28
Dynamic Bandwidth Allocation
• Will be one of the services to be provided in LHCONE
• Allows to allocate network capacity on as-needed basis
– Instantaneous (“Bandwidth on Demand”), or
– Scheduled allocation
• Significant effort in R&E Networking community
– Standardisation through OGF (OGF-NSI, OGF-NML)
• Dynamic Circuit Service is present in several advanced R&E networks
–
–
–
–
SURFnet (DRAC)
ESnet (OSCARS)
Internet2 (ION)
US LHCNet (OSCARS)
• Planned (or in experimental deployment)
– E.g. GEANT (AutoBahn), RNP (OSCARS/DCN), …
• DYNES: NSF funded project to extend hybrid & dynamic network
capabilities to campus & regional networks
– In first deployment phase; fully operational in 2012
29
US Example: DYNES Project
• NSF-funded project: DYnamic NEtwork System
• What is it?
– A nationwide cyber-instrument spanning up to ~40 US universities and ~14
Internet2 connectors
– Extends Internet2s ION service into regional networks and campuses, based on
ESnet’s OSCARS implementation of IDC protocol
• Who is it?
– A collaborative team including Internet2, Caltech, University of Michigan, and
Vanderbilt University
– Community of regional networks and campuses
– LHC, astrophysics community, OSG, WLCG, other virtual organizations
• The goals
– Support large, long-distance scientific data flows in the LHC, other leading
programs in data intensive science (such as LIGO, Virtual Observatory, and other
large scale sky surveys), and the broader scientific community
– Build a distributed virtual instrument at sites of interest to the LHC but available
to R&E community generally
http://www.internet2.edu/dynes
30
DYNES System Description
• AIM: extend hybrid & dynamic capabilities to campus & regional networks
– A DYNES instrument must provide two basic capabilities at the Tier 2s, Tier3s and
regional networks:
1. Network resource allocation such as
bandwidth to ensure transfer performance
2. Monitoring of the network and data
transfer performance
• All networks in the path require the ability
to allocate network resources and monitor
the transfer. This capability currently exists
on backbone networks such as Internet2 and
ESnet, but is not widespread at the campus
and regional level
– In addition Tier 2 & 3 sites require:
3. Hardware at the end sites capable of
making optimal use of the available
network resources
Two typical transfers that DYNES
supports: one Tier2 - Tier3 and
another Tier1-Tier2.
The clouds represent the network
domains involved in such a transfer.
31
Summary
• LHC Computing models rely on efficient high-throughput data movement
between computing sites (Tier0/1/2/3)
• Close collaboration between the LHC and R&E networking communities
– Regional, National, International
• LHCOPN (LHC Optical Private Network):
– Layer 2 overlay network, dedicated resources for Tier0 and Tier1 centres
– Very successful operation
• LHCONE (LHC Open Network Environment):
– New initiative to provide reliable services to ALL LHC computing sites (Tier 0-3)
– Being developed as collaboration between LHC community and the Research
and Education Networks world-wide
– User driven, organic growth
– Current architecture is built on switched core with routed edge
– Will provide advanced network services with dynamic bandwidth allocation
32
THANK YOU!
[email protected]
33