David_LHCCReview_191107 - Indico

Download Report

Transcript David_LHCCReview_191107 - Indico

LHCC Comprehensive Review
November 2007
LHCOPN Networking Status
David Foster
Head, Network and Communications Systems Group
CERN IT-CS
LHCC Comprehensive
Review, November 2007
David Foster, CERN
Information
• All technical content is on the LHCOPN Twiki:
http://lhcopn.cern.ch
• Coordination Process
– LHCOPN Meetings (every 3 months)
• Active Working Groups
– Routing
– Monitoring
– Operations
– Active Interfaces to External Networking Activities
•
•
•
•
•
European Network Policy Groups
US Research Networking
Grid Deployment Board
LCG Management Board
EGEE
David Foster, CERN
Overview
• LHC Wide Area Networking
– LHCOPN Mission
– Current Status
– Production
– Issues and Risks
• Not Covered
– CERN General Purpose Networking
– Accelerator and Experiment Networks
– Other Communications Systems
David Foster, CERN
Mission
• To assure the T0-T1 transfer capability.
– Essential for the Grid to distribute data out to the
T1’s.
– Capacity must be large enough to deal with most
situation including “Catch up”
– The excess capacity can be used for T1-T1
transfers.
• Lower priority than T0-T1
• May not be sufficient for all T1-T1 requirements
• Resiliency Objective
– No single failure should cause a T1 to be
isolated.
David Foster, CERN
GÉANT2:
Consortium of 34 NRENs
22 PoPs, ~200 Sites
38k km Leased Services, 12k km Dark Fiber
Supporting Light Paths for LHC, eVLBI, et al.
Dark Fiber Core Among
16 Countries:
 Austria
 Belgium
 Bosnia-Herzegovina
 Czech Republic
 Denmark
 France
 Germany
 Hungary
 Ireland
 Italy,
 Netherland
 Slovakia
 Slovenia
 Spain
 Switzerland
 United Kingdom
Multi-Wavelength Core (to 40) + 0.6-10G Loops
H. Doebbeling
David Foster, CERN
OPN Status Summary
November 2007
Link
Status
Nominal E2e Capacity
Changes
Expected
BNL
OPN Production
10G (Colt)
FNAL
OPN Production
10G (Qwest)
TRIUMF
OPN Production
5G
+1G Backup link
ASGC
OPN Production
2x1G (2.5G+10G to
AMS)
10G via GN2 + IP
peering with GN2
Q4 07
NDGF
OPN Production
10G
Connected to
Nordunet.
Connect to NDGF
OPN
Q1 08
SARA
OPN Production
10G
Still using GEANT/IP
Q4 07
RAL
OPN Production
10G
FZK
OPN Production
10G
CNAF
OPN Production
10G
IN2P3
OPN Production
10G
PIC
OPN Production
10G
David Foster, CERN
USLHCNet
November 2007
Link
Status
Nominal E2e Capacity
Changes
Expected
CERN - MANLan
OPN Production
10G (Colt)
CERN - Starlight
OPN Production
10G (Qwest)
CERN - NetherLight
Backup
10G (GN2)
NetherLight - MANLan
Backup
10G (Global Crossing)
MANLan - StarLight
Backup
10G (Global Crossing)
MANLan - StarLight
Backup
10G (Qwest)
MANLan - London
Backup
10G (GC)
New
Q1 08
London - CERN
Backup
10G
New
Q1 08
David Foster, CERN
USLHCNet
• A number of links providing alternate routing
for primary traffic.
• Relationship with ESNet (and DOE approval)
to provide capacity (O(5G)) on the ManLan –
AMS link for additional ESNet-GEANT
peering
– This helps for US Tier-1 to EU Tier-2 connectivity.
– US Tier-2 to EU Tier-1 will require additional
peering I2-GEANT. Discussions are ongoing.
LHCC Comprehensive
Review, November 2007
David Foster, CERN
CBF Status Summary
November 2007
Link
Status
SARA - NDGF
Nominal E2e
Capacity
Provider Changes
Expected
10G
Q4 ‘07
Q4 ‘07 In
Production
SARA - FZK
In Place. Unused
10G
FZK - CNAF
In Production
10G
FZK – IN2P3
In Production
10G
Triumf - BNL
In Place.
1G
Q4 ‘07 In
Production
David Foster, CERN
David Foster, CERN
David Foster, CERN
David Foster, CERN
T0-T1 Lambda routing
(schematic)
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
T0-T1s:
???
BNL
RAL
T1
T1
SURFnet
T1
MAN LAN
London
NY
SARA
Amsterdam NL
UK
AC-2/Yellow
DE
VSNL N
CH
Hamburg
VSNL S
Paris
Frankfurt
T1 GRIDKA
Starlight
CERN-RAL
CERN-PIC
CERN-IN2P3
CERN-CNAF
CERN-GRIDKA
CERN-NDGF
CERN-SARA
CERN-TRIUMF
CERN-ASGC
USLHCNET NY (AC-2)
USLHCNET NY (VSNL N)
USLHCNET Chicago
(VSNL S)
Strasbourg/Kehl
FR
Stuttgart
T1
FNAL
Atlantic
Ocean
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
Milan IT
IN2P3
T1
PIC
From Michael Enrico, DANTE
T1
CNAF
David Foster, CERN
T1-T1 Lambda routing
(schematic)
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
???
BNL
T1-T1s:
RAL
T1
T1
SURFnet
T1
MAN LAN
London
NY
SARA
NL
UK
AC-2/Yellow
DE
VSNL N
CH
Hamburg
VSNL S
Paris
GRIDKA-CNAF
GRIDKA-IN2P3
GRIDKA-SARA
SARA-NDGF
Frankfurt
T1 GRIDKA
Starlight
Strasbourg/Kehl
FR
Stuttgart
T1
FNAL
Atlantic
Ocean
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
Milan IT
IN2P3
T1
PIC
From Michael Enrico, DANTE
T1
CNAF
David Foster, CERN
Some Initial Observations
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
???
BNL
KEY
RAL
T1
(Between CERN and
T1 BASEL)
MAN LAN
NY
London
Following lambdas run in same fibre
pair:
SARA
T1
SURFnet
Hamburg
GEANT2
NREN
USLHCNET
NL
CERN-GRIDKA
UK
AC-2/Yellow
CERN-NDGF
(Between BASEL and Zurich)
CERN-SARA
VSNL N
CERN-SURFnet-TRIUMF/ASGC
(x2) run in same trench:
Following lambdas
VSNL S NYCERN-CNAF
USLHCNET
(AC-2)
Paris
CH
Starlight Following lambdas
GRIDKA-CNAF
(T1-T1)
run in
same (sub-)duct/trench:
Strasbourg/Kehl
FR trench as all above:
(all above +) Following lambda MAY run in same
T1
CERN-CNAF USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
USLHCNET
FNAL
AtlanticNY (VSNL N) [supplier is COLT]
DE
Via SURFnet
T1-T1 (CBF)
Frankfurt
T1 GRIDKA
Stuttgart
Following
Ocean lambda MAY run in same (sub-)duct/trench as all above:
USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
Milan IT
IN2P3
T1
PIC
From Michael Enrico, DANTE
T1
CNAF
David Foster, CERN
Result
•
•
•
•
SARA-CERN lambda has been rerouted
4th diverse USLHCNET lambda will be added
RAL & PIC still need backups
CNAF needs a 3rd route into CERN
– Long route around “eastern ring” OR
– New CBF solution(s)…
• Further investigations required in particular
concerning:
– Physical routing of GRIDKA-IN2P3 in Paris area
– Leased lambdas passing through UK
• Further analysis is on-going
– May be some layer-1 switching solutions (LCAS) that
could help on the GEANT footprint.
• Can do “LCAS protected 10GE” for ASGC
– Tests are on-going on the USLHCNet footprint
LHCC Comprehensive
Review, November 2007
David Foster, CERN
Link Layer Monitoring
• Perfsonar very well advanced in deployment
(but not yet complete). Monitors the
“up/down” status of the links.
• Integrated into the “End to End Coordination
Unit” (E2ECU) run by DANTE
• Provides simple indications of “hard” faults.
• Insufficient to understand the quality of the
connectivity
LHCC Comprehensive
Review, November 2007
David Foster, CERN
LHCC Comprehensive
Review, November 2007
David Foster, CERN
LHCC Comprehensive
Review, November 2007
David Foster, CERN
Initial Active Measurements
• One Way Latency
– To measure network Reliability & detect Congestion
– Between
• Tier0 to Tier1
• Tier1 to Tier1
• Bandwidth
– To detect & quantify service degradation
– Between
• Tier0 and Tier1
• Tier1 to Tier1
• ICMP based Latency
– To measure Reliability & Congestion
– Between
• LHCOPN Edge into Tier1 facility
David Foster, CERN
Active Monitoring Deployment
• It is a small number of servers at each Tier-1
• Dante proposes to deploy this as a “service”.
Managed and maintained by them.
– Mainly funded by the GEANT project as part of
the “transition to service” activity.
– Major advantages in terms of measurement
quality and consistency.
• Will be presented at the next OB
– Documents in preparation to cover requirements
from the T1’s and a “security plan”.
LHCC Comprehensive
Review, November 2007
David Foster, CERN
Operational Procedures
• Have to be finalised but need to deal with change and
incident management.
– Many parties involved.
– Have to agree on the real processes involved (activity
being lead by Mathieu Goutelle)
• Recent Operations workshop made some progress
– Try to avoid, wherever possible, too many “coordination
units”.
– All parties agreed we need some centralised information to
have a global view of the network and incidents.
– Further workshop planned to quantify this.
– We also need to understand existing processes used by
T1’s.
LHCC Comprehensive
Review, November 2007
David Foster, CERN
Resiliency Issues
• The physical fiber path considerations continue
– Some lambdas have been re-routed. Others still may
be.
• Layer3 backup paths for RAL and PIC are still an
issue.
– In the case of RAL, excessive costs seem to be a
problem.
– For PIC, still some hope of a CBF between RedIris
and Renater
• Overall the situation is quite good with the CBF
links, but can still be improved.
– Most major “single” failures are protected against.
LHCC Comprehensive
Review, November 2007
David Foster, CERN
Bigger Issues
• Will be important to get some agreements from the
T1’s
– Active Monitoring
– Operational Management – in progress
• GEANT-2 will end (March 2009), GEANT-3 is
being planned. GN-4 and beyond?
– Assumption is that GEANT will continue ad-infinitum
• What will follow from EGEE-III in terms of network
management resources?
– Dante may be able to take over most of the
responsibility
• Funding for USLHCNet assumed to continue.
LHCC Comprehensive
Review, November 2007
David Foster, CERN