20080122-foster
Download
Report
Transcript 20080122-foster
LHCOPN
LHCOPNStatus
Statusand
andPlans
Plans
Joint
Techs
Joint-Techs
Hawaii
Hawaii
David
DavidFoster
Foster
Head,
Head,Communications
Communicationsand
andNetworks
Networks
CERN
CERN 1
January
January2008
2008
1
Acknowledgments
Many presentations and material in the public domain
have contributed to this presentation, too numerous to
mention individually.
2
2
LHC
Mont Blanc, 4810 m
Downtown Geneva
3
3
26659m in Circumference
SC Magnets pre-cooled to -193.2°C (80 K) using 10 080 tonnes of liquid nitrogen
60 tonnes of liquid helium bring them down to -271.3°C (1.9 K).
The internal pressure of the LHC is 10-13 atm, ten times less than the pressure
on the Moon
600 Million Proton Collisions/second
4
CERN – March 2007
4
CERN’s Detectors
• To observe the collisions, collaborators from around the
world are building four huge experiments: ALICE,
ATLAS, CMS, LHCb
• Detector components are constructed all over the world
• Funding comes mostly from the participating institutes,
less than 20% from CERN
CMS
ATLAS
5
ALICE
LHCb
5
The LHC Computing Challenge
• Signal/Noise 10-9
• Data volume
• High rate x large number of
channels x 4 experiments
15 PetaBytes of new data
each year
• Compute power
• Event complexity x Nb. events
x thousands users
100 k of today's fastest
CPUs
• Worldwide analysis & funding
• Computing funding locally in
major regions & countries
• Efficient analysis everywhere
GRID technology
6
6
7
CERN – March 2007
7
8
CERN – March 2007
8
The WLCG Distribution of Resources
Tier-0 – the accelerator centre
• Data acquisition and initial
Processing of raw data
• Distribution of data to the different
Tier’s
Tier-1 (11 centers ) – “online” to
the data acquisition process
•
Canada – Triumf (Vancouver)
France – IN2P3 (Lyon)
Germany – Forschunszentrum Karlsruhe
Italy – CNAF (Bologna)
Netherlands – NIKHEF/SARA (Amsterdam)
Nordic countries – distributed Tier-1
Spain – PIC (Barcelona)
Taiwan – Academia SInica (Taipei)
UK – CLRC (Oxford)
US – FermiLab (Illinois)
– Brookhaven (NY)
Tier-2 –
•
•
•
•
high availability
Managed Mass Storage –
grid-enabled data service
Data-heavy analysis
National, regional support
10
~200 centres in ~40 countries
Simulation
End-user analysis – batch and interactive
14
10
Centers around the world form a
Supercomputer
• The EGEE and OSG projects are the basis of the
Worldwide LHC Computing Grid Project WLCG
11
Inter-operation between Grids is working!
11
The Grid is now in operation, working on: reliability, scaling up, sustainability
12
Tier-1 Centers: TRIUMF (Canada); GridKA(Germany); IN2P3 (France); CNAF (Italy); SARA/NIKHEF (NL); Nordic
Data Grid Facility (NDGF); ASCC (Taipei); RAL (UK); BNL (US); FNAL (US); PIC (Spain)
12
Guaranteed bandwidth can be a good thing
13
13
LHCOPN Mission
• To assure the T0-T1 transfer capability.
• Essential for the Grid to distribute data out to the T1’s.
• Capacity must be large enough to deal with most situation
including “Catch up”
• The excess capacity can be used for T1-T1 transfers.
• Lower priority than T0-T1
• May not be sufficient for all T1-T1 requirements
• Resiliency Objective
• No single failure should cause a T1 to be isolated.
• Infrastructure can be improved
• Naturally started as an unprotected “star” – insufficient for a
14
production network but enabled rapid
progress.
• Has become a reason for and has leveraged cross border fiber.
• Excellent side effect of the overall approach.
14
LHCOPN Design Information
• All technical content is on the LHCOPN Twiki:
http://lhcopn.cern.ch
• Coordination Process
• LHCOPN Meetings (every 3 months)
• Active Working Groups
– Routing
– Monitoring
– Operations
• Active Interfaces to External Networking Activities
•
•
•
•
•
European Network Policy Groups
US Research Networking
Grid Deployment Board
LCG Management Board
EGEE
15
15
16
CERN – March 2007
16
CERN External Network Links
SWITCH
20G
12.5G
Geant2
COLT - ISP
Interoute - ISP
Globalcrossing - ISP
CA-TRIUMF - Tier1
6G
WHO - CIC
CERN WAN
Network
DE-KIT - Tier1
ES-PIC - Tier1
CITIC74 - CIC
CIXP
40G
FR-CCIN2P3 - Tier1
NDGF - Tier1
NL-T1 - Tier1
Equinix -TIX
TIFR - Tier2
CH-CERN – Tier0
LHCOPN
IT-INFN-CNAF - Tier1
20G
UniGeneva - Tier2
20G
5G
TW-ASGC - Tier1
RIPN
Russian Tier2s
USLHCnet
Chicago – NYC - Amst
UK-T1-RAL - Tier1
US-FNAL-CMS - Tier1c
US-T1-BNL - Tier1c
17
10Gbps
1Gbps
100Mbps
17
CERN External Network E513-E – AS513
as1(-5)-csen C2511
r513-c-rca80-1
GPRS - VPN
CIXP E513-X
TIX
GPN
SWITCH AS559
swice3.switch.ch C7606
GEANT AS20965
I-root dns server
IX Europe
g513-e-rci76-1
K-root dns server
g513-e-rci76-2
rt1.par.fr.geant2.net JT640
rt1.gen.ch.geant2.net JT640
RIPE RIS(04) AS12654
e513-x-mfte6-1
swice2.switch.ch C7606
evo-eu
e513-e-rci76-2
e513-e-rci76-1
Internet Level3 AS3356
ext-dns-1
Internet COLT AS8220
WHO 158.232.0.0/16
Internet GC AS3549
who-7204-a
who-7204-b
e513-e-rci72-4
Reuters AS65020
CITIC74 195.202.0.0/20
Internet Level3 AS3356
Tier2
UniGe
JINR AS2875
KIAE AS6801
RadioMSU AS2683
LHCOPN
l513-c-rftec-2
l513-c-rftec-1
e513-e-rci65-3
Akamai AS21357
e513-e-shp3m-4
Amsterdam
USLHCnet AS1297 192.65.196.0/23
e600gva1
e600ams
tt87.ripe.net
New York POP
e600nyc.uslhcnet.org
GN2 - E2E
18
Chicago POP
StarLight Force10
ESnet AS293
FNAL AS3152
x424nyc.uslhcnet.org
Abilene AS11537
e513-e-mhpyl-1
e600gva2
e600chi.uslhcnet.org
as1-gva C2509
as2-gva C2511
ext-dns-2
tt31.ripe.net
evo-us
Abilene AS11537
[email protected] - last update: 20070801
18
Transatlantic Link Negotiations Yesterday
A major
provider lost
their shirt on
this deal!
19
19
LHCOPN Architecture 2004 Starting Point
20
20
GÉANT2:
Consortium of 34 NRENs
22 PoPs, ~200 Sites
38k km Leased Services, 12k km Dark Fiber
Supporting Light Paths for LHC, eVLBI, et al.
21
Dark Fiber Core Among
16 Countries:
Austria
Belgium
Bosnia-Herzegovina
Czech Republic
Denmark
France
Germany
Hungary
Ireland
Italy,
Netherland
Slovakia
Slovenia
Spain
Switzerland
United Kingdom
Multi-Wavelength Core (to 40) + 0.6-10G Loops
H. Doebbeling21
22
22
Basic Link Layer Monitoring
• Perfsonar very well advanced in deployment (but not yet
complete). Monitors the “up/down” status of the links.
• Integrated into the “End to End Coordination Unit”
(E2ECU) run by DANTE
• Provides simple indications of “hard” faults.
• Insufficient to understand the quality of the connectivity
23
23
24
24
25
25
Active Monitoring
• Active monitoring needed
• Implementation consistency needed for accurate results
•
•
•
•
•
One-way delay
TCP achievable bandwidth
ICMP based round trip time
Traceroute information for path changes
Needed for service quality issues
• First mission is T0-T1 and T1-T1
• T1 deployment could be also used for T1-T2
measurements as a second step and with
corresponding T2 infrastructure.
26
26
Background Stats
27
27
Monitoring Evolution
•
•
•
Long standing collaboration of the measurement and monitoring technologies
• Monitoring working group of the LHCOPN
• ESNet and Dante have been leading the effort
Proposal for a Managed Service by Dante
• Manage the tools, archives
• Manage the hardware, O/S
• Manage integrity of information
Sites have some obligations
• On-site operations support
• Provision of a terminal server
• Dedicated IP port on the border router
• PSTN/ISDN line for out of band communication
• Gigabit Ethernet Switch
• GPS Antenna
• Protected power
• Rack Space
28
28
Operational Procedures
• Have to be finalised but need to deal with change and
incident management.
• Many parties involved.
• Have to agree on the real processes involved
• Recent Operations workshop made some progress
• Try to avoid, wherever possible, too many “coordination units”.
• All parties agreed we need some centralised information to have
a global view of the network and incidents.
• Further workshop planned to quantify this.
• We also need to understand existing processes used by T1’s.
29
29
Resiliency Issues
• The physical fiber path considerations continue
• Some lambdas have been re-routed. Others still may be.
• Layer3 backup paths for RAL and PIC are still an issue.
• In the case of RAL, excessive costs seem to be a problem.
• For PIC, still some hope of a CBF between RedIris and Renater
• Overall the situation is quite good with the CBF links, but
can still be improved.
• Most major “single” failures are protected against.
30
30
T0-T1 Lambda routing
(schematic)
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
T0-T1s:
???
BNL
RAL
T1
T1
SURFnet
T1
MAN LAN
London
NY
SARA
Amsterdam NL
UK
AC-2/Yellow
DE
VSNL N
CH
Hamburg
VSNL S
Paris
Frankfurt
T1 GRIDKA
Starlight
CERN-RAL
CERN-PIC
CERN-IN2P3
CERN-CNAF
CERN-GRIDKA
CERN-NDGF
CERN-SARA
CERN-TRIUMF
CERN-ASGC
USLHCNET NY (AC-2)
USLHCNET NY (VSNL N)
USLHCNET Chicago
(VSNL S)
Strasbourg/Kehl
FR
Stuttgart
T1
FNAL
Atlantic
Ocean
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
IN2P3
T1
PIC
31
Milan IT
T1
CNAF
31
T1-T1 Lambda routing
(schematic)
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
???
BNL
T1-T1s:
RAL
T1
T1
SURFnet
T1
MAN LAN
London
NY
SARA
NL
UK
AC-2/Yellow
DE
VSNL N
CH
Hamburg
VSNL S
Paris
GRIDKA-CNAF
GRIDKA-IN2P3
GRIDKA-SARA
SARA-NDGF
Frankfurt
T1 GRIDKA
Starlight
Strasbourg/Kehl
FR
Stuttgart
T1
FNAL
Atlantic
Ocean
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
IN2P3
T1
PIC
32
Milan IT
T1
CNAF
32
Some Initial Observations
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
???
BNL
KEY
RAL
T1
(Between CERN and
T1 BASEL)
MAN LAN
NY
London
Following lambdas run in same fibre
pair:
SARA
T1
SURFnet
Hamburg
GEANT2
NREN
USLHCNET
NL
CERN-GRIDKA
UK
AC-2/Yellow
CERN-NDGF
(Between BASEL and Zurich)
CERN-SARA
VSNL N
CERN-SURFnet-TRIUMF/ASGC
(x2) run in same trench:
Following lambdas
VSNL S NYCERN-CNAF
USLHCNET
(AC-2)
Paris
CH
Starlight Following lambdas
GRIDKA-CNAF
(T1-T1)
run in
same (sub-)duct/trench:
Strasbourg/Kehl
FR trench as all above:
(all above +) Following lambda MAY run in same
T1
CERN-CNAF USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
USLHCNET
FNAL
AtlanticNY (VSNL N) [supplier is COLT]
DE
Via SURFnet
T1-T1 (CBF)
Frankfurt
T1 GRIDKA
Stuttgart
Following
Ocean lambda MAY run in same (sub-)duct/trench as all above:
USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
IN2P3
T1
PIC
33
Milan IT
T1
CNAF
33
Closing Remarks
• The LHCOPN is an important part of the overall
requirements for LHC Networking.
• It is a (relatively) simple concept.
• Statically Allocated 10G Paths in Europe
• Managed Bandwidth on the 10G transatlantic links via
USLHCNet
• Multi-domain operations remain to be completely solved
• This is a new requirement for the parties involved and a learning
process for everyone
• Many tools and ideas exist and the work is now to pull
this all together into a robust operational framework
34
34
35
Simple solutions are often the best!
35
35