David_GDB_310807 - Indico

Download Report

Transcript David_GDB_310807 - Indico

WGDB Meeting
August 2007
LHCOPN Status and Plans
David Foster
CERN
WGDB Meeting
August 2007
David Foster, CERN
LHC Optical Private Network
• Status
– Scope, Implementation, Issues
• Operations
– Roles and Processes
• Monitoring
– Requirements and Opportunities
WGDB Meeting
August 2007
David Foster, CERN
LHCOPN Conceptual Architecture (20042007)
WGDB Meeting
August 2007
David Foster, CERN
LHCOPN Scope
• Provides T0-T1 connectivity
– Simple star concept allowed for incremental investment.
– Value delivered to T1’s is clear.
– Not a true “network” design for interconnecting all sites.
• But objective is a single event should not isolate a T1.
• This is incrementally improving through cross border fiber (CBF).
• Can be used for T1-T1 connectivity
– in so far as it does not impact the primary mission
– Does not aim to provision all required T1-T1 connectivity
• Does not provide T1-T2 traffic capabilities
WGDB Meeting
August 2007
David Foster, CERN
OPN Status Summary
July 2007
Link
Status
Nominal E2e Capacity
Changes
BNL
OPN Production
10G (Colt)
New CFT in 2007
FNAL
OPN Production
10G (Qwest)
New CFT in 2007
TRIUMF
OPN Production
5G
ASGC
OPN Production
2x1G (2.5G+10G to
AMS)
NDGF
OPN Production
10G
SARA
OPN Production
10G
RAL
OPN Production
10G
FZK
OPN Production
10G
CNAF
OPN Production
10G
IN2P3
OPN Production
10G
PIC
OPN Production
10G
WGDB Meeting
August 2007
Expected
10G via GN2 + IP
peering with GN2
?
Waiting for public AS
number
Sometime soon?
David Foster, CERN
CBF Status Summary
April 2007
Link
Status
SARA - NDGF
Nominal E2e Capacity
Provider Changes
10G
SARA - FZK
In Place. Unused
10G
FZK - CNAF
In Place.
10G
FZK - CERN
10G
BNL - FNAL
In Place (from GC,
Qwest)
2x10G
FZK – IN2P3
In Place
10G
Expected
Q4
DFN/Switch
To be Decided
Q3/07
Triumf - BNL
Triumf - SARA
Other Links Summary
Link
Status
Nominal E2e
Capacity
Provider
Changes
ManLan Netherlight
Production
10G
GC
Netherlight CERN
production
10G
GN2
WGDB Meeting
August 2007
Expected
David Foster, CERN
Cloud IP
ESNET/I2
US T2’s
Canada
Taipei
NetherLight
US LHCNet
ManLan
BNL
VCAT/LCAS
European T1’s
CERN
European T2’s
Starlight
FNAL
US T1’s
Cloud
NREN/GN2
IP
David Foster, CERN
David Foster, CERN
• GEANT2 is an EU
project.
• Funding comes
from NREN’s and
the EU
• DANTE is the
implementation
organisation.
• Policy is decided
at the NREN Policy
Committee meeting
(NREN PC)
• Subgroups work
on different aspects.
E.g. Global
Connectivity
Committee (GCC)
WGDB Meeting
August 2007WLCG Jan 2007
David Foster, CERN
T0-T1 Lambda routing (schematic)
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
T0-T1s:
???
BNL
RAL
T1
T1
SURFnet
T1
MAN LAN
London
NY
SARA
Amsterdam NL
UK
AC-2/Yellow
DE
VSNL N
CH
Hamburg
VSNL S
Paris
Frankfurt
T1 GRIDKA
Starlight
CERN-RAL
CERN-PIC
CERN-IN2P3
CERN-CNAF
CERN-GRIDKA
CERN-NDGF
CERN-SARA
CERN-TRIUMF
CERN-ASGC
USLHCNET NY (AC-2)
USLHCNET NY (VSNL N)
USLHCNET Chicago
(VSNL S)
Strasbourg/Kehl
FR
Stuttgart
T1
FNAL
Atlantic
Ocean
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
Milan IT
IN2P3
T1
PIC
T1
CNAF
David Foster, CERN
Some Initial Observations
Connect. Communicate. Collaborate
Copenhagen
ASGC
TRIUMF
T1
Via SMW-3 or 4 (?)
T1 NDGF
DK
T1
???
BNL
KEY
RAL
T1
(Between CERN and
T1 BASEL)
MAN LAN
NY
London
Following lambdas run in same fibre
pair:
SARA
T1
SURFnet
Hamburg
GEANT2
NREN
USLHCNET
NL
CERN-GRIDKA
UK
AC-2/Yellow
CERN-NDGF
(Between BASEL and Zurich)
CERN-SARA
VSNL N
CERN-SURFnet-TRIUMF/ASGC
(x2) run in same trench:
Following lambdas
VSNL S NYCERN-CNAF
USLHCNET
(AC-2)
Paris
CH
Starlight Following lambdas
GRIDKA-CNAF
(T1-T1)
run in
same (sub-)duct/trench:
Strasbourg/Kehl
FR trench as all above:
(all above +) Following lambda MAY run in same
T1
CERN-CNAF USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
USLHCNET
FNAL
AtlanticNY (VSNL N) [supplier is COLT]
DE
Via SURFnet
T1-T1 (CBF)
Frankfurt
T1 GRIDKA
Stuttgart
Following
Ocean lambda MAY run in same (sub-)duct/trench as all above:
USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
Zurich
Basel
Lyon
Madrid
T0
Barcelona
T1
GENEVA
ES
Milan IT
IN2P3
T1
PIC
T1
CNAF
David Foster, CERN
Conclusions
• Given the routing policies to share links for
backup purposes, some links will need to be
re-routed to provide for greater resiliance
– The multiple links from Amsterdam are the most
susceptible: TRIUMF and ASGC links will be
studied as candidates for re-routing.
• Some sites (RAL and PIC) have no backup
routes
– This requires further investment.
WGDB Meeting
August 2007
David Foster, CERN
Operations
• Very complex multi-domain problem:
– 11 T1 Noc’s
– 1 T0 Noc
– E2ECU (which hides Dante NOC and 11 NREN
Organisations)
– 1 IPCU (which hides the Grid and experiment
operations)
WGDB Meeting
August 2007
David Foster, CERN
Objectives
• Define and document all required procedures
to ensure operational response as good as or
exceeding the levels in the LCG MoU
• Measure, and provide reports on achieved
performance
WGDB Meeting
August 2007
David Foster, CERN
Operational Issues
• Many components to define and document:
– Roles, Responsibilities
– Functional Units, Processes
• Incident, Problem and Change Management
– Metrics and Measurements
WGDB Meeting
August 2007
David Foster, CERN
Where are we?
• Much is in place, but needs to be formalised
and documented.
– E2ECU (implemented by Dante) provides
monitoring via perfsonar or all circuit status.
– IPCU (implemented by ENOC) provides a “service
view” of the network and interfaces to GGUS.
• Formal processes are still missing
• Complete RACI analysis is still missing
WGDB Meeting
August 2007
David Foster, CERN
Next Steps
• Collaborative tools being used to define all the
elements of the operational handbook.
– Should be largely completed by end October
– Next progress review will be early October.
• More complete discussion with many of the
parties in November.
– Requires “buy in” from a lot of stakeholders.
– Will be an incremental process, but we need to
keep moving forward.
WGDB Meeting
August 2007
David Foster, CERN
Monitoring
• WG Activity led by Joe Metzger (ESNet) with close
collaboration with Dante.
• Perfsonar measuring points now available in all NREN’s and
some US partners.
• Provides link status information
– Managed by the E2ECU
• But this is not enough
– Distinguish between network and application problems
– Identify network problems even if they are not affecting applications
– Identify and react to changes in the underlying network
• Allow application managers to understand and react to changes in topology & capacity and retune applications as necessary
• Provide the network data necessary to correlate application performance changes with
network topology changes
• Eventually allow applications to automatically react to network changes (network awareness)
WGDB Meeting
August 2007
David Foster, CERN
PerfSONAR Solutions Current Status
Attribute
Functionality
perfSONAR Tool(s)
Date
Circuit Up/Down
Measure & Archive
E2E_MP & SQLma
Deployed
Visualize
E2Emon
Deployed
Alarm
E2Emon
Deployed
Measure & Archive
RRDMA Utilization & Capacity
RRDMA Input Errors & Output Drops
PS-SNMPMA
Done
??
Beta Aug 1, Package Sep 1
Visualize
perfSONARUI,
Visual Traceroute
Done
??
Alarm
?
?
Measure & Archive
PingerMA
Ping MP
Aug 15?
Aug 15?
Visualize
perfSONARUI Plugin?
?
Alarm
?
?
Schedule
On-demand
AMI MA & Scheduler
Hades
Owamp MP
Beta Sep 15, Package Oct 1
October
Done
Archive
AMI MA
Oct 1
Visualize
PerfSONARUI
CNM (If same as HADES MA)
I2 CGI (Done in Aug,packaged in OCT)
Done ??
October
Beta Aug, Package Oct
Alarm
Being worked on in Internet2. Generate a plan in December
07, implement 08
?
Schedule & Measure
BWCTL
BWCTL_MP (DFN one)
AMI scheduler
Done
Dne
Beta Sep 15, Package Oct 1
Archive
AMI_MA
DFN MA?
Beta Sep 15, Package Oct 1
?
Visualize
PerfSONAR UI Plugin
Web CGI scripts
Fall?
October
Alarm
Look at it Spring 08
?
Link Utilization, Errors &
Capacity
Round Trip Delay
(ICMP) & Traceroute
One way Delay Tests
between MPs
Bandwidth Tests
between MPs
WGDB Meeting
August 2007
David Foster, CERN
End-End Monitoring
• Need “end-end” monitoring in so far as that is
possible.
– Requires active monitoring, and this needs an active
monitoring infrastructure.
• Dedicated boxes whose characteristics are known at all sites
connected close to the end systems.
– Infrastructure based on linux “appliance” is being
proposed by Dante
• Still to be discussed, many questions, but homogeneous
installation, support and maintenance is being proposed at
least until the end of the GN2 project.
• Expect formal proposal from Dante in the coming weeks.
WGDB Meeting
August 2007
David Foster, CERN
Summary
• Infrastructure is almost complete
– Some additional links are required for resiliance
• Many operational issues to sort out.
– Tremendous goodwill and effort from all parties involved,
EGEE-SA2, DANTE, USLHCNet, NREN’s, ESNET, I2 etc …
• Excellent opportunity to have a T0-T1 coherent
monitoring infrastructure, quickly.
– Firm proposal from Dante still needed
– Agreement from T1’s is needed.
• Progress will be reported at the next LHCOPN meeting
(November)
WGDB Meeting
August 2007
David Foster, CERN