perfSONAR MDM Service for LHC OPN

Download Report

Transcript perfSONAR MDM Service for LHC OPN

Connect. Communicate. Collaborate
perfSONAR MDM Service for
LHC OPN
Loukik Kudarimoti
DANTE
Multi Domain Monitoring Service
Introduction
-1-
Connect. Communicate. Collaborate
• perfSONAR Consortium
– ESnet
– GÉANT2
– Internet2
– RNP
• perfSONAR MDM Service
– Support for GÉANT2 users of monitoring software
– Managed Service portfolio to support hardware
2
Multi Domain Monitoring Service
Introduction
-2-
Connect. Communicate. Collaborate
perfSONAR MDM Service
• Monitor the network
– Deploy and maintain hardware
– Install and maintain monitoring software
– Reports on monitoring results
• Provide access to measurement capabilities & data
– Authorized groups of users
– Various visualisation tools
• Support users of tools and data
– Service Desk
– Agreements with Software development teams
– Reports on Service desk performance
3
Network Monitoring
-1-
Connect. Communicate. Collaborate
Primary Monitoring Objective
• LHC OPN links between Tier-0 & Tier-1
• Links between Tier-1 sites (ex: ATLAS experiments)
• Metrics
– One way delay, One way delay variation
– TCP Achievable Bandwidth
– Status of circuits
– Traceroute, packet loss and packet re-ordering
– IP Interface statistics - utilisation, errors and discards
4
Network Monitoring
-2-
Connect. Communicate. Collaborate
• Data Visualisation
– Map based tools, diagnostic tools and alarms
– Recent data & archived data for trend analysis
– Data and Tool Administration Interfaces
5
Connect. Communicate. Collaborate
6
Connect. Communicate. Collaborate
7
Connect. Communicate. Collaborate
8
Connect. Communicate. Collaborate
9
Connect. Communicate. Collaborate
10
Connect. Communicate. Collaborate
11
Appliance Deployment
-1-
Connect. Communicate. Collaborate
Deployment Location
• 12 sites
– 8 in Europe
– 2 in USA
– 1 in Canada
– 1 in Taiwan
• Equipment deployed at each Tier-1 site
– 2 x Sun Servers
– 1 x Bee Server
• Location suggested: close to the site’s border routers
12
Appliance Deployment
-2-
Connect. Communicate. Collaborate
• Single Border Router Scenario
• Sites:
– BNL
– CNAF
– FNAL
– IN2P3
– NDGF
– PIC
– RAL
13
Appliance Deployment
-3-
Connect. Communicate. Collaborate
• Scenario 2a: Two border routers
with primary & backup links
• Sites:
– Gridka
14
Appliance Deployment
-4-
Connect. Communicate. Collaborate
• Scenario 2b: Two border routers
in a triangle
• Sites:
– ASGC
– CERN
15
Appliance Deployment
-4-
Connect. Communicate. Collaborate
• Scenario 2c: Two border routers
using Virtual Router technique
• Sites:
– SARA (?)
16
Site Responsibilities
-1-
Connect. Communicate. Collaborate
• GPS Antenna & receiver card - according to spec
– Installation of GPS Antenna and all necessary cabling
• Dedicated GigE switch
• Network connections and IP Addresses
• Terminal Server - according to spec
– Out of band access for terminal server (PSTN / ISDN)
– In band access via switch
• Dedicated interface(s) on border routers
• 60 minute UPS backup and notification procedures
17
Site Responsibilities
-2-
Connect. Communicate. Collaborate
• Site Network firewall configurations
– Importantly: SNMP read access to border routers
• Site administrator(s) for local support
– Physical hardware installation
– Provide Network topology and configuration information
– Some on-site maintenance tasks
• Remote hands and eyes
• Site access to 3rd party support on behalf of DANTE
• Other (rack space, electricity)
18
Site Responsibilities
-3-
Connect. Communicate. Collaborate
• Provide Network Topology and configuration information
– About Routers, IP interfaces, connectivity, network
topology
– About links that are part of end-to-end circuits
• Status of links
– Update information whenever there are changes
• Web Interfaces and service desk procedures will be
available
• Software to help in updating the status of circuits
19
Security
-1-
Connect. Communicate. Collaborate
• Network Firewalls and isolation
– Well known flows and ports
– Well defined access to site network equipment
– Separate VLANs and interfaces
• Isolation from the Tier-0/Tier-1 LAN
• Operating System Security
– Red Hat Enterprise 5
– Maintained and promptly updated by Service Desk
– “Bolted down”
20
Security
-2-
Connect. Communicate. Collaborate
• Monitoring tools and perfSONAR software
– Stable, supported software
– Low privileges
– Well defined ports, protocols and firewall requirements
• Data Security and Availability
– eduGAIN framework for access control
• Data and measurement capability limited to specific
user groups
– RAID system for local data backup
21
Support
-1-
Connect. Communicate. Collaborate
• Service Desk @ DANTE
– Single Point Of Contact for users
– Support for deployments, data and tools
– ITIL Principles
– Agreements with development groups
• Supported user groups
– Tier-0 and Tier-1 Operation Centers
– LHC OPN IP Co-ordination Unit (LIPCU)
– E2ECU
– Performance Enhancement Response Teams
22
Support
-2-
Connect. Communicate. Collaborate
• Service Desk Reports
– Incidents reported and resolved
– Deployment Availability
• Data and Tool Availability statistics
– Statistics on network monitoring results
23
Service Extensions and Conclusions
Connect. Communicate. Collaborate
• Service Extension possibilities
– Investigate more measurements if requested
– Controlled change process to the services
• perfSONAR MDM Service in beta phase (public trial)
– 1st phase - Rolled out to six NRENs
– 2nd phase - eleven NRENs
• Trial phase extended to LHC OPN
– In partnership with Internet2 and ESnet, service desk
made available to Tier-0 and all Tier-1 sites.
24