SA2: “Networking Support”

Download Report

Transcript SA2: “Networking Support”

Enabling Grids for E-sciencE
End-to-End Service Level Agreement
Provisioning and Monitoring
for End-to-End QoS
TNC2007, 21-24 May 2007
Vassiliki Pouli (GRNET/NTUA)
[email protected]
www.eu-egee.org
EGEE-II INFSO-RI-031688
Outline
Enabling Grids for E-sciencE
• Introduction
• SLA parts
• Model of SLA establishment
• Monitoring of SLAs
• Questions
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
2
Introduction (1)
Enabling Grids for E-sciencE
• Grid applications require specific level of QoS
for data transfers across different Resource
Centres (RCs)
• end-to-end (e2e) SLA is needed to define this
level of QoS
– provides the technical and administrative details to perform
 Maintenance
 Monitoring
 Troubleshooting
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
3
Introduction (2)
Enabling Grids for E-sciencE
• Grid RCs: users of the network
• Network providers
– GÉANT (paneuropean network)
– NRENs (national networks)
– Regional/metropolitan/campus networks
• Synthesis of e2e SLA based on individual domain SLAs
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
4
SLA parts
Enabling Grids for E-sciencE
• ALO (Administrative Level Object)
–
–
–
–
–
Contacts
Duration (start and end times)
Availability of service
Response times
Fault handling procedures
• SLO (Service Level Object)
–
–
–
–
–
Service flow description
Excess traffic treatment
Performance guarantees (OWD, jitter, packet loss, capacity…)
Reliability guarantees: max downtime (MDT), time to repair (TTR)
Monitoring infrastructure
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
5
Trusted 3rd party - ENOC
Enabling Grids for E-sciencE
• Multi-domain SLA
• Need for a trusted 3rd party responsible for:
– SLA service installation
– SLA management/monitoring
– User support for network issues
 network SLA violations
• For EGEE ►►► ENOC (EGEE Network
Operation Centre)
– Counterpart of a NREN NOC for EGEE
– In charge of “EGEE network” operations
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
6
Model of SLA establishment(1/2)
Preliminary agreement
Enabling Grids for E-sciencE
•Preliminary agreement of ENOC with participating domains & RCs
• made once for every participating domain & RC
1. ENOC asks from every participating
domain and RC to formulate an
agreement
2. Each domain NOC provides
Preliminary agreement
– the ALO (Administrative Level Object)
– max bandwidth allocated for EGEE
Each RC
– provides administrative and technical
details
– signs Acceptable Use Policy (AUP)

Provisioned network resources used only
for EGEE purposes
3. ENOC stores the received information
to the NOD (Network Operational
Database)
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
7
Model of SLA establishment (2/2)
2-Stage provisioning model: SR-SA
Enabling Grids for E-sciencE
2-Stage Provisioning Model: SR-SA
• Stage 1: Service Reservation (SR) – b2b SLA
– PIP (Premium IP) reservation in extended QoS network (GEANT/NRENs)
– border-to-border SLA (GEANT/NRENs SLAs)
• Stage 2: Service Activation (SA) – e2e SLA
– Activation of the service ↔ Configuration of the routers in the last mile
network
– end-to-end SLA (b2b SLA + NREN client domains’ SLAs)
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
8
Rationale of 2 stage provisioning model
Enabling Grids for E-sciencE
• Grid applications require e2e path be
available in time
• Lead time between service request and
service reservation of the extended QoS
network
– Manual configuration of the routers
– Currently 2 working days in the GEANT network
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
9
Service Reservation – b2b SLA
Enabling Grids for E-sciencE
• Case 1: automatic reservation
– Reservation via AMPS servers of hosting NRENs and GEANT
– AMPS (Advanced Multi-domain Provisioning System) :
 System In development stage by the GEANT project
 Management of the whole PIP provisioning process from user
request through to the configuration of the appropriate network
elements
• Case 2: manual reservation
– No AMPS servers (or similar services) installed
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
10
Stage 1: Service Reservation (SR)-b2b SLA
case 1: automatic reservation
Enabling Grids for E-sciencE
• Reservation through AMPS or
similar service
• ENOC identifies involved
GEANT/NREN domains
• GEANT/NRENs provide individual
SLAs
• Synthesis of b2b SLA: performed
by ENOC based on reported
GEANT/NRENs SLAs
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
11
Stage 1: Service Reservation (SR)-b2b SLA
case 2: manual reservation
Enabling Grids for E-sciencE
• Cases with no AMPS servers
or similar services installed in
NRENs
GEANT/
NRENs
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
12
Stage 1: Service Reservation (SR)-b2b SLA
case 2: manual reservation
Enabling Grids for E-sciencE
• No AMPS servers installed
• ENOC identifies involved
GEANT/NREN domains
• ENOC initiates manual requests
to individual domain NOCs
• NOCs reply by email and provide
individual SLAs
• Synthesis of b2b SLA: performed
by ENOC based on reported
domain SLAs
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
13
Stage 2: Service Activation (SA)-e2e SLA
Enabling Grids for E-sciencE
• ENOC verifies that the reservation in
the extended QoS domain is still
effective and retrieves it
• Checks if NREN client domains
(MAN/campus/institution) can support
the request
• NREN client domains provide their
SLAs
• ENOC produces e2e SLA based on:
– reported NREN client domains’ SLAs
– b2b SLA from stage 1
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
14
Monitoring of SLAs
Enabling Grids for E-sciencE
• ENOC queries NPM DT (Network
Performance Monitoring Diagnostic
Tool)
• NPM DT provides measurement data
from perfSONAR (GEANT/NRENs)
and e2emonit (RC-to-RC) monitoring
frameworks
• Fault Identification/Notification
– Case 1: ENOC identifies & notifies
responsible domain
– Case 2: ENOC (not able to isolate
the problem) informs GEANT PERT
(Performance Enhancement
Response Team)
• Reaction-Repair according to SLAs
• ENOC checks SLA compliance
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
15
SLA monitoring requirements
Enabling Grids for E-sciencE
e2e Metrics:
–
–
–
–
–
–
–
OWD (One Way Delay)
IPDV (IP Packet Delay Variation)
RTT (Round Trip Time)
Packet Loss
Available bandwidth
Achievable bandwidth
TTR (Time To Repair)
Performance metrics
From trouble ticket issue to recovery, per violation
Reliability metrics
– MDT (Maximum DownTime)
Maximum total TTRs for all violations in a given period
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
16
Monitoring features
Enabling Grids for E-sciencE
– Frequent e2e and partial domain monitoring of
performance metrics (e.g. every 15’) in agreed
service availability period
– Capability of setting thresholds on metrics to
generate violation alarms
 Different severity levels (?)
– Trouble tickets, triggered by users and ENOC
operators on alarms, managed via TTM (Trouble
Ticket Manager)
– Statistics from trouble tickets to infer MDT & TTR
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
17
Questions
Enabling Grids for E-sciencE
?
EGEE-II INFSO-RI-031688
TNC2007, 21-24 May 2007
18