Network in EGEE: Building end-to-end network services
Download
Report
Transcript Network in EGEE: Building end-to-end network services
Enabling Grids for E-sciencE
Network in EGEE
Building end-to-end network services
for the Grid
Mathieu Goutelle – CNRS UREC, France
EGEE-II SA2 “Networking support”
[email protected]
www.eu-egee.org
EGEE-II INFSO-RI-031688
EGEE and gLite are registered trademarks
Outline
Enabling Grids for E-sciencE
• Short presentation of EGEE,
• The network in EGEE:
– Network services?
– EGEE focus on end-to-end services in a multi-domain context.
• Network services:
– Resource reservation,
– Service Level Agreement.
• Operational services:
– Monitoring,
– EGEE Network Operational Centre.
• Summary & conclusion
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
2
EGEE in a nutshell…
Enabling Grids for E-sciencE
• EGEE:
– 1 April 2004 – 31 March 2006
– 71 partners in 27 countries, federated in regional Grids
• EGEE-II:
– 1 April 2006 – 31 March 2008
– 91 partners in 32 countries
– 13 Federations
• Objectives:
– Large-scale, production-quality
infrastructure for e-Science
– Attracting new resources and
users from industry as well as
science
– Improving and maintaining
“gLite” Grid middleware
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
3
EGEE in a nutshell…
Enabling Grids for E-sciencE
• More than 20 applications from 7 domains:
– Astrophysics:
MAGIC, Planck
– Computational Chemistry
– Earth Sciences:
Earth Observation, Solid Earth Physics, Hydrology, Climate
– Financial Simulation:
E-GRID
– Fusion
– Geophysics:
EGEODE
– High Energy Physics:
4 LHC experiments (ALICE, ATLAS, CMS, LHCb)
BaBar, CDF, DØ, ZEUS
– Life Sciences:
Bioinformatics (Drug Discovery, GPS@, Xmipp_MLrefine, etc.)
Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D, etc.)
– Multimedia
– Material Sciences
– …
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
4
EGEE Infrastructure
Enabling Grids for E-sciencE
Country
participating
in EGEE
Scale (June 2006):
~ 200 sites in 40 countries
~ 25 000 CPUs
> 10 PB storage
> 35 000 jobs per day
> 100 Virtual Organizations
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
5
Network infrastructure
Enabling Grids for E-sciencE
Connects 32 NRENs
Over 3M users
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
6
Network infrastructure (cont.)
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
7
End-to-end network services?
Enabling Grids for E-sciencE
• What type of services?
– Network services are available to the EGEE sites:
Premium IP and similar (QBSS e.g.),
“lightpath” or network resource reservation,
IPv6, multicast…
– Operational services are available to the EGEE sites:
Monitoring of the network (local & backbone),
Operational data (incident, maintenance).
• How to ensure the service continuity along the path?
– In the last mile?
– In a multi-domain context?
• What about service availability, interface
standardization, inter-domain agreements, etc.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
8
EGEE focus
Enabling Grids for E-sciencE
• Network services:
– Network resource reservation:
Bandwidth Allocation and Reservation (BAR),
Dedicated talk on that subject (see session 1, “End to End
Bandwidth Allocation and Reservation for Grid applications”).
– Service Level Agreement (SLAs):
End-to-end SLAs?
• Operational services:
– Monitoring:
Network Performance Monitoring (NPM),
Dedicated talk on that subject (see session 2, “Federated Network
Performance Monitoring for the Grid”).
– Coordination of operational actions:
Concept of the EGEE Network Operational Centre (ENOC).
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
9
Network resource reservation
Enabling Grids for E-sciencE
• Based on the framework currently being built by the
GÉANT2 project:
– Hides the multi-domain, multiple technologies issues;
– Provides at the Grid level:
A seamless interface for service requests at the “customer” layer;
High-level view of the network, with request of characteristics and
not of a particular service;
Reduced configuration lead-time;
A description of the service level.
• Issues remain:
– A component (BAR, see dedicated talk) gives access to these
interfaces at the middleware layer, but the application layer is not
yet ready;
– Need of sub-management of the macroscopic reserved resource
at the Grid level;
– What about domains outside the GÉANT2 cloud?
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
10
Quick look at the BAR architecture
Enabling Grids for E-sciencE
Site 1
Network 1
Network 2
Network 3
Site 2
HLM
BAR
BAR
EGEE
Network
L-NSAP
L-Network
NSAP
NSAP
L-NSAP
NSAP
Extended QoS Network
L-Network
• Clear demarcation between the Grid and the network:
– The network is hidden from the Grid (technology, multi-domain
issues…);
– The Grid is hidden to the network (only knows one “EGEE” user);
– Allows a two-stage process (reservation & activation) suitable in a Grid
context;
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
11
SLAs
Enabling Grids for E-sciencE
• “SLAs”?
– Description of the characteristics of the service provided (e.g.
after a successful resource reservation request);
– Provided by each domain crossed by the data path;
– Either manually filled in by a human or automatically if the
request is all handled by software.
– Definition of templates in cooperation with GÉANT2:
Based on previous work inside EGEE and answers from GÉANT2
to some open issues (procedures, demarcation point…)
• SLA template:
– Administrative part (contact, duration, troubleshooting
procedures);
– SLS (Service Level Specification) part.
• The SLA is formed using the individual SLAs provided
by all domains along the end-to-end path.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
12
SLAs (cont.)
Enabling Grids for E-sciencE
border-to-border connectivity
end-to-end connectivity
• EGEE end-to-end SLA template:
– Concatenation of the individual SLAs in each participating domains;
– SLA between the border of the NRENs cloud (border-to-border SLA);
• Difficulty to accommodate and take into account the “last mile”:
– If the “last-mile” network is not participating (no resource reservation
system, no SLA, etc.);
– Try to address this with static information on these networks to provide
service characteristics to the user/application.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
13
SLA institution
Enabling Grids for E-sciencE
• All domains involved in network services provisioning
to EGEE as part of the existing network infrastructure
hierarchy have to be categorized as one of:
– Compliant with the Premium IP service,
– Supportive of the Premium IP service,
– Indifferent to the Premium IP service.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
14
EGEE focus
Enabling Grids for E-sciencE
• Network services:
– Network resource reservation:
Bandwidth Allocation and Reservation (BAR),
Dedicated talk on that subject (see session 1, “End to End
Bandwidth Allocation and Reservation for Grid applications”).
– Service Level Agreement (SLAs):
End-to-end SLAs?
• Operational services:
– Monitoring:
Network Performance Monitoring (NPM),
Dedicated talk on that subject (see session 2, “Federated Network
Performance Monitoring for the Grid”).
– Operational Interface with the network:
Concept of the EGEE Network Operational Centre (ENOC).
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
15
Monitoring
Enabling Grids for E-sciencE
• Not Yet Another Monitoring Framework!
– Role of a Mediator between the various monitoring frameworks and the
various clients (diagnostic tools, middleware, etc.);
– Network Performance Monitoring (NPM) gives access to data collected
at existing monitoring frameworks (site, backbone);
– Use of the NMWG interface to access those frameworks and republish
data;
– Special requirements for some middleware
components for faster access to data.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
16
Operational Interface
Enabling Grids for E-sciencE
• The network infrastructure of EGEE is mainly served by
a set of NRENs via GÉANT2;
• Need of an entity coordinating all the NOCs involved
and the Grid Operations:
– Concept of an end-to-end Coordination Unit (GÉANT2);
– Providing an end-to-end operational support.
• A single point of contact as an operational interface
between EGEE and GÉANT2/NRENs dealing with:
–
–
–
–
Network problems troubleshooting,
Interactions with network providers and Grid sites,
Notifications from NRENs,
Network SLA installation and monitoring.
• Two Functional Entities inside EGEE:
– EGEE Network Operational Centre (ENOC);
– A Network Trouble Ticket Manager – GGUS.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
17
ENOC
Enabling Grids for E-sciencE
EGEE Network
Support
Units
NRENs
GGUS
ENOC
Users
GÉANT2
• From the EGEE point of view:
– GGUS acts as the first line support (interacts with the user);
– Support units are the second level support;
• From the NRENs’ point of view:
– EGEE (via the ENOC) is a single entity;
– The ENOC is the only point of contact for the NRENs (submitter of the
problem).
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
18
ENOC (cont.)
Enabling Grids for E-sciencE
• Main challenges:
– To create a network support structure inside EGEE;
– To define the associated network operational procedures.
• The ENOC is the user support for network failures:
– End-to-End network problems troubleshooting;
– Coordination unit of the actions of all the entities involved in a
network incident;
– Try to have an overall view of the end-to-end service, gathering
information from all the involved domains;
– SLA Management: installation and monitoring.
• ENOC Operational Procedures have been defined and
validated during the first phase of EGEE;
• EGEE-II will fully implement ENOC.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
19
ENOC (cont.)
Enabling Grids for E-sciencE
• ENOC Service:
– Collect tickets from NRENs which agree to provide them to the
ENOC;
– Forward to GGUS the ones that seem relevant (possible impact
on the Grid infrastructure);
– Receive tickets assigned to ENOC by the GGUS 1st level
support;
– Troubleshoot them with the help of monitoring tools;
– Contact identified faulty domains or reassign ticket to the
associated site if there is no evidence of a backbone problem
(e.g. LAN issue).
• Main Issues:
– Load on the ENOC team (amount of info, etc.);
– Heterogeneity of systems the ENOC has to deal with
(languages, trouble ticket format, monitoring, etc.).
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
20
ENOC status
Enabling Grids for E-sciencE
• ENOC team is ready!
5 people (2 FTE) including one dedicated to it.
• ENOC receives operational information from GÉANT2
and 10 NRENs (more to come):
About 80% of all the EGEE sites covered;
An average of 5 tickets handled per day;
8 different languages.
• Building tools to follow up or
enhance the network support:
Network Operational Database
(interconnection of
administrative domains between
the EGEE resource centres);
TT parsing and filtering tool;
Dashboard to present overall status
of the “EGEE network”.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
21
EGEE expectations
Enabling Grids for E-sciencE
• Towards a better solution against our “multi-domain”
and “end-to-end” issues
• Seamless access to network monitoring data:
GÉANT2 will provide such access (PerfSonar), from multiple
domains, aggregating data from multiple frameworks;
• Network resource reservation:
Requests expressed not in terms of service but of characteristics;
The choice of the underlying technology to fulfil them is up to the
network;
Answer to a request = SLA (depending of the current network
status & load);
What about the last mile? The non-NRENs domains?
• Standardization of the operational interface:
Trouble Ticket format (data schema and exchange format);
Access method.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
22
Summary & conclusion
Enabling Grids for E-sciencE
• Focus on providing end-to-end services in a multidomain context:
– Hiding the network complexity from the Grid (users, middleware,
Grid support);
– Hiding the Grid complexity from the network (single point of
contact, operational interface);
• Many building blocks depend on the providers:
– Resource reservation frameworks, SLA installation, backbone
monitoring;
– Fortunately, EGEE and GÉANT2 built up a strong collaboration!
• Many things remains pending:
– Mainly on the operational side (homogenization of the network
interface);
– How to cope with domains outside the GÉANT2 cloud?
• The two infrastructures need to collaborate on these
aspects.
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
23
Enabling Grids for E-sciencE
Thank you for your attention!
EGEE-II INFSO-RI-031688
GridNets 2006 – 2006-10-01 – San Jose, CA, USA
24