Slides - TERENA Networking Conference 2002

Download Report

Transcript Slides - TERENA Networking Conference 2002

Grid networking in EU DataGRID
TERENA conference
Limerick - 5th of June 2002
Pascale PRIMET
Manager of the workpackage “Network” of the DataGRID project
INRIA/ RESO - ENS Lyon
[email protected]
1
Outline
•
•
•
•
•
The European DataGRID “network”
High performance Grid Networking
Grid Network Monitoring in EDG
Network Services for the GRID
Perspectives
2
Grid technology
• The purpose of a Grid is to
– aggregate a large number of resources
– to build a high performance
– computing and storage environment.
• The distributed resources may be
– interconnected via a VPN
– or the Internet.
3
European DataGRID project
• The EDG project http://www.eu-datagrid.org/ aims to
provide production quality testbeds, using real-world
applications with real data:
• High Energy Physics
• process the huge amount of data from LHC
experimentations
• Biology and Medical Imaging
– sharing of genomic databases for the benefit of international
cooperation
– processing of medical images for medical collaborations
• Earth Observations
– access and analysis of atmospheric ozone data collected by
satellites as Envisat-1
• Calendar : january 2001 to december 2003
• Funded by the European Union
4
EDG - Partners
• CERN – France
• CNRS – France
– Testbed (WP6)
– Network (WP7)
– Bio application (WP10)
•
•
•
•
ESA/ESRIN – Italy
INFN – Italy
NIKHEF – The Netherlands
PPARC - UK
5
European DataGRID project
• 7 applications distributed among 6 virtual
organisations
• 11 organisations over 15 countries
• 40 sites in Europe
• Based on the European GEANT backbone
and National NREN’s
http://ccwp7.in2p3.fr
6
EDG - Infrastructure
7
High Performance Grid Networking
• Technical collaboration with Network providers
– Requirement studies (Application and middleware)
– Available infrastructure and services review
– Enhanced Network services tests
• Technical collaboration with Grid users
–
–
–
–
End to end monitoring
Transport protocols studies and optimisation
E2E performances problems identification
Network cost functions realization for scheduling
8
« Physical » view of a Grid Network
Public Network
No security
No predictable performances
No control on the traffic
The flat INTERNET
Resource = CE
(computing element)
or Resource = SE
(storage element)
9
Logical view of the Grid Network
10
EDG WP7 «Network » activities
Manager: Pascale Primet - INRIA/RESO – 25 persons- 2,5 funded
Provisionning
Monitoring
Security
E2E QoS and Transport Services
11
EDG WP7 activities
T7.1 : Technical Collaboration with Dante/NRENs
– Pilot services test (QoS, multicast)
– Dedicated machines in GEANT PoPs
T7.2 : QoS and advanced services
- QoS services test with biological/medical applications
- Reliable Multicast Protocol test and deployement
- High performance transport protocol (TCP/nonTCP)
T7.3 : Network Monitoring Architecture
Applications
Middleware
Infrastructure
Management
T7.4 : Security => EDG Security team
Testbed
– Design and deploy a Network Monitoring Infrastructure
– Visualize and analyze monitoring data
12
Collaboration with GEANT
• E2E : Close participation to pilot services
– Test of IP Premium service/WP10
• In Backbone : (our proposal)
– Use of dedicated machines in GEANT POPs
• Amsterdam, Geneva, London
– Tests of high throughput transfers
– Test of IP multicast for Reliable Multicast
– Sharing WP7 monitoring and DANTE monitoring data
13
Network provisioning
•
Network Requirements studies
•
•
•
Physical Networks
1.
2.
3.
4.
•
GEANT : 2.5 Gbps to 10 Gbps
NRENs : from 155Mbps (or less) to 2.5Gbps
Regional networks: from 2Mbps to 155Mbps
Local Area Networks : from 10Mbps to 1Gbps)
Is a « Virtual Private Network » required for the
DataGRID ?
•
•
Application Requirements (WP8, WP9, WP10)
Middleware Requirements
concept definition / VPN technologies review
See our D7.1 document on WP7 EDG site
14
Methodology
1 Flows
4 Monitoring
2 Logical links
3 Physical links
15
Application requirement studies
Top down stream identification
CERN
CPU Client
CPU Client
Input File Database
CPU ClientCPU Client
desktop
desktop
desktop
Tier 2
CPU Client
Tier 3-4
GOME data
Application
scheduler
GOME
processor
Process Binary
Output file
Comp.
resource
GOME archives
Data Server
Data Server
Data Server
CPU Client Tier 1
dedzdscdcdsc
scsdcdscdcds
dx
cs
Comp.
resource
GOME
product
User control /
monitoring
GRID
Visualisation
Product
Archives
WP8
WP9
WP10
Flows list
N°
Name
Application WP
1 Monté Carlo Data Réplication
LHCb
2 ENVISAT Data from ground station to storage centreMETEO
3
Type
8 gridftp
9 tcp
Transfert
Frequency
volume
(in days)
(Mbytes)
30 000
24
5 000 000
0
10
86 400
Average
bitrate
constraints
(Kbit/s)
66 667
66 138
80 000
Observations
16
Some numbers
• HEP applications:
– Bulk Data transfer : from 100Mb/s (TB1) to 1Gb/s cont.
(TB3)
• Medical applications:
– Interactive Traffic with burst of more than 1Gbyte
– Real Time High Performance Vizualisation/Simulations
17
Network performances measurement (1)
For Provisioning:
– To be available, via visualization to human
observer (user, network/system administrators)
– To provide tools for network performances
measurement, problems identification and
resolution (bottlenecks, point of unreliability,
quality of service needs, topology…)
– To achieve network performance forecast and
optimization – Capacity planning
18
Network performances measurement (2)
For Resource Brokers:
– Network performance parameters are used for
optimizing resource allocation (replication,
MPI, Remote file access…)
– Network performance metrics must
• be published to the Grid Information System
• Be accessible through aggregated functions called
by Grid resource broker services (computing and
data storage).
19
Architectural design
• four functional units :
– monitoring tools or sensors
– a repository for collected data;
– the means for data analysis to generate network
metrics;
– the means to access and to use the derived
metrics.
• See our D7.2 document on WP7 EDG site
20
Network Monitoring Architecture
Network managers
P_RTPL
MapCenter
Resource Broker
P_NWS
Publication
Middleware
LDAP
Forecaster
Analysis
Data processor
Repository
Data Collector
Raw
Sensors
PingEr
IPerf
GridFTP
SNMP
RTPL
…
21
Measurement methods
• Active methods
– Injection of traffic inside the network for testing
performances between two points
– problem: may be intrusive (TCP/UDP throughput)
• Passive methods
– Collect traffic informations in one point of the network
: router, switch, dedicated passive host, computing
element or storage element (GRIDftp logs)…
– Problem : give network usage, not capacity
22
Active measurement
Identify bottlenecks and real throughput availability
23
Passive measurement
Passive measures
at one point
24
Metrics and tools
•
•
•
•
•
•
•
Round Trip Delay => PinGER (Lyon->nikhef)
Packet Loss => PinGER (Lyon->nikhef)
TCP throughput => IPerfER (nikhef -> Ral)
UDP throughput => UDPMon (CZ->Cern)
site connectivity => MapCenter
service availability => MapCenter
OneWay metrics => RIPEncc test boxes
25
Some results
• from testbed sites to CERN
–
–
–
–
Pinger RTT: Average: 25ms
OWD: average: 9ms
OWL: average 0 to 0,3%
TCP Throughput : from 0 to 350Mb/S
26
PingER results
27
IPerfER Results
28
Schema and LDAP backend
• Grid applications/mw are able to access network
monitoring metrics via LDAP services according to a
defined LDAP schema.
• LDAP back end to make measurements visible
through the Globus GIIS/GRIS system has been
developed.
– that fetch, or have pushed, the current metric information
from the local network monitoring data store.
• R-GMA is tested as an alternative solution to Globus
MDS
http://ccwp7.in2p3.fr/mapcenter
29
Network Cost functions
Network metrics published in LDAP
repositories are used by resource brokers
and replica managers through network
cost functions :
Time = networkCost (SE1, SE2, filesize)
Computed from
1. GridFTP logs
2. TCP throughput measurements (aggregated)
3. RTT Measurements (aggregated)
30
EDG Network Cost Function
Network Element => Network COST function
31
EDG MapCenter Tool
– Connectivity of sites
– Availability of services running over all sites involved
– Efficient and flexible model to logically and graphically
represent all communities, organization, applications
running over grids.
– MapCenter enables representation of any level of
abstraction (national and international organizations,
virtual organizations, application etc) needed by grid
environments.
– http://ccwp7.in2p3.fr/mapcenter
32
Network and Transport Services
• QoS:
– Demonstrate and build experience in use of
E2E diffserv services in Grid context
– Feedback experiences to GEANT/DANTE,
NRENs and LANs
• Transport
– High performance transport protocols
– Reliable multicast protocols tests
33
QoS and Grid Applications
• 4 types of flows => Required Services
–
–
–
–
Bulk data transfer => Scavenger, AF
Interactive flows => AF, EF, ECN, others?
Real-time flows => EF, others?
Test traffic => Scavenger
34
QoS and Experimental work
•
•
•
•
Routers configuration : WRR, DRR…
QBSS : in LAN and LFN (CERN-Caltech)
ECN and TCP over ECN
Alternative models:
– ABE, EDS, proportional DS
• E2E Premium service for Medical
Applications
35
High Performance Transport
TCP mechanisms optimization
– Tests of applicability of new mechanisms
• Use of QoS solutions
– diminution of Packet Loss
– Active queue management (WRED, ECN)
– TCP over DiffServ (AF, EF, PDS, EDS…)
• Reliable Multicast Protocol
– Test and deployement of JRMS and TRAM
36
Issues and perspectives
•
•
•
•
•
•
•
Refine NetworkCost functions algorithms
Scheduling of active measurements
Sensor deployment scalability
Automatic metrics analysis
Network performances forecasting
QoS services E2E availability and effectiveness
Transport services deployment
37
Conclusion
• In testbed0 and testbed1 the networking
functionality was here
– IP technology: Best effort
– GEANT has been deployed
– A Performance Measurement Architecture developed
• In testbed 2 and testbed 3
– Grid application performance optimization
– End to end performance analysis
– Test and provide enhanced network and transport
services : Premium, Scavenger, Multicast
38
WP7 and other collaborations
• WP7 and EU DataTAG collaboration
– QoS service study and experiment
– High Throughput study and experiment
– Network monitoring and measurement
• GGF
– GHPN research group
• Other European Grid projects (FR e-toile,
UK e-science, INFN grid…)
39
For more information
• Consult our sites:
– http://ccwp7.in2p3.fr
– http://eu-datagrid.web.cern.ch
40