Replica Management Services in the European DataGrid Project

Download Report

Transcript Replica Management Services in the European DataGrid Project

Replica Management Services in the
European DataGrid Project
Work Package 2
European DataGrid
Outline
•
•
•
•
•
The need for the European DataGrid and replica mgt.
Overview of replica management services
Performance evaluation of services
Future work – replica management in EGEE
Conclusion
Why do we need a Grid?
Equivalent
to
2
100s
MB/s data
million->CDs
of PB
output
several
data/year needing
of data per year.
20,000 PCs per exp
to analyse.
Distributed Grid
computing…
The European DataGrid
• Ran from Jan 2001 – March 2004
• Aim: to develop a Grid infrastructure for data-intensive
scientific applications
– High energy physics, biology and Earth observation producing
several PB of data per year
• Developed Grid middleware for job, data and fabric
management, information and monitoring
Grid Architecture
Scope
EDG
of of
EDG-WP2
middleware
Data Management
• Requirements:
– Enable secure access to massive amounts of
data in a global name space
– Move and replicate data at high speed from one
geographical site to another
• 1st generation: GDMP + edg-replicamanager
– Used Globus for secure file transfer
– C++ based – gave basic replication functionality
and cataloging
Data Management
• 2nd generation – uses web services
– Easy and standardised way to connect distributed services via XML
• Services include
– Replica Manager Client
• main user interface
– Replica Location Service
• stores physical locations of replicas
– Replica Metadata Catalog
• stores logical file name mappings and metadata attributes
– Replica Optimization Service
• provides optimised access to replicas
– Security
• HTTPS + Globus’ GSI
Replica Location Service
• Implementation of RLS framework co-developed with
Globus
• Maps unique identifier (GUID) to multiple replicas (SURLs)
• Local catalog (LRC) with distributed index (RLI)
RLI
RLI
RLI
GUID:LRCs
soft-state
update
LRC
LRC
LRC
LRC
GUID:SURL
Replica Metadata Catalog
• GUIDs are unfriendly and non-intuitive
– guid:131f9940-f501-11d8-9669-0800200c9a66
• Use user-definable Logical File Names
– lfn:cal-test-data-2004-09-01-005a
• RMC stores LFN:GUID mappings (n:1)
• Can also store ~10 metadata attributes
– eg file owner, file size
• Together with RLS gives complete LFN:GUID:SURL view
RMC
LFN
LFN
LFN
SURL
GUID
SURL
SURL
RLS
Replica Optimization Service
• Gives optimised access to replicas by choosing replicas
with quickest access (based on network measurements)
• Automatically replicates files to sites on which they are
needed
Simulation research (OptorSim)
continues to investigate more
complex replica management
strategies
Replica Manager
• Client-side tool acts as user interface to services
(although services can also be accessed directly)
• Coordinates service interactions
• Interfaces with external services
– information service (MDS, R-GMA)
– storage services (SRM, EDG-SE)
– file transfer services (GridFTP)
Implementation
• Servers written in Java, clients auto-generated (Java, c++
etc) from WSDL
• Web services run on Apache Axis inside Java servlet
engine (Tomcat/Oracle AS)
• Use MySQL/Oracle as back-end DB to store persistent
information
• RLS used already in production for LCG (Oracle AS/DB)
– CMS Data Challenge 04 – 2 million entries stored
Service Interactions
“Make a replica of the file specified by LFN to SE2”
User
Interface
2. getGuid(LFN)
1. replicateFile(LFN, SE2)
3. listReplicas(GUID)
6. registerFile(GUID, SURL)
Replica
Metadata Catalog
Replica
Manager
Replica
Location Service
4. listBestFile(SURLs, SE2)
5. copyFile(SE1, SE2)
Storage
Element 1
Storage
Element 2
Replica
Optimization Service
RLS performance
• In production use, only single LRC used so far
– Test performance using Java and c++ API to insert and query
GUID:SURL mappings
c++ query
Java vs c++ insert
• Excellent query performance, c++ more stable than Java
RLS performance
Using Java API and multiple concurrent threads
Insert 500,000 mappings
5 insert and 5 query threads
• Throughput peak ~20 threads, again stable query performance
Security
• Security adds significant overheads!
RLS Inserts
Secure Client (s)
Insecure Client(s)
1
0.77
0.07
10
7.07
0.54
100
55.44
3.38
1000
527.12
28.61
• Problem caused by new connection for each transaction
• Could be reduced by using bulk operations
RMC performance
• Test multiple LFNs per GUID and multiple metadata
attributes
c++ query
Java insert
• Scales well with no. of LFNs per GUID and no. of attributes
RMC Performance
• Command Line Interface: edg-rmc addAlias
Time (s)
Operation
0 - 1.0
Start-up script and JVM start-up
1.0 - 1.1
Parse command and options
1.1 - 2.1
Get RMC service locator
2.1 - 2.3
Get RMC object
2.3 - 3.0
Call to rmc.addAlias() method
3.0
End
• Very slow compared to API calls (2 orders of mag slower)
• Recommended for testing an installation only
The Future of EDG Services
• EGEE - building production quality Grids
• Lessons learned from EDG:
• Less is more: stability and usability most important
• User interface and documentation difficult to get
first time
• Need easy integration of different providers
• G-Lite - middleware (re)engineering and
integration
– using many concepts/experience from EDG
– but geared towards service-oriented
architecture
EGEE Data Mgt Services
• Replica Manager -> Data Scheduler + Transfer Fetcher +
File Placement Service + File Transfer Service
From EGEE Middleware Architecture
and Planning (Release 1.0) DJRA1.1
EGEE Data Mgt Services
• RLS + RMC -> Combined Catalog Interface to: File Catalog
+ Replica Catalog (+ Metadata Catalog)
From EGEE Middleware Architecture
and Planning (Release 1.0) DJRA1.1
Conclusion
• EDG WP2 has developed a set of integrated replica
management services
• Can cope with demanding Grid conditions
– already used in production environment
• A lot of concepts now being taken forward into EGEE
project