GEONgrid Software Stack Version 1.0

Download Report

Transcript GEONgrid Software Stack Version 1.0

Data Replication Service
Sandeep Chandra
GEON Systems Group
San Diego Supercomputer Center
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Outline
• Motivation
• Data Replication Service (DRS)
• Components for DRS
– RLS, GridFTP, RFT
• DRS Deployment
• DRS setup on GEON
• Next Steps
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Motivation
• Science domains spend considerable effort
collecting and managing large amounts of
data
• Science domains develop customized data
management services that vary with the type
of application
• Common data management requirements
–
–
–
–
Publish and replicate large datasets
Register data replicas in catalogs and discover them
Perform metadata-based discovery of datasets
May require ability to validate correctness of replicas
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Motivation (cont.)
• These systems demand considerable
resources to design, implement & maintain
– Typically cannot be re-used by other applications
• Need for a long-term solution
– Generalize functionality provided by these data management
systems
– Provide suite of application-independent services
• Design and build on lower-level grid services
– Globus Reliable File Transfer (RFT) service
– Replica Location Service (RLS)
– GridFTP
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
A possible solution:
Data Replication System (DRS)
• Higher level data management service based
on low level data management components
like RLS and RFT
• The primary functionality is to
– Allow users to identify a set of desired files
existing in their grid environment
– Make local replicas of those data files by
transferring files from one or more source
locations
– Register the new replicas in a Replica Location
Service
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Replica Location Service (RLS)
• A simple registry that keeps track of where
replicas exist on physical storage systems.
• Users or services register files in RLS when
the files are created.
• Query RLS servers to find these replicas.
• RLS can be a distributed registry, consisting
of multiple servers at different sites.
• Distributed RLS increases the overall scale
and store more mappings than would be
possible in a single, centralized catalog.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
RLS (cont.)
• A logical file name is a unique
identifier for the contents of a file.
• A physical file name is the location
of a copy of the file on a storage
system.
• RLS maintains mappings between
logical file names and one or more
physical file names of replicas.
• Users can provide a logical file
name to an RLS server and ask for
all the registered physical file
names of replicas.
• Users can also query an RLS
server to find the logical file name
associated with a particular
physical file location.
Logical File Name XYZ
XYZ replica 1
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Site 1
XYZ replica 2
XYZ replica 3
Site 2
www.geongrid.org
Site 3
RLS (cont.)
• Two servers: LRI, LRC
• LRC stores mappings between
Replica Location Index (RLI) Nodes
logical names for data items and
the physical locations of replicas.
RLI
RLI
RLI
• Query the LRC to discover replicas
associated with a logical name.
• RLI server collects information
about the logical name mappings
LRC
LRC
LRC
LRC
stored in one or more LRCs.
• RLI returns a list of all the LRCs it is
Local Replica Catalogs (LRC)
aware of that contain mappings for
the logical name contained in a
query.
• The client then queries these LRCs
to find the physical locations of
replicas.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
RLS in Context
• The RLS is one
component in a
layered data
management
architecture
• Consistency
management
provided by higherlevel services
Replica Consistency Management Services
Reliable Replication Service
Replica Location Service
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Metadata
Service
Reliable Data Transfer
Service
GridFTP
www.geongrid.org
GridFTP
• The GridFTP protocol provides for the secure,
robust, fast and efficient transfer of
(especially bulk) data.
• Globus Toolkit provides the most commonly
used implementation of the protocol, though
others exist.
• The Globus Toolkit provides
– server implementation called globus-gridftp-server
– scriptable command line client called globus-urlcopy
– a set of development libraries for custom clients
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Reliable File Transfer (RFT)
• A WSRF compliant web service that
provides “job scheduler” like
functionality for data movement.
• You provide a list of source and
destination URLs (including directories
or files), then the service writes your job
description into a database and moves
the files on your behalf.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
RFT (cont.)
• Accepts SOAP description of a desired transfer
• Service methods are provided for querying the
transfer status
• WSRF tools to subscribe for notifications of
state change events
• Supports all the same options as globus-urlcopy (buffer size, etc)
• Increased reliability because state is stored in a
database
• Supports concurrency, multiple files transferred
for better performance
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Globus Services
• WSRF Services
– Data Replication Service
– Delegation Service
– Reliable File Transfer
Service
• Pre WSRF
Components
– Replica Location Service
(Local Replica Catalog,
Replica Location Index)
– GridFTP Server
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Local Site
Data
Replication
Service
Delegation
Service
Reliable
File
Transfer
Service
Replicator
Resource
Delegated
Credential
RFT
Resource
Web Service Container
Local
Replica
Catalog
Replica
Location
Index
GridFTP
Server
www.geongrid.org
DRS Deployment
• Local storage system
• GridFTP server for file
transfer
• Replica Location
Service:
– LRCs stores mappings
from logical names to
storage locations
– RLI collects state
summaries from LRCs
Create a
Transfer
request
DRS
Service
Replica
Location
Index
• RFT: WSRF service to
perform data transfer
• DRS: The master
replication service
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Location
Replica
Catalog
Database
RFT
Service
GridFTP
Server
Site
Storage
System
www.geongrid.org
Local Site
Client
1
3
Data
Replication
Service
2
Request
File
4
5
Replicator
Resource
6
Delegation
Service
Delegated
Credential
12
Reliable
File
Transfer
Service
9
RFT
Resource
8
Web Service Container
Replica
Location
Index
7
13
Local
Replica
Catalog
10
GridFTP
Server
Remote Sites 1…N
11
Data
Replication
Service
Delegation
Service
Reliable
File
Transfer
Service
Replicator
Resource
Delegated
Credential
RFT
Resource
Web Service Container
Replica
Location
Index
Local
Replica
Catalog
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GridFTP
Server
www.geongrid.org
DRS Functionality
•
•
•
•
•
•
•
•
•
•
Initiate a DRS Request
Create a delegated credential (Delegate Authority)
Create a Replicator resource (Replication Service)
Monitor Replicator resource (Status)
Discover replicas of files in RLS, select among
replicas
Start data transfer to local site with RFT service
Check status
Register new replicas in RLS catalogs
Allow client inspection of DRS results
Destroy Replicator resource
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Geon DRS Test Setup
ASU
SDSC
Globus Container
Create a
Transfer
request
DRS
Service
Replica
Location
Index
Globus Container
Replica
Location
Catalog
Database
RFT
Service
GridFTP
Server
Create a
Transfer
request
DRS
Service
Replica
Location
Index
Site
Storage
System
Replica
Location
Catalog
Database
RFT
Service
GridFTP
Server
Site
Storage
System
Data Transfer
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Next Tasks
• Transfer LIDAR data from ASU to
SDSC resource. (HPSS, etc)
• Extend the testbed to include more
nodes.
• Benchmarking data movement.
• Package DRS and components with
GEON software stack version 2.0
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Acknowledgement
• Ann Chervenak & Robert Schuler (ISI)
• www.globus.org (slides)
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org
Questions?
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
www.geongrid.org