Slides - Indico
Download
Report
Transcript Slides - Indico
Data Bridge
Solving diverse data access in scientific applications
Zoltán Farkas, Péter Kacsuk, Mark Santcroos, Silvia
Olabarriaga, Ákos Balaskó, Krisztián Karóczkai
[email protected]
Outline
• Problem statement
• Data Bridge as independent DCI service:
– Data Bridge concept
– Use-cases
– Data Bridge architecture
• WS-PGRADE integration
– Data browsing portlet
• gUSE integration
Problem statement
• Scientific applications:
– Individual jobs or workflows
– Access data from diverse sources
– Science Gateways can hide the details, but…
• Data sources:
– Diverse types: HTTP, FTP, GridFTP, SRM, iRODS, …
– Thus, different APIs are needed to access these
• One possible solution is to use a service that can
be used to access the sources through a unified
interface
Existing solutions
Name
Supported storages
Access
possibilities
OGSA-DAI
Web services, XML databases, file
services
Web service
Storage Resource
Broker
File systems, Relational
Databases
Web, APIs,
Command line
iRODS
Disk, Tape, Database, Filesystem
with Metadata catalog
Web, WebDAV,
Java API,
Command line
jSAGA
FTP, GridFTP, SRM, LFC
Java API
Globus Online
FTP, GridFTP
Web interface
Data Bridge
• Offers a simple service that provides a
generic interface above different DCI's
storage services to handle the data stored
• The service in different use cases offers a
way to browse, upload and download data,
and with the help of multiple server instances
it enables inter-DCI data transfer as well
Use cases
• Use case 1: Browse a single DCI data
storage from WS-PGRADE, upload data
• Use case 2: Transfer data files between
different DCIs
• Use case 3: Fetch input data on a DCI
worker node from an other DCI
• Use case 4: Cloud storage usage
Use case 1: Storage browsing
and data upload
WS-PGRADE
Storage Browsing
Portlet
Data Bridge
Adaptor Interface
Storage Adaptor
Storage
Browse
and
upload
Use case 2: Data Transfer – Using
multi-level Data Bridge
Client:
•Storage Browsing Portlet
•Custom application
•…
Data Bridge
Adaptor Interface
Storage Adaptor1
Data Bridge Adaptor
Data Bridge
Adaptor Interface
Storage Adaptor2
Storage1
Storage2
Use case 3: Fetch data on a DCI’s worker
node from a „foreign” DCI’s storage
• Data bridge usage guidelines:
– First try to fetch the data using native tools
– Only if this fails, use the Data Bridge
DCI Worker node
Data Bridge
Wrapper
Adaptor Interface
Pre-process
Storage Adaptor
Executable
Storage
Post-process
Use case 3: Get FTP data from PBS
• Could be other protocols (e.g. SRM) as well
PBS Worker node
Data Bridge
Wrapper
Adaptor Interface
Pre-process
FTP Adaptor
Executable
FTP
Server
Post-process
Use case 4: Cloud Storage
access from WS-PGRADE/gUSE
• Currently, no S3 support in WS-PGRADE
• An S3 Data Bridge adaptor would fix this
DCI
WS-PGRADE/gUSE
Worker node
Job
Amazon S3
Data Bridge
Data Bridge Architecture
Public Interface
HTTP servlet
Adaptor Manager
Temporary URL queue
Worker Pool
Thread1
Thread2
Threadn
URI
URI
Adaptor Interface
DCI Adaptor1
jSAGA
DCI Adaptor2
DCI Adaptor3
DCI Adaptorm
URI
Data Bridge components
• Interfaces:
– Public Interface
– Adaptor Interface
• Adaptor Manager
• Worker Threads
• DCI Adaptors
Data Bridge componentsInterfaces
• Public Interface:
– Provides the public interface for external
components (Portlets, gUSE, …)
– Web Service interface
• Adaptor Interface:
– A Java interface that hides the details of the
different adaptors
Data Bridge Public Interface
• Operations:
–
–
–
–
–
–
–
List
Mkdir
Delete
Get
Put
Copy
Move
• Entities:
– URI (either a path, an URL or some specific class)
• Error reports:
– Common exceptions
Data Bridge Public Interface URI
• Represents an element with a given URI (a
directory, a file, metadata attributes, …)
• Also needs to carry security credentials (if needed)
• Attributes:
– Nothing special in the base class
– For gLite, e.g:
•
•
•
•
Path: the full path
Type: directory or file
Size: length of the entity (0 for directories)
Attributes: optional, contains information as returned by the
Adaptor Interface's Stat function
Data Bridge Public Interface
– Get and Put
• Two-phase up- and
with
the
Publicdownload
Interface
HTTP
servlet
temporary URL queue:
Adaptor
Manager
• First, the web
service
interface is invoked to
Temporary URL queue
Worker Pool
register the transfer
request
URI
URI
Thread
Thread
• Next, a simple
HTTP Thread
client may useURIHTTP
GET
or POST/PUT to down- or upload the data
1
2
n
• This way, web service
invocation
Adaptor
Interface
(„heavyweight” SOAP) is separated from
data transfer („lightweight” HTTP)
DCI Adaptor1
DCI Adaptor2
DCI Adaptor3
DCI Adaptorm
Adaptor Manager and Worker
threads
• Provided by JAX-WS web service API
• Tasks:
– Manage incoming requests
– Initialize worker threads to perform the
requested operation
– With the help of different adaptors
DCI Adaptors
• Implement: Adaptor Interface
• Tasks:
– Perform operations requested by the Worker Threads,
that is operations invoked through the web service
• Types:
–
–
–
–
–
gLite (using jSAGA)
GridFTP (using jSAGA)
FTP (using jSAGA)
…
Data Bridge: special adaptor to forward requests to
other Data Bridges
Data Bridge clients
• Web Service clients:
– Create your own based on the WSDL (or REST)
• Java API:
– Provides a convenient tool to use Data Bridge
Public Interface functions
– Data transfer functions should accept
InputStream and OutputStream objects as their
arguments
WS-PGRADE integration
• A Data Browsing portlet that eases storage
management
WS-PGRADE Workflow
I/O configuration
• During a workflow
node's IO configuration
the user should be able
to select files from
storages
• The provided interface
should be the same as
the selected storage's
Storage Browsing
portlet (only with one
panel)
Current status, future work
• Core Data Bridge (available as a web
service) ready, working with most major
protocols (FTP, GridFTP, SRM)
• User Interface development has been
started, first version will be available as part
of WS-PGRADE/gUSE shortly
Questions
Thank you for your attention!
?