to get the file

Download Report

Transcript to get the file

Fedora Distributed data
management (SI1)
Mohamed Rafi
DART – UQ
Outline of Work Package
To enable Fedora to natively handle large datasets.
Explore SRB integration at the storage level of the repository software.
Facilitate distributed data management using Fedora.
Data access integration
Data model integration
archival management
Metadata compatibility
Deliverables
Working integration.
Timeframe Sept 2006
2
Fedora
3
Current Storage Systems
Usage
Data store
Log
File
Digital Object – FOXML
File
DataStream
File
Resource Index
Kowari Triple Store
XACML Policies
File
Performance enhancement data Database
(for Resource Index)
Path Index
Database
Temp Dir
4
File
Fedora Storage : Issues
 Currently there is only a simple file implementation for storing content.
 Fedora’s definition of Distributed data is distributed repository (not implemented yet)
 Problem : If a single Digital object is to be made up of Datastreams from multiple datastores,
the available options are :

Have external references

Ingest all data streams into one fedora file system

Maintain multiple repositories (not implemented) and extend fedora ‘External’ method to
fetch data from another repository.
5
Dart Architecture
6
Storage Resource Broker (SRB)
 SRB is a distributed data manger and brokers data from heterogenous
data stores
 Of particular interest to DART project are the following features.
 Application Programming interfaces (API) exposed by SRB for client
applications accessing the SRB server. Currently APIs are available in ‘java’
and ‘c’.
 Meta Data managed by the MCAT catalogue
 The global namespace mechanism and the ‘collection’ view applied to
heterogenous data sources.
 Authentication schemes and in particular the ‘Ticket’ abstraction.
 Logical Resource concept which allow sharing and replication of data among
multiple physical resources.
 Support for large datasets, multiple storage devices and high speed parallel
I/O.
7
SDSC Storage Resource Broker
& Meta-data Catalog
SRB
Resource,
User
User
Defined
Application
C, C++,
Linux I/O
Unix
Shell
Java, NT
Browsers
Prolog
Web
Predicate
SRB
MCAT
Dublin
Core
Archives
HPSS, ADSM, HRM
UniTree, DMF
File Systems Databases
Unix, NT,
Mac OSX
Third-party
copy
Remote
Proxies
DB2, Oracle,
Sybase
DataCutter
Application
Meta-data
8
Distributed Datastreams
 Use new Fedora-SRB module for accessing SRB. Store
bases are limited to one collection.
 Modify Fedora code to generate different storage paths for
different DART specific mime types
 Text/raw , data/curate, image/protein, etc.
 Define SRB collection as a logical grouping of data stores
suitable for storing the different mime types
9
Example
 Digital Object Protein
 Element 1 : Amino Acid Sequence – text file
 Element 2 : Crystal/X-ray images - large binary file
 Element 3 : 3-Dim image - Special Image file, probably copyrighted
 Element 4 : Simulation Results – Large data file
 Element 5 : Related research Publications – Links to external sites
 Collection hierarchy
 Datastream_store
(has the following sub-collections)
 Text - Simple file System
 Images – HPSS
 Data - Some other storage System
10
MetaData
Load SRB metadata dynamically into fedora object model.
Every time the fedora FOXML object model is accessed;
 go thru each of its DataStream objects
 fetch the corresponding DataStream’s SRB data path.
 query SRB for the meta data associated with the dataset
 add the returned list to the object’s FOXML (i.e., extpropertyType).
To do this modify implementations of Digital Object Reader interface to import
SRB metadata, as soon as the digital object stream is created (by deserializing
the content model).
11
Key Problems/Issues
 DART test data
 Related work packages
 Co-ordination
 Functional Overlap
12
Future Expansion/Work Plans
 Modify fedora code
 Dynamic Data stream paths
 Metadata update
 Test Data
 Test and implement the integrated software.
13