to get the file
Download
Report
Transcript to get the file
Fedora Distributed data
management (SI1)
Mohamed Rafi
DART – UQ
Outline of Work Package
To enable Fedora to natively handle large datasets.
Explore SRB integration at the storage level of the repository software.
Facilitate distributed data management using Fedora.
Data access integration
Data model integration
archival management
Metadata compatibility
Deliverables
Working integration.
Timeframe Sept 2006
2
Fedora
3
Current Storage Systems
Usage
Data store
Log
File
Digital Object – FOXML
File
DataStream
File
Resource Index
Kowari Triple Store
XACML Policies
File
Performance enhancement data Database
(for Resource Index)
Path Index
Database
Temp Dir
4
File
Fedora Storage : Issues
Currently there is only a simple file implementation for storing content.
Fedora’s definition of Distributed data is distributed repository (not implemented yet)
Problem : If a single Digital object is to be made up of Datastreams from multiple datastores,
the available options are :
Have external references
Ingest all data streams into one fedora file system
Maintain multiple repositories (not implemented) and extend fedora ‘External’ method to
fetch data from another repository.
5
Dart Architecture
6
Storage Resource Broker (SRB)
SRB is a distributed data manger and brokers data from heterogenous
data stores
Of particular interest to DART project are the following features.
Application Programming interfaces (API) exposed by SRB for client
applications accessing the SRB server. Currently APIs are available in ‘java’
and ‘c’.
Meta Data managed by the MCAT catalogue
The global namespace mechanism and the ‘collection’ view applied to
heterogenous data sources.
Authentication schemes and in particular the ‘Ticket’ abstraction.
Logical Resource concept which allow sharing and replication of data among
multiple physical resources.
Support for large datasets, multiple storage devices and high speed parallel
I/O.
7
SDSC Storage Resource Broker
& Meta-data Catalog
SRB
Resource,
User
User
Defined
Application
C, C++,
Linux I/O
Unix
Shell
Java, NT
Browsers
Prolog
Web
Predicate
SRB
MCAT
Dublin
Core
Archives
HPSS, ADSM, HRM
UniTree, DMF
File Systems Databases
Unix, NT,
Mac OSX
Third-party
copy
Remote
Proxies
DB2, Oracle,
Sybase
DataCutter
Application
Meta-data
8
Distributed Datastreams
Use new Fedora-SRB module for accessing SRB. Store
bases are limited to one collection.
Modify Fedora code to generate different storage paths for
different DART specific mime types
Text/raw , data/curate, image/protein, etc.
Define SRB collection as a logical grouping of data stores
suitable for storing the different mime types
9
Example
Digital Object Protein
Element 1 : Amino Acid Sequence – text file
Element 2 : Crystal/X-ray images - large binary file
Element 3 : 3-Dim image - Special Image file, probably copyrighted
Element 4 : Simulation Results – Large data file
Element 5 : Related research Publications – Links to external sites
Collection hierarchy
Datastream_store
(has the following sub-collections)
Text - Simple file System
Images – HPSS
Data - Some other storage System
10
MetaData
Load SRB metadata dynamically into fedora object model.
Every time the fedora FOXML object model is accessed;
go thru each of its DataStream objects
fetch the corresponding DataStream’s SRB data path.
query SRB for the meta data associated with the dataset
add the returned list to the object’s FOXML (i.e., extpropertyType).
To do this modify implementations of Digital Object Reader interface to import
SRB metadata, as soon as the digital object stream is created (by deserializing
the content model).
11
Key Problems/Issues
DART test data
Related work packages
Co-ordination
Functional Overlap
12
Future Expansion/Work Plans
Modify fedora code
Dynamic Data stream paths
Metadata update
Test Data
Test and implement the integrated software.
13