Data Management in Geodise

Download Report

Transcript Data Management in Geodise

Data Management in Geodise
Zhuoan Jiao, Jasmin Wason and Marc Molinari
30-31 January 2003, Edinburgh
Engineering design and optimisation is a computationally intensive process where
data may be generated at different locations with different characteristics. Data is
traditionally stored in flat files with little descriptive metadata provided by the file
system. Our focus is on providing data management by leveraging existing
database tools that are not commonly used in engineering and making them
accessible to users of the system.
The main objectives are to provide:
A data management service
Store and retrieve data files securely from a repository using GridFTP.
Technical and application specific metadata added so data is easier to search for and locate.
Metadata management services
Web services provide API access to metadata in databases.
Use both relational and XML databases.
A familiar interface for engineers
Work with functions and variables rather than underlying XML, SOAP, SQL, XPath, etc.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Geodise Database Toolkit
Storage service
Allows applications to archive data
sent over GridFTP in file systems
curated by Geodise for benefits of:
accessibility by a larger community
(via authorisation), storage capacity,
and a uniform query interface.
Metadata service
The data can be stored with additional
descriptive
information
detailing
technical characteristics (e.g. location,
format), ownership, and application
domain specific metadata.
Query service
Query over the metadata database
can help to locate the needed data
intuitively and efficiently.
Example:
Authorisation service
Access rights to data can be granted
to an authenticated user based on
information stored in the authorisation
database.
Example:
Archive data:
>> fileID = gd_archive('C:\input.dat');
Retrieve data:
>> gd_retrieve(fileID, 'E:\tmp' )
ans = E:\tmp\input.dat
Example:
Define metadata and archive file:
>> m.grids = 1;
>> m.turb_model = 'sa';
>> fileID = gd_archive('C:\input.dat', m);
Example:
>> r = gd_query('standard.userID = me & grids < 2');
To access a value from first result in cell array r:
>> r{1}.turb_model
ans = sa
>>
>>
>>
>>
m.grids = 1;
m.access.users = {'userA', 'userB'};
m.access.groups = {'groupC'};
fileID = gd_archive ('C:\input.dat', m);
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Application of XML Toolbox for MATLAB
Type-based XML (easy for converting back to Matlab)
Define Matlab variables:
>> meta.grids = 1
>> meta.turb_model = ‘sa’
X = xml_format(meta)
X =
meta = xml_parse(X)
<struct xmlns="http://www.geodise.org/matlab.xsd" idx="0“
fields="grids turb_model">
<double idx="1" name="grids" size="1 1"> 1 </double>
<char idx="1" name="turb_model" size="1 2">sa</char>
</struct>
XSLT: name2type
The following functions are
responsible for converting a
Matlab variable to and from an
XML string:
Name-based XML (easy for query)
xml_format() - Convert
X =
a Matlab variable into an XML
string.
<file_metadata type="struct" idx="0"
fields="grids turb_model">
<grids type="double" idx="1" size="1 1"> 1 </grids>
<turb_model type="char" idx="1" size="12">sa</turb_model>
</file_metadata>
xml_parse()
- Convert
an XML string into a Matlab
variable.
XSLT: type2name
Database
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Data Management Implementation
To increase the usability of file and metadata management services for Engineers we have implemented
a MATLAB Toolkit for archiving, querying and retrieval of data to and from a Geodise repository.
Client
Grid
Geodise Database
Toolkit
Matlab
Functions
Globus Server
Refers
to
GridFTP
Java
clients
.NET
Location
Service
Location
Database
Authorisation
Service
Authorisation
Database
CoG
Apache
SOAP
SOAP
SOAP
Browser
Java
Metadata
Archive & Query
Services
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Metadata
Database
Query Example
1
MATLAB commands to
retrieve files.
2
3
Copy and paste
4
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Future Work
Grid Data Management
Replace and enhance some of our functionality with that provided by OGSA DAI for
Grid Database Services.
E.g. Name mapping interface for authenticating Grid credentials to local ids.
Automatic collection of data and metadata from a higher level engineering problem
setup GUI.
Manage Matlab Structures
Some data may take the form of Matlab structures rather than files.
These can be archived as XML in the repository and then queried.
The structures can also be retrieved back into the Matlab workspace.
Categorisation of metadata based on XML Schemas
Group metadata with same XML Schema in the database.
Users are not expected to write XML Schemas.
Generate simple XML Schema from metadata structure if one does not already exist
to describe it.
Could help future integration of ontologies.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/