Data Management in Geodise
Download
Report
Transcript Data Management in Geodise
Data Management in Geodise
Zhuoan Jiao, Jasmin Wason & Marc Molinari
{ z.jiao, j.l.wason, m.molinari } @soton.ac.uk
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Providing Data Management Services for
Engineering
Engineering design and optimisation is a computationally
intensive process.
Large quantities of data may be generated at different
locations with different characteristics.
Engineering data is traditionally stored in flat files with
little descriptive metadata provided by the file system.
Our focus is on leveraging existing database tools not
commonly used in engineering …
…and making them accessible to users of the system.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Tools and Services (1)
File storage
Applications can archive data sent over GridFTP in
file systems for benefits of:
Accessibility by a larger community (via authorisation)
Storage capacity
Additional metadata storage and query facilities
Metadata management service
The data can be stored with additional descriptive
information detailing standard metadata (e.g. file
format, description) and application domain specific
metadata (e.g. grids, flux_order).
An XML database is used as is it flexible enough to
store nested, complex engineering data.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Tools and Services (2)
Query service
Queries can be performed over the metadata
database to help the user locate required data
intuitively and efficiently.
Authorisation service
Access rights to data can be granted to an
authenticated user based on information stored in an
authorisation database.
Location service
Files are referenced with a unique handle.
The location service provides access to a database of
file locations mapped to handles.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Data Management Implementation for MATLAB
To increase the usability of file and metadata management services for
Engineers we have implemented a MATLAB Toolkit for archiving,
querying and retrieval of data to and from a Geodise repository.
Client
Grid
Geodise Database
Toolkit
Matlab
Functions
Globus Server
Refers
to
GridFTP
Java
clients
.NET
Location
Service
Location
Database
Authorisation
Service
Authorisation
Database
CoG
Apache
SOAP
SOAP
SOAP
Browser
Java
Metadata
Archive & Query
Services
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Metadata
Database
Geodise Database Toolkit for MATLAB –
Archive
gd_archive – Store a file with some metadata.
gd_datagroup – A datagroup is a collection of related files that may
be logically grouped together – this can also have associated
metadata.
Syntax:
groupID = gd_datagroup(<group_name>, [<metadata>])
fileID = gd_archive(<file_name>,[<metadata>],[<groupID>])
Examples:
m.dimension = ‘2D’;
m.component.gamma = 1.4;
groupID = gd_datagroup(‘2D-LP turbine rotor job9’, m)
meta.grids = 1
meta.flux_order = 2
fileID = gd_archive(‘input.dat’, meta, groupID)
fileID = gd_archive(‘mesh_ns.grid.1.adf’, [], groupID)
fileID = gd_archive(‘airfoil.msh’)
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
XML Toolbox for MATLAB
Marc Molinari – GEM project.
xml_format():Convert a MATLAB variable into an XML string.
xml_parse():Convert an XML string into a MATLAB variable.
Example:
>> A.b = ‘Hello World’;
>> A.c.aa = [1 2; 3 4; 5 6];
>> X = xml_format(A)
X =
<struct idx="0" size="1 1" fields="a b c">
<char idx="1" name="b" size="1 11">Hello World</char>
<struct idx="1" name="c" size="1 1" fields="aa ">
<double idx="1" name="aa" size="3 2"> 1 3 5 2 4 6
</double>
</struct>
</struct>
>> Y = xml_parse (X);
>> str = Y.b
str =
Hello World
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Application of XML Toolbox for MATLAB
Metadata set by user as a MATLAB structure.
MATLAB structure Type-based XML
More natural format for MATLAB user.
Element names = variable types (e.g. <struct>, <double>)
Easier for conversion to and from structures.
Type-based XML Name-based XML
Element names = variable names (e.g. <grids>, <turb_model>)
Easier for database query.
xml_format.m
Type-based
XML
XSLT
type2name
Name-based
XML
xml_parse.m
Type-based
XML
XSLT
name2type
Name-based
XML
MATLAB
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Geodise Database Toolkit for MATLAB – Query
gd_query
Text based query expressed over MATLAB variables for use in MATLAB
scripts.
Converted to XPath to query XML database.
XML Toolbox used to convert results into a list of metadata structures.
Syntax:
Results = gd_query( <query string>,[‘file’|‘datagroup’] )
Example 1: datagroup
Results = gd_query(‘dimension = 2D’, ‘datagroup’)
Results{1}.standard.files.fileID
ans =
input_dat_632d05be-ba26-479b-9607-d1845f3c78ff
ans =
mesh_ns_cs_adf_ce875805-47b7-4e25-a5f7-9a8adf8f21b6
Example 2: file
r = gd_query(‘standard.userID = me & grids < 2’);
r{1}.grids
ans =
1
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Geodise Database Toolkit for MATLAB –
Retrieve
gd_retrieve
Retrieve a file from the repository using unique handle.
Asks Authorisation service whether user has permission to retrieve the
file.
Asks Location service where the file is.
File transferred back to local file system using GridFTP.
Syntax
newFileLocation = gd_retrieve(<fileID>, <localPath>)
Examples
gd_retrieve(‘input_dat_632d05be-ba26-479b-960…’, ‘E:\tmp’)
ans =
E:\tmp\input.dat
gd_retrieve(‘input_dat_632d05be-ba26-479b-960…’,
‘E:\tmp\control42.dat’)
ans =
E:\tmp\control42.dat
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Authorisation
Data Authorisation
Globus certificate subject mapped to user ID.
User sets access rights for the data they archive, so it can be
queried and retrieved by others.
Access rights stored in a relational database, accessed through
Authorisation web service.
Grant users and groups access rights by including their user ID
or group ID in the metadata structure.
Example
m.grids = 1
m.access.users = {‘userA’,’userB’}
m.access.groups = {‘groupC’}
gd_archive (‘input.dat’, m)
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Future Work
Archive structures as XML
OGSA DAI integration
Cannot query inside archived files.
Archive MATLAB structures as XML and query them.
Replace and enhance some of our functionality with that
provided by OGSA DAI.
E.g. Name mapping interface for authenticating Grid credentials
to local ids (system and relational database ids) .
Change database system
Xindice XML database – flexible and good for prototyping but not
scalable and no security.
Will choose a relational database with XML capabilities – Oracle,
DB2, SQL Server.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/