Transcript Slides PPT

Integrating Data Management into
Engineering Applications
Zhuoan Jiao, Jasmin Wason, Marc Molinari,
Steven Johnston & Simon Cox
School of Engineering Sciences
University of Southampton, UK
{z.jiao, j.l.wason, m.molinari, s.j.johnston, sjc} @soton.ac.uk
Geodise Project
Grid Enabled Optimisation and Design Search for
Engineering
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Challenges
Large quantities of data generated at different
locations with different characteristics.
Engineering data is traditionally stored in flat files with
little descriptive metadata – hard to search and share.
Our focus is to leverage existing database tools not
commonly used in engineering applications, and …
… provide them in an environment familiar to
engineers.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Geodise Database Toolkit
Overview
Store data with additional descriptive information.
Standard metadata (file name, size, …)
User-defined application specific metadata
Query over metadata to more easily locate required data.
Retrieve data based on logical data identities.
Provide a familiar interface for engineers – wrap database
services as Matlab functions and metadata as Matlab
structures and variables.
Tools can be used in scripts running locally or on a remote
compute resource.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Architecture
Client
Grid
Geodise Database
Toolkit
Matlab
Functions
Globus Server
GridFTP
Java
clients
CoG
Geodise Database
Web Services
Apache
SOAP
Location
Service
SOAP
Authorisation
Service
GUI
SOAP
Metadata
Archive & Query
Services
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Metadata
Database
Services (1)
Storage
Archive data in file systems, sent over GridFTP.
Archive Matlab structures and variables as XML documents
in a database.
Support datagroup concept to aggregate related data.
Location service to map logical data identities with physical
storage locations.
Authorisation service
Access rights to data can be granted to authenticated users
from Matlab.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Services (2)
Metadata archive service
Descriptive information can be added to data:
Standard technical metadata (e.g. file size, format, date): mostly autogenerated.
Application specific metadata: user defined Matlab structure.
Query service
Query over metadata to efficiently locate required data.
Client side command-line and GUI interfaces.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Client Tools (1)
Archive data - gd_archive
Store files/ structures into archive with some metadata.
metadata.model = ‘pgb_design’
metadata.result.bandgap = 20
fileID = gd_archive(‘C:\input.dat’, metadata)
var.a = [1.4, 5.32, 4.98]
structID = gd_archive(var, metadata)
Group data - gd_datagroup, gd_datagroupadd
Logically group together related data.
groupID = gd_datagroup (‘my datagroup’, group_metadata)
gd_datagroupadd (groupID, fileID)
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Client Tools (2)
Query data - gd_query
Query archive from script or GUI.
gd_query (‘file.userID = me &
result.bandgap < 40’,
‘file.*’)
Retrieve data - gd_retrieve
Retrieve archived data to local machine.
gd_retrieve (fileID, ‘E:\files\control.dat’)
var = gd_retrieve (structID)
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
XML Toolbox for Matlab
Matlab Variables/Structures
XML
Convert proprietary format to XML description and
vice versa transparently, easy-to-use.
Benefit: XML can be transferred, stored, queried and
retrieved across the Grid.
Two functions used in database toolbox:
xml_format(): converts a MATLAB variable to an XML string.
xml_parse(): converts an XML string to a MATLAB variable.
GEM project
http://www.soton.ac.uk/~gridem/Pages/xmltoolbox.htm
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Application of XML Toolbox in Database
Toolkit
Matlab
(A) Generate file
File archive
(B) Archive
local file path
structure
Data file
XML
filehandle
(C) Query
Metadata
database
query string
structure
structure
structure
XML
XML
XML
filehandle
(D) Retrieve
filehandle
local file path
Data file
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Geodise Database Toolkit used in GENIE
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Current Work - XML Schema Generation and Evolution
XML Schemas describing metadata can be used for:
Automatically generating graphical query interfaces.
Improving query performance.
Categorisation of user defined metadata.
Schemas automatically generated from XML
Modified tool from the Castor project (http://castor.exolab.org).
Schemas evolve over time
Changes are made to user-defined metadata as a design is developed.
SchemaEvolver tool compares generated schema with those previously stored.
Depending on similarity weighting of closest match the outcome is:
Exact match – Metadata conforms to an existing XML Schema
Similar – Existing XML Schema modified to include differences
No match – Metadata conforms to a new XML Schema which must be stored
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Example – Similar XML Schemas
<metadata>
<a> 5.342 </a>
<b> 2D </b>
<c> new info </c>
</metadata>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="metadata">
<xs:complexType>
2
<xs:sequence>
<xs:element name="a" type="xs:float"/>
<xs:element name="b" type="xs:string"/>
<xs:element name="c" type="xs:string"/>
…
1
1. Generate XML from user
metadata structure and
store in database.
3
2. Generate XML Schema
from XML.
3. Compare with previously
stored schemas.
4. If similar schema found
merge the two to create a
new, evolved schema.
5.
Database
5
Compare
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="metadata">
<xs:complexType>
<xs:sequence>
<xs:element name="a" type="xs:float"/>
<xs:element name="b" type="xs:string"/>
…
4
Evolve
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
Add this schema to the
<xs:element name="metadata">
<xs:complexType>
database and associate
<xs:sequence>
the XML with it.
<xs:element name="a" type="xs:float"/>
<xs:element name="b" type="xs:string"/>
© Geodise Project, University of Southampton,
2003.
<xs:element name="c" type="xs:string“ minOccurs="0" />
http://www.geodise.org/
…
Future Work
Metadata and XML Schemas
More work on producing an evolved schema with the SchemaEvolver
GUI tool for user input to assist XML schema modification
Further research into XML Schema versioning
Web query interface
Dynamic generation based on database and XML Schemas.
Database support for Geodise graphical workflow construction
Infrastructure
Improved Web Service security and use of OGSA-DAI.
Jython implementation of client toolkit
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/