XML with Structural Information Description
Download
Report
Transcript XML with Structural Information Description
An Extension to XML Schema
for Structured Data Processing
Presented by: Jacky Ma
Date: 10 April 2002
Presentation Outline
The Problems
Research Objectives
The Schema Extension: MMX
MMX Query System
Discussion
Conclusion
The Problems
Mapping XML data into relational tables
Legacy application-specific structured data
Not natural to XML structure
Efficient, but may not be a effective method
Similar modeling but proprietary implementation
Not interoperable, and difficult to maintain
Lack of modular design and thus difficult to combine
to form more complex data structure
Meta-data can facilitate wide range of needs,
while XML Schema is solely used for physical
data validation nowadays
Research Objectives
To facilitate more effective searching and
storing of XML contents by making use of
meta-data (XML Schema)
Propose a data-oriented model to allow
different storage mechanism, processing
model, and query model on XML contents
Our Approach – MMX
Use meta-data to map XML data into
structured data objects
Define the structured data models
“conceptually” and link the models to
XML document structure “syntactically”
Meta-data is defined as an extension of
XML Schema
The extension is called MMX (Multi
Model XML)
Program Driven vs. Data Driven
Information for processing
is hard-coded in program
Program Driven
Raw Data
Structured Data (XML)
Data with Modeling Information
MMX!
Data with Program Codes
Data Driven
Processing instruction
is hard-coded in data?!
A Glance of XML Data
A Glance of The Linked Schema
Schema Extension
The extended schema is associated with a namespace
The extended schema goes within a schema element, like
<tree:element> in the example
<tree:element> specify a single structure object instance
Name association for elements and attributes
Class hierarchies:
<tree:element> -> <tree:internal> -> <tree:leafNode>
finally to the structure specified in <tree:leafNodeValue>
Additional properties in <rootNodeAttr>, <internalNodeAttr>
and <leafNodeAttr>
Schema writer has to know the structure model
specification, while the XML writer only needs to know
the given schema
Modeling
For an instance of “MMX data object”
As an encapsulated information object only
accessible from the root, thus as a “single tree node”
As a mapping from root node, query method and
query parameters to the value at leaf nodes
Leaf nodes may contain any valid XML content, as
long as defined in the Schema
I.e. may contain another “MMX data object”
A query is modeled as a 3-dimension tuple:
[accessing-node, query-method, query-parameters]
Accessing-node is specified by XPath
Query-method is specified in String Value
Query-parameters is multi-dimension depends on the current
model
Modeling (2)
A
Tree(1) is accessible from
point A, occasionally, a query
(e.g. [A, “spatial-search”,(3, 5)],
assuming Tree(1) will accept
spatial-search with two coordinates)
Tree (1)
B
XML Elements..
Tree(2)
may return point B as answer,
either by XPath of B or the
XML subtree of B.
From this point B, user may
drill down the tree by issue
another query on Tree(2).
Query with and without MMX
From the original XML data, we could not
assume the semantics of the data:
We can ONLY do XML-based query such as XPath
We can do the spatial query ONLY IF we can map
the data into a R-Tree
After mapping the data into R-Tree
Spatial Queries
Give me the point at (2,7)
Give me the point nearest to (4,4)
Nearest Neighbor Search
Give me the point nearest to “Franklin”
(0,0)
Processing
Users might not know the “type” of the
node (and not necessary to know). They are
interested in what they can do
Users retrieved the list of possible
operation by issuing a LIST-OPERATION
method to the root element of a MMX
object
Possible operations may include queries,
updates, and other model-specific
operations
MMX Query System
To show that the schema, modeling, and
processing of MMX extension is workable
To illustrate how it assists in querying
XML data
To facilitate as the platform for testing the
implementation of arbitrary structured
models
Implement with JDK1.4
System Design
XML
DOM
Node Data
Schema
Parse
Schema
Fetch
Classes
MMX Element
Abstract
MMX Element
Extends class
(Partly)
Defines
R-Tree Maps
Schema
VP-Tree
X-Tree
R-Tree
MMX
Document
…
Clients
The Abstract Class defines
common interface that
have to be implement in
each MMX Element such
as LIST-OPERATION,
QUERY, BUILD, etc.
Discussions - Pros
Compatible with the relational approach, and
supersedes that.
Modular design promotes reusability and
maintainability
XML “flatten” the legacy structured data to
make them text-editable, easy to transport and
process by different systems
Discussion - Cons
There is no generic syntax to precisely
describe all kinds of structures models
The size of XML file is often larger than
legacy data file
Each structure model needs additional
implementation effort
Schema specification become longer and
longer quickly as number of supported model
increases
Conclusion
Propose a representation to encapsulate data
structures
Describe XML data with the Schema
conceptually as well as syntactically
Map legacy structure models into Schema, and
map XML data to the structure models by the
Schema
Structured data repository with increased
interoperability, reusability, and transportability
Q&A