Transcript Streilein
Data Management
for XML:
Research Directions
By: Jennifer Widom
Stanford University
Reviewer: Kristin Streilein
Paper Objectives:
whitepaper
“thoughts on the research
opportunities XML brings to
the general area of data
management”
Not a survey
offers “personal opinions and
thoughts on Data
Management for XML”
“written from a true research
standpoint”
Important Commerical
Perspectives not covered
How will XML be used?
Data exchange format?
Data storage format?
With or without DTDs?
Application interoperation
and data integration will still
cause problems
Proposed mechanisms for
inter-document references
and proposed extensions or
alternatives to DTDs for
richer schema definitions not
covered
Current State of Query
Processing of Web Information
HTML Pages
Needs be preprocessed for
meaningful queries
Simple keyword-based
searches
Traditional DBMS
Simple & rigid forms-based
interfaces
Sample XML Research
Topics:
Ability to map XML-encoded
info into a true data model
Resolve conflicts from mixing
concepts of documents and
databases
Designing XML databases
Theoretical results
Practical techniques
Relationship between XML
DTDs and traditional
database schemas
Sample XML Research
Topics:
Query language(s)
Database updates in XML
setting
Efficient physical layout and
indexing mechanisms
Query Processing
View mechanisms
How to scale everything to
web proportions
Lore Project at Stanford:
Personal Research Agenda
Storage and Indexing
Clustering schemes
New index types
Compression
DataGuides and DTDs
Build validating into XML database
system
Encode subelement ordering
Performance and functionality
tradeoffs (DataGuides & DTDs)
Combine DataGuides & DTDs
Browse database structure
Allow updates propagate database
Lore Project at Stanford:
Personal Research Agenda
Databases and Information
Retrieval
Keyword search
Proximity search
Similarity search
Other Database Features
Views
Constraints
Triggers
Change Management
Lore Project at Stanford:
Personal Research Agenda
Mixing Semistructured and
Structured Data
Finding the structure
Exploiting the structure
XML in/on a Traditional
DBMS
Performace Evaluation
Appropriate benchmark for
what XML data should look like
Type of queries & mix of
queries and updates