Index structures

Download Report

Transcript Index structures

IDAR 2007
A Generic Framework for Querying and Updating
Secondary XML Index Structures
Katharina Grün
Johannes Kepler University Linz
Department of Business Informatics
Data & Knowledge Engineering
Altenberger Str. 69, 4040 Linz
Austria/Europe
www.dke.jku.at
Research Methodology
Become aware of problem
Suggest solution
Construct solution
Evaluate solution
2
Motivation

Widespread use of XML

XML databases for efficient query and update processing

Require index structures on content and structure of documents

primary index structure




secondary index structures




default index
on whole document
not optimized for specific queries
created on demand
on specific document fragments
adapted to query workload
Framework for querying and updating secondary XML index structures
(SCIENS)
Become aware of problem
3
Running example
projects
Resource
project*
...
Report
@name milestone*
Financial
Report
@id @name resource*
@date
author*
info
description keyword*
Technical
Report
path: projects/project[1]/@name
labelpath: projects/project/@name

//resource[@date>= '2005-01-01']

//project[@name='sciens']/milestone[@id=2]/resource[@date>='2007-01-01']

//element(resource, Report)[author='Smith']
Become aware of problem
4
Challenges

Which secondary index structures are necessary?



How to integrate them into a common framework?



each kind of query is best supported by different index structure
not possible to provide one index structure for each possible query
each secondary index can index arbitrary properties of arbitrary fragments
query and update processing must not depend on specific indices defined
How to update them when documents change?


document updates must be propagated to affected index structures
incremental index maintenance algorithm
Become aware of problem
5
Related work (1)

XML databases


XML index structures




structure and/or content
mostly primary index structure
based on different models, proprietary structures
Object-oriented index structures


limited support for secondary index structures
proprietary structures to support queries on path navigation and/or
inheritance hierarchies
Multidimensional index structures


support several value dimensions
do not consider structure
Become aware of problem
6
Related work (2)

Extensible indexing



Indexing tasks




object-relational databases
adapt index structures to different data types
Maintain secondary indices when documents are updated (KeyX1)
Select optimal index for specific query (XML Access Modules2)
Suggest set of indices for query workload (KeyX1)
currently no integrated approach for processing secondary index
structures in an XML database
1) B.C.Hammerschmidt: KeyX: Selective Key-Oriented Indexing in Native XML Databases. Phd Thesis, University of Lübeck, 2005.
2) Arion, A., Benzaken, V. and Manolescu, I.: XML Acess Modules: Towards Physical Data Independence in XML Databases. Ximep
workshop, 2005.
Become aware of problem
7
SCIENS - Ideas

Structure and Content Indexing with Extensible, Nestable Structures

Which secondary index structures are necessary?



How to integrate them into a common framework?



select a small set of index structures and adapt them to various properties
nest index structures to reflect hierarchical queries
provide an index model
common index interface to query and update indices
How to update them when documents change?


index maintenance algorithm that determines updates for arbitrary indices
based on update fragments and index definitions
Suggest solution
8
Index structures – one dimension (1)

Value indexing



hashtable or B+-tree on value
@date>= '2005-01-01'
Structure indexing

hashtable or B+-tree on path/labelpath/type

//resource

/project[1]//resource

/project[2]/milestone[2]/resource
Construct solution
9
Index structures – one dimension (2)
projects
project 2
milestone 2.1
resource
2.1.1
resource
2.1.2
2.2
2.3
...
milestone 2.2
resource
2.4
milestone 3.1
...
resource
2.4
2.1
project 3
3.3
3.1
3.2
3.3
4.1
2.1.1,
2.1.2
Construct solution
10
Index structures – multiple dimensions (1)
property
example
index structure
(value | structure)+
@date>= '2005-01-01' and
kdb-tree1
author='Smith'
//project[]/milestone[]/resource and
@date>= '2005-01-01'
2-2007
1-2.5
1-3.3
2.4
2006-01-01
3.1
2006-02-05
22007-03
3.3
2004-05-05
1.7
2007-01-02
2.1
2007-05-06
2.8
2007-02-03
2.8
2007-03-01
2.9
2007-03-01
node ids
1) Robinson, J.: The KDB-tree: A search Structure for Large Multidimensional Dynamic Indexes. Sigmod, ACM Press, 1981.
Construct solution
11
Index structures – multiple dimensions (2)
property
example
index structure
((value | structure) ∆
//project[]/milestone[]/resource and
index nesting
(value | structure))+
@date>= '2005-01-01'
//resource and @date>= '2005-0101'
2.4
2.1
2.2
2.3
2007
2005-02-01
2006-03-15
2004-03
3.3
2.4
3.1
3.2
2005-02-01
...
2007-03-21
1.2
3.1
2007
2006-03-15
2007-03-21
3.5
node ids
node ids
Construct solution
12
Comparison
time (ms)
I1
I2 (date,
I3 (date >
I4 (hierarchy >
(date)
hierarchy)
hierarchy)
date)
Q1 (specific milestone)
598
74
199
11
Q2 (specific project)
84
68
57
27
Q3 (all)
14
63
20
207
average
232
68
92
82

queries and indices on milestone hierarchy and date

e.g. //project[1]//resource[@date>2005-01-01]

define index that best matches query workload
Evaluate solution
13
Index framework (1)

index



index entry



maps index keys (value, type, path,…) -> returned nodes
TechnicalReport, Smith -> 3.2.1, 4.3.1,...
index definition




search function consisting of a set of index entries
provides interface to update and retrieve index entries
selects nodes to be indexed
//element(resource, $V1)[author=$V2]
represented as unordered tree pattern with index variables
resource
author
$T1, R
$E2
index structure


specific data structure (hash table, prefix B+-tree, kdb-tree)
one index can use several index structures (index nesting)
Construct solution
14
Index framework (2)

index configuration





provides mapping from index to specific index structure
associates with each index variable the index structure to be used
$T1, $E2: kdb-tree
$E2: hash table, $T1: B+-tree
search configuration




used to access index
associates index key to be searched with each index variable
generated by index selection tool
$T1= Report, $E2= 'Smith'
Construct solution
15
Index maintenance

propagate document updates to affected indices

steps
1.
2.
3.
resource
author
find embeddings of index patterns in update fragments
execute queries
generate index entries
<Technical
resource
Report>
$T1, R
$E2
@date
author
2007-06-10 Smith
info
keyword
<Technical
resource
Report>
@date
2007-06-10 Smith
XML
[(TechnicalReport, 'Smith')  resource]

author author
Tim
info
keyword
XML
[(TechnicalReport, 'Tim')  resource]
up to 9 times faster than existing approach (KeyX)
Construct / evaluate solution
16
Conclusion

select secondary index structures for XML



integrate index structures into framework



hides indexing tasks from query and update processing tasks
provides index model (common index interface)
index maintenance algorithm


extensible: various properties and operations on these properties
nestable: adapt indices to hierarchical queries
propagate updates to index structures
flexibility to define indices that match the query workload
17