Transcript (xBR) tree

Fifteenth East-European Conference on
Advances in Databases and Information Systems
September 19th–23rd, 2011, Vienna, Austria
Performance Comparison
of xBR-trees and R*-trees
for Single Dataset Spatial Queries
George Roumelis
(Master in Information Systems, Open University of Cyprus, Cyprus)
Michael Vassilakopoulos
(*)
(Dept. of Computer Science and Biomedical Informatics, University of Central Greece, Greece)
Antonio Corral
(Department of Languages and Computing, University of Almeria, Spain)
* speaker
Outline
Motivation
Contribution and Background
R-tree and R*-tree
XBR-tree, Internal Nodes and Leaf Nodes
Single Dataset query processing on XBR-trees
Experimental Results: Settings,Tree Building,
Single Dataset Queries
 Conclusions and Future Work






2
Motivation
 In applications, a variety of Spatial Queries arise,
 involving one spatial dataset, like Point Location,
Window, Distance Range , Nearest Neighbor Queries
 involving two spatial datasets, like Distance Join,
Closest Pair, All-Nearest Neighbor Queries
 Usually, there is no overall performance winner
 Many researchers have compared different Access
Methods, regarding their I/O and execution time
performance, for a variety a Spatial Queries
 In this work, we compare the popular R*-tree and the
External Balanced Regular (xBR) tree for Single
Dataset Spatial Queries
3
Contribution






We implement the xBR-tree and present conclusions
arising from the (real data based) experimental
comparison of xBR-trees and R*-trees regarding I/O
Performance and Execution Time for
Tree building
Point Location Queries (PLQs)
Window Queries (WQs)
Distance Range Queries (DRQs)
K-Nearest Neighbor Queries (K-NNQs)
Constrained K-Nearest Neighbor Queries (CK-NNQs)
4
Background (1)
 Given an index I and a query point q, the PLQ
returns true if q belongs to I and false otherwise
 Given an index I and a query rectangle r, the result
of the WQ is the set of all points in I that are
completely inside r
 Given an index I, a query point q and a distance
threshold delta >= 0, the DRQ returns all points of I
that are within the delta distance from q (according
to a distance function)
5
Background (2)
 Given an index I, a query point q, and a value K > 0,
the K-NNQ returns K points of I which are closest to
q (according to a distance function)
 Given an index I, a query point q, a value K > 0 and
a distance threshold delta >= 0, the CK-NNQ
returns K closest points of I which are within delta
distance from q (according to a distance function)
6
R-Tree
Clusters of spatial objects can be recursively grouped into
Minimum Bounding Rectangles – MBRs
R10
R11
R10 R11 R12
R1
R2
R4
R5
R6
R1 R2 R3
R4 R5 R6
R7 R8 R9
R3
R9
R7
R8
R12
Nodes that contain points
Nested MBRs can be organized as a tree (R-tree)
7
R*-Tree
 The R*-tree is the most popular R-tree variation
 It added two major enhancements to the original Rtree, when a node overflow is caused
 First, rather than just considering the area, the
node-splitting algorithm in R*-trees also minimizes
the perimeter and overlap enlargement of the
MBRs
 Second, an overflowed node is not split
immediately, but a portion of entries of the node
is reinserted from the top of the R*-tree (forced
reinsertion)
8
xBR-Tree
 xBR-trees can be defined for various dimensions
 For the ease of exposition, we assume 2
dimensions
 For 2 dimensions the hierarchical decomposition of
space is that of Quadtrees.
 The space indexed by an xBR-tree is a square,
expressed in a coordinate system of real numbers
 The nodes of xBR-trees are disk pages and are
distinguished in two kinds: leaves, which store the
actual multidimensional points and internal nodes,
which provide a multiway indexing mechanism for
these data
9
Internal Nodes
 Each
node consists of a non-predefined number of
entries of the form <shape, address, REG, pointer>
 An
address is formed by directional digits (NW, NE,
SW, SE), determines the region of a child node and is
accompanied by the pointer to this child
 Shape
is a flag that determines if the region of the
child is a complete square (used widely in queries)
 REG stores the coordinates of the region referenced by
address (it is more expensive to calculate it)
 The
region of a child is the subquadrant of its address
minus the subquarants of the next addresses in the
same node
10
An xBR-tree example
 The region of the root is the original space (a quadrangle)
 The * symbol denotes the end of an address
 The address of the right child is 0*, since the region of
this child is the NW quadrant of the original space
 The
address of the left child is *: its region is the whole
space minus the region of the right child
11
Processing PLQs and WQs
 Queries
are processed in a top-down manner on the
xBR-tree, like on the R*-tree
 During
a PLQ for a specified point, the appropriate leaf
is determined by descending the tree from the root
(unlike the R*-tree, a single path is followed)
 During
WQs, we examine if the subquadrant of the
current internal-node entry and the query window
intersect and follow the pointer to the related child
 We repeat until we have examined all entries of the
internal node, or until the query window is completely
inscribed inside the region of the entry that we examine
12
Processing DRQs
 The DRQ follows the same strategy as the WQ
 At first, the querying circle is replaced from its MBR (the
calculations are faster in this way) and if the answer
about the intersection of the subquadrant of the current
entry and the query MBR is positive, then we follow the
pointer to the related child at the next lower level
 If
we reach a leaf with a region that intersects the
query MBR, we select the points in the leaf that are
inside the query circle
13
Processing K-NNQs
 The K-NNQ algorithm follows a DF tree traversal
 In an internal node, entries are visited according to
their mindist from the query point
 The process is repeated recursively until the leaf level is
reached, where a potential next NN is found
 It is possible to reach a leaf, but the next NN may exist
in a neighboring region. Thus, we use a global max Kheap and insert in it every point of this leaf that is
nearest to the query point than the root of the heap
 When the heap is full, according to a set of conditions*,
the search is stopped
*Roumelis et al., Nearest Neighbor Algorithms Using xBR-Trees, PCI 2011
14
Experimental Results
Experimental settings
We used 5 real datasets of different sizes (CSN: 98022
line-segments, NApp: 24493 points, NAcl: 9203 points, Narr:
191637 line-segments, Nard: 569120 line-segments)
To create 2d point datasets from non-point datasets, we
used the centroids of the line-segment MBRs
Environment used: Linux machine, Intel core duo 2x2GHz
processor, 3 GB of RAM, gcc
 Performance measurements: I/O activity (page accesses)
and Execution Time
An extended set of experimental results is accessible from:
http://delab.csd.auth.gr/~michalis/xBRsys/results
15
Tree Construction
 In all cases, the xBR-tree uses less space (i.e. it is more
compact) and time than the R*-tree (the R*-tree
creation is slower, partially, due to the use of forced
reinsertion that improves searching efficiency)
 The difference in creation time is enlarged as the size of
node increases
16
Point Location Query
 The xBR-tree needs less read accesses and executes every query faster
than the R*-tree
 The xBR-tree needs a number of disk accesses equal to its tree height,
while the R*-tree needs at least this number of access and, in most
cases, even more
17
Window Query (1)
 The xBR-tree needs more page accesses
 As the size of node increases the I/O difference between the two trees
becomes smaller
 In both trees, a linear dependence of the number of accesses to the
size of the node appears
 This is due to reduction of tree height as the size of node increases
18
Window Query (2)
 Query windows that were inhabited by points (non-empty windows)
were used
 The xBR-tree becomes clearly faster (execution time) for all sizes of
nodes and the I/O efficiency of the two trees is closer
 Main memory processing is simpler (and thus faster) for the xBR-tree
19
Distance Range Query
 Τhe xBR-tree needs less disk accesses and is faster than the R*-tree, in
all cases and for all datasets
 The results were even better when the DRQs addressed only nonempty regions
20
K- Nearest Neighbor Query
 The xBR-tree needs more disk accesses than the R*-tree, but the
difference gets smaller when the size of node increases
 Regarding the execution time, the xBR-tree shows improved
performance, in relation to its I/O difference from the R*-tree
 The worse time performance of both trees is for larger node sizes
 This is due to the fact that as the node size increases, the trees
become very wide and very short
21
Constrained K-NN Query
 Studying the results of several CK-NNQ experiments, we see that the
xBR-tree is improved for both performance categories
 Depending on the dataset, for non-empty regions the CK-NNQ time
performance of the xBR-tree is almost the same to, or much better
than the R*-tree
22
Conclusions
 We performed an extensive (real data based) experimental
comparison of the performance of xBR-trees and R*-trees,
for spatial queries where a single index is involved
 The conclusions arising from this comparison show that the
two structures are competitive
 The xBR-tree is smaller and is built faster than the R*-tree
 The performance of the xBR-tree is higher for PLQs and
DRQs and for WQs when the query window is non-empty
 The R*-tree is better for K-NNQs and needs less disk
access for CK-NNQs
 The execution time winner for CK-NNQs depends on
whether the query returns result points, or not
23
Future Work
 To extend the xBR-tree for modeling empty regions too
 To use in the xBR-tree internal-node address
representation that facilitates main memory computation
 To study the relative performance of the two structures
using memory consuming BF algorithms
 To study the relative performance of the two trees for two
dataset (join) queries
 To study the relative performance of the two structures in
the presence of buffering
24
Thank you for your attention
[email protected]
http://users.ucg.gr/~mvasilako