Transcript (xBR) tree
Fifteenth East-European Conference on
Advances in Databases and Information Systems
September 19th–23rd, 2011, Vienna, Austria
Performance Comparison
of xBR-trees and R*-trees
for Single Dataset Spatial Queries
George Roumelis
(Master in Information Systems, Open University of Cyprus, Cyprus)
Michael Vassilakopoulos
(*)
(Dept. of Computer Science and Biomedical Informatics, University of Central Greece, Greece)
Antonio Corral
(Department of Languages and Computing, University of Almeria, Spain)
* speaker
Outline
Motivation
Contribution and Background
R-tree and R*-tree
XBR-tree, Internal Nodes and Leaf Nodes
Single Dataset query processing on XBR-trees
Experimental Results: Settings,Tree Building,
Single Dataset Queries
Conclusions and Future Work
2
Motivation
In applications, a variety of Spatial Queries arise,
involving one spatial dataset, like Point Location,
Window, Distance Range , Nearest Neighbor Queries
involving two spatial datasets, like Distance Join,
Closest Pair, All-Nearest Neighbor Queries
Usually, there is no overall performance winner
Many researchers have compared different Access
Methods, regarding their I/O and execution time
performance, for a variety a Spatial Queries
In this work, we compare the popular R*-tree and the
External Balanced Regular (xBR) tree for Single
Dataset Spatial Queries
3
Contribution
We implement the xBR-tree and present conclusions
arising from the (real data based) experimental
comparison of xBR-trees and R*-trees regarding I/O
Performance and Execution Time for
Tree building
Point Location Queries (PLQs)
Window Queries (WQs)
Distance Range Queries (DRQs)
K-Nearest Neighbor Queries (K-NNQs)
Constrained K-Nearest Neighbor Queries (CK-NNQs)
4
Background (1)
Given an index I and a query point q, the PLQ
returns true if q belongs to I and false otherwise
Given an index I and a query rectangle r, the result
of the WQ is the set of all points in I that are
completely inside r
Given an index I, a query point q and a distance
threshold delta >= 0, the DRQ returns all points of I
that are within the delta distance from q (according
to a distance function)
5
Background (2)
Given an index I, a query point q, and a value K > 0,
the K-NNQ returns K points of I which are closest to
q (according to a distance function)
Given an index I, a query point q, a value K > 0 and
a distance threshold delta >= 0, the CK-NNQ
returns K closest points of I which are within delta
distance from q (according to a distance function)
6
R-Tree
Clusters of spatial objects can be recursively grouped into
Minimum Bounding Rectangles – MBRs
R10
R11
R10 R11 R12
R1
R2
R4
R5
R6
R1 R2 R3
R4 R5 R6
R7 R8 R9
R3
R9
R7
R8
R12
Nodes that contain points
Nested MBRs can be organized as a tree (R-tree)
7
R*-Tree
The R*-tree is the most popular R-tree variation
It added two major enhancements to the original Rtree, when a node overflow is caused
First, rather than just considering the area, the
node-splitting algorithm in R*-trees also minimizes
the perimeter and overlap enlargement of the
MBRs
Second, an overflowed node is not split
immediately, but a portion of entries of the node
is reinserted from the top of the R*-tree (forced
reinsertion)
8
xBR-Tree
xBR-trees can be defined for various dimensions
For the ease of exposition, we assume 2
dimensions
For 2 dimensions the hierarchical decomposition of
space is that of Quadtrees.
The space indexed by an xBR-tree is a square,
expressed in a coordinate system of real numbers
The nodes of xBR-trees are disk pages and are
distinguished in two kinds: leaves, which store the
actual multidimensional points and internal nodes,
which provide a multiway indexing mechanism for
these data
9
Internal Nodes
Each
node consists of a non-predefined number of
entries of the form <shape, address, REG, pointer>
An
address is formed by directional digits (NW, NE,
SW, SE), determines the region of a child node and is
accompanied by the pointer to this child
Shape
is a flag that determines if the region of the
child is a complete square (used widely in queries)
REG stores the coordinates of the region referenced by
address (it is more expensive to calculate it)
The
region of a child is the subquadrant of its address
minus the subquarants of the next addresses in the
same node
10
An xBR-tree example
The region of the root is the original space (a quadrangle)
The * symbol denotes the end of an address
The address of the right child is 0*, since the region of
this child is the NW quadrant of the original space
The
address of the left child is *: its region is the whole
space minus the region of the right child
11
Processing PLQs and WQs
Queries
are processed in a top-down manner on the
xBR-tree, like on the R*-tree
During
a PLQ for a specified point, the appropriate leaf
is determined by descending the tree from the root
(unlike the R*-tree, a single path is followed)
During
WQs, we examine if the subquadrant of the
current internal-node entry and the query window
intersect and follow the pointer to the related child
We repeat until we have examined all entries of the
internal node, or until the query window is completely
inscribed inside the region of the entry that we examine
12
Processing DRQs
The DRQ follows the same strategy as the WQ
At first, the querying circle is replaced from its MBR (the
calculations are faster in this way) and if the answer
about the intersection of the subquadrant of the current
entry and the query MBR is positive, then we follow the
pointer to the related child at the next lower level
If
we reach a leaf with a region that intersects the
query MBR, we select the points in the leaf that are
inside the query circle
13
Processing K-NNQs
The K-NNQ algorithm follows a DF tree traversal
In an internal node, entries are visited according to
their mindist from the query point
The process is repeated recursively until the leaf level is
reached, where a potential next NN is found
It is possible to reach a leaf, but the next NN may exist
in a neighboring region. Thus, we use a global max Kheap and insert in it every point of this leaf that is
nearest to the query point than the root of the heap
When the heap is full, according to a set of conditions*,
the search is stopped
*Roumelis et al., Nearest Neighbor Algorithms Using xBR-Trees, PCI 2011
14
Experimental Results
Experimental settings
We used 5 real datasets of different sizes (CSN: 98022
line-segments, NApp: 24493 points, NAcl: 9203 points, Narr:
191637 line-segments, Nard: 569120 line-segments)
To create 2d point datasets from non-point datasets, we
used the centroids of the line-segment MBRs
Environment used: Linux machine, Intel core duo 2x2GHz
processor, 3 GB of RAM, gcc
Performance measurements: I/O activity (page accesses)
and Execution Time
An extended set of experimental results is accessible from:
http://delab.csd.auth.gr/~michalis/xBRsys/results
15
Tree Construction
In all cases, the xBR-tree uses less space (i.e. it is more
compact) and time than the R*-tree (the R*-tree
creation is slower, partially, due to the use of forced
reinsertion that improves searching efficiency)
The difference in creation time is enlarged as the size of
node increases
16
Point Location Query
The xBR-tree needs less read accesses and executes every query faster
than the R*-tree
The xBR-tree needs a number of disk accesses equal to its tree height,
while the R*-tree needs at least this number of access and, in most
cases, even more
17
Window Query (1)
The xBR-tree needs more page accesses
As the size of node increases the I/O difference between the two trees
becomes smaller
In both trees, a linear dependence of the number of accesses to the
size of the node appears
This is due to reduction of tree height as the size of node increases
18
Window Query (2)
Query windows that were inhabited by points (non-empty windows)
were used
The xBR-tree becomes clearly faster (execution time) for all sizes of
nodes and the I/O efficiency of the two trees is closer
Main memory processing is simpler (and thus faster) for the xBR-tree
19
Distance Range Query
Τhe xBR-tree needs less disk accesses and is faster than the R*-tree, in
all cases and for all datasets
The results were even better when the DRQs addressed only nonempty regions
20
K- Nearest Neighbor Query
The xBR-tree needs more disk accesses than the R*-tree, but the
difference gets smaller when the size of node increases
Regarding the execution time, the xBR-tree shows improved
performance, in relation to its I/O difference from the R*-tree
The worse time performance of both trees is for larger node sizes
This is due to the fact that as the node size increases, the trees
become very wide and very short
21
Constrained K-NN Query
Studying the results of several CK-NNQ experiments, we see that the
xBR-tree is improved for both performance categories
Depending on the dataset, for non-empty regions the CK-NNQ time
performance of the xBR-tree is almost the same to, or much better
than the R*-tree
22
Conclusions
We performed an extensive (real data based) experimental
comparison of the performance of xBR-trees and R*-trees,
for spatial queries where a single index is involved
The conclusions arising from this comparison show that the
two structures are competitive
The xBR-tree is smaller and is built faster than the R*-tree
The performance of the xBR-tree is higher for PLQs and
DRQs and for WQs when the query window is non-empty
The R*-tree is better for K-NNQs and needs less disk
access for CK-NNQs
The execution time winner for CK-NNQs depends on
whether the query returns result points, or not
23
Future Work
To extend the xBR-tree for modeling empty regions too
To use in the xBR-tree internal-node address
representation that facilitates main memory computation
To study the relative performance of the two structures
using memory consuming BF algorithms
To study the relative performance of the two trees for two
dataset (join) queries
To study the relative performance of the two structures in
the presence of buffering
24
Thank you for your attention
[email protected]
http://users.ucg.gr/~mvasilako