03_dmkd_OLAP

Download Report

Transcript 03_dmkd_OLAP

Efficient OLAP Operations
for Spatial Data Using P-Trees
Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding
William Perrizo
North Dakota State University
OUTLINE
 Introduction
 Review Of Peano Trees (P-trees)
 OLAP Operations Using P-trees
 Peano Data Cubes (PD-Cubes)
 OLAP Operations
 Performance Analysis
 Conclusion
INTRODUCTION
Efficient OLAP for spatial data warehouses
is in great demand
Spatial warehouses is growing with more
and more spatial data, such as remotely
sensed images, geographical information,
digital sky survey data
The data in a warehouse are conceptually
modeled as data cubes (Gray et al, 1997)
INTRODUCTION (Cont.)
OLAP queries are complex and time
consuming
Two major approaches to speed up OLAP
Using index structures
Operating on compressed data.
Bitmap index are space inefficient for high
cardinality attributes, and are only suitable
for narrow domains.
Our Approach on OLAP
A new data warehousing structure, PDcube, is developed to facilitate OLAP
operations and queries
Fast logical operations of P-Trees are
used to accomplish OLAP operations.
Predicate P-trees are used to efficiently
reduce data accesses by filtering out “big
holes” consisting of consecutive 0’s
REVIEW OF PEANO TREES (Ptrees)
 The Ptree is a quadrant-based tree structure
(assuming a 2-dimensional image; more generally, for ndimensional data, an n-polytant tree)
 It is used to facilitate compression and very fast
logical operations on bit sequential (bSQ) data
(Perrizo, 2001)
Ptrees can be 1-dimensional, 2-dimensional, 3-dimensional…
 The most useful form of a Ptree is the predicatePtree: e.g., Pure1 Ptree (P1tree) and NonPure0
Ptree (NP0-tree)
bSQ File and a Pure1 tree (P1-tree)
P1-tree: Tree node=1 iff that sub-quadrant is
purely 1-bits.
1111110011111000111111001111111011110000111100001111000001110000
11
11
11
11
11
11
11
01
11
11
11
11
11
11
11
11
11
10
11
11
00
00
00
00
00
00
00
10
00
00
00
00
0
1
0
0
0 0 1 0
1 1 0 1
1 1 1 0 0 0 1 0 1 1 0 1
0
An Count Ptree
(NOTE: usually counts are the ultimate goal, but Pure1 trees are easier to work with and
produce the needed counts quite quickly)
001
111
11
11
11
11
11
11
11
01
11
11
11
11
11
11
11
11
11
10
11
11
11
11
11
11
00
00
00
10
11
11
11
11
55
0
16
2
3
15
2
3 0 4 1
4 4 3 4
3
1 1 1 0 0 0 1 0 1 1 0 1
 Peano or Z-ordering
 Pure (Pure-1/Pure-0) quadrant
 Root Count
( 7, 1 )
1
( 111, 001 )
8
16
2.2.3
 Level
 Fan-out
 QID (Quadrant ID)
10.10.11
BSQ File and a NP0-tree
NP0-tree: Node=1 iff that sub-quadrant is not
pure zero. (more general; <predicate>-Ptree:
node=1 iff sub-quad satisfies <predicate>
11
11
11
11
11
11
11
01
11
11
11
11
11
11
11
11
11
10
11
11
00
00
00
00
00
00
00
10
00
00
00
00
1
1
1
1
1 0 1 1
1 1 1 1
1 1 1 0 0 0 1 0 1 1 0 1
0
Logical Operations of P-trees
 Operations are level by level
 Consecutive 0’s holes can be filtered out
 We only need to load quadrant with Qid 2 for
ANDing NP0-tree1 and NP0-tree2.
OLAP OPERATIONS USING P-TREES
1. Peano Data Cube (PD-cube)
2. OLAP Operations
1) Slice/Dice
2) Rollup
3. Performance Analysis
Peano Data Cube (PD-cube)
The data cube is partitioned by bit position

 Each bit-wised data cube is in Peano order
Take advantage of the continuity and sparseness of
spatial data
An example: a 3-D data cube representing
the crop yield with three dimensions: Xcoordinate, Y-coordinate, and time T.
A Fact Table and the PD-cubes
X
Y
T
Yield
0
0
0
15 (1111)
1
0
0
15 (1111)
0
1
0
15 (1111)
1
1
0
15 (1111)
0
0
1
15 (1111)
1
0
1
15 (1111)
0
1
1
15 (1111)
1
1
1
15 (1111)
2
0
0
15 (1111)
3
0
0
4 (0100)
2
1
0
1 (0001)
3
1
0
12 (1100)
2
0
1
12 (1100)
3
0
1
2 (0010)
2
1
1
12 (1100)
3
1
1
12 (1100)
0
2
0
15 (1111)
1
2
0
15 (1111)
0
3
0
2 (0010)
1
3
0
0 (0000)
0
2
1
15 (1111)
1
2
1
15 (1111)
0
3
1
2 (0010)
OLAP Query Examples
 “Find all galaxies brighter than magnitude 22.”
“Find average crop yield in a field. ”
“Find area of the region with the color red.”
“Find total traffic flow during a given period.”
Slice/Dice Operations
Typical select statements may have a
number of predicates in their “where” clause.
The predicates may include “=”, “<” and “>”.

These predicates lead to two different query
scenarios: equal queries (“=”) and range
queries (“<” or “>”).
Equal Select Slice Example
 Suppose we have a 3-D data cube representing crop
yield with dimensions X, Y and T, where X = {0, 1}, Y
= {0, 1} and T = {0, 1}.
1
Y
T
Yield
0
0
0
111
1
0
0
011
0
1
0
001
1
1
0
100
0
0
1
011
1
0
1
010
0
1
1
100
1
1
1
001
2
3
7
X
3
1
4
Ptrees for 3-D Cube Example
Pij is a Ptree for the jth bit of the ith attribute.
Slice: “Get yield where Y = 1”
 First get Ptree masks, and then trim all Ptrees
accordingly.
2
3
7
1
3
1
4
Range Slice: Get yield where Y >1001
 Data set {“Y > 1001”} consists of two subsets {“Y
= 11**”} and {“Y = 101*”}, where * is 1 or 0.
 The query clause can be written as “where Y =
11** || Y = 101*”.
 The query is retrieved by Ptree mask
PMgt = PM1 || PM2.
PM1 = P21 & P22
PM2 = P21 & P’22 & P27
1111
1110
1101
1100
1011
1010
------1001
11**
P21 & P22
101*
P21 & P’22 & P23
Other Properties of Range Queries
Combination of an Equality Query and a Range Query

Divide {“T01011001”} into 2 subsets {“T>01011001”}, {“T=01011001”}
Complement of a Range Query

Data set {“T01011001”} is the complement of {“T>01011001”}

With the result of query “Get yield where T>01011001, we can easily
retrieve query “Get yield where T01011001” by making the
complement, i.e. PMle = PM’gt.
Rollup Operations
PD-cube is stored in Peano order rather
than in raster order. Therefore, the rollup
of PD-cube is accomplished differently
from the traditional data cube as a result of
different storage models.
According to the Peano storage of PDcube, we develop the recursive rollup
algorithm.
Rollup of “Yield” along Dimension T
011
010
111
011
001
100
0
0
0
1
0
0
1
1
1
1
0
1
0
001
0
1
0
1
1
1
1
0
0
1
0
0
1
1
1
0
0
1
0
1
1
3
1
1
2
3
7
1
4
0
1
0
0
1
0
4
1
1
1
0
1
1
0
S2[ ] = {1, 1, 1, 0}
1
1
S1 [ ] = {0, 1, 1, 1}
1
2
7
1
1
S0 [ ] = {2, 1, 1, 1}
8
7
7
S [ ] = {8, 7, 7, 7}
S[i] = S2[i] x 22 + S1[i] x 21 + S0[i] x 20
PERFORMANCE ANALYSIS:
 Compare our algorithm with
bitmap indexed data cube
method
 The data is prepared in five
sizes, 128x128, 256x256,
512x512, 1024x1024, and
2048x2048.
 When cube size > 1900KB, our method
outperforms bitmap indexed data cube method.
 As the cube size increases, there is a drastic
increase in response time for bitmap indexed data
cube method.
Response Tim e Com parison
Response Time (ms)
Bitmap
PD-cube
100000
10000
1000
100
100
1000
10000
100000 1000000
Cube Size (KB)
CONCLUSION
 A general spatial data warehousing structure, PDcube, is presented to facilitate OLAP operations.
 The fast logical operations of Ptrees are used to
accomplish these operations.
 Predicate Ptrees are used to find the “big holes” of
consecutive 0’s by performing logical operations.
 Experiments indicate OLAP operations using Ptrees
is much faster than traditional data cube methods.
FUTURE WORK
One future research direction is to extend
our PD-cube into parallel data warehouse
systems.
It appears to be particularly promising to partition
large cubes horizontally or vertically (or both) into
small cubes to improve the query performance
through parallelism.
Thank you.