Approximate Spatial Query Processing Using Raster Signatures

Download Report

Transcript Approximate Spatial Query Processing Using Raster Signatures

Federal
University of
Rio de Janeiro
Approximate Spatial Query
Processing Using
Raster Signatures
Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro,
Geraldo Zimbrão & Jano Moreira de Souza
Coppe – Graduate School of Engineering
Institute of Mathematics – Computer Science Department
Common Spatial Queries
 Area of polygon
 Area of polygon within window
 Spatial Joins
 polygon  polygon, polygon  polyline & polyline 
polyline
 Distance
 Buffer
 Perimeter
 Topological queries
Approximate Spatial Query Processing Using Raster Signatures
2
Common Spatial Queries
 Approximate Area of polygon
 Approximate Area of polygon within window
 Approximate Spatial Joins
 polygon  polygon, polygon  polyline & polyline 
polyline
 Approximate Distance
 Approximate Buffer
 Approximate Perimeter
 Approximate Topological queries
Approximate Spatial Query Processing Using Raster Signatures
3
Approximate Answers to Spatial
Queries
What is an approximate
answer?
If the “exact” result is a number,
the approximate result will be a
number and a confidence
interval
If not, the graphical display of
approximate answers is
something like a “fuzzy” map
Approximate Spatial Query Processing Using Raster Signatures
4
Motivation
 The increase of storage capacity
 The decrease of hardware costs
 Disk access time is still high
 Complex queries
 Data stored in devices that are not online.
A query may take minutes or
hours to be processed.
Approximate Spatial Query Processing Using Raster Signatures
5
Motivation
Approximate answer may be enough
“exact” answers are itself approximations
Approximate answers can be computed quickly
Spatial query processing:
Scale
Quality
Round-off errors
Approximate Spatial Query Processing Using Raster Signatures
6
Scenarios and Applications
 Decision Support System
 Increasing business competitiveness
 More use of accumulated data
 Data mining
 During drill down query sequence in ad-hoc data mining
 Earlier queries in a sequence can be used to find out the
interesting queries.
 Data warehouse
 Performance and scalability when accessing very large
volumes of data during the analysis process.
Approximate Spatial Query Processing Using Raster Signatures
7
Scenarios and Applications
Query optimization
 To define the most efficient access plan for a given query
 Distributed data recording and warehousing
environments
 Data may be remote, and even may be unavailable
 Old data can be disposed in order to make room for new
ones. Therefore it becomes impossible to answer to queries
on deleted information.
Approximate Spatial Query Processing Using Raster Signatures
8
Scenarios and Applications
 Mobile computing
 An approximate answer may be an alternative:
• When the data is not available
• To save storage space
Approximate Spatial Query Processing Using Raster Signatures
9
A framework for approximate query processing
Data environment set-up
for providing
approximate answers
Queries
New data
Approx.
Query
Engine
Database
Responses
Approximate Spatial Query Processing Using Raster Signatures
10
Four Color Raster Signature (4CRS)
 Raster approximation (VLDB98)
 Object representation upon a grid of cells.
 Each cell stores relevant information using few bits.
 Grid resolution can be changed
• Precision  storage requirements
 4 types of cells
Bit value
Cell type
Description
00
Empty
The cell is not intersected by the polygon
01
Weak
The cell contains an intersection of 50% or less with the
polygon
10
Strong
The cell contains an intersection of more than 50% with the
polygon and less than 100%
11
Full
The cell is fully occupied by the polygon
Approximate Spatial Query Processing Using Raster Signatures
11
4CRS Approximation
Construction of Signatures
Polygon
4CRS
24th VLDB Conference New York, USA, 1998
12
Polygon approximate area
 The algorithm is based on the sum of the expected
area of each cell grid
 Empty cells: 0%
 Full cells: 100%
 Weak and Strong cells  supposing uniform distribution
• Weak cells: (0, 0.5] interval  mean 0.25
• Strong cells: (0.5, 1) interval  mean 0.75
 Count the number of each cell type in the polygon’s
4CRS, and multiply these values by the presumed
cell area.
Approximate Spatial Query Processing Using Raster Signatures
13
Confidence interval
 A measure of answer accuracy
 The polygon area inside weak or strong cell is assumed to
be uniformly distributed.
(0.5  0)
 0.25
 Weak cells  
2
(1  0.5)
 0.75
 Strong cells  
2
2


0
.
5

0
2 
12
2

1  0.5
2
 
12
1
48
1

48

 Using Central Limit Theorem  confidence interval
 95%
 99%
N    1.96   N
N    2.576   N
Approximate Spatial Query Processing Using Raster Signatures
14
Confidence interval (example)
 Query results
 # weak cells: 100
 # strong cells: 120
 # full cells: 400
 Confidence interval: 95%
 Weak cells:
100  0.25  1.96 
100
 25  2.83
48
 Strong cells:
120  0.75  1.96 
120
 90  3.10
48
 Full cells: 400 (full cells have the exact area!)
 Total: 515  5.93
 Error between -1.15% and 1.15%
Approximate Spatial Query Processing Using Raster Signatures
15
Cell Area Distribution
Cell Area Distribution - Township dataset
3000
2500
2000
1500
1000
500
0
0
Weak
Strong
Comparable to an uniform distribution
0,1
0,2
0,3
0,4
0,5 0.021369
0,6
0,7
0,8 0.020833)
0,9
1
Variance:
(U:
Mean: 0.246453 (U: 0.25)
Approximate Spatial Query Processing Using Raster Signatures
16
Example





# empty cells: 55
# weak cells:
27
# strong cells: 26
# full cells:
79
Approximate area:
( Σ weak
* 0.25 +
Σ strong * 0.75 +
Σ full ) * cellArea
 Exact area: 106.40
 Appr. area: 105.25
 Error:
1.07%
Approximate Spatial Query Processing Using Raster Signatures
17
Approximate area of polygon  window
intersection
 This algorithm is similar to the approximate polygon
area algorithm
 There are two kinds of cell overlap:
 The cell may be completely contained by the window
 The cell may be partially contained by the window
• proportional to its overlapping area
Approximate Spatial Query Processing Using Raster Signatures
18
Experimental tests
 Computer: PC Pentium IV 1,8 GHz, 512 MB RAM
 Page size 2,048 Bytes
 Target: to evaluate the use of 4CRS for approximate query
processing against exact query processing related to the
following aspects:
 Response time
 Storage requirements
 Accuracy
 The algorithms tested were :
 Polygon approximate area
 Approximate area of polygon x window intersection
• 100 random windows for each data set (different sizes and positions)
Approximate Spatial Query Processing Using Raster Signatures
19
Experimental tests
 Use of R*-trees in order to reduce the search
space.
Relation A
Relation A
Relation B
Relation B
Step 1
Step 1
SAMs
SAMs
Candidate pairs
Candidate pairs
Step 2
Step 2
Approximate query 4CRS
processing
Response set
Approximate Spatial Query Processing Using Raster Signatures
$
Exact geometry
processor
Response set
20
Experimental tests
 The polygon real data sets used in the experiments consist
of township boundaries, census block-group, topography,
geologic map and hydrographic map from Iowa (USA),
and Brazilian municipalities.
Dataset
Dataset # pol. # segments Aver. 4CRS 4CRS/
Size
#
size dataset
(KB)
segm. (KB) size -%
Census block29,105 17,844 1,764,588 98 1,006 3.46
group
Topography 123,367 40,140 7,561,104 188 2,487 2.02
Iowa Hydro. map 7,753 2,670 475,812 178 153 1.97
Township
17,508 12,216 1,059,438 86
boundaries
Geologic map 10,703 9,984 640,428 64
Brazil Municipalities 6,382 4,645 399,002 85
722
4.12
583
282
5.45
4.42
Average
Approximate Spatial Query Processing Using Raster Signatures
3.57
R*Base
Tree
type
Iowa Census 4CRS
block-group
Iowa
4CRS
Topography
Iowa
4CRS
Hydrologic
map
Iowa
4CRS
Township
boundaries
Iowa
4CRS
Geologic map Brazil
4CRS
Municipalities -
Node
size Time
Tree
# leafs
average
(KB) (sec)
Height
use (%)
1,604 23.664 67.73
3
896
822 23.394 72.40
3
403
8,712 168.032 31.55
3
4,270
2,388 115.586 56.20
3
1,170
248 2.674 68.88
3
120
124
2.514
72.77
2
60
1,168 19.829
68.11
3
573
582 16.674
70.01
3
285
946 12.268
480 12.168
434 6.720
214 5.468
67.96
69.48
72.05
73.74
3
3
3
3
464
235
211
103
21
Approximate polygon area
25,51
Execution time
14
12
10
8
7,82
Appr
Exact
6
3,26
4
2
3,13
2,03
1,69
1,55
0,32
2,52
0,95
1,18
0,39
0
Census
block-group
Topograp.
Hydrologic
Approximate Spatial Query Processing Using Raster Signatures
Township
boundaries
Geologic map
Municip.
22
Approximate polygon area
# Disk Access
16000
14,552
14000
12000
10000
8753
Appr
8000
Exact
5351
6000
3876
3190
4000
2000
259
694
39
193
154
77
0
Census block-
Topograp.
Hydrologic
group
Approximate Spatial Query Processing Using Raster Signatures
Township
Geologic map
Municip.
boundaries
23
Approximate polygon  window area
Execution tim e
60
53.768
54.769
50
40
Appr
25.807
30
24.665
Exact
20
10
8.95
7.58
5.29
0.94
4.23
2.78
Geologic
Municip.
0
Census blkgroup
Topogr.
Hydrolog.
Approximate Spatial Query Processing Using Raster Signatures
Tow nship
boundar.
24
Approximate polygon  window area
# Disk Access
90,000
80,000
70,000
63,182
60,000
50,000
Appr.
40,000
32,251
Exact
30,727
30,000
17,168
20,000
10,000
4,873
11,629
7,576
885
3,250
2,811
556
Township
boundar.
Geologic
Municip.
0
Census blkgroup
Topogr.
Hydrolog.
Approximate Spatial Query Processing Using Raster Signatures
25
Conclusion
 The experimental results demonstrated the efficiency of the
4CRS use for approximate query processing.
 Storage requirements
• 4CRS has an average of 3.75% of the real data set size
 Accuracy
• Approximate area: average error of 2.62%
• Window query approximate area: average error of 1%
 Response time
• Approximate area: average 28.41%
• Window query approximate area: average 7.22%
 Disk access
• Approximate area: average 1.90%
• Window query approximate area: average 7.04%
Approximate Spatial Query Processing Using Raster Signatures
26
Future works
Algorithms for the other operations
Approximate area of polygon x polygon
intersection algorithm is being evaluated
Use of approximations for mobile
computing
Approximate Spatial Query Processing Using Raster Signatures
27