Transcript Goals

Approximate Query Processing in Spatial Databases Using
Raster Signatures
Leonardo Guerreiro Azevedo
Geraldo Zimbrão
Jano Moreira de Souza
{azevedo, zimbrao,jano}@cos.ufrj.br
Federal University of
Rio de Janeiro
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
FINAL
CONSIDERATIONS
Presentation plan
FIRST
FIRST CONSIDERATIONS
CONSIDERATIONS
GOALS AND CONTRIBUTIONS
FOUR-COLOR RASTER SIGNATURE (4CRS)
PROPOSALS OF ALGORITHMS
FINAL
FINAL CONSIDERATIONS
CONSIDERATIONS
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Motivation
 There are many cases where a query can take a
long time to be processed, for example:
– When processing huge volume of data that requires a
large number of I/O operations
• Disk access time is still higher than memory
access time
An exact
– When processing high complex answer
queriescan
demand a
long
– When accessing remote data due
to atime
slow network
link or even temporary non-availability
...
...
...
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Motivation
 There are many cases where a query can take a
long time to be processed, for example:
– When processing huge volume of data that requires a
large number of I/O operations
• Disk access time is still higher than memory access time
–
–
A fast answer
When processing high complex
canqueries
be more
important than
When accessing remote data an
due
to a slow network
exact
response
link or even temporary non-availability
...
...
...
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Motivation
The challenge becomes bigger in spatial data
environments.
399,0000
segments
475,434 segments
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Motivation
 Precision of the query can be lessened, and an
approximate answer returned to the user
– Approximate answers can be quickly computed
– Acceptable precision
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Motivation
There are many approaches on the
approximate query processing field,
however most of them are not suitable
for spatial data.
“Research new techniques for
approximate query processing that
support the uniqueness of spatial data
is a major issue in the database field”.
(Roddick et al., 2004)
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Scenarios and Applications
Decision Support System
Increasing business competitiveness
More use of accumulated data
Data mining
During drill down query sequence in ad-hoc
data mining
Earlier queries in a sequence can be used to
find out the interesting queries.
Data warehouse
Performance and scalability when accessing
very large volumes of data during the analysis
process.
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Scenarios and Applications
Mobile computing
An approximate answer may be an alternative:
When the data is not available
To save storage space
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Traditional SDBMS query processing environment
New data
(inserts or updates)
Queries
ExactSlow
answeres
Spatial
DBMS
Deleted data
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
SDBMS set-up for providing approximate query answers
Queries
Approximate
Query Processing
Engine
New data
Exact
answer
Approximate
Answer + conf.
Interval
Fast answer
(inserts or updates)
Spatial
DBMS
Deleted data
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Goals
Execute approximate query processing in
Spatial Databases using Raster Signature
– Four-Color Raster Signature (4CRS) (Zimbrao and
Souza, 1998).
Provide fast approximate query answers for
queries over spatial data.
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Contributions
Proposals of algorithms for many spatial
operations that can be approximately
processed using 4CRS
 Spatial operators returning numbers
 Area, distance, diameter, perimeter…
 Spatial predicates
 Equal, different, disjoint, area disjoint, inside, meet,
adjacent…
 Operators returning spatial data type values
 Intersection, plus (union), minus, common border…
 Spatial operators on set of objects
 Sum, closest, decompose, overlay, fusion.
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Contributions
Proposals of algorithms















Approximate Area of Polygon
Distance
Diameter
Perimeter and Contour
Equal and Different
Disjoint, Area Disjoint, Edge Disjoint
Inside (Encloses), Edge Inside, Vertex Inside
Intersects and Intersection
Overlay
Adjacent, Border in Common, Common border
Plus and Sum
Minus
Fusion
Closest
Decompose
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Four-Color Raster Signature (4CRS)
4CRS is a raster approximation
It is an object representation upon a grid of cells
Grid resolution can be changed
 Precision × Storage requirements
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Four-Color Raster Signature (4CRS)
Each cell stores relevant information using few bits
 4CRS  4 types of cells
Bit value
Cell type
Description
00
Empty
The cell is not intersected by the polygon
01
Weak
The cell contains an intersection of 50% or less with
the polygon
10
Strong
The cell contains an intersection of more than 50%
with the polygon and less than 100%
11
Full
The cell is fully occupied by the polygon
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS OF
ALGORITHMS
FINAL
CONSIDERATIONS
Four-Color Raster Signature (4CRS) - Generation
Polygon
4CRS
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
FINAL
CONSIDERATIONS
Approximate Area of Polygon
Approximate area of polygon
Based on the expected area of polygon within cell
Approximate area of polygon within window
Based on the expected area of polygon within cell
Approximate overlapping area of polygon join
Based on the intersection expected area of two types of cells
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
FINAL
CONSIDERATIONS
Approximate area of polygon
Approximate area of polygon
Approximate area of polygon within
cell
Expected area (µ) of cell type
E
Expected Area = zero%  µ = 0
F
Expected Area = 100%  µ = 1
W
Expected Area (0, 0.50]  µ = 0.25
S
Expected Area (0.50, 1)  µ = 0.75
Grid and polygon are independent from each other


Approximate answer    t   cellarea
 t

FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
FINAL
CONSIDERATIONS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
Approximate overlapping area of polygon join
W
S
S
S
×
×
×
×
E
µW×E
E
µS×E
W
µS×W
µS×S
S
expected area
of cells
overlapping
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
FINAL
CONSIDERATIONS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
Approximate overlapping area of polygon join
Table of expected area of cells overlapping
Cell types
Empty
Weak
Strong
Full
Empty
0
0
0
0
Weak
0
0.0625
0.1875
0.25
Strong
0
0.1875
0.5625
0.75
Full
0
0.25
0.75
1


Approximate answer    i j   cellarea


i j
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
FINAL
CONSIDERATIONS
Affinity degree
For some proposed algorithms, it is possible to return an approximate
answer evaluating only cell types.
For other algorithms, when evaluating cell types it is also required to
compute an approximate value in the interval [0,1] that indicates a true
percentage of the response
 Affinity deggree: it is based on expected area of cells overlapping
(Azevedo et al., 2005).
Table of affinity degree
Cell types
Empty
Weak
Strong
Full
Empty
0
0
0
0
Weak
0
0.0625
0.1875
0.25
Strong
0
0.1875
0.5625
0.75
Full
0
0.25
0.75
1
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
FINAL
CONSIDERATIONS
Equal
Equal algorithm using 4CRS  the approximate answer is equal to the sum
of affinity degrees divided by the number of comparisons of pair of objects,
if no trivial case occurs.
Trivial case:
not equal  overlap of different
cell types  result false
S
S
F
×
×
×
E
W
S
Sum of affinity degree
E
W
S
F
×
×
×
×
E
µE×E = 1
W
µW×W = 0.0625
S
µS×S = 0.5625
µF×F = 1
F
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
FINAL
CONSIDERATIONS
Different
Different algorithm is opposite to equal algorithm
Affinity degree is equal to the 1 - affinity degrees
Trivial case:
different  overlap of different
cell types  result true
S
S
F
×
×
×
E
W
S
Sum of affinity degree
E
W
S
F
×
×
×
×
E
µE×E = 0
W
µW×W = 1-0.0625
S
µS×S = 1-0.5625
µF×F = 0
F
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
FINAL
CONSIDERATIONS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
4CRS
4CRS
Disjoint
Disjoint: two objects are disjoint if they have no portion in common
Case I: At least one
overlap of
W
×
F
S
F
×
S
Trivial case:
Not disjoint (exact answer)
S
E
Case II: Only overlap of
E
W
×
S
Disjoint (partial answer)
Affinity degree += 1
F
W
Case III: weak × weak
weak × strong
W
×
×
W
S
Disjoint (partial answer)
Affinity degree +=
1 – expected area(type1,type2)
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
FINAL
CONSIDERATIONS
PROPOSALS
PROPOSALSOF
OF
ALGORITHMS
ALGORITHMS
Distance
Distance can be estimate from 4CRS signatures computing the
distance among cells corresponding to polygons’ borders (Weak
and Strong cells).
Distance = average of the minimum and maximum distances
...
(a)
... ...
(b)
Minimum
distance
Maximum
distance
(c)
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
IMPL. AND EVAL.
ALGORITHMS
EXPERIMENTAL
RESULTS
FINAL
CONSIDERATIONS
Conclusions
Goal
 Provide an estimated result in orders of magnitude less time
than the time to compute an exact answer, along with a
confidence interval for the answer.
Proposals
 Use raster approximations for approximate query processing in
spatial databases
 Use 4CRS signature to process the queries over polygons,
avoiding accessing the real data.
 Proposal many algorithms for approximate processing
 Use expected area of polygons (Azevedo et al., 2005) to
estimate responses
FIRST
FIRST
CONSIDERATION
CONSIDERATIONS
S
GOALS AND
CONTRIBUTIONS
4CRS
4CRS
IMPL. AND EVAL.
ALGORITHMS
EXPERIMENTAL
RESULTS
FINAL
CONSIDERATIONS
Future work
Implement and evaluate algorithms involving other
kinds of datasets, for example, points and polylines,
and combinations of them:
• point × polyline, polyline × polygon and polygon ×
polyline.
The experimental evaluation is not addressed in
this work; it is on going work developed on Secondo
(Güting et al., 2005) which is an extensible DBMS
platform for research prototyping and teaching.
Approximate Query Processing in Spatial Databases Using
Raster Signatures
Leonardo Guerreiro Azevedo
Geraldo Zimbrão
Jano Moreira de Souza
{azevedo, zimbrao,jano}@cos.ufrj.br
Federal University of
Rio de Janeiro