Boolean + Ranking: Querying a Database by K

Download Report

Transcript Boolean + Ranking: Querying a Database by K

Zhen Zhang
Seung-won Hwang
Kevin C. Chang
Min Wang
Christian A. Lang
Yuan-chi Chang
Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006
Presented By : Pavan Kumar M.K. (1000618890)
Aditya Mangipudi (1000649172)








Introduction
Motivation
A* Search Algorithm
A*-Driven State Space Construction
Optimization Driven Configuration
OPT* Search Algorithm
Experiments
Conclusion


The wide spread of databases for managing
structured data, compounded with the
expanded reach of the Internet, has brought
forward interesting data retrieval and analysis
scenarios to RDBMS
Only the Top-K results are of interest to the
user.
QUERY: Select the Top-5 2nd year students in CSE with
highest GPA
Boolean query:
Qualifying constraint
dept = CSE and year = 2
+
Ranking query:
Top 5 ranked by GPA
Find top answers
B: dept = CSE and year = 2
O: GPA
Quantifying
function
4
 Query


Q = (G, k)
G - Goal Function
G=B.O
k – Retrieval Size
Boolean query
+
Ranking query
How to answer?
6

If evaluated as separate operators
D
… …
Rankingquery
query
Boolean
BR
…
Rankingquery
query
Boolean
BR
 Current techniques optimize only condition-bycondition

If search by an overall goal function G as a
ranking function
Goal function G
D
B
R
7
Att 1
Att 2


Threshold Algorithm essentially relies on a
rigid assumption that G functions are
Monotonic.
The monotonicity requires G to be decreasing
if all its parameters are decreasing.

Consider the example query as below to find
houses in a certain price range with good
price/sqrft ratio
Select h.address from House h,
Where h.price ≤ 200k ν h.price ≥ 400k
Order by h.size/|h.price-300k|

The function G here in Non-Monotonic.
Att 1
Att 2



Existing algorithms build upon their problemspecific assumptions on the goal functions or
index traversals.
For example, Threshold Algorithm assumes the
monotonicity of G and the use of sorted accesses
(interleaf navigation), based on which the search is
implicitly hardwired.
In a Boolean Query like B = price > 100K, such a
search is straightforward as the constraint
expressions B explicitly suggests how to carry out
a focused search, eg., visiting only the nodes with
locality potentially satisfying B.


In contrast, for a general k-constrained
optimization query potentially involving arbitrary
ranking combined with Boolean conditions and
joining multiple relations, eg.. Q maximizing
size/price ratio, it is no longer clear how to focus
the search.
By encoding into a generic search with no
assumptions on G, the search is generalized to
support arbitrary G over potentially multiple indices
and a combination of both hierarchical and
interleaf traversals.






A* is a well known search algorithm that finds the
Shortest Path, given an initial and a designated goal
state.
Widely used in the field of Artificial Intelligence.
Uses Best-First Search Traversal.
Uses heuristic information to carry out the search in
a guided manner.
A* is guaranteed to find the correct answer
(Correctness) by visiting the least number of states
(Optimality)
Ex: GPS, Google Maps, A lot of puzzles, games etc.
For a tuple t with m attribute values, Goal
Function G(t) maps the tuple to a positive
numeric score.
G(t) = B(t)*R(t) =
R(t)
if B(t) is true
0
if B(t) is false
(ie, lowest score)
15
Addr
Price
Size
Score
1.
Oak park,
Chicago
600K
4500
15
Where h.price ≤ 200k ν h.price ≥ 400k
2.
Mattis,
Champaign
350K
2000
0
Order by h.size/|h.price-300k|
3.
…
150K
1000
6.67
4.
…
250K
2000
0
5.
…
300K
3500
0
6.
…
80K
500
2.27
Select h.address from House h,
Addr
Price
Size
Score
1.
Oak park,
Chicago
600K
4500
15
2.
Mattis,
Champaign
350K
2000
0
3.
…
150K
1000
6.67
4.
…
250K
2000
0
5.
…
300K
3500
0
6.
…
80K
500
2.27




To realize k-constrained optimization over
databases, this paper develops the OPT*
framework.
Objective: To Optimize G with the help of indices
as access methods over tuples in D.
Discrete State Search: From the view of using
indices, we are to search the maximizing tuples on
the index nodes as “discrete states”.
Continuous Function Optimization: From the
view of maximizing goal functions, we are to
optimize G.
G
OPT*
Optimize G
over D
Function optimization
of G
Discrete state search
over D
D
D
19
Indices
Value Space


States : States in a search graph represent “localities” of
values at different granularity– from coarse to fine, and
eventually reach tuples in the database.
• Region State
• Tuple State
Transitions : While states of space give “locations” in the
map, transitions further capture possible paths followed to
reach our destination of query answers.
Example : for two states u and v, there is a transition (u, v) if v
∈ Next(u)
b1
0-250
250-600
b2
0-100
b3
100-250
………
250-350
b6
2
5
350-600
b7
1
Price
(k) 600
1
350
2
250
5
4
3
100
6
1500
a1
3000
0-3000
4000
4500
3000-4500
a2
0-1500
size
a3
1500-3000
………
3000-4000
4000-6000
a6
5
1
a7
22
b1
0-250
250-600
b2
0-100
b3
100-250
………
250-350
2
2
b7
5
1
250
5
M22
4
3
……
M
100
55
6
1500
a1
0-3000
3000
4000
4500
size
3000-4500
a2
0-1500
a3
1500-3000
………
3000-4000
(ai, bj)
4
M32
M23
M33
M75 M56 M66 M77
2
5
M76
1
M67
4000-6000
a6
5
=
M11
1
350
350-600
b6
Mij
Price
(k) 600
1
a7
23
b1
0-250
250-600
b2
0-100
b3
100-250
………
250-350
Price
(k) 600
2
b7
2
5
250
1
5
M22 M32 M23 M33
4
3
100
… M55 M75 M56 M66 M77 M67 M76
6
1500
a1
0-3000
3000
4000
4500
size
4
2
5
1
3000-4500
a2
0-1500
M11
1
350
350-600
b6
conceptually, combined
space
Mij =(ai, bj)
a3
1500-3000
………
3000-4000
4000-6000
a6
5
1
a7
24
Challenge 1: What is the search mechanism?
25
K-constrained
optimization
A* Shortest path
Find a tuple with
maximal score
Find a path with
minimal distance
> A* Gives Shortest Path to testable goal.
> The goal is to find optimal tuple states with
maximal G-Score.
26

How to encode a tuple to a path?
◦ Adding a virtual target t* only reachable through tuples

How to encode maximal tuple with minimal path?
◦ Quality of path depends solely on the tuple it passes
by
M1
 For tuple state t
0
D(t, t*) = - G(t)
 For two states r, u
D(r, u) = 0
0
M22
0
1
M32 M23
M33
0
… M55 M75 M56 M66 M77 M67 M76
0
0
4
2
- G(4)
5
1
- G(1)
t*
27
Challenge 2: How to guide the search?
28


Function optimization measures quality of states
Function optimization aspects:
• Defines Proper Heuristics
• Identifies a set of initial states to start search.
29


Input : G(x1,……,xm) and domain of values dom = xi ε [xi1,xi2]
Output : <O,U> = OPT(G,dom)
where O={gives local optima}
U={Upper Bound Score}
OPTPOINT gives O Component of OPT
OPTMAX gives U Component of OPT
Approaches
Analytical Method
Seach based (Ex:Hill
Climbing)
Template Based
High
Medium
Low
Figure illustrates different states have different promises.
Search should favor the choice of M77 over M67 because its
more promising.


To guarantee completeness
◦ A* requires admissible heuristics, i.e., estimate optimistically
To ensure admissible heuristics
◦ Function optimization gives tightest upper bound
 Analytical approaches
 Numeric analysis package
H(region) = OPTMAX(G, region)
i.e., maximal value of G in the region
32
600
M67
350
2
250
3
M77
1
5
4
100
6
1500


3000
4000
4500
h(M67) gives U=0
However if we follow the link from M67 to M77, we can
reach Tuple 1 with score 15.


To guarantee optimality
◦ A* requires descending heuristics
To ensure descending heuristics
◦ Remove uphill links
M11
M22
M32 M23 M33
… M55 M75 M56 M66
4
2
5
M77 M67 M76
1
34


To guarantee correctness
◦ Every tuple state must be reachable from start states
◦ Taking only downhills requires start with high points
To ensure reachability
◦ Initial states should contain all local optima
M11
M22 M32 M23
…
M55 M75
4
M33
M56 M66 M77 M67
2
5
M76
1
35
top-down
M11
M22
…
M55
4

M75
M32
M23
M56
M66
2
5
M33
M77 M67 M76
M57
1
Search is implemented as priority queue driven traversal
36

Example . Given a set of states constructed from the set of
index graph I, the search, in principle, should follow those
transitions to look for the tuple states maximizing the goal
function.. The search may follow the path

M11 → M33 → M77 → 1  Top-down search

M57 → M77 → 1
 Bottom-Up Search
M11
M22 M32 M23
M55 M75
4
M33
M56 M66 M77 M67
2
5
1
M76



OPT* may result in different costs if started at
different initial states.
Top down-> More hops | Bottom up->Less hops
Preference goes to Bottom Up but what if
Goal functions G=1/(X-Y)2+1, any value satisfying
X=Y maximizes the function.



Comparison vs.
◦ Boolean then ranking
◦ Ranking then boolean
Metrics:
node accessed = Nl + Nt
Settings:
◦ Benchmark queries over real dataset
◦ Controlled queries over synthetic dataset
40


Datasets:
◦ 19,706 real estate listing crawled online
Queries
◦ Q1: size * bedrms/| price-450k| : [40k<=price<=50k]
◦ Q2: size * ebedrms / |price-350k| : [price<400k^size>4000]
◦ Q3: size/price : [bedrms=3 ν bedrms=4]
BR_clustered
BR_unclustered
OPT*
Q1
Q2
Q3
41


Datasets
◦ Three randomly generated datasets of 100k points
 Uniform, gaussian, logvariatenormal
Queries
◦ Linear average queries: (eg, 0.4*a + 0.6*b)
◦ Nearest neighbor queries: (eg, (x-3)^2 + (y-4)^2)
◦ Join queries: (0.4*R.a + 0.6*S.b: R.c=R.d)
!"#$
%
!
"#$%
!"#$
42



Problem
◦ Study K-constrained optimization queries as boolean
+ ranking
Abstraction
◦ Encode K-constrained optimization into shortest path
problem
Framework
◦ Develop OPT* to process K-constrained optimization
43

•
•
References
Boolean + Ranking: Querying a Database by K-Constrained
Optimization. Z. Zhang, S. Hwang, K. C.-C. Chang, M. Wang,
C. Lang, and Y. Chang. In Proceedings of the 2006 ACM SIGMOD
Conference (SIGMOD 2006), pages 359-370, Chicago, June 2006
www.wikipedia.org
44
Questions?
45