Transcript ppt slides

Boolean + Ranking:
Querying a Database by K-Constrained
Optimization
Joint work with: Seung-won Hwang, Kevin C.
Chang, Min Wang, Christian A. Lang, Yuan-chi
Chang
Presented By
Rashmi Pagadala(1000574860)
Swetta Bhaskar(1000568628)
Introduction
K- Constrained Optimization Query
Query Q = ( B , O , k )
B – Qualifying Constraint
O – Quantifying Constraint
k – number of tuples
Many queries naturally combine Boolean and ranking
Traditional databases
Boolean query:
dept = CS and year = 2
+
Qualifying constraint
Quantifying function
Find top answers
B: dept = CS and year = 2
R: gpa
Information retrieval
Database applications on Web
Ranking query:
Top 5 ranked by gpa
Boolean + Ranking form a coherent goal
function

Boolean B + Ranking R = Goal function G
For a tuple t
G(t) = B(t)*R(t) =
R(t)
0 (ie, lowest score)
if B(t) is true
if B(t) is false
Motivating scenarios

Data retrieval:

Find houses in certain price range with good
price/sqrft ratio
Select h.address from House h,
h CrimeRate c
Where h.price ≤ 200k ν h.price ≥ 400k and h.zipcode = c.zipcode
Order by h.size/|h.price-300k| *c.crimerate
Limit 1 -1 Limit 10

Data analysis:

Find products with highest sale increase in
consecutive years
Select itemid from Sales s1, Sales s2
Where s1.itemid = s2.itemid and s2.year – s1.year = 1
Order by s2.sale – s1.sale Limit 10
Current techniques lack of global search mechanism

If evaluated as separate operators
D
…
Rankingquery
query
Boolean
BR
…
…
Rankingquery
query
Boolean
BR
 Current techniques optimize only condition-by-condition

If search by an overall goal function G as a ranking
function
Goal function G
D
B
R
 Current techniques restrict G to be monotonic
The nature of Boolean + Ranking is K-constrained
optimization query

Optimize goal function G over database D
Goal
function G
Database D
G
h.size/|h.price-300k|
[h.price ≤ 200k ν h.price ≥ 400k ]
Addr
Zip
Price
Size
1.
Oak park, Chicago
60644
600K
4500
2.
Mattis, Champaign
61821
350K
2000
3.
…
150K
1000
4.
…
250K
2000
5.
…
300K
3500
6.
…
80K
500
D
Our Goal: Evaluate query as its nature suggests!
G
OPT*
Optimize G
over D
Function optimization
of G
Discrete state search
over D
D
D
Query Mechanism

Discrete State Search
Search over a discrete set of index nodes
to find the satisfying data tuples

Continuous Function Optimization
Optimize the goal function G over the
domain of a database
Challenge 1: What is the search mechanism?
We encode as A* because it’s optimal


What A* is: Finding the shortest path
Why we choose: Completeness and optimality
with proper heuristics


Complete: guarantee to find shortest path
Optimal: visit least number of nodes
origin
5
3
1
2
5
1
7
9
6
destination
The To do :
Discrete state search perspective ( indices)
Shortest path problem (Encoding)
Index –Induced State Space: A*-Driven Construction







K constrained optimization
State space induced by indices: state & route
Ii over relation Di
Ii= (V,E), where V=RUT
Dom(ni) ,ni €R
The reachable node depends on t he type of n
Each node is referred to as I.V and I.E.
We view compound index as discrete
space
b1
b2
0-100
Price (k)
0250
250-600
b3
100-250
………
250350
b6
2
5
600
1
350
350600
2
b7
1
250
5
4
3
100
6
1500
a1
03000
a2
01500
3000
4000
30004500
15003000
………
4500 size
a6
a3
30004000
40006000
5
1
a7
We view compound index as discrete
space
b1
b2
0-100
Price (k)
250-600
0250
………
600
b3
100-250
250350
b6
2
5
Mij =(ai, bj)
M11
1
350
350600
b7
1
2
250
5
M22
M23
M33
4
3
…… M55
100
6
M75
M56
M66
M77
M6
7
1500
a1
03000
a2
01500
M32
3000
30004500
15003000
………
4000
4500 size
a6
a3
30004000
40006000
5
1
a7
4
2
5
1
M76
We view compound index as discrete
conceptually, combined space
space
b1
b2
0-100
Price (k)
250-600
0250
………
600
b3
100-250
250350
b6
2
5
Mij =(ai, bj)
M11
1
350
350600
b7
1
2
5
250
M22
100
…
6
a2
01500
3000
a6
4500 size
4000
30004500
15003000
………
M33
M55
M75
M56
M66
M77
M6
7
1500
03000
M23
4
3
a1
M32
a3
30004000
40006000
5
1
a7
4
2
5
1
M76
Mapping the Space : state & transition








States: Index graph ,Composite graph
Composite graph is categorized: Region U Tuples
Region state:
#Internal state
#Leaf state
Tuple state
Transition:
Internal state-Branch in( top down )
Leaf state-Branch out( bottom up)
Cartesian product + intersection of node
Example
Given a set of states constructed from the set of index graph I
considering the transitions among the states. To reach tuple 1.
The search, in principle, should follow those transitions to look for the
tuple states maximizing the goal function. For instance, suppose we
decide to start from the root of the graph M11.
This essentially follows a top-down search strategy.
The search may follow the path M11 →M33 → M77 → 1 to reach the
target tuple state.
Alternatively, as a bottom-up search, suppose we start fromM67, the
search may follow an alternative route M67 → M77 → 1.
Encoding our problem into shortest path is
challenging

K-constrained
optimization
Shortest path
Find a tuple with
maximal score
Find a path with
minimal distance
How to encode:
 a tuple  a path?
 score of tuple distance of path?
Therefore, we encode K-constrained opt. as:

How to encode a tuple to a path?


Adding a virtual target t* only reachable through tuples
How to encode maximal tuple with minimal path?

Quality of path depends solely on the tuple it passes by

M11
For tuple state t
0
D(t, t*) = - G(t)

M22
For two states r, u
D(r, u) = 0
0
M32
M23
M33
0
0
… M55 M75
M56 M66 M77 M67 M76
0
0
4
2
5
1
- G(1)
- G(4)
t*
Challenge 2: How to guide the search?
We use function opt. to sketch the landscape of G

Function optimization measures quality of states

Function optimization enables:



1. How to define heuristics?
2. How to configure space?
3. Where to start the search?
Return local optima O and upper bound score U
1. Define admissible heuristics: Measure tightest
upper bound

To guarantee completeness


A* requires admissible heuristics, ie, estimate
optimistically
To ensure admissible heuristics

Function optimization gives tightest upper bound


Analytical approaches
Numeric analysis package
H(region) = OPTMAX(G, region)
ie, maximal value of G in the region
2. Configure descending space: disconnect
uphills

To guarantee optimality


A* requires descending heuristics
To ensure descending heuristics

Remove uphill links
M11
M22
…
M55
4
M75
M32
M23
M33
M56
M66
M77
2
5
1
M67
M76
Find right start point: Start from local optima

To guarantee correctness
Every tuple state must be reachable from start states
Taking only downhills requires start with high points



To ensure reachability
Initial states should contain all local optima

M11
M22
…
M55
4
M75
M32
M23
M33
M56
M66
M77
2
5
1
M67
M76
OPT SEARCH ALGORITHM
Putting together:
Executing OPT* on the configured space
top-down
M11
M22
…
M55
4

M75
M32
M23
M56
M66
2
5
M33
M77 M67 M76
M57
1
Search is implemented as priority queue driven
traversal
Putting together:
Executing OPT* on the configured space
top-down
M22
…
M55
M75
4
M11
M32
M56
M66
2
5
bottom-up
M22
…
M55
4

M75
M23
M32
M33
M77 M67 M76
M57
1
M11
M23
M56
M66
2
5
M33
M77 M67 M76
M57
1
Bottom-up approach is always better than topdown
Experiments

Comparison vs.


Boolean then ranking
Ranking then boolean

Metrics:

Settings:


node accessed = Nl + Nt
Benchmark queries over real dataset
Controlled queries over synthetic dataset
Benchmark queries

Datasets:


19,706 real estate listing crawled online
Queries



Q1: size * bedrms/| price-450k| : [40k<=price<=50k]
Q2: size * ebedrms / |price-350k| : [price<400k^size>4000]
Q3: size/price : [bedrms=3 ν bedrms=4]
BR_clustered
BR_unclustered
OPT*
Q1
Q2
Q3
Controlled queries

Datasets
Three randomly generated datasets of 100k points



Uniform, gaussian, logvariatenormal
Queries
Linear average queries: (eg, 0.4*a + 0.6*b)
Nearest neighbor queries: (eg, (x-3)^2 + (y-4)^2)
Join queries: (0.4*R.a + 0.6*S.b: R.c=R.d)



!"#$
%
!
!"#$
"#$%
Conclusion

Problem


Abstraction


Study K-constrained optimization queries as boolean +
ranking
Encode K-constrained optimization into shortest path
problem
Framework

Develop OPT* to process K-constrained optimization
References:

Boolean + Ranking: Querying a Database by K-Constrained
Optimization Joint work with: Seung-won Hwang, Kevin C. Chang,
Min Wang, Christian A. Lang, Yuan-chi Chang

W. H. Press, S. A. Teukolsky, W. T .Vetterling, and B. P.Flannery.
CAMBRIDGE UNIVERSITY PRESS, 2nd Edition, 1992.

www-forward.cs.uiuc.edu/talks/2006/asopt-sigmod06-zzhang-jun06

http://portal.acm.org/citation.cfm?id=1142473.1142515
THANK YOU !
Questions?