Transcript slides

Ranking objects based on
relationships
Computing Top-K over Aggregation
Sigmod 2006
Kaushik Chakrabarti et al.
Outline
•
•
•
•
Motivation
The framework and problem definition
Proposed solution
Discussions
• Experiments
Heating up discussion
• We basically know how web search
engines work.
– Having web crawlers collecting web-page
information, index and rank them.
• How do we define searching in a relational
database
– Free-style search v.s. SQL + predicates ?
– What’s the expected outcome?
– How do we rank results?
Motivation
• Searching over a relational database
– information scattered in different relations
Motivation
• Full text search, aggregation already
supported by RDBMS
– What else do we need in order to perform
good searching?
Related work
• Information Retrieval (full text searching)
• Researches in Text Databases
• Explore database via foreign key-primary key
– DBExplorer (ICDE 2002)
– BANKS (ICDE 2002)
– DISCOVER (VLDB 2002)
• What are related work missing
– Target objects don’t contain keywords
– Lack of scoring function for query results
– Not utilizing aggregates to put together search results
for multiple keywords
Contributions
• Introduce an interesting problem domain
• Define “Object Finder” (OF) queries
• Propose scoring functions
• Propose a solution to process OF query
– Return top K ranked results
– Efficient early termination property
System Overview
Scoring functions
• Scoring Matrixes and row- column- marginal's
Scoring semantics
• All Query Keywords Present in each
document
– can be too restrictive
• All Query Keywords Present in Set of
Related Documents
– can not use MIN as row-marginal scoring
• Pseudo-document Approach:
– enlarged searching space
Problem definition
• Object finder problem:
Process OF query as Top-K query
• Top-K query incorporates ranking. Results
are total ordered if we process strong topK
• A good algorithm can utilize early
termination to avoid processing of results
that are not in top-K
Top-K query processing
• General framework:
Supporting Ad-hoc Ranking Aggregates SIGMOD 2006
( presented in May)
*SELECT* ga_1,..ga_n , F
----groups
*FROM* R1,...,Rh
----source rel
*WHERE* c1 AND... cl
----join cond.
*GROUP BY* ga_1,...ga_n
----group def.
*ORDER BY* F
----ordering func. an aggregate
*LIMIT* k
----Top-k setting
Top-K query processing
• For OF query, it is
select TOId, TOValue, score(TOId)
from TargetTable T, R, L1,...,LN
where R.TOId = T.TOID
and R.DocId=Li.DocID (i=1..N)
group by TOId, TOValue
order By score(TOId)
limit k
My work is done
(please try to recall my last talk)
Algorithm : Generate-Prune
Phrase I : Compute top-K candidates
Algorithm Overview
Algorithm
• Phrase II Compute exact top-K
Discussions
• In this work
– Choice of aggregation function
– ranking function in general
– How do you think of this work
• Not limited
– Impact of more complicated schema
– Impact of selectivity of the query
Experiment Results
•
•
•
•
Faster than SQL
Faster than Generate-Only
Robust to # of keywords and selections
Intuitive Results
Experiments
• Faster than SQL
Experiments
• Faster than Generate-Only
Experiments
• Robust to # of keywords and selections
Thank you
Questions to discuss?