Keyword Searching and Browsing in Databases using BANKS
Download
Report
Transcript Keyword Searching and Browsing in Databases using BANKS
Keyword Searching and Browsing
in Databases using BANKS
Seoyoung Ahn
Mar 3, 2005
The University of Texas at Arlington
Outline
Introduction
Database and Query Model
Searching for the best answers
Browsing features of BANKS
Experiment
Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Introduction
Search engines on Web have popularized an
unstructured querying and browsing
Simple and user-friendly
Users just type in keywords and follow hyperlink
Relational databases are commonly searched using
structured query language
Users need to know the schema
Keyword searching techniques cannot be used on
data stored in databases
It often splits across the tables/tuples due to normalization
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Introduction(cond..)
BANKS (Browsing And Keyword
Searching)
a system which enables keyword-based search on
relational databases, together with data and schema
browsing
User
Seoyoung Ahn
HTTP
BANKS
system
JDBC
Database
Keyword Searching and Browsing in Databases using BANKS
Introduction(cond..)
BANKS (Browsing And Keyword
Searching)
a framework for keyword querying of relational
database
a novel and efficient heuristic algorithm for executing
keyword queries
key features of BANKS system
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
Introduction
Database and Query Model
Informal Model
Formal Model
Query and Answer Model
Searching for the best answers
Browsing features of BANKS
Experiment
Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Informal Model
Model Description
database
each tuple in db
fk-pk-Link
Seoyoung Ahn
directed graph
node in the graph
directed edge
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
The Schema
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
A Fragment of the Database
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Informal Model(cond.)
An answer to a query should be a subgraph
connecting nodes matching the keywords.
The importance of a link depends upon the type of
the link i.e. what relations it connects and on its
semantics
Ignoring directionality would cause problems because
of “hubs” which are connected to a large numbers of
nodes.
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Formal Database Model
Nodes and edges
Node Weight : N(u)
Depends on the prestige
Set the node prestige = the indegree of the node
Nodes that have multiple pointers to them get a higher
prestige
Node score N = root node weight
+ ∑ leaf node weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Formal Database Model (Cond.)
Edge Weights
Some pupluar tuples can be connected many other
tuples Edge with forward and backward edge weights
Weight of a forward link = the strength of the proximity
relationship between two tuples
(set to 1 by default)
Weight of a backward link = indegree of edges pointing
to the node
Total edge weight = ∑ edge weights
Edge score E = 1 / Total edge weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Formal Database Model (Cond.)
Overall relevance score
= Node weights + Edge Weight
Normalize in the range [0,1]
Combine using weighting factor
Additive: (1- ) E + N;
multiplicative: E * N
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Query and Answer Model
Query
A set of keywords e.g.{k1,k2,…kn}
A set of nodes Si = {S1,S2,…Sn}
Locate nodes matching search terms t1,t2,…tn
Answer Model
A rooted directed tree connecting keyword nodes
Relevance score of an answer tree
Relevance scores of it nodes and its edge weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Answer Model
A rooted directed tree connecting keyword
nodes
Multiple answers
Ranked by proximity + prestige
Proximity edges weights
Prestige indegree of nodes
Relevance score of an answer tree
Relevance scores of it nodes and its edge weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
Result of query “sudarshan soumen”
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
Introduction
Database and Query Model
Searching for the best answers
Backward expanding search algorithm
Browsing features of BANKS
Experiment
Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Searching for the best answers
Backward expanding search algorithm
Offers a heuristic solution for incrementally computing query
results.
Assume that the graph fits in memory
Start at leaf nodes each containing a query keyword
Run concurrent single source shortest path algorithm from each
such node
Traverses the graph edges backwards
Confluence of backward paths identify answer tree roots
Output a node whenever it is on the intersection of the sets of nodes
reached from each keyword
Answer trees may not be generated in relevance order
Insert answers to a small buffer (heap)
Output highest ranked answer from buffer to user when buffer is full
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Searching for the best answers
Model (Query : Charuta Sudarshan Roy )
BANKS: Keyword search…
paper
writes
Charuta
Seoyoung Ahn
S. Sudarshan
Prasan Roy
author
Keyword Searching and Browsing in Databases using BANKS
Outline
Introduction
Database and Query Model
Searching for the best answers
Browsing features of BANKS
Experiment
Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Browsing
BANKS system provides
A rich interface to browse data stored in a relational
database
Automatically generates browsable views of database
relations and query results
Schema browsing and data browsing
A hyperlink to the referenced tuple
Templates for several predefined ways of displaying
data
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Browsing
Data browsing
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Browsing
Schema browsing
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
Introduction
Database and Query Model
Searching for the best answers
Browsing features of BANKS
Experiment
Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Error scores vs parameter choices
The rankings are
relatively stable
across different
choices of parameter
values
= 0.2 coupled
with log scaling of
edges weights
does best
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
Introduction
Database and Query Model
Searching for the best answers
Browsing features of BANKS
Experiment
Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Conclusion
BANKS system
provides an integrated browsing and keyword
querying system for relational databases
allows users with no knowledge of database systems
or schema to query and browse relational database
with ease
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS