Keyword Searching and Browsing in Databases using BANKS

Download Report

Transcript Keyword Searching and Browsing in Databases using BANKS

Keyword Searching and Browsing
in Databases using BANKS
Seoyoung Ahn
Mar 3, 2005
The University of Texas at Arlington
Outline
 Introduction
 Database and Query Model
 Searching for the best answers
 Browsing features of BANKS
 Experiment
 Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Introduction
 Search engines on Web have popularized an
unstructured querying and browsing

Simple and user-friendly

Users just type in keywords and follow hyperlink
 Relational databases are commonly searched using
structured query language

Users need to know the schema
 Keyword searching techniques cannot be used on
data stored in databases

It often splits across the tables/tuples due to normalization
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Introduction(cond..)
 BANKS (Browsing And Keyword
Searching)
 a system which enables keyword-based search on
relational databases, together with data and schema
browsing
User
Seoyoung Ahn
HTTP
BANKS
system
JDBC
Database
Keyword Searching and Browsing in Databases using BANKS
Introduction(cond..)
 BANKS (Browsing And Keyword
Searching)
 a framework for keyword querying of relational
database
 a novel and efficient heuristic algorithm for executing
keyword queries
 key features of BANKS system
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
 Introduction
 Database and Query Model

Informal Model

Formal Model

Query and Answer Model
 Searching for the best answers
 Browsing features of BANKS
 Experiment
 Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Informal Model
 Model Description
database

each tuple in db
fk-pk-Link
Seoyoung Ahn
directed graph
node in the graph


directed edge
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 The Schema
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 A Fragment of the Database
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Informal Model(cond.)
 An answer to a query should be a subgraph
connecting nodes matching the keywords.
 The importance of a link depends upon the type of
the link i.e. what relations it connects and on its
semantics
 Ignoring directionality would cause problems because
of “hubs” which are connected to a large numbers of
nodes.
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Formal Database Model
Nodes and edges
 Node Weight : N(u)
Depends on the prestige
Set the node prestige = the indegree of the node
Nodes that have multiple pointers to them get a higher
prestige
Node score N = root node weight
+ ∑ leaf node weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Formal Database Model (Cond.)
 Edge Weights
Some pupluar tuples can be connected many other
tuples  Edge with forward and backward edge weights
Weight of a forward link = the strength of the proximity
relationship between two tuples
(set to 1 by default)
Weight of a backward link = indegree of edges pointing
to the node
Total edge weight = ∑ edge weights
Edge score E = 1 / Total edge weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Formal Database Model (Cond.)
 Overall relevance score
= Node weights + Edge Weight
Normalize in the range [0,1]
 Combine using weighting factor 
Additive: (1- ) E + N;
multiplicative: E * N
Seoyoung Ahn

Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Query and Answer Model
 Query
A set of keywords e.g.{k1,k2,…kn}
A set of nodes Si = {S1,S2,…Sn}
Locate nodes matching search terms t1,t2,…tn
 Answer Model
A rooted directed tree connecting keyword nodes
 Relevance score of an answer tree
Relevance scores of it nodes and its edge weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Answer Model
 A rooted directed tree connecting keyword
nodes
 Multiple answers
Ranked by proximity + prestige
Proximity  edges weights
Prestige  indegree of nodes
 Relevance score of an answer tree
Relevance scores of it nodes and its edge weight
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Database and Query Model
 Result of query “sudarshan soumen”
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
 Introduction
 Database and Query Model
 Searching for the best answers

Backward expanding search algorithm
 Browsing features of BANKS
 Experiment
 Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Searching for the best answers
 Backward expanding search algorithm

Offers a heuristic solution for incrementally computing query
results.

Assume that the graph fits in memory

Start at leaf nodes each containing a query keyword

Run concurrent single source shortest path algorithm from each
such node

Traverses the graph edges backwards

Confluence of backward paths identify answer tree roots
Output a node whenever it is on the intersection of the sets of nodes
reached from each keyword

Answer trees may not be generated in relevance order
Insert answers to a small buffer (heap)
Output highest ranked answer from buffer to user when buffer is full
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Searching for the best answers
 Model (Query : Charuta Sudarshan Roy )
BANKS: Keyword search…
paper
writes
Charuta
Seoyoung Ahn
S. Sudarshan
Prasan Roy
author
Keyword Searching and Browsing in Databases using BANKS
Outline
 Introduction
 Database and Query Model
 Searching for the best answers
 Browsing features of BANKS
 Experiment
 Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Browsing
 BANKS system provides
 A rich interface to browse data stored in a relational
database
 Automatically generates browsable views of database
relations and query results
 Schema browsing and data browsing
 A hyperlink to the referenced tuple
 Templates for several predefined ways of displaying
data
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Browsing
 Data browsing
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Browsing
 Schema browsing
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
 Introduction
 Database and Query Model
 Searching for the best answers
 Browsing features of BANKS
 Experiment
 Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Error scores vs parameter choices
 The rankings are
relatively stable
across different
choices of parameter
values
  = 0.2 coupled
with log scaling of
edges weights
does best
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Outline
 Introduction
 Database and Query Model
 Searching for the best answers
 Browsing features of BANKS
 Experiment
 Conclusion
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS
Conclusion
 BANKS system
 provides an integrated browsing and keyword
querying system for relational databases
 allows users with no knowledge of database systems
or schema to query and browse relational database
with ease
Seoyoung Ahn
Keyword Searching and Browsing in Databases using BANKS