Swinburne Marketing Strategy

Download Report

Transcript Swinburne Marketing Strategy

Processing XML Keyword Search by
Constructing Effective Structured Queries
Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning
Swinburne University of Technology, Australia
Outline

Motivation of Keyword Search in XML

Brief Review of Related Work

Existing Problems

Construct Structured Query Templates

Ranking Function

Processing Algorithms

Conclusions
Motivation of XML Keyword Search

Keyword search is easy-to-use
 Users don’t need to know the structure of XML data and
specific query languages.
 The XML data with different structures can be searched
equivalently by a keyword query because it doesn’t specify
the structures of the retrieved results.
Brief Review of Related Work

We focus on 4 references using label and term as
keyword query format:
 [YunyaoLi2004VLDB] Schema-Free XQuery.
 [DanielaFlorescu2002ComputerNetworks] Integrating
keyword search into XML query processing.
 [SaraCohen2003VLDB] XSEarch: A semantic search engine
for XML.
 [WeidongYang2007CIT] Schema-aware keyword search
over xml streams.

Other relevant work can be found in our paper.
Brief Review of Related Work

All the four work utilized label and term as keyword
query format.

The difference: the first three work shared the similar
basic strategy that first retrieves the relevant keyword
lists and then merges them into the results; while the
last one first generate a big template that covers all the
kinds of results w.r.t. XML schema and then cache the
possible results over xml streams.
Template-based strategy can obtain better
performance [WeidongYang2007CIT]!
Existing Problems

[WeidongYang2007CIT] was
used to query over XML streams,
which is not enough because of the challenges:
 Different templates may exist in one XML data repository.
 Users prefer to see part of the results, e.g., top k results.
 Domain knowledge can be helped to process the labels with
the same meaning.

Therefore, it is required to study the problem of applying
template-based keyword search strategy to XML data
repository.
Construct Structured Query Templates

Example: There are two data sources that conform to t1 and
t2 respectively.
Schema t1
Schema t2
Keyword query – (year:2006, title:xml, author:philip)
Construct Structured Query Templates

Identifying context of keywords
 Determine master entities using labels in keyword query and XML
schema.
 Generate FOR clause for each entity.
 Judge the occurrences of every label under each master entity.
 Once a time – Generate WHERE clauses
 More than once – First cluster and then generate WHERE clauses.

Step 1: determine master entity and its
corresponding label set
 Q1 = “For $b in bibliography/books/book”
 Q2 = “For $a in bibliography/articles/article”

Schema t1
Step 2: only one occurrence of each label in
each master entity.
 Q1 += “Where $b/year=‘2006’ and
$b/title.contains(xml) and
$b/author.contains(philip)”
 Q2 += “Where $a/year=‘2006’ and
$a/title.contains(xml) and
$a/author.contains(philip)”
Keyword query – (year:2006, title:xml, author:philip)

Step 1: determine master entity and its
corresponding label set
 Q = “For $bi in bibliography/bib”
Schema t2

Step 2: only two occurrences of each label in
the master entity. Cluster title and author using
book and article respectively
 Q1 += Q + “For $bo in $bi/book”
 Q2 += Q + “For $a in $bi/article”

Step 3: only one occurrence of each label in
each cluster.
 Q1 += “Where $bi/year=‘2006’ and
$bo/title.contains(xml) and
$bo/author.contains(philip)”
 Q2 …
Keyword query – (year:2006, title:xml, author:philip)
Construct Structured Query Templates

Identifying returned nodes
 Step1: If the cardinality of a master entity satisfies “*” and no
cluster operation is activated, we take the master entity as a
return node in constructed queries;
 Step 2: If the cardinality of a master entity satisfies “*” and
clusters are generated, we first check the root node of each
cluster in a recursive procedure (back to step 1);
 Step 3: If the cardinality of a master entity does not satisfy
“*”, we will probe its ancestor nodes one by one until this
kind of node exists or the root of the xml schema.
Schema t1

Schema t2
Master entities are the returned nodes.
 Q1 += “$b”
 Q2 += “$a”

Roots of clusters are the returned nodes.
 Q1 += “$bo”
 Q2 += “$a”
The constructed queries can be read in our paper!
Keyword query – (year:2006, title:xml, author:philip)
Ranking Function
Score ( A , q i ) 
1
n
n
 (vi , ti )
i 1
LengthOfPa th ( v i , v m )

 vm is the master entity nodes;
 ω(vi, ti) is calculated by using tf*idf weight model.
Feature of the function: The Score() consists of two parts
ContextScore() and tf*idf weight, and the former is the upper
bound of the score of the results.
Processing Strategy

Algorithm 1 is used to generate structured queries with
their corresponding context score.

Algorithm 2 is used to schedule the query plan
according to the conditions:
 Users’ requirements, e.g., number of results;
 Context scores of all generated queries;
 And the intermediate results.
Experiments

Dataset:
 Sigmod record
 three variant of DBLP

Keyword Queries:
 q1 (author:David, title:XML)
 q2 (year:2002, title:XML)
Experimental Results
q1
q1(k = 10)
q2
q2(k = 20)
Conclusions

XBridge is proposed to process keyword query over
XML data repository, which can efficiently find the top k
results by evaluating generated structured queries.

A precise ranking function is provided to evaluate the
relevance of the results.

Limitation of this work:
 We take XML schema as tree patterns;
 We didn’t consider reference relationships of XML data.