Ph.D. Comprehensive Exam

Download Report

Transcript Ph.D. Comprehensive Exam

Querying for Information Integration:
How to go from an Imprecise Intent
to a Precise Query?
Aditya Telang
Sharma Chakravarthy, Chengkai Li
Motivation
• “Retrieve castles near London that are
reachable by train in less than 2 hours”
• “Find 3-bedroom houses in Houston within 2
miles of a school and within 5 miles of a
highway and priced under 250,000$”
• “Retrieve French restaurants within 1 mile of
IMAX Theater in Dallas, Texas”
• …
Current Scenario
•“Retrieve castles near London that are
reachable by train in less than 2 hours”
- Decision Making Process
- Manually Combine Results to arrive at a decision
London Train
schedules
Castles Near
London
Trains from
London
Ideal Scenario
Intent: Retrieve castles near
London that are reachable by
train in less than 2 hours
Information
Integration
System
Actual Results
for the intent
The InfoMosaic Approach
Query Specification
Query: Castle within 2 hours by train
from London
• Query: “Bank within 1 mile of ‘University of
Texas, Arlington’”
How to specify a query?
• Search Method (e.g., Google) –
– Just needs to search for the ‘keyword’ in a set of
documents
– Get list of documents and post-process (rank,
cluster, classify, etc.)
– In an Integration scenario, this doesn’t work
•
•
•
•
‘bank’ – D1
‘University of Texas, Arlington’ – D2
1 (out of 1 mile) is ignored
Intersecting documents returned will not generate
results desired
How to specify a query?
• Database Method (e.g., SQL) –
– Too rigid
– Need to know database (or source) and its
corresponding attributes
• SELECT T1.a1, T2.a2 FROM T1, T2 WHERE …
– Web is not organized as a database hence exact
mapping between sources and attributes is not
feasible and not available
How to specify a query?
• Natural Language –
– Ideal mechanism
– Inherently hard considering ambiguities of natural
language.
– school – institution for education; group of fish
– Mechanisms such as Question-Answering
frameworks focus on sophisticated language
models built for specific domains independently.
– Incorrect assumption in a integration scenario
Query Specification
• Query – “castle near London”
SELECT castle.name, …
FROM castle_DB
WHERE
Castle.location = ‘London’
Relation containing tuples
List of documents retrieved from Web containing text –
“castle near London”
Query Specification
• Query – “castle near London”
SELECT castle.*
WHERE castle.place = “London”
Information
Integration
No idea about user intent –
Castle:= building, move in chess, … ?
No idea about source, schema, attributes, etc.
No idea about how to pose a query
Proposed Approach
• Approach: refine-as-you-input
• Approach: verify-after-input
4
(refine-as-you-input)
3
Number of
Interactions 2
1
Ideal Formulation
(verify-after-input)
Succinct
Verbose
(keyword)
(natural language, SQL)
Amount of input
Approach 1: Refine-as-you-input
• Based on most popular paradigm of querying
used today – keyword search
– Input: Set of keywords/concepts (e.g., castle, train, …)
– Output: Set of 1 or more Precise Structured Query
– Challenge:
• Keyword Resolution – entity, attribute, value?
• Generating Query from minimal information
– Problem:
• Could result in too many non-relevant queries
– Positive:
• Paradigm accepted by Web, IR and even DB community !!!
Approach 1: Refine-as-you-input
User Interaction
Approach 1: Verify-after-input
• Based on a rigorous method of formulation
queries – similar to SQL
– Input – user filled template based
– Output – single precise query
– Problem
• Users don’t like filling too many details
• Coming up with a unique template across domains
– Positive
• Less ambiguous
• Reduced number of user interactions
Approach 1: Verify-after-input
Evaluation Plan
• Testing the approaches on RDBMS where the
schema and output is known
• Actual user studies
Related Work
Future Work
• Perform extensive experiments to prove the
validity of the proposed approaches
• Address other issues in information
integration
• Current focus – Ranking [Telang:DBRank’07]
Thank You !