Multidatabase Querying by Context
Download
Report
Transcript Multidatabase Querying by Context
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Querying Relational Databases
without Explicit Joins
Ramon Lawrence
Ken Barker
University of Iowa
University of Calgary
[email protected]
[email protected]
Page 1
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Outline
Introduction,
Motivation, and Background
What
is wrong with SQL?
How can we replace SQL?
Querying
An
by Context using Semantic Names
example query
Query
Architecture
Term
Query
dictionary, X-Specs, query processor
Processor Algorithms
Field/table
mapping discovery, join selection
GUI extensions to simplify query construction
Future
work and conclusions
Page 2
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Introduction and Motivation
Despite
improvements in “core” database
technology, advances in database query
languages have not kept pace.
SQL is still the fundamental basis for most
access languages and tools.
SQL is often difficult to use for beginning users
or when formulating queries on large and
complex database schema.
Thus, the motivation for designing a high-level
query language for users that is also
backwards compatible with SQL.
Page 3
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
What is wrong with SQL?
There
is nothing wrong with SQL. However,
SQL is not a simple query language for many
reasons:
Querying
by structure does not hide complexities
introduced due to database normalization.
Structures (fields and tables) may be assigned poor
names that do not adequately describe their
semantics.
Notion of a “join” is confusing for beginning users
especially when multiple joins are present.
SQL forces structural access which does not
provide logical query transparency and restricts
logical schema evolution.
Page 4
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
SQL as a Standard
SQL
is an universally-accepted standard
supported by all major relational DB vendors.
Object-oriented
query languages are also developed
similar to SQL.
Most vendors provide graphical query tools that
simplify SQL construction.
SQL is “relatively” standardized across database
systems and platforms.
Conclusion:
Any new query language should
be backwards compatible with SQL to
guarantee its usefulness.
However,
it is desirable to totally hide SQL
formulation at the user-level.
Page 5
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Previous Work
There
have been several research systems and
prototypes in the area. They fall in 3 categories:
1)
Graphical Query Tools and Models
Query by example (QBE), database query tools
2)
Query by Word Phrases
SemQL (WordNet)
Information retrieval techniques, web searching
3)
User-directed Querying
Kaleidoscope (logic)
Page 6
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Query Issues
A
desirable query language should:
Allow
the user to be able to systematically and
deterministically produce results.
Provide a suitable formalism for representing and
querying data.
Hide the naming and structures of the database by
providing logical and physical access transparency.
Allow the user to browse the contents of the
database to determine query concepts.
Provide a graphical query interface.
Be generally applicable to multiple database systems
and/or data models.
Page 7
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Querying by Context (QBC)
Querying
by context (QBC) is a methodology for
querying relational databases by semantics.
Querying
is performed by selecting semantic names
that represent query concepts.
Semantic names are assigned once by the DBA to
describe database semantics.
Users
query the database by selecting semantic
names from the context view.
The
context view contains all concepts present in the
database with appropriate semantic names.
A
query processor maps the user’s selections
and criteria to an actual SQL query.
Page 8
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Querying by Context Example
Order Database Schema
Tables
Categories
Customers
Employees
OrderDetails
Order
Products
Shippers
Suppliers
Fields
CategoryID, CategoryName
CustomerID, CompanyName
EmployeeID, LastName, FirstName
OrderID, ProductID, UnitPrice, Quantity
OrderID, CustomerID, EmployeeID, OrderDate,
Shipvia
ProductID, ProductName, SupplierID, CategoryID
ShipperID, CompanyName
SupplierID, CompanyName
Page 9
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Querying by Context Example (2)
User
query: Retrieve all orders and customer
names where an order contains a product from
category ‘Produce’.
SQL:
SELECT O.OrderID, CU.CompanyName
FROM [Categories] AS C, [Customers] AS CU, [Orders]
AS O, [Products] AS P, [Order Details] AS OD
WHERE C.CategoryName = 'PRODUCE’ AND
(C.CategoryID = P.CategoryID) AND (P.ProductID =
OD.ProductID) AND (OD.OrderID = O.OrderID) AND
(O.CustomerID = CU.CustomerID)
Page 10
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Query by Context Processes
The
query architecture consists of three
separate processes:
Capture
process: independently extracts database
schema information and metadata into a XML
document called a X-Spec.
Integration
process: combines X-Spec(s) into a
structurally-neutral hierarchy of database concepts
called an integrated context view.
Query
process: allows the user to formulate queries
on the integrated view that are mapped by the query
processor to structural queries (SQL) and the
results are integrated and formatted.
Page 13
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
The Integrated Context View
Before
semantic querying can begin, the DBA
must assign semantic names to each field and
table in the database.
Semantic
names and schema mappings are stored in an
XML document called an X-Spec.
Integration
of one or more X-Specs produces a
structurally-neutral hierarchy of concepts called
an integrated context view.
We
A
will only look at querying one database.
context view (CV) is a valid Universal Relation.
Each
field is assigned a semantic name which uniquely
identifies its semantic connotation.
Page 14
Query Architecture
Client
Client
Global Query Layer
• user’s view of integration
2) X-Spec Editor
Integrated Context View
X-Spec
Editor
Standard
Dictionary
Architecture Components:
1) Integrated Context View
• stores schema & metadata
• uses XML
Integration
Algorithm
3) Standard Dictionary
• terms to express semantics
4) Integration Algorithm
Query Processor and ODBC Manager
5) Query Processor
Subtransactions
X-Spec
X-Spec
Database
Database
Local Transactions
• combines X-Specs into
integrated context view
• accepts query on view
• determines data source
mappings and joins
• executes queries and
formats results
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
What is a semantic name?
A
semantic name is a universal, semantic
identifier in a domain.
Similar
to a field name in the Universal Relation.
Semantics are guaranteed unique by construction.
Users
query based on the semantic names
provided by the DBA.
Systematic construction of semantic names
allows:
The
system to insert join conditions into query.
The context view to be organized hierarchically by
semantic concept to reduce burden on user.
Page 16
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
User Query Formulation
Users
query the context view by:
browsing
and selecting semantic names from the
context view for selection/projection
specifying ordering and selection criteria
Users
do not have to specify joins between
concepts:
Some
joins are implicit by virtue of hierarchical
semantic names. (e.g. [Order;Product] of [Order]).
Joins are automatically inserted when required if
not directly specified by user.
User can explicitly direct joins by browsing CV
which automatically connects concepts.
Page 17
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Query Processor
Query
processor translates from semantic
query to SQL which requires:
Mapping
from semantic names to field and table
names using supplied X-Spec mappings.
One challenge is determining a particular field mapping to
use if multiple are present. (e.g. [Order] Id)
Insertion
of joins to preserve user query semantics.
If user specifies no relationship between concepts, chose
shortest semantic join path, otherwise select shortest
physical join path.
If
user specifies (some) join semantics, then join
determination is simplified.
System does not prevent user from specifying complex,
non-standard joins like outer joins.
Page 18
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Selecting Joins Using Join Graphs
Determining
joins to use is simplified by
defining a join graph.
A join graph is an undirected graph where:
Each
node Ni is a table in the database.
There is a link from node Ni to node Nj if there is a
join between the two tables.
A
join path is a sequence of joins connecting
two nodes in the graph.
A join tree is a set of joins connecting two or
more nodes.
A join matrix M stores the shortest join paths
between any two nodes (tables).
Page 19
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Join Graph for Order Database
1
Suppliers
N
N
1
Categories
Products
1
Products
N
OrderDetails
1
Shippers
N
N
1
N
Orders
N
Products
1
Employees
Products
1
Customers
Page 20
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Join Discovery Results
Join
discovery in a database with a connected,
acyclic join graph is simple as there exists only
one join tree for any set of tables.
For a cyclic join graph, there may exist more
than one join tree for a set of tables and each
tree may have different semantics.
Users
may eliminate possibilities when browsing
the context view or using direct specification.
Otherwise, system selects shortest join path first
considering semantic names, and then physical join
paths.
Page 21
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Reducing Join Ambiguity
Join
ambiguity can be reduced during query
formulation without the user’s knowledge.
Example:
Retrieve all orders and customer names
where an order contains a product in category
‘Produce’.
Semantic
names selected:
[Order]
Id, [Customer] Name, [Category] Name
no join ambiguity if acyclic graph, however…
Semantic
names with no ambiguity:
[Order]
Id, [Order;Customer] Name,
[Order;Product;Category] Name
Names
indicate path from starting context
“Order”.
Page 22
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Reducing Join Ambiguity (2)
From
a user interface perspective, the second
set of semantic names can be determined if:
When
the user browses the [Order] context, the
[Customer] name information is merged into the
[Order] context using a hidden connection on
[Order] Id.
Similarly, connect [Product] and [Category]
information through [Order] Id (Order to Product)
and [Product] Id (Product to Category).
As the user browses the view and uses these
connections, this information can be exploited to
determine appropriate join paths.
Page 23
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Query by Context Discussion
Important
benefits of querying by context:
System
table and field names are not presented to
the user who queries based on semantic names.
Database structure is not shown to the user.
Field and table mappings are automatically
determined based on X-Spec information.
Join conditions are inserted as needed when
available to join tables and will preserve user’s
query semantics as specified.
Structural neutrality of context view allows QBC to
be extended to non-relational databases and be
used as a query language for integrated databases.
Page 24
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Conclusions
User
querying can be simplified by semantic
naming of schema constructs (by the DBA) that
hierarchically organizes concepts into a view.
Query by context provides logical query
transparency that is suited for databases with
schema evolution or integrated systems.
Users are able to transparently query
integrated systems by concept instead of
structure.
Handling join ambiguity is an important
component in mapping to SQL.
Page 25
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
Future Work
Continuing
to refine a prototype of the system
called Unity.
A comparison study of query by context versus
traditional SQL, database query tools, and
query by example.
The query processor is being extended to
resolve more complex queries.
Joins
on non-standard keys, joins across
databases, etc.
Page 26
Querying Relational Databases without Explicit Joins
Ramon Lawrence, Ken Barker
References
Publications:
Unity
- A Database Integration Tool, R. Lawrence
and K. Barker, TRLabs Emerging Technology
Bulletin, January 2000.
Multidatabase Querying by Context, R. Lawrence
and K. Barker, DataSem2000, pg 127-136, Oct. 2000.
Using Unity to Semi-Automatically Integrate
Relational Schema, Demonstration to appear at
ICDE’2002.
Integrating Relational Database Schemas using a
Standardized Dictionary, SAC’2001 - ACM
Symposium on Applied Computing, March, 2001.
Further
Information:
http://www.cs.uiowa.edu/~rlawrenc/
Page 27