Automatic Database Integration
Download
Report
Transcript Automatic Database Integration
Automatic Data
Ramon Lawrence
University of Manitoba
[email protected]
base Integration
Ken Barker
University of Calgary
[email protected]
Summary
Database
integration allows systems within a
company or on the WWW to interoperate.
Current integration solutions require manual
coding which is costly and time-consuming.
Our integration architecture uses XML and a
standard dictionary to automate integration.
Database systems are integrated using 3
processes: capture, integration, and query.
The architecture is implemented in a software
package called Unity.
Unity is capable of automatic database
integration and querying using ODBC.
Applications include data warehouse design,
web site integration, and systems integration.
Integration Architecture
Client
Client
Multidatabase Layer
• user’s view of integration
2) X-Spec Editor
Integrated Context View
X-Spec
Editor
Standard
Dictionary
Architecture Components:
1) Integrated Context View
• stores schema & metadata
• uses XML
Integration
Algorithm
3) Standard Dictionary
• terms to express semantics
4) Integration Algorithm
Query Processor and ODBC Manager
5) Query Processor
Subtransactions
X-Spec
X-Spec
Database
Database
Local Transactions
• combines X-Specs into
integrated context view
• accepts query on view
• determines data source
mappings and joins
• executes queries and
formats results
Integration Processes
The
integration architecture consists of three
separate processes:
Capture
process: independently extracts database
schema information and metadata into a XML
document called a X-Spec.
Integration
process: combines X-Specs into a
structurally-neutral hierarchy of database concepts
called an integrated context view.
Query
process: allows the user to formulate queries on
the integrated view that are mapped by the query
processor to structural queries (SQL) and the results
are integrated and formatted.
Architecture Components
The
architecture consists of four components:
A
standard dictionary (SD) to capture data semantics
SD terms are used to build semantic names describing
semantics of schema elements.
X-Specs
for storing data semantics.
relational database info. stored and transmitted using XML
stores semantic names to describe schema elements
Integration
Algorithm
Identical concepts in different databases are identified by
similar semantic names.
Produces an integrated view of all database concepts.
Query
Processor
allows the user to formulate queries on the view
translates from semantic names in integrated view to SQL
queries and integrates and formats results
involves determining correct field and table mappings
and discovery of join conditions and join paths
Architecture Discussion
The
architecture automatically integrates
relational schemas into a multidatabase
Desirable properties:
individual
mappings - information sources integrated
one-at-a-time and independently
integrated view constructed for query transparency user queries system by semantics instead of structure
handles schema conflicts - including semantic,
structural, and naming conflicts
automated integration - integrated view constructed
efficiently and automatically
no wrapper or mediator software is required
transparent querying - users issue semantic queries
which are translated to SQL by the query processor
no changes are performed on underlying databases
Conclusions
Automatic
database integration is possible by
using a standard term dictionary and defining
semantic names for schema elements.
Integration of data sources has applications to
the WWW and construction of data warehouses.
Users are able to transparently query integrated
systems by concept instead of structure.
Future Work:
Test
the system in large industrial projects.
Improve the Unity implementation and develop an
integration component for a web browser.
Increase the efficiency of the query processor.
Allow distributed updates and global updates on all
databases.
Background &
Motivation
Why is Integration Required?
There
are many integration environments:
operational
systems within an organization
system integration during company merger
data warehouses, Intranets, and the WWW
Users
require information from many data
sources which often do not work together.
Companies require a global view of their entire
operations which may be present in numerous
operational databases for different departments
and distributed geographically.
Business-to-Business (B-to-B) and Business-toConsumer (B-to-C) E-commerce demand
integration of web databases with production
systems for efficiency and productivity.
What is Integration?
Database
integration requires conflict resolution
during schema integration.
A schema is a description of the data
organization and format in a database.
Schema integration combines database schemas
into a unified view of the data.
Constructing
an integrated view is difficult because
databases will store different types of data, in varying
formats, with different meanings, and reference it
using different names. Subsequently, the construction
of the integrated view must handle the different
mechanisms for storing data (structural conflicts), for
referencing data (naming conflicts), and for attributing
meaning to the data (semantic conflicts). Data is
integrated at both the schema level (the description of
the data) and the data level (individual data instances).
What is the Current Solution?
SQL
and multidatabase query languages:
querying
Research
do
is difficult for numerous databases
systems:
not use standardization so cannot be automated
Industrial
standards:
XML,
BizTalk, E-commerce portals
apply to limited domains/industries
require standard structures and database changes
Current
techniques for database integration have
some of these problems:
require
integrator to understand all databases
integration process is manual
do not hide system complexity from the user
force changes on the existing database systems
Unity
Software
Unity Overview
Unity
is a software package that implements the
integration architecture with a GUI.
Developed using Microsoft Visual C++ 6 and
Microsoft Foundation Classes (MFC).
Unity allows the user to:
construct
and modify standard dictionaries
build X-Specs to describe data sources
integrate X-Specs into an integrated view
transparently query integrated systems using ODBC
and automatically generate SQL transactions
A
screen-shot of Unity editing a standard
dictionary is below.
Unity is available for demonstration and
distribution.
References
Publications:
Unity
- A Database Integration Tool, R. Lawrence and
K. Barker, TRLabs Emerging Technology Bulletin,
January 2000.
Multidatabase Querying by Context, R. Lawrence and
K. Barker, DataSem2000, pages 127-136, Oct. 2000.
Integrating Relational Database Schemas using a
Standardized Dictionary, To appear in SAC’2001 - ACM
Symposium on Applied Computing, March, 2001.
Sponsors:
NSERC,
Further
TRLabs
Information:
http://www.cs.umanitoba.ca/~umlawren/