Automatic Database Integration

Download Report

Transcript Automatic Database Integration

Automatic Data
Ramon Lawrence
University of Manitoba
[email protected]
base Integration
Ken Barker
University of Calgary
[email protected]
Summary
 Database
integration allows systems within a
company or on the WWW to interoperate.
 Current integration solutions require manual
coding which is costly and time-consuming.
 Our integration architecture uses XML and a
standard dictionary to automate integration.
 Database systems are integrated using 3
processes: capture, integration, and query.
 The architecture is implemented in a software
package called Unity.
 Unity is capable of automatic database
integration and querying using ODBC.
 Applications include data warehouse design,
web site integration, and systems integration.
Integration Architecture
Client
Client
Multidatabase Layer
• user’s view of integration
2) X-Spec Editor
Integrated Context View
X-Spec
Editor
Standard
Dictionary
Architecture Components:
1) Integrated Context View
• stores schema & metadata
• uses XML
Integration
Algorithm
3) Standard Dictionary
• terms to express semantics
4) Integration Algorithm
Query Processor and ODBC Manager
5) Query Processor
Subtransactions
X-Spec
X-Spec
Database
Database
Local Transactions
• combines X-Specs into
integrated context view
• accepts query on view
• determines data source
mappings and joins
• executes queries and
formats results
Integration Processes
 The
integration architecture consists of three
separate processes:
 Capture
process: independently extracts database
schema information and metadata into a XML
document called a X-Spec.
 Integration
process: combines X-Specs into a
structurally-neutral hierarchy of database concepts
called an integrated context view.
 Query
process: allows the user to formulate queries on
the integrated view that are mapped by the query
processor to structural queries (SQL) and the results
are integrated and formatted.
Architecture Components
 The
architecture consists of four components:
A
standard dictionary (SD) to capture data semantics
 SD terms are used to build semantic names describing
semantics of schema elements.
 X-Specs
for storing data semantics.
 relational database info. stored and transmitted using XML
 stores semantic names to describe schema elements
 Integration
Algorithm
 Identical concepts in different databases are identified by
similar semantic names.
 Produces an integrated view of all database concepts.
 Query
Processor
 allows the user to formulate queries on the view
 translates from semantic names in integrated view to SQL
queries and integrates and formats results


involves determining correct field and table mappings
and discovery of join conditions and join paths
Architecture Discussion
 The
architecture automatically integrates
relational schemas into a multidatabase
 Desirable properties:
 individual
mappings - information sources integrated
one-at-a-time and independently
 integrated view constructed for query transparency user queries system by semantics instead of structure
 handles schema conflicts - including semantic,
structural, and naming conflicts
 automated integration - integrated view constructed
efficiently and automatically
 no wrapper or mediator software is required
 transparent querying - users issue semantic queries
which are translated to SQL by the query processor
 no changes are performed on underlying databases
Conclusions
 Automatic
database integration is possible by
using a standard term dictionary and defining
semantic names for schema elements.
 Integration of data sources has applications to
the WWW and construction of data warehouses.
 Users are able to transparently query integrated
systems by concept instead of structure.
 Future Work:
 Test
the system in large industrial projects.
 Improve the Unity implementation and develop an
integration component for a web browser.
 Increase the efficiency of the query processor.
 Allow distributed updates and global updates on all
databases.
Background &
Motivation
Why is Integration Required?
 There
are many integration environments:
 operational
systems within an organization
 system integration during company merger
 data warehouses, Intranets, and the WWW
 Users
require information from many data
sources which often do not work together.
 Companies require a global view of their entire
operations which may be present in numerous
operational databases for different departments
and distributed geographically.
 Business-to-Business (B-to-B) and Business-toConsumer (B-to-C) E-commerce demand
integration of web databases with production
systems for efficiency and productivity.
What is Integration?
 Database
integration requires conflict resolution
during schema integration.
 A schema is a description of the data
organization and format in a database.
 Schema integration combines database schemas
into a unified view of the data.
 Constructing
an integrated view is difficult because
databases will store different types of data, in varying
formats, with different meanings, and reference it
using different names. Subsequently, the construction
of the integrated view must handle the different
mechanisms for storing data (structural conflicts), for
referencing data (naming conflicts), and for attributing
meaning to the data (semantic conflicts). Data is
integrated at both the schema level (the description of
the data) and the data level (individual data instances).
What is the Current Solution?
 SQL
and multidatabase query languages:
 querying
 Research
 do
is difficult for numerous databases
systems:
not use standardization so cannot be automated
 Industrial
standards:
 XML,
BizTalk, E-commerce portals
 apply to limited domains/industries
 require standard structures and database changes
 Current
techniques for database integration have
some of these problems:
 require
integrator to understand all databases
 integration process is manual
 do not hide system complexity from the user
 force changes on the existing database systems
Unity
Software
Unity Overview
 Unity
is a software package that implements the
integration architecture with a GUI.
 Developed using Microsoft Visual C++ 6 and
Microsoft Foundation Classes (MFC).
 Unity allows the user to:
 construct
and modify standard dictionaries
 build X-Specs to describe data sources
 integrate X-Specs into an integrated view
 transparently query integrated systems using ODBC
and automatically generate SQL transactions
A
screen-shot of Unity editing a standard
dictionary is below.
 Unity is available for demonstration and
distribution.
References
 Publications:
 Unity
- A Database Integration Tool, R. Lawrence and
K. Barker, TRLabs Emerging Technology Bulletin,
January 2000.
 Multidatabase Querying by Context, R. Lawrence and
K. Barker, DataSem2000, pages 127-136, Oct. 2000.
 Integrating Relational Database Schemas using a
Standardized Dictionary, To appear in SAC’2001 - ACM
Symposium on Applied Computing, March, 2001.
 Sponsors:
 NSERC,
 Further
TRLabs
Information:
 http://www.cs.umanitoba.ca/~umlawren/