Data Integration through database federation
Download
Report
Transcript Data Integration through database federation
Overview of Database Federation
and IBM Garlic Project
Presented by Xiaofen He
1
Reference
Data Integration through database
federation, L.M. Haas, E.T.Lin, M.A.
Roth
Towards Heterogeneous Multimedia
Information Systems: The Garlic
Approach, IBM Almaden Research
Center
2
Outline
Approaches to data integration
Database Federation in IBM DB2
IBM Garlic Project
3
Various Approaches to Data
Integration (1)
Application-specific solutions
Application-integration frameworks
Always works
Expensive, fragile and hard to extend
Protection from changes of data source
Do not address data integration issues
Workflow frameworks
Limited support for comparing and manipulating
4
Various Approaches to Data
Integration (2)
Digital libraries
Data warehousing
Meta search engine
No combination of data
Powerful, high-level query language
May not be possible or cost effective, loss of
functionality
Database federation
Virtual data warehouse
Performance tradeoff (query rewrite & cost-based
optimization)
5
Database Federation
Basics of Database Federation
DB2 styles of database federation
Determining the style of database
federation to use
6
Basics of Database Federation
What is ‘database federation’ (DF)
Aka. ‘mediation’
An architecture in which middleware,
consisting of a relational database
management system, provides uniform
access to a number of heterogeneous data
sources
7
Common Mediation
Architecture
Data Source
Wrapper
Mediator
Figure 1. Common Mediator Architecture
8
Goals of IBM DF
Transparency
Support heterogeneity
A high degree of function
Extensibility
Openness
Autonomy of individual data sources
Query optimization
9
DB2 architecture for DF
Figure 2. DB2 architecture for database Federation
10
DB2 Styles of federation
Scalar UDFs:
Federating function
Table UDFs:
Federating data
Wrappers:
Federating function
and data
Figure 3. Different styles of federation
11
Wrapper Architecture
Multi-server integration
Multi-dataset integration and multioperation integration
Optimization
Transactional integration
12
Determining the style of DF
to use
Figure 4. Determine the style of federation to use
13
IBM Garlic Project
Introduction
Overview
Architecture
Repositories and Databases
The Garlic Data Model
Queries in Garlic
Interface and Application
Conclusion
14
Introduction
Need
Goal
Object-Oriented Model
15
Garlic Overview
C++ Application
Query/Browser
Query Services &
Runtime System
Metadata Repository
Repository
Wrapper
Repository
Wrapper
Repository
Wrapper
Repository
Wrapper
Complex Object
Repository
Data Repository
Data Repository
Data Repository
Figure 5. Garlic System Architecture
16
Garlic Overview
Repositories
Repository type
Repository instance
Repository manager
Databases
Global schema
Wrapper schemas (local schemas)
17
Garlic Data Model (1)
ODMG-93 object model
Object identity
Objects and values
Inheritance
Weak identity – unique, not necessarily
immutable
Legacy references
Implementation-constrained reference
18
Garlic Data Model (2)
Extensions
Degree of support for alternative
implementations of interfaces
Type system flexibility - conformity
Object-appropriate view definition facility
Object-Centered Views
Enhance objects by adding or hiding some
of their attributes/methods.
19
Queries in Garlic
Query language
Query Processing
Object-oriented extension of SQL
Integrating approximate match query semantics
with traditional exact match query semantics.
Decomposition
Interesting Question
How to characterize the query power of a
repository, in terms of the language subset that its
wrapper is capable of processing directly
20
Interfaces and Applications
C++ API
Compiled applications
Dynamic applications
Query/Browser
A dynamic application
Moving back and forth between querying
and browsing activities
21
Summary
Database Federation
A powerful tool for integrating data
Future work
to improve the ease of use
Enhance the performance
Garlic Project
New research in many dimensions
22