Data Integration through database federation

Download Report

Transcript Data Integration through database federation

Overview of Database Federation
and IBM Garlic Project
Presented by Xiaofen He
1
Reference

Data Integration through database
federation, L.M. Haas, E.T.Lin, M.A.
Roth

Towards Heterogeneous Multimedia
Information Systems: The Garlic
Approach, IBM Almaden Research
Center
2
Outline



Approaches to data integration
Database Federation in IBM DB2
IBM Garlic Project
3
Various Approaches to Data
Integration (1)

Application-specific solutions



Application-integration frameworks



Always works
Expensive, fragile and hard to extend
Protection from changes of data source
Do not address data integration issues
Workflow frameworks

Limited support for comparing and manipulating
4
Various Approaches to Data
Integration (2)

Digital libraries



Data warehousing



Meta search engine
No combination of data
Powerful, high-level query language
May not be possible or cost effective, loss of
functionality
Database federation


Virtual data warehouse
Performance tradeoff (query rewrite & cost-based
optimization)
5
Database Federation



Basics of Database Federation
DB2 styles of database federation
Determining the style of database
federation to use
6
Basics of Database Federation

What is ‘database federation’ (DF)


Aka. ‘mediation’
An architecture in which middleware,
consisting of a relational database
management system, provides uniform
access to a number of heterogeneous data
sources
7
Common Mediation
Architecture



Data Source
Wrapper
Mediator
Figure 1. Common Mediator Architecture
8
Goals of IBM DF







Transparency
Support heterogeneity
A high degree of function
Extensibility
Openness
Autonomy of individual data sources
Query optimization
9
DB2 architecture for DF
Figure 2. DB2 architecture for database Federation
10
DB2 Styles of federation



Scalar UDFs:
Federating function
Table UDFs:
Federating data
Wrappers:
Federating function
and data
Figure 3. Different styles of federation
11
Wrapper Architecture




Multi-server integration
Multi-dataset integration and multioperation integration
Optimization
Transactional integration
12
Determining the style of DF
to use
Figure 4. Determine the style of federation to use
13
IBM Garlic Project


Introduction
Overview






Architecture
Repositories and Databases
The Garlic Data Model
Queries in Garlic
Interface and Application
Conclusion
14
Introduction



Need
Goal
Object-Oriented Model
15
Garlic Overview
C++ Application
Query/Browser
Query Services &
Runtime System
Metadata Repository
Repository
Wrapper
Repository
Wrapper
Repository
Wrapper
Repository
Wrapper
Complex Object
Repository
Data Repository
Data Repository
Data Repository
Figure 5. Garlic System Architecture
16
Garlic Overview

Repositories




Repository type
Repository instance
Repository manager
Databases


Global schema
Wrapper schemas (local schemas)
17
Garlic Data Model (1)

ODMG-93 object model



Object identity


Objects and values
Inheritance
Weak identity – unique, not necessarily
immutable
Legacy references

Implementation-constrained reference
18
Garlic Data Model (2)

Extensions




Degree of support for alternative
implementations of interfaces
Type system flexibility - conformity
Object-appropriate view definition facility
Object-Centered Views

Enhance objects by adding or hiding some
of their attributes/methods.
19
Queries in Garlic

Query language



Query Processing


Object-oriented extension of SQL
Integrating approximate match query semantics
with traditional exact match query semantics.
Decomposition
Interesting Question

How to characterize the query power of a
repository, in terms of the language subset that its
wrapper is capable of processing directly
20
Interfaces and Applications

C++ API



Compiled applications
Dynamic applications
Query/Browser


A dynamic application
Moving back and forth between querying
and browsing activities
21
Summary

Database Federation


A powerful tool for integrating data
Future work



to improve the ease of use
Enhance the performance
Garlic Project

New research in many dimensions
22