Interoperability in Information Systems

Download Report

Transcript Interoperability in Information Systems

Information Systems: Modelling
Complexity with Categories
Four lectures given by Nick Rossiter
at Universidad de Las Palmas de
Gran Canaria,
15th-19th May 2000, under the
Socrates-Erasmus Programme
Lectures
1. Interoperability in Information Systems
2. Introduction to Category Theory
3. Object Concepts as Categories
4. Handling Heterogeneity with Information
Resource Dictionary System
Lecture 1: Interoperability in
Information Systems
Nick Rossiter, Computing Science,
Newcastle University, England
[email protected]
http://www.cs.ncl.ac.uk/people/b.n.rossiter/
Motivations
• Diversity of modelling techniques
• Distributed businesses may exercise local
autonomy in platforms
• Data warehousing requires heterogeneous
systems to be connected
• Data mining enables new rules to be derived
from heterogeneous collections
Basic Definitions 1
• Distribution: information bases are stored
on multiple computer systems
interconnected by a communication
medium.
• Homogeneous system: one that adheres to
the same software at all sites.
• Heterogeneous system: one that does not
adhere to the same software at all sites.
Basic Definitions 2
• Autonomy: the ability of a site to control its
own activities with respect to one or more
of:
–
–
–
–
design
communication
execution
association
Interoperability 1
• Interoperability:
the ability to request and receive services
between various systems and use their
functionality.
• More than data exchange.
• Implies a close integration.
Interoperability 2
• Features:
exchange of messages and requests
use of each other’s functionality
client-server abilities
distribution
operate multiple systems as single unit
communication despite incompatibilities
extensibility and evolution
Architectures for Interoperability 1
1. Global schema integration
Produces single new schema (C) for the
different information systems with schemas
(A, B).
C
A
B
Global Schema Integration
• Advantages
– Transparent to end users -- appears as single
information system
• Disadvantages
– Difficult -- needs human understanding to
perform integration
– Local autonomy lost
– Static - does not evolve automatically
Architectures for Interoperability2
2. Federated Database Systems
Less tightly coupled schema (than in 1)
Each service through an export schema
specifies sharable objects
Common data model
Internal command language
Decentralised control (local autonomy)
Five-level architecture for federated system
Federated Databases: Looselycoupled
• Created by users
AE,BE are
export
schema
AE
V
V is view
A
A,B are base schemas
BE
B
Federated Databases: TightlyCoupled
• Created by administrators
• Global schema integration on all export
schemas
• More formal than loosely-coupled
• Much effort to resolve semantic
inconsistencies
Federated Database Systems General Advantages
• Preserves local autonomy
• Not all data needs to be integrated
• Provides metadata structures for views
(external and export schema, data
dictionary)
Federated Database Systems Disadvantages by Approach
• Tightly-coupled
– similar to global schema integration
1) complex, difficult to make changes
dynamically
2) much effort in resolving semantic
inconsistencies
• Loosely-coupled
– duplication by different users in building views
– updating data defined in views can be difficult
Multidatabase Language
Approach
• No attempt at schema integration
• Various schema in services provided can be
heterogeneous, inconsistent and duplicate
information in different ways.
• Language (e.g. MSQL) is used to integrate
databases at run time.
• Relational data model used as Common
Data model
Multidatabase Language
Approach - Diagram
MSQL
A,B are schema
MSQL is runtime
language
A
B
Multidatabase Language
Approach - Advantages
• No preparatory work to understand
semantics of schema
• Dynamic -- access latest versions
• Very skilled users can succeed in reaching
their goals
• Interesting work on multidatabase
dependencies
Example Multidatabase
Language
• MSQL (Multidatabase SQL)
– Biased towards relational model
– Illustrates problems
• Consider 2 databases
– Each on publications of a computing society
– And query:
– “What is the name, email, title for each
publication of an author appearing in both of
the society’s databases?”
MSQL - Schema
•
Schema 1 (for AIIA):
– Contacts (PersonID, Name, Email, …)
– Conference (Name, Type, …)
– Attendees(ID, Conf_ID, Speaker, …)
– Publ_Papers(P_ID, Title, Author_ID, …)
• Schema 2 (for IFIP):
–
–
–
–
Member_Socs(Soc_Name, …)
Conf (Conf_ID, …)
Publ_Papers(P_Ref, Title, Conf_Ref, …)
Authors(Name, Email, Paper_ID, …)
Underlined attributes are primary key; attributes in italics are foreign key.
MSQL for Query
USE AIIA, IFIP
SELECT Name, Email, Title
FROM Authors,
IFIP.Publ_Papers IFIP_Paper,
Contacts,
AIIA.Publ_papers AIIA_Paper
WHERE Authors.Name = Contacts.Name
AND Contacts.Person_ID = AIIA_Paper. Author_ID
AND Authors.Paper_ID = IFIP_Paper.P_Ref;
The USE statement declares the multidatabases which are aliased in
the FROM statement to distinguish tables with the same name.
Retrieves Name, Email and Title from both databases.
Potential Problems with MSQL
• Are domains on name comparable?
• Can use LET command to create
equivalencies of names but does not solve
domain mismatch.
• What if one schema not relational? EntityRelationship model often used as neutral
schema for translation and comparison of
heterogeneous features
Multidatabase Language Disadvantages in General
• Distribution is not transparent
• Users must resolve inconsistencies
themselves
• Common language may restrict scope of
heterogeneity (relational bias)
• Local autonomous system may change
schema freely (so that existing queries fail)
Comparison of Approaches
• By coupling:
– how tightly is the interoperable system
connected to its underlying systems
• By adaptability:
– the ability for the interoperable system to
evolve in line with underlying schema
• By transparency:
– the need for the end-user to understand the
underlying schema
Comparison of Approaches
Coupling
Adaptability
Transparency
Global Schema
Integration
Tight
Low
High
Federated
Data Bases
Medium
Medium
Medium
Multidatbase
Languages
Low
High
Low
Approach
Summary
Trend:
• From Global Schema Integration
Federated Database
Multidatabase Language
• of lower coupling, higher adaptability,
and lower transparency.
Further Reading
• Management of Heterogeneous and
Autonomous Database Systems
Elmagarmid, Ahmed
Rusinkiewicz, Marek
Sheth, Amit
Morgan Kaufmann 1999.