Nessun titolo diapositiva

Download Report

Transcript Nessun titolo diapositiva

Methodologies and techniques for the extraction,
the representation and the integration of
structured and semi-structured information
sources
Unità Responsabile: CS-RC
Unità Coinvolte: BO, CS-RC, MI, MO, RM
D2I
Modena, 27 Aprile 2001
Synthesis
•
Aim: developing a framework for uniformly and semiautomatically handling information sources having large sizes and
different formats and structures
•
The proposed framework consists of three steps:
– The representation of involved information sources through a
conceptual model
– The exploitation of the conceptual model for extracting interscheme
properties
– The exploitation of interscheme properties for obtaining an integrated
and uniform representation of involved information sources
Synthesis
•
The framework stores all the necessary information in a Metadata
Repository
•
This contains all information about involved sources, their concepts,
properties existing among concepts, etc.
•
All steps of the proposed framework exploit the Metadata Repository
for taking their inputs and storing their outputs
•
In addition, the Metadata Repository is used by all those applications,
such as Data Warehousing and Data Mining, which exploit the
integrated and uniform representation of involved information sources
which constitute the output of our framework
Synthesis
•
Many implementations of the proposed framework, based on
completely different conceptual models and algorithms, can be
designed
•
In this report we propose three approaches which implement the
general ideas of the framework:
– A graph based approach, which extends, to semi-structured data, the
ideas at the basis of the system DIKE
– An object-oriented based approach, which extends, to semi-structured
data, the ideas underlying the system MOMIS
– A Description Logic based approach
Synthesis
•
Conceptual Models for representing and handling information
sources having different formats and structures:
– Graph based approach:
The SDR-Network
– The object-oriented approach:
The ODLI3 data model
– The Description Logic approach:
The DLR Desciption Logic
Synthesis
Metadata Repository Architectures
•
A Metadata Repository Architecture based on the SDR-Network
– It is composed by
• A metascheme, storing the information about involved sources, their
concepts and interscheme properties among concepts
• A set of meta-operators, for querying and modifying the metascheme
Synthesis
A Metadata Repository Architecture based on the SDR-Network
Synthesis
A Metadata Repository Architecture based on the ODLI3 data model
ID name
Rule
body
Mapping
Integrity
ID
title
(1,1)
Default Complex Simple
Null
And
Or
name I
option name ID
(1,1)
D
(0,n)
al
(1,n) ODLI3 (1,n) has (1,1)
(y/n)
attribute
class
(1,1)
Tag selector
Relationship
(1,n)
belongs-to
has
(1,n)
ID
given/calculated
ODLI3
schema
has
(2,2)
(0,n)
(0,n)
Intra/interscheme
Extensional
(1,n
)
has
value
Affinity
(1,n)
(1,n)
Local
Terminological
union
Name
Global
Global
(1,1)
(1,1)
associated with
strength
Structural
Structural
wna
wsa
(1,1)
Cluster
belongs to
(1,n)
ID
affinity value
nr. of classes
Synthesis
Extraction of interscheme properties
– The graph based approach both extracts and represents interscheme
properties by exploiting the SDR-Network and the related metrics
– The object oriented approach both extracts and represents
interscheme properties by exploiting the ODLI3 data model
– Both of them store extracted interscheme properties in the
corresponding Metadata Repositories
– The Description Logic based approach can suitably represent
interscheme properties derived by other approaches
Synthesis
Integration of involved information sources
– The graph based approach exploits interscheme properties for
carrying out a scheme integration
– The object oriented approach exploits derived interscheme properties
for carrying out a scheme integration
– The Description Logic based approach is able to carry out a data
integration
Open Problems and Future Work
•
While there is a clear convergence about the general structure of
the framework to be adopted for the project, each unit provides its
own perspective into the issues
•
We have three different approaches to the problem lying on the
table, each of which concentrates on certain aspects of the
problem
•
Each approach uses its own formalism and technical grounding
•
As future work, it is necessary to harmonize those approaches to
attain a unique, well structured and detailed framework to support
integration activities
•
This will be possible only in the context of a more general
agreement on the features of the Metadata Repository