Découverte de mappings entre schemas

Download Report

Transcript Découverte de mappings entre schemas

Découverte de mappings entre schemas :
les différentes approches
Schema Matching : Different Approaches
Khalid Saleem
LIRMM
1
Schema and Ontology

Schema represents Database Community


Ontology represents the AI Community


Schemas often do not provide explicit semantics of their data
(ER, XML document schema).
Ontologies are logical systems that themselves obey some
formal semantics. Designed to be interpreted by computers
for reasoning (OWL)
Schemas and Ontologies are similar in the sense that


Both provide a vocabulary of terms that describes a domain
Both constraint the meaning of terms used in vocabulary
(Hierarchy/ relations)
XML
XML
Schema
RDF
RDF
Schema
OWL
2
Schema vs Ontology : examples
<class-def>
<name>branch</name>
<slot-constraint>
<name>is-part-of</name>
<has-value>tree</has-value>
</slot-constraint>
</class-def>
class-def animal
%plants are a class that is disjoint from animals
class-def plant subclass-of NOT animal
%it is necessary but not sufficient for a tree to be a plant:
class-def tree subclass-of plant
%branches are PART OF trees
class-def branch
slot-constraint is-part-of has-value tree
%it is necessary and sufficient for a carnivore to be an animal:
class-def defined carnivore subclass-of animal
slot-constraints eats value-type animal
%herbivores eat only plants OR part of plants
class-def defined herbivore subclass-of animal
slot-constraint eats value-type plant OR
(slot-constraint is-part-of has-value plant)
XML
DAML
+OIL
3
Match

Takes two schemas/ontologies as input and produces
a mapping between elements of the two schemas that
correspond semantically to each other
Books
Source A
price
book-title
Books
Source B
author-name
listed-price title
complex match
1-1 match
26,60 Harry Potter
J. K. Rowling
11,50 Marie Des Intrigues Juliette Benzoni
a-fname a-lname
16,50
24
Nous Les Dieux
Pompei
Bernard Werber
Robert Harris
4
Schema Matching vs Ontology Matching

Schema matching is usually performed with
the help of techniques trying to guess the
meaning encoded in the schemas

Ontology matching try to exploit knowledge
explicitly encoded in the ontologies.`
In real world applications :
Solutions from both domains are mutually beneficial
5
Application Domains

Traditional (Static)





Schema Integration
Data warehousing
E-commerce
Catalogue Integration
New Frontiers (Dynamic)




Semantic Query Processing
Agent Communication
Web Services Integration
P2P Databases
6
Basic Classification of Matchers



Schema vs Data Instance
Element vs Structure
Language vs Constraint


String based : Prefix, Suffix e.g. auth: author
Tokenization, Lemmatization, Eliminition [GSY04]
Tool_Kit :(Tool,Kit), Kits:Kit,


[RB01]
IsRelatedTo : Related
Data Types, Value domain e.g. 1..12 : month
Match Cardinalities - 1:1, 1:n, n:m
(Tel Res, Other) : (Tel Day, Evening, Night)

Auxiliary Information

Global Schema, Dictionaries, Thesauri, Previous Match
Decisions, User Input
7
Basic Classification of Matchers

Structure Level Techniques





[SE05]
Graph Matching
Children
Leaves
Relations
Taxonomy based Techniques
e.g if super concept is same then sub concepts are
same or vice versa

Model Based

ER, XML or XML schema, OWL, OO etc.
Combinational Matchers [RB01]
 Hybrid Matcher
 Multiple/Composite Matcher
8
Match Dimensions
[SE05]
For Match Algorithms designing
We need the knowledge for its utilization i.e. Dimensions

Input of the Algorithm


Characteristics of the Matching Process



Data or Schema, Element level or Structure Level
Require exact or approximate matching
Performance over quality
Output of the Algorithms

Output is a graded result, or part of a set of match
algorithms which are combined together for a map result
9
Existing Matching Tools









Cupid [MBR01]
COMA (COMA++) [ADMR05]
Similarity Flooding
SemInt
Artemis
DIKE
TransScm
AutoMed
Charlie [TBBT04]
Ontologies Specific

NOM/ QOM

OLA

Anchor-PROMPT

S-Match [GSY04]

HICAL

SKAT
10
Matching Tools
continued
Machine Learning
 GLUE (LSD, CGLUE) [DMDH02]
 Automatch

These tools do not completely fulfil the requirements
for large scale schema matching because


Not fully automated
Emphasise less on search space optimisation
11
Our Approach
Motivation :


Large Scale Scenario
Peer-to-peer Information Systems over the XML Web
Our Schema Matching and Integration Approach

Tree Mining Techniques

Name Matcher
a=w
b=o
f=d

Element Level Matching
Search
sub-trees

Structure Level Matching

b
a
p
n
n
b
t
w
f
n
p
n
b
t
i
a
g
t
b
w
d
p
n
r
h
a: author
b: book
d: detail
f: information
g: general
h: birth
i: isbn
n: name
o: own-books
p: publisher
r: price
t: title
w: writer
o
w
f
t
n h p
t
i
n
12
Tree Mining Approach
b
n0 [0,5]
Inspired from the tree mining algorithms
and data structures based on node scope
values (calculated by depth first pre-order
traversal) Top-down [Z02]


a
n1 [1,2]
n2 [2,2]
author
book
p
t
publisher
n
n
name
name
title
n5 [5,5]
n3 [3,4]
n4 [4,4]
Our work extends these data structures for schema matching and
integration process for handling large sets of XML schema trees.
Employs
a)
b)
Element level Name Matcher (same node label or synonym)
 Cluster similar/synonym labels
Utilize the node scope values properties to extract semantics out of
structure
 E.g. node with label name n2[2,2] is a descendent of node with
label author n1[1,2] and not of node with label publisher n3[3,4]
verified using descendent test
Descendent Node Check :
Scope of Node x is [X,Y] and Scope of Descendent Node xd [Xd,Yd]
then
Xd>X and Yd<=Y
13
Tree Mining Approach …

Data Structure used



continued
Label List : Sorted list of all node labels in the forest of XML
schema trees
xGrid : Matrix in which each row represent each participating
XML tree and each column represents the corresponding
node label. Each cell contains the scope values, parent node
number and mapping information.
Output


Creation of a Mediated Schema Tree , from the given forest
of participating XML schema trees.
Generation of Mapping Information between participating
schema trees and the mediated schema tree
14
Tree Mining Approach …
a
1,2,
0,13
f
5,9,
1
3,3,
1,7
g
11,
11,1
h
i
4,4,
2,
8,8,
5
0,5,
-1,1
0,7,
-1,1
S2
S3
d
1,11,
0
Sm
S1
b
0,4,
-1,1
3,6,
0,3
4,6,
0,3
6,6,
3,6
n
n
3,3,
2
o
p
r
7,7,
6
6,7,
5
9,9,
5
2,2,
1,7
4,4,
3,8
3,4,
0,10
5,5,
0,12
2,2,
1,7
5,5,
4,8
4,5,
3,10
7,7,
0,12
1,3,
0,4
6,6,
4,8
2,2,
0,5
S4
continued
1,1,
0,7
5,5,
4,11
3,4,
0,1
t
w
10,
10,1
2,4,
1
R
0,11,
-1,-1
1,2,
0,13
2,2,
1,12
4,4,
3,12
0,4,
-1,13
b
f
g
h
i
n
n
p
r
t
w
<1,0>,
<2,0>,
<3,0>,
<4,3>
<2,3>,
<3,4>
<3,1>
<4,2>
<2,6>
<1,2>,
<2,2>,
<3,3>,
<4,1>
<1,4>,
<2,5>,
<3,6>
<1,3>,
<2,4>
<3,5>
<1,5>,
<2,7>,
<3,2>,
<4,4>
<1,1>,
<2,1>,
<4,0)
Mapping Information is the column number of node
15
Conclusion

Element level Name and Linguistic Matching with the support of
thesaurus is an integral part of every Match system.

With systems moving towards schema/ontology based
manipulation, and lack of global schemas or previous matching
results, Structure Level matching is equally important for making
out the semantics.

Peer-to-peer environment requires new methods to be exploited
for performance and quality mapping i.e. integration of Tree
Mining techniques for matching purposes and search space
optimisation.

Machine Learning algorithms can be beneficial in the P2P
environment in later stages when training examples have been
created from instance data, provided the target domain remains
the same.
16
References












[AH04] Antoniou G., Harmelen F. A Semantic Web Primer, The MIT Press, 2004
[ADMR05] Aumuller D., Do H. H. , Massmann S., and Rahm E. Schema and ontology
matching with COMA++. In Proceedings of the International Conference on
Management of Data (SIG-MOD), 2005
[BR04] Bellahsène Z. and Roantree M. (2004) Querying Distributed Data in a Superpeer based Architecture. DEXA 2004.
[BMP04] Bernstein PA., Melnik S., Petropoulos M. and Quix C. (2004) IndustrialStrength Schema Mapping. SIGMOD Record, Vol. 33, No. 4, December 2004
[DMDH02] Doan AH., Madhavan J., Domingos P. and Halvey A. (2002) Learning to Map
Ontologies on the Semantic Web. WWW 2002
[MBR01] Madhavan J., Bernstein PA. and Rahm E. (2001) Generic Schema Matching
with Cupid. VLDB 2001.
[RB01] Rahm E. and Bernstein PA (2001) A Survey of Approaches to Automatic
Schema Matching. VLDB Journal 2001 : 10(4):334-3503
[SE05] Shvaiko P. and Euzenat J. (2005) A Survey of Schema-based Matching
Approaches. Journal on Data Semantics, 2005.
[TBBT04] Tranier J., Baraer R., Bellahsene Z. and Teisseire M (2004) Where’s Charlie:
Family Based Heuristics for Peer-to-Peer Schema Integration. IDEAS 2004, 227-235
[Z02] Zaki MJ (2002) Efficiently Mining Frequent Trees in a Forest. 8th ACM SIGKDD
Int’l Conf. Knowledge Discovery and Data Mining. July 2002
http://www.w3.org/TR/daml+oil-reference
http://www.doc.ic.ac.uk/automed/
17
Thank you
18