Transcript ppt

Ontology Based Information Management
MatchIT 1.1:
Data Integration with Semantic
Mapping Technologies
Michael Schidlowsky
Sr. Software Architect
4 North Park
•
Suite 106
•
Hunt Valley, MD 21030
•
410-584-0009
•
www.revelytix.com
Data Integration
Motivated by:
• Organizational Changes
 Mergers and Acquisitions
 Internal reorganizations (e.g., DHS)
• Data Mining
• Standards Conformance
• Migration Efforts
• Legacy Systems
• Decouple data sources from application code
Data Integration
Challenges for integration specialist include:
• Domain-specific terms
• Unfamiliarity with source schemas
• Large size of schema set
• Semantics often not captured
• Captured semantics
 Stored in ad-hoc formats
 Cannot be reused to facilitate future data integration efforts
Data Integration: Example
Background:
Acme Inc., merges with CompuGlobalHyperMeganet.
Technical Challenge:
Need “Virtual Database” of all sales for all stores in real-time.
• Which fields represent customers?
 CUSTOMERID
 CUST_ID
 SSN
• Which fields represent ‘Price’?
 Sale_Amt
 Total_Sale
• What if your database has 10,000 columns?
Data Integration: Example
Background:
HR needs to use employee information for new company portal.
Technical Challenge:
Data must be in XML and conform to standard HR schema.
• Find all fields related to Address?
 RESIDENCE
 PREV_RESIDENCE
• What if your database has 10,000 columns?
Ideal Matching Solution
• Finds lexical relationships
• Captures semantic information
• Finds semantic relationships
• Provides programmatic access to results (API)
• Fast
• Scalable
• Human Involvement
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
- “PLEASE PRESENT ID”
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
- “PLEASE PRESENT ID”
- NY, NJ, ID
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
- “PLEASE PRESENT ID”
- NY, NJ, ID
- SUPEREGO, EGO, ID
MatchIT 1.1
- MatchIT is a semantic and lexical matching tool.
- Session Outline:
- Import and process schemas
- Perform lexical matching
- Create and manage a semantic vocabulary
- Perform semantic matching
- Demonstrate 3rd Party integration with Data Integration tool
(MetaMatrix)
Import & Process Schemas
Revelytix Models are RDF/OWL
• Flexible model architecture
• Extensible
• Interoperable
Current Importers:
• JDBC
• XML Schema
• MetaMatrix XMI Models
Importer Demo
Lexical Matching
Uses lexical distance measures to determine lexical
similarity.
• Fastest matching technique
• Requires no work other than importing schemas
• Often yields interesting results
Lexical Matching Demo
Create Vocabulary from
Schemas
A Vocabulary is
• A set of symbols
• Occurrences of those symbols in your schemas
• Binding of each symbol to one or more semantic concepts
• Created by MatchIT from schemas using tokenization
algorithms.
• Reusable
Tokenization Algorithms
Different schemas require different tokenization
techniques.
Tokenization algorithms determine how symbols are
extracted from schemas:
• Capitalization
• Delimiters
• English Language
Vocabulary Demo
Matching Techniques
MatchIT currently uses two types of matching
techniques:
• Lexical Matching
 Attempts to determine similarity based on the lexical distance
between them.
• Semantic Matching
 Attempts to determine similarity based on the ontological distance
between them within a semantic knowledge base.
Parts Supplier Schema
(as seen by a person)
Parts Supplier Schema
(as seen by a computer)
Semantic Matching
How semantically similar are two concepts?
vehicle
is a
is a
wheeled vehicle
is a
self-propelled
vehicle
is a
car
is a
aircraft
is a
heavier-than-air
craft
motor vehicle
is a
craft
is a
is a
truck
car and truck are very similar
Car and airplane are less similar
airplane
Semantic Matching
Uses knowledge base distance measures to determine
semantic similarity.
• Presents ranked candidate matches
• Based on semantics captured in Vocabularies
• The only way to effectively find relationships between
lexically dissimilar symbols:
GenderCode
SexCode
Provider
Supplier
Amount
Quantity
Semantic Matching Demo
3rd Party Integration
MatchIT Integration
• MatchIT Java API
• Stand-alone application
• Embeddable application (as Eclipse plug-ins).
• Hides unapproved matches
• Useful for various 3rd Party applications:
- Data Integration
- Data Discovery
- Ontology Mediation
- Search
- Metadata Management
- Data Cleansing
MetaMatrix Demo
Ontology Based Information Management
Questions?
MatchIT 30-day trial available at
http://www.revelytix.com
Michael Schidlowsky
[email protected]
4 North Park
•
Suite 106
•
Hunt Valley, MD 21030
•
410-584-0009
•
www.revelytix.com