Ontology vs. Data Model

Download Report

Transcript Ontology vs. Data Model

Data Model vs. Ontology
Dr. Tatiana Malyuta
Associate Professor, CUNY
Consultant for DoD
Dr. Barry Smith
UB, NCOR
Data Model - Purpose
• To provide a consistent and efficiently functioning data
store for a particular business application(s)
– Represents specific business concepts in a way that determines
organization of data in the store
– Commonly used representations are relational and graph; they
are supported by data management technologies, e.g. relational
– Oracle and MySQL, graph – Neoj4, RDF/OWL stores.
• Efficiency requires
– Application-specific representations
– Store only data needed the application
• Objective (shared) representation of the domain is not
the purpose – multiple data models for the same domain
to accommodate different business applications
Data Silos
• Numerous partial idiosyncratic representations of the
domain in data models and numerous versions of data in
data stores
• No re-usability
• No single version of truth
Accounts
Receivable
Accounts
Payable
Budget
Ontology – Purpose
• Objectivity of representation of reality
• Commonly used representation is graph, it is
supported by RDF-based semantic technologies
• Objective (shared) representation of the domain
- one authoritative ontology for the domain of
reality meant for re-use
• Storing vast volumes of data is not the purpose
Financial Ontology
• A single domain ontology (or a collection of ontologies)
• To be re-used in different applications
• Single version of truth (as we know it today)
Note: we discuss ontologies built in accordance with the
methodology and architecture pioneered by Dr. Smith.
Comparison
• Although there are technologies that support a particular
paradigm in the best way, they are not the defining factor
in distinguishing between a data model and ontology
• We compare not technologies but paradigms
Data Model
Ontology
Person
Name
Skill
Computer
Skill
Network
Skill
Person
Person Name
Network Skill
Programming Skill
First Name
Skill
PersonSkill
Programming
Skill
Java
Skill
First Middle Last Nick
Name Name Name Name
Last Name
Skills
Person Name
Computer Skill
Data Model – Types
• Types are general or repeatable entities capable of being
instantiated by indefinitely many particulars
• Data model types and instances are abstractions embodying
efficient ways of describing the data about reality that is
needed by an application (efficient both for reasoning and for
storage)
– Different abstractions depending on the business need
The data model term
‘person’ is used to
define an efficient
storage solution for
data about persons
needed by a
particular
application
Ontology – Types
• Ontology types and instances are on the side of
reality
• They must provide one term, and one definition, for
each salient type of entity in each domain of interest
The ontology
term ‘person’,
when it is used to
represent data
about persons, is
designed to
establish a link
between these
data and persons
in reality.
Data Model – Organization
• Arbitrary combination of selected types suited for
efficient data processing
• The data model view of reality is flat and rigid
One of the models needs to
be changed to accommodate
multiple skills of a person.
These changes can be
performed only through
significant effort because of
relative rigidity of data
representation languages
and the need to re-arrange
the physical data store
Ontology - Organization
• Each type appears only once in the ontology
hierarchy.
• The ontology view of reality is synoptic – it
represents in non-redundant fashion an entire
hierarchy of types at different levels of generality.
Each term is associated in an intelligible way with
its subsuming and subsumed terms (and thus with
the ancestor and descendant types) in the
hierarchy of more and less general
• Representation is more flexible, changes are easier
to make, and changes are not as disruptive
Questions?
Data Model vs. Ontology –Types and
Individuals
Skill
Computer
Skill
Programming
Skill
Java
C++
Person Name
John
Mary
Skill
Computer Skill
Sewing Skill
Person Name
John
Mary
Skill
Java
C++
Data Model – Labels
• Are not as important because databases are not
directly exposed to users – they are presented
via an application that exposes the database
content using the specific vocabulary of a
narrow community of users
• Can be anything, e.g. ‘PN’, ‘PName’, ‘PersName’,
‘PersonN’, etc. for the person name
• The meaning of the label is often derived from
the context (e.g. Name for the name of the
Person and the name of the Skill in one of the
examples)
Ontology - Labels
• Are exposed to users
• Are nouns and noun phrases from natural
language, and each type has a unique name that
designates the type unambiguously regardless of
the context in which the type might be used, e.g.
PersonName, SkillName
Closed and Open World Assumptions
(impact of technologies)
• Database reasoning is confined to search based on
the closed world assumption. If we do not find
something in the database, then this means that
this something does not exist in the world that is
defined by the database.
• Ontologies are based on the idea that we can never
describe entities in the real world completely. This
means that, from the absence in an ontology of a
particular term ‘A’, we cannot infer that As do not
exist. It means also that ontologies are constructed
in a way which allows easy addition of new types
and relations.
Life Span
• Data models are created in ad hoc ways to
capture targeted selection of features; the data
model usually is not reused, which results in
numerous data silos for a domain
• Ontologies will grow and expand as new
knowledge is gained over time
Summary of Comparison
Dimension of
Traditional Data-Model
Ontologies
Comparison
Closeness to
Variable, application-specific
Reality is always the prime focus
reality
Conceptualization Plain and partial (always at the level of Hierarchical,
simultaneously
of the domain
detail needed for a particular describing the same domain at
implementation)
different levels of detail
Vocabulary
Application-specific, not intended for Application-independent, intended
sharing
to support sharing and reuse
Structures or
Groupings of types to accommodate Taxonomies (type hierarchies)
organization of data access patterns
always used to describe/classify the
types
domain
Combinability
Can rarely be combined; even if If
the
ontology
building
possible this will typically require methodology is followed, then the
significant manual effort
results will be
combinable
automatically
Flexibility
Rigid, changes normally require Flexible, changes can normally be
significant effort
effected very easily.
Semantic Enhancement of Data
Models by Ontology
• Semantic Enhancement (SE) is realized with the help of ontologies that are
used to explicate data models and annotate data instances
– Vocabulary of ontologies used for explications and annotations provides
agile horizontal integration
– Ontologies, by virtue of their nature and organization, provide semantic
enhancement of data
Education
Skill
Technical
Education
ComputerSkill
ProgrammingSkill
SQL
Java
C++
PersonID
Name
Description
111
Java
Programming
222
SQL
Database
18
The Meaning of ‘Enhancement’
• Semantic enhancement/enrichment of data = arm’s
length approach (no change to data) – through simple
explication we associate an entire knowledge system
with a database field
– enables analytics to process data, e.g. about computer skills,
“vertically” along the Skill hierarchy, as well as “horizontally” via
relations between Skill and Education.
– and further… while data in the database does not change, its
analysis can be richer and richer as our understanding of the
reality changes
• For this richness to be leveraged by different
communities, persons, and applications it needs to have
the properties mentioned above and be constructed in
accordance with the principles of the SE (see References)
19
SE and Data Integration
• Traditional integration approaches involve creation of a
new model used in
– A new physical store (data warehouse)
• Expensive, resource- and time-consuming
• Another data store – rigid (potential data silo), interoperable with
other stores
• Querying the data sources via it
–
Fragile
• Both entail loss and or distortion of data and semantics, and provide
only ‘local’ integration (do not lead to interoperability with other
sources)
• SE of a store
– Does not require data reorganization and creation of
another store
– Changes to it are non-intrusive
– Leads to integration of the store with other stores,
enhanced previously or in the future
References
• Barry Smith, et al. IAO-Intel – An Ontology of Information Artifacts in the
Intelligence Domain, STIDS Conference, 2013.
• Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent,
Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A
Shared Semantic Resource for the Intelligence Community, STIDS
Conference, 2012.
•
• Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny
Parent, Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence
Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012.
•
• David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry Smith,
Integration of Intelligence Data through Semantic Enhancement, STIDS
Conference, 2011.
21
Questions?