Integrating Existing Ontologies
Download
Report
Transcript Integrating Existing Ontologies
Copyright © 1997 Pangea Systems, Inc. All rights reserved.
Building Ontologies
Building Ontologies
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
No
field of Ontological Engineering equivalent to
Knowledge or Software Engineering;
No standard methodologies for building ontologies;
Such a methodology would include:
a set of stages that occur when building
ontologies;
guidelines and principles to assist in the
different stages;
an ontology life-cycle which indicates the
relationships among stages.
Gruber's guidelines for constructing ontologies are
well known.
The Development Lifecycle
Two kinds of complementary methodologies emerged:
Stage-based, e.g. TOVE [Uschold96]
Iterative evolving prototypes, e.g. MethOntology
[Gomez Perez94].
Most have TWO stages:
1.
Informal stage
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
2.
Formal stage
ontology is sketched out using either natural language
descriptions or some diagram technique
ontology is encoded in a formal knowledge representation
language, that is machine computable
An ontology should ideally be communicated to people and
unambiguously interpreted by software
the informal representation helps the former
the formal representation helps the latter.
A Provisional Methodology
skeletal methodology and life-cycle for building
ontologies;
Inspired by the software engineering V-process
model;
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
A
The left side
charts the
processes in
building an
ontology
The
The right side
charts the
guidelines, principles
and evaluation used
to ‘quality assure’
the ontology
overall process moves through a life-cycle.
The V-model Methodology
Identify purpose and scope
Knowledge acquisition
Ontology in Use
Evaluation: coverage,
verification, granularity
User Model
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Conceptualisation
Integrating existing
ontologies
Conceptualisation
Principles: commitment,
conciseness, clarity,
extensibility, coherency
Conceptualisation Model
Encoding
Representation
Implementation Model
Encoding/Representation
principles: encoding bias,
consistency, house styles
and standards, reasoning
system exploitation
The ontology building life-cycle
Identify purpose and scope
Knowledge acquisition
Building
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Conceptualisation
Encoding
Evaluation
Integrating
existing
ontologies
Language and
representation
Available
development
tools
User Model: Identify purpose and scope
what applications the ontology will support
EcoCyc: Pathway engineering, qualitative simulation
of metabolism, computer-aided instruction,
reference source
TAMBIS: retrieval across a broad range of
bioinformatics resources
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Decide
The
use to which an ontology is put affects its
content and style
Impacts re-usability of the ontology
User Model: Knowledge Acquisition
biologists; standard text books;
research papers and other ontologies and database
schema.
Motivating scenarios and informal competency
questions – informal questions the ontology must
be able to answer
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Specialist
Evaluation:
Fitness for purpose
Coverage and competency
Conceptualisation Model: Conceptualisation
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Identify the key concepts, their properties and the
relationships that hold between them;
Which ones are essential?
What information will be required by the applications?
Structure domain knowledge into explicit conceptual models.
Identify natural language terms to refer to such concepts,
relations and attributes;
Determine naming conventions
Consistent naming for classes and slots
EcoCyc:
Classes are capitalized, hyphenated, plural
Slot names are uppercase
A quality ontology captures relevant biological distinctions with
high fidelity
Conceptualisation Model: Pitfalls
Pitfall:
Missing ontological elements
Missing classes: Swiss-Prot Protein complexes
Missing attributes: Genetic code identifier
Confuse 1:1 with 1:Many, or 1:Many with
Many:Many
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Cofactor as an attribute of reaction
Important data is stored within text/comment
fields
Pitfall: Extra ontological elements
Pitfall: Stop over-elaborating – when do I stop?
Pitfall: Relevance – do I really need all this detail?
Integrating Existing Ontologies
or adapt existing ontologies when possible
Save time
Correctness
Facilitate interoperation
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Reuse
Integration
of ontologies
Ontologies have to be aligned
Hindered by poor documentation and
argumentation
Hindered by implicit assumptions
Shared generic upper level ontologies should
make integration easier
Encoding: Implementation Toolkit
Construct ontology using an ontology-development system
Does the data model have the right expressivity?
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Is it just a taxonomy or are relationships needed?
Is multiple parentage needed? Inverse relationships?
What types of constraints are needed?
Are reasoning services needed?
What are authoring features of the development tool?
Can ontology be exported to a DBMS schema?
Can ontology be exported to an ontology exchange
language?
Is simultaneous updating by multiple authors needed?
Size limitations of development tool?
Encoding:
Ontology Implementation Pitfalls
Pitfall:
Semantic ambiguity
Multiple ways to encode the same information
Meaning of class definitions unclear
Encoding Bias
Encoding the ontology changes the ontology
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Pitfall:
Encoding:
Ontology Implementation Pitfalls
Pitfall:
Redundancy (lack of normalization)
Exact same information repeated
Presence of computationally derivable
information
More effort required for entry and update
Partial updates lead to inconsistency
OK if redundant information is maintained
automatically
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Date of birth and age
DNA sequence and reverse complement
Encoding: The Interaction Problem
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Task
influences what knowledge is represented and
how its represented
Molecular biology: chemical and physical
properties of proteins
Bioinformatics: accession number, function
gene
Underlying perspectives mean they may not be
reconcilable
If
an ontology has too many conflicting tasks it can
end up compromised – TaO experience
Evaluate it - A guide for reusability
Conciseness
No redundancy
Appropriateness – protein molecules at the
atomic resolution when amino acid level would
do
Clarity
Consistency
Satisfiability – it doesn’t contradict itself
Enzyme is a both a protein which catalyses a
reaction and does not catalyse a reaction
Commitment
Do I have to buy into a load of stuff I don’t
really need or want just to get the bit I do?
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Documentation: Make Ontology
Understandable!
clear informal and formal documentation
An ontology that cannot be understood will not
be reused
Genbank feature table
NCBI ASN.1 definitions
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Produce
There
exists a space of alternative ontology design
decisions
Semantics / Granularity
Terminology
Pitfall:
Neglecting to record design rationale
Publish the Ontology
and informal specifications
Intended domain of application
Design rationale
Limitations
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Formal
See
EcoCyc paper in ISMB-93/Bioinformatics 00
See TAMBIS paper in Bioinformatics 99
Macromolecule Reference Ontology
MacroMolecule
SequenceComponent
Gene
Lipid
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Nucleic Acid
RNA
Protein
Motif
Phosphorylation
site
Peptide Enzyme
Restriction site
mRNA
cDNA
DNA
gDNA
mDNA
componentOf
Discussion
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
What
is a macromolecule?
Where does macromolecule fit into an upper level
ontology?
Substance?
Structure?
Is lipid a macromolecule?
If we replace macromolecule with biopolymer is
the placement of lipid legit?
Is a peptide a protein and therefore a
macromolecule? If not, where does it go?
Taxonomy and Roles
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Do
we want to assert everything in a taxonomy?
Or do we want to define things in terms of their
properties?
Enzyme = Protein catalyses Reaction
gDNA = DNA hasLocation Chromosomal
Sufficiency as well as necessary conditions
Whats the relationship between
cDNA and EST
cDNA and some child of RNA ?
Axioms and constraints
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Not
all RNA is translated to protein
Do we want to say that DNA is translated to
protein?
Do we want to model catalytic RNAs?
Relationships – what other ones do we need?
Genes express proteins
Genes express rRNA, tRNA
Genes are found on gDNA
Genes are found on mDNA
Genes have their own components – recursive
relationships with partitive semantics
Reasoning? Instances?
Reusable? Clear? Concise?
Ontological Pitfalls
– when do I stop over elaborating?
Proteins amino acid residues side chains
physical chemical properties ….
Relevance
Do we need to mention all the types of nucleic
acid?
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Stop-over
EcoCyc
Chemicals
MacroMolecule
Compounds-And-Elements
Nucleic-Acids
Compounds
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Proteins
Lipids
RNA
Misc-RNA
DNA
PolyPeptides
Protein-Complexes
DNA-Segments
Genes
Macromolecule in other Ontologies
Gene Ontology
to add attributes to gene instances in
databases
Doesn’t need to talk about molecules or
components of molecules
Copyright © 1998 Pangea Systems, Inc. All rights reserved.
Used
TAMBIS Ontology
Models
it in a similar way to our reference
macromolecule ontology
Because it asks questions of bioinformatics
sources