Bio-Ontologies: Bio-Ontologies: Their Creation and Design

Download Report

Transcript Bio-Ontologies: Bio-Ontologies: Their Creation and Design

Building and Using Ontologies
Robert Stevens
Department of Computer Science
University of Manchester
Manchester UK
http://img.cs.man.ac.uk/stevens
1
Introduction
•
•
•
•
•
•
The nature of bioinformatics resources
What is knowledge?
What is an ontology?
What are the uses of ontologies?
Components of an ontology
Building an ontology (in brief)
http://img.cs.man.ac.uk/stevens
2
The Nature of Bioinformatics
Resources
• Over 500 databanks and analysis tools that work over
resources
• Repositories of knowledge and data and generation of
new knowledge
• Knowledge often held as free text; some use made of
controlled vocabularies
• Enormous amount of semantic heterogeneity and poor
query facilities
• Knowledge about services not always apparent
http://img.cs.man.ac.uk/stevens
3
What is Knowledge?
• Knowledge – all information
and an understanding to
carry out tasks and to infer
new information
• Information -- data equipped
with meaning
Pat Baker is a
Manchester
bioinformatician
who drinks beer.
Patricia Grace
Kennedy said
mine is a pint
name noun verb
• Data -- un-interpreted
signals that reach our
senses
http://img.cs.man.ac.uk/stevens
Protein that acts as
a tyrosine kinase in
the liver of primates.
…CEKENN…
Single letter amino
acid codes
C – cysteine
K - lysine
PATRICIAGRACEKENNEDY
SAIDMINEISAPINT 4
Capturing Knowledge
• Capturing knowledge for both humans an computer
applications
• A set of vocabulary definitions that capture a
community’s knowledge of a domain
• `An ontology may take a variety of forms, but
necessarily it will include a vocabulary of terms, and
some specification of their meaning. This includes
definitions and an indication of how concepts are interrelated which collectively impose a structure on the
domain and constrain the possible interpretations of
terms.'
http://img.cs.man.ac.uk/stevens
5
What Does an Ontology Do?
• Captures knowledge
• Creates a shared understanding – between
humans and for computers
• Makes knowledge machine processable
• Makes meaning explicit – by definition and
context
http://img.cs.man.ac.uk/stevens
6
What is an Ontology?
Thesauri
“narrower
term”
relation
Catalog/
ID
Terms/
glossary
http://img.cs.man.ac.uk/stevens
Formal
is-a
Informal
is-a
Frames
(properties)
Formal
instance
General
Logical
constraints
Value Restrs. Disjointness,
Inverse, partof…
7
Roles of Ontologies in
Bioinformatics
• We can divide ontology use into three types:
• Domain-oriented, which are either domain specific (e.g.
E. coli) or domain generalisations (e.g. gene function or
ribosomes);
• Task-oriented, which are either task specific (e.g.
annotation analysis) or task generalisations (e.g.
problem solving);
• Generic, which capture common high level concepts,
such as Physical, Abstract and Substance. Important in
ontology management and language applications.
http://img.cs.man.ac.uk/stevens
8
Uses of Ontology
• Community reference -- neutral authoring.
• Either defining database schema or defining a common
vocabulary for database annotation -- ontology as
specification.
• Providing common access to information. Ontologybased search by forming queries over databases.
• Understanding database annotation and technical
literature.
• Guiding and interpreting analyses and hypothesis
generation
http://img.cs.man.ac.uk/stevens
9
Components of an Ontology
• Concepts: Class of individuals – The concept
Protein and the individual `human cytochrome C’
• Relationships between concepts
• Is a kind of relationship forms a taxonomy
• Other relationships give further structure – is a
part of
• Axioms – Disjointness, covering, equivalence,…
http://img.cs.man.ac.uk/stevens
10
Knowledge Representation
• Ontology are best delivered in some computable
representation
• Variety of choices with different:
– Expressiveness
• The range of constructs that can be used to formally,
flexibly, explicitly and accurately describe the ontology
– Ease of use
– Computational complexity
• Is the language computable in real time?
Rigour -- Satisfiability and consistency of the
representation
• Systematic enforcement mechanisms
– Unambiguous, clear and well defined semantics
http://img.cs.man.ac.uk/stevens
11
Languages
• Vocabularies using natural language
– Hand crafted, flexible but difficult to evolve, maintain and
keep consistent, with weak semantics
– Gene Ontology
• Object-based KR: frames
– Extensively used, good structuring, intuitive. Semantics
defined by OKBC standard
– EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua)
• Logic-based: Description Logics
– Very expressive, model is a set of theories, well defined
semantics
– Automatic derived classification taxonomies
http://img.cs.man.ac.uk/stevens
– Concepts are defined and primitive
12
Building Ontologies
• No field of Ontological Engineering equivalent to
Knowledge or Software Engineering;
• No standard methodologies for building ontologies;
• Such a methodology would include:
– a set of stages that occur when building ontologies;
– guidelines and principles to assist in the different stages;
– an ontology life-cycle which indicates the relationships
among stages.
http://img.cs.man.ac.uk/stevens
13
The Development Lifecycle
• Two kinds of complementary methodologies emerged:
– Stage-based, e.g. TOVE [Uschold96]
– Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94].
• Most have TWO stages:
1. Informal stage
•
ontology is sketched out using either natural language descriptions or some
diagram technique
2. Formal stage
•
ontology is encoded in a formal knowledge representation language, that is
machine computable
– the informal representation helps the former
– the formal representation helps the latter.
http://img.cs.man.ac.uk/stevens
14
A Provisional Methodology
• A skeletal methodology and life-cycle for building
ontologies;
• Inspired by the software engineering V-process model;
The left side
charts the
processes in
building an
ontology
The right side charts the
guidelines, principles and
evaluation used to ‘quality
assure’ the ontology
• The overall process moves through a life-cycle.
http://img.cs.man.ac.uk/stevens
15
The V-model Methodology
Identify purpose and scope
Knowledge acquisition
Ontology in Use
Evaluation: coverage,
verification, granularity
User Model
Conceptualisation
Integrating existing
ontologies
Conceptualisation
Principles: commitment,
conciseness, clarity,
extensibility, coherency
Conceptualisation Model
Encoding
Representation
Implementation
http://img.cs.man.ac.uk/stevens
Model
Encoding/Representation
principles: encoding bias,
consistency, house styles
and standards, reasoning
system exploitation
16
The ontology building lifecycle
Identify purpose and scope
Knowledge acquisition
Building
Conceptualisation
Encoding
Integrating
existing
ontologies
Language and
representation
Available
development
tools
Evaluation
http://img.cs.man.ac.uk/stevens
17
Starting Concept List
• Chemicals – atom, ion, molecule, compound, element;
• Molecular-compound, ionic-compound, ionic-molecularcompound, …;
• Ionic-macromolecular-compound and ionic-smallmacromolecular-compound;
• Protein, peptide, polyprotein, enzyme, holoprotein,
apoprotein,…
• Nucleic acid – DNA, RNA, tRNA, mRna, snRNA, …
http://img.cs.man.ac.uk/stevens
18
Conceptualisation Sketch
Chemical
Molecule
Molecular
Compound
Compound
Ionic
Compound
Ionic Molecular
Compound
http://img.cs.man.ac.uk/stevens
Element
Molecular
Element
Ion
Ionic
Molecule
Atom
Non-Metal
Metal
Metaloid
19
Molecule Conceptualisation
Sketch
Ionic Macromolecular
Compound
Macromolecule
Polysaccharide
Starch
Glycogen
Protein
Enzyme
Nucleic
Acid
DNA
snRNA
http://img.cs.man.ac.uk/stevens
Small
Molecule
Peptide
RNA
mRNA
tRNA
rRNA
20
Initial Encoding
class-def chemical
subclass-of substance
class-def molecule
subclass-of chemical
class-def compound
subclass-of chemical
class-def molecular-compound
subclass-of molecule and compound
http://img.cs.man.ac.uk/stevens
21
Molecules Revisited
Non-Ionic Macromolecular
Compound
Ionic Macromolecular
Compound
Macromolecule
Polysaccharide
Starch
Glycogen
Protein
Enzyme
Nucleic
Acid
DNA
snRNA
http://img.cs.man.ac.uk/stevens
Small
Molecule
Peptide
RNA
mRNA
tRNA
rRNA
22
More Encoding
class-def chemical
subclass-of substance
class-def defined molecule
subclass-of chemical
Slot-constraint contains-bond min-cardinality 1 has-value covalent-bond
class-def defined compound
subclass-of chemical
Slot-constraint has-atom-types greater-than 1
class-def defined molecular-compound
http://img.cs.man.ac.uk/stevens
subclass-of molecule and compound
23
Expansion
•
•
•
•
•
Sketch and encode in cycles
Build a taxonomy of a small portion
Then build links to other portions
Add more detail
Document sources, author, date and
argumentation.
http://img.cs.man.ac.uk/stevens
24
Summary
• An ontology captures knowledge for a shared
understanding
• The important question is not whether an artefact is an
ontology, but whether it does any good
• Making our understanding of domain explicit, consistent
and processable
• Bioinformatics resources are knowledge resources –
needs to be both human and machine understandable
http://img.cs.man.ac.uk/stevens
25