Ncbo-anatomy-2006-intro

Download Report

Transcript Ncbo-anatomy-2006-intro

Application of OBO Foundry
Principles in GO
Chris Mungall
Lawrence Berkeley Labs
NCBO
GO Consortium
The GO is 3 ontologies
• Molecular function (MF)
• Biological Process (BP)
• Cellular component (CC)
The GO is 3 orthogonal
ontologies
• Molecular function
– (a kind of dependent continuant)
• Biological Process
– (a kind of occurrent)
• Cellular component
– (a kind of independent continuant)
The GO is 3 orthogonal
ontologies of canonical
biology
• Molecular function
– (a kind of dependent continuant)
• Biological Process
oncogenesis
X
– (a kind of occurrent)
• Cellular component
– (a kind of independent continuant)
acquisition of nutrients
from host
yes
fin regeneration
yes
The GO is 3 orthogonal
canonical species-neutral
ontologies
• Molecular function
– many core functions
• Biological Process
– core shared processes (e.g. transcription)
– processes specific to organism types (e.g.
fin development, fly courtship behaviour)
• Cellular component
– prokaryotes and eukaryotes
part of
the
part_of
tree
in GO
Granularity of the GO
Function
Process
Continuant
GO-BP
GO-CC
(Sub?)Cellul
ar
GO-BP
GO-CC
Organismal
GO-BP
Molecular
GO-MF
The GO is an ontology, with
rich terminological features
• GO ‘terms’ are actually representations of
types (aka kinds, universals, classes)
– The actual terms (i.e. the phrases used by
biologists) are attached to the representations of
types as names and synonyms
• synonyms have linguistic relations to the GO types
– exact, broad, narrow, related
• distinct from ontological relations between GO types
– is_a, part_of
• GO is moving to genus-differentia style
definitions
– Many definitions are still dictionary-style,
terminological
Genus differentia definitions
central nervous system morphogenesis
Genus: morphogenesis
Differentia: has_outcome central nervous system
The process by which the anatomical structure of the
central nervous system is generated and organized
The GO is both reference and
application ontology
• The same artefact (i.e. file) is used for
both ontology editing and data
annotation
• This has worked reasonably well until
now
• We may encourage making a distinction
– application views (aka GO slims)
• currently only used to present a very small
subset of GO
• consider wider use for extracting most of
Relations in GO (current)
• part_of
– conforms to RO
• X part_of Y: all Xs are part_of some Y (for the entirety of
the duration of the existence of the X)
– e.g. nucleus part_of cell (all nuclei are always part_of a
cell, not all cells have a nucleus as part)
• for both continuants and processes
– no ordering for processes
• is_a
– sort-of conforms to RO
• X is_a Y: all Xs are Ys (for the entirety of Xs existence)
• but there are issues with is_a in GO:
– is_a incompleteness
– is_a polyhierarchies
is a has issues
• Not all GO types have is_a parents
– not a problem in MF
– fixed in CC (July 2006)
– being fixed in BP (Sept 2006: right now, here in
Seattle)
• Still a contentious issue?
– is_a completion requires new high-level types in
ontology
– perceived as being too abstract by biologists
– simple solution: application ontology
• remove high level terms in annotation view
is_a polyhierarchies
• is_a diamonds cause problems
– tangled DAGs, easy to make mistakes
• Source of problems typically due to multiple
axes of classification
– e.g. due to composite terms
• Solution:
– Genus - differentia (aristotelian) definitions
• aka cross-products [Hill et al]
– Always a single genus
• choose consistent axis of classification
– Allow classifier/reasoner to provide different views
of ontology
problem:
mixes (at least)
two axes of
classification
a tangled hierarchy
in GO
biosynthesis
is_a
metabolism
cysteine
is_a
serine family amino acid
is_a
amino acid
is_a
amine
cysteine
is_a
serine family amino acid
is_a
amino acid
is_a
serine
The solution: separate the
axes
serine family
amino acid
(ChEBI)
metabolism (GO)
biosynthesis
(GO)
computable
genus-differentia
definition
cysteine (ChEBI)
cysteine biosynthesis (GO)
Genus: biosynthesis
Differentia: has_outcome cysteine
Compute the subsumption
DAG from the definition
cysteine metabolism
(GO)
serine family amino acid
biosynthesis (GO)
cysteine biosynthesis (GO)
Genus: biosynthesis
Differentia: has_outcome cysteine
the DAG is
required
for applications
such as annotatio
search
Pre- and post- composition
• References to types can be pre-composed in
ontology, prior to annotation
– Ontology editor creates term, with ID
– Use reasoners to classify the DAG automatically
• References to types can be post-composed
(created on the fly) at annotation time
– No term with ID is created
• Computationally, it makes no difference
– provided we adhere to the genus-differentia
formalism
OBO Foundry practices and
pre-composition
• Pre-composition of terms in the
ontology is good as it creates a map of
biological reality, linking foundry
ontologies
– within reason
Examples
• GO Biological Process x OBO Cell
– neuron migration
– cone cell fate specification
– T cell homeostasis
– erythrocte degranulation
• OBO Cell is species-neutral
Current status
• The ability to effectively created
computationally visible genus-differentia
definitions is new to most OBO ontologies
• Soon to be created:
–
–
–
–
SO (many terms now done)
GO-BP definitions referencing OBO-Cell
OBO Disease definitions referencing FMA
And more…
• Difficult:
– GO-BP and ChEBI (chemical entities)
– GO-BP and anatomy (we need CARO!)
development in GO (current)
neural plate morphogenesis
neural plate development
neural plate formation
neural tube development
neural tube formation
GO
part_of
(is_a not shown)
development in GO (future)
neural plate morphogenesis
presumptive spinal cord
neural plate development
neural plate
neural plate formation
neural keel
neural rod
neural tube development
neural tube
neural tube formation
spinal cord
AO
transformation_of
GO
has_participant
part_of
neural tube formation
Genus: tube formation
Differentia: has_outcome neural tube