You want an ontology - Gene Ontology Consortium

Download Report

Transcript You want an ontology - Gene Ontology Consortium

Principles for Building
Biomedical Ontologies
Talk delivered by Jennifer Clark, GO Editorial Office
Clark et al., 2005
is_a
part_of
Clark et al., 2005
Slides and content by:
Barry Smith
http://ifomis.de
•formal ontology
•information science
•special reference to
the bio-medical domain.
Rama Balakrishnan, David Hill,
Jennifer Clark.
http://www.geneontology.org
The Rules
1.
Univocity
2.
Positivity
3.
Objectivity
4.
Single Inheritance
5.
Intelligibility of Definitions
6.
Basis in Reality
classes
GO terms, types, kinds, universals
instances
annotated gene product attributes,
tokens, individuals, particulars
1 Univocity:
Terms should have the same
meanings on every occasion
of use
The Challenge of Univocity: People use the
same words to describe different things
= bud initiation
= bud initiation
= bud initiation
Bud initiation? How is a
computer to distinguish?
Univocity: GO adds “sensu” descriptors to
discriminate among organisms
= bud initiation
sensu Metazoa
= bud initiation
sensu Saccharomyces
= bud initiation
sensu Viridiplantae
The Challenge of Univocity:
People call the same thing by different names
Tactition
Taction
?
Tactile sense
Univocity: GO uses 1 term and many
characterized synonyms
Tactition
Taction
Tactile sense
perception of touch ; GO:0050975
Univocity in part_of relation
 ‘is at times part of’
 antlers part_of red deer
 ‘necessarily is_part’
 Seed dormancy part_of seed development
 ‘necessarily has_part’
 Plant embryo part_of seed
2 Positivity:
The complements of classes
are not themselves classes.
Vertebrates
http://www.cucco.org/CatPictures/Cat%20Nap.jpg
non-vertebrates
Vertebrates
http://www.cucco.org/CatPictures/Cat%20Nap.jpg
non-vertebrates
Vertebrates
http://www.cucco.org/CatPictures/Cat%20Nap.jpg
http://www.digibarn.com/collections/systems/canon-cat/Image53.jpg
non-vertebrates
Vertebrates
http://www.cucco.org/CatPictures/Cat%20Nap.jpg
http://www.digibarn.com/collections/systems/canon-cat/Image53.jpg
Set of
all things
Vertebrates
http://www.cucco.org/CatPictures/Cat%20Nap.jpg
Set of
all
organisms
Invertebrates
Vertebrates
http://www.cucco.org/CatPictures/Cat%20Nap.jpg
Set of
all
organisms
Invertebrates
Vertebrates
http://www.cucco.org/CatPictures/Cat%20Nap.jpg
http://www.artalyst.com/files/userimages/user70/21088058-O.preview.jpg
Set of
all
organisms
membrane-bound organelle
GO:0043227
Non-membrane
bound organelle
V.
Not a membrane
bound organelle
Non-membrane bound organelles
A centrosome is not a membrane bound organelle,
but it still may be considered an organelle.
3 Objectivity:
The existence of classes
is not dependent
on our biological knowledge.
‘unlocalised’
‘unknown’
‘unclassified’
http://news.bbc.co.uk/1/hi/sci/tech/4501152.stm
do not designate biological
natural kinds.
Task:
Annotate
molecular function
of 10-4,
a gene from Drosophila melanogaster
Molecular function ontology
Annotations
molecular function
is_a
molecular function
unknown
10-4
Molecular function ontology
molecular function
is_a
molecular function
unknown
Annotations
10-4
Molecular function ontology
molecular function
is_a
molecular function
unknown
Annotations
10-4
4 Single Inheritance:
No class in a classification
hierarchy should have more
than one is_a parent on the
immediate higher level
Clark et al., 2005
is_a
part_of
Rule of Single Inheritance
 no diamonds:
C
is_a2
B
is_a1
A
Problems with multiple inheritance
B
C
is_a1
is_a2
A
‘is_a’ no longer univocal
(univocal: having only one meaning)
Is_a diamond in GO Process
behavior
locomotory behavior
is_a
larval behavior
larval locomotory behavior
behavior
behavior of
a thing
descriptive
behavior
is_a
locomotory behavior
larval behavior
larval locomotory behavior
Is_a diamond in GO Process
behavior
locomotory
behavior
larval
behavior
is_a2
is_a1
larval
locomotory
behavior
5 Intelligibility of
Definitions:
The terms used in a definition
should be simpler
than the term to be defined
cellular process
Is_a
cell differentiation
part_of
cell fate
Specification
cell
development
cell differentiation
is_a
osteoblast
differentiation
neuron
differentiation
adipocyte
differentiation
keratinocyte
differentiation
garland cell
differentiation
‘X cell differentiation’
Essence = Genus + Differentiae
Genus: differentiation
Differentiae: a neuron (or x cell)
X cell differentiation
X cell differentiation
Differentiation of an x cell.
X cell differentiation
The process whereby
a relatively unspecialized cell
acquires specialized features
of an x cell.
[List characteristics of x cell.]
Process ontology
Cell Ontology
 cone cell fate commitment
 retinal_cone_cell
 keratinocyte differentiation
 keratinocyte
 adipocyte differentiation
 fat_cell
 dendritic cell activation
 dendritic_cell
[Term]
id: GO:0030182
name: neuron differentiation
namespace: biological_process
def: "The process whereby a relatively unspecialized cell
acquires specialized features of a neuron." [GO:mah]
is_a: GO:0030154 ! cell differentiation
relationship: part_of GO:0048699 ! neurogenesis
[Term]
id: CL:0000540
name: neuron
def: "The basic cellular unit of nervous tissue. Each neuron
consists of a body\, an axon\, and dendrites. Their purpose
is to receive\, conduct\, and transmit impulses in the
nervous system." [MESH:A.08.663]
xref_analog: FBbt:00005106
xref_analog: FBbt:00005146
is_a: CL:0000393 ! electrically responsive cell
is_a: CL:0000404 ! electrically signaling cell
relationship: develops_from CL:0000031 ! neuroblast
[Term]
id: GO:0030182
name: neuron differentiation
namespace: biological_process
def: "The process whereby a relatively unspecialized cell
acquires specialized features of a neuron. The basic cellular
unit of nervous tissue. Each neuron consists of a body\, an
axon\, and dendrites. Their purpose is to receive\, conduct\,
and transmit impulses in the nervous system."
[MESH:A.08.663, GO:mah]
is_a: GO:0030154 ! cell differentiation
intersection_of: is_a GO:0030154 ! cell differentiation
intersection_of: has_participant CL:0000540 ! neuron
Other Ontologies that can be
aligned with GO
 Chemical ontologies
 3,4-dihydroxy-2-butanone-4-phosphate synthase activity
 Anatomy ontologies
 metanephros development
But Eventually…
Building Ontology
Improve
Collaborate
and Learn
6 Basis in Reality:
When building or maintaining an
ontology, always think carefully
about how classes relate to
instances in reality
Catwoman
strength,
speed,
agility
and ultra-keen senses of a cat.
http://home.austarnet.com.au/davekimble/catwoman.jpg
superman
strength
flight
x-ray vision
leaps over tall buildings in a single bound
http://www.uncleodiescollectibles.com/doesnotcompute/2004-10-11/Actor%20Christoper%20Reeve.jpg
Ontology
cartoon character super power ontology
super senses
super physical powers
is_a
x-ray
vision
Annotations
cat
senses
super
leaping
super
strength
Ontology
cartoon character super power ontology
super senses
super physical powers
is_a
x-ray
vision
Superman
cat
senses
super
leaping
Catwoman
Catwoman
Superman
Annotations
super
strength
Ontology
cartoon character super power ontology
super senses
super physical powers
is_a
x-ray
vision
Superman’s
X-ray vision
Annotations
cat
senses
Catwoman’s
cat senses
super
leaping
Superman’s
super leaping
super
strength
Catwoman’s
super strength
Ontology
cartoon character super power ontology
is_a
super senses
Superman’s
X-ray vision
Annotations
Catwoman’s
cat senses
super physical powers
Superman’s
super leaping
Catwoman’s
super strength
molecular function
Ontology
binding
tetrapyrrole binding
cofactor binding
is_a
chlorophyll
binding
PSBI
Annotations
heme
binding
coenzyme
binding
quinone
binding
PSBI
molecular function
Ontology
binding
tetrapyrrole binding
cofactor binding
is_a
chlorophyll
binding
PSBI’s chlorophyll
binding function
Annotations
heme
binding
coenzyme
binding
quinone
binding
PSBI’s quinone
binding function
The Rules
1.
2.
3.
4.
5.
6.
Univocity: Terms should have the same meanings
on every occasion of use
Positivity: Terms such as ‘non-mammal’ or ‘nonmembrane’ do not designate genuine classes.
Objectivity: Terms such as ‘unknown’ or
‘unclassified’ or ‘unlocalized’ do not designate
biological natural kinds.
Single Inheritance: No class in a classification
hierarchy should have more than one is_a parent
on the immediate higher level
Intelligibility of Definitions: The terms used in a
definition should be simpler (more intelligible) than
the term to be defined
Basis in Reality: When building or maintaining an
ontology, always think carefully about how classes
relate to instances in reality
END
spare slides follow.
How to define A is_a B
A is_a B =def.
1. A and B are names of universals
(natural kinds, types) in reality
2. all instances of A are as a matter of
biological science also instances of B
True path violation
What is it?
nucleus
Part_of relationship
chromosome
Is_a relationship
Mitochondrial
chromosome
True path violation
What is it?
nucleus
Part_of relationship
Nuclear
chromosome
chromosome
Is_a relationships
Mitochondrial
chromosome
The Importance of synonyms for utility:
How do we represent the function of tRNA?
Molecular_function
Triplet_codon amino acid adaptor activity
GO Definition: Mediates the insertion of an amino acid at the correct
point in the sequence of a nascent polypeptide chain during protein
synthesis.
Synonym: tRNA
Main obstacle to integration
 Current ontologies do not deal well with
 Time and
 Space and
 Instances (particulars)
 Our definitions should link the terms in
the ontology to instances in spatiotemporal reality
7
Distinguish
Universals and Instances
Don’t forget instances when
defining relations
 part_of as a relation between classes
versus part_of as a relation between
instances
 nucleus part_of cell
 your heart part_of you
Slides and content by:
Barry Smith
http://ifomis.de
•formal ontology
•information science
•special reference to
the bio-medical domain.
Rama Balakrishnan, David Hill,
Jennifer Clark.
http://www.geneontology.org