The Ontology of the Gene Ontology

Download Report

Transcript The Ontology of the Gene Ontology

Species and Classification in
Biology
Barry Smith
http://ifomis.org
http:// ifomis.org
2
10-9 m
DNA
http:// ifomis.org
3
Organism
Organ
10-1 m
Tissue
Cell
10-5 m
Organelle
Protein
DNA
10-9 m
http:// ifomis.org
4
New golden age of classification*
~ 30 million species
30,000 genes in human
200,000 proteins
100s of cell types
100,000s of disease types
1,000,000s of biochemical pathways
(including disease pathways)
*… legacy of Human Genome Project
http:// ifomis.org
5
Organism
Organ
10-1 m
Tissue
Cell
10-5 m
Organelle
Protein
DNA
10-9 m
http:// ifomis.org
6
FUNCTIONAL GENOMICS
proteomics,
reactomics,
metabonomics,
phenomics,
behaviouromics,
toxicopharmacogenomics
…
http:// ifomis.org
7
The incompatibilities between different
scientific cultures and terminologies
immunology
genetics
cell biology
http:// ifomis.org
8
have resurrected the problem of the unity
of science in a new guise:
The logical positivist solution to
this problem addressed a world in
which sciences are associated
with printed texts.
What happens when sciences are
associated with databases?
http:// ifomis.org
9
… when each (chemical, pathological,
immunological, toxicological) information
system uses its own classifications
how can we overcome the
incompatibilities which become apparent
when data from distinct sources are
combined?
http:// ifomis.org
10
Answer:
“Ontology”
http:// ifomis.org
11
= building software artefacts
standardized classification systems/
controlled vocabularies
so that data from one source should be
expressed in a language which
makes it compatible with data from
every other source
http:// ifomis.org
12
Google hits (in millions) 25.4.06
ontology
52.4
ontology + philosophy
2.7
ontology + information science 6.0
ontology + database
7.8
http:// ifomis.org
13
A Linnaean Species Hierarchy
http:// ifomis.org
14
(Small) Disease Hierarchy
http:// ifomis.org
15
Combining hierarchies
Organisms
http:// ifomis.org
Diseases
16
via Dependence Relations
Organisms
http:// ifomis.org
Diseases
17
A Window on Reality
http:// ifomis.org
18
A Window on Reality
Diseases
Organisms
http:// ifomis.org
19
A Window on Reality
http:// ifomis.org
20
How to understand species (aka
types, universals, kinds)
Species are something like invariants in
reality which can be studied by science
Species have instances: this mouse, this cell,
this cell membrane ...
http:// ifomis.org
21
Entity =def
anything which exists, including things and
processes, functions and qualities, beliefs
and actions, documents and software
http:// ifomis.org
22
Domain =def
a portion of reality that forms the subjectmatter of a single science or technology or
mode of study;
proteomics
radiology
viral infections in mouse
http:// ifomis.org
23
Representation =def
an image, idea, map, picture, name or
description ... of some entity or entities.
http:// ifomis.org
24
Analogue representations
http:// ifomis.org
25
Representational units =def
terms, icons, photographs, identifiers ...
which refer, or are intended to refer, to
entities
http:// ifomis.org
26
Composite representation =def
representation
(1) built out of representational units
which
(2) form a structure that mirrors, or is intended
to mirror, the entities in some domain
http:// ifomis.org
27
The Periodic Table
Periodic Table
http:// ifomis.org
28
Ontologies are here
http:// ifomis.org
29
Ontologies are representational
artifacts
http:// ifomis.org
30
What do ontologies represent?
http:// ifomis.org
31
A
B
C
515287
521683
521682
http:// ifomis.org
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
32
instances
A
B
C
515287
521683
521682
http:// ifomis.org
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
types
33
Two kinds of composite
representational artifacts
Databases, inventories: represent what is
particular in reality = instances
Ontologies, terminologies, catalogs:
represent what is general in reality =
types
http:// ifomis.org
34
What do ontologies represent?
http:// ifomis.org
35
Ontologies do not represent
concepts in people’s heads
http:// ifomis.org
36
Ontology is a tool of science
Scientists do not describe the concepts in
scientists’ heads
They describe the types in reality, as a step
towards finding ways to reason about (and
treat) instances of these types
http:// ifomis.org
37
The biologist has a cognitive representation
which involves theoretical knowledge
derived from textbooks
http:// ifomis.org
38
An ontology is like a scientific text;
it is a representation of types in reality
http:// ifomis.org
39
Two kinds of composite
representational artifacts
Databases represent instances
Ontologies represent types
http:// ifomis.org
40
Instances stand in similarity relations
Frank and Bill are similar as humans,
mammals, animals, etc.
Human, mammal and animal are types at
different levels of granularity
http:// ifomis.org
41
types
substance
organism
animal
mammal
cat
siamese
frog
instances
http:// ifomis.org
42
science needs to find uniform ways
of representing types
ontology =def a representational artifact whose
representational units (which may be drawn from
a natural or from some formalized language) are
intended to represent
1. types in reality
2. those relations between these types which
obtain universally (= for all instances)
lung is_a anatomical structure
lobe of lung part_of lung
http:// ifomis.org
43
is_a
A is_a B =def
For all x, if x instance_of A then x
instance_of B
cell division is_a biological process
http:// ifomis.org
44
Entities
http:// ifomis.org
45
Entities
universals (species, types, taxa, …)
particulars (individuals, tokens, instances)
http:// ifomis.org
46
Canonical instances within the
realm of individuals
= those individuals which
1. instantiate universals (entering into
biological laws)
2. are prototypical
 Canonical Anatomy: no Siamese twins,
no six-fingered giants, no amputation
stumps, …
http:// ifomis.org
47
Entities
universals
junk
junk
instances
junk
example of junk particulars: desk-mountain
http:// ifomis.org
48
Entities
human
inst
Jane
http:// ifomis.org
49
Ontologies are More than Just
Taxonomies
http:// ifomis.org
50
The Gene Ontology
7 million google hits
a cross-species controlled
vocabulary for annotations of
genes and gene products
deeper than Darwinianism
http:// ifomis.org
51
When a gene is identified
three important types of questions need to
be addressed:
1. Where is it located in the cell?
2. What functions does it have on the
molecular level?
3. To what biological processes do these
functions contribute?
http:// ifomis.org
52
GO has three ontologies
biological
processes
molecular
functions
cellular
components
http:// ifomis.org
53
GO astonishingly influential
used by all major species genome projects
used by all major pharmacological research
groups
used by all major bioinformatics research
groups
http:// ifomis.org
54
GO part of the Open Biological
Ontologies consortium
Fungal Ontology
Plant Ontology
Yeast Ontology
Disease Ontology
http:// ifomis.org
Mouse Anatomy
Ontology
Cell Ontology
Sequence Ontology
Relations Ontology
55
Each of GO’s ontologies
is organized in a graph-theoretical
structure involving two sorts of links or
edges:
is-a (= is a subtype of )
(copulation is-a biological process)
part-of
(cell wall part-of cell)
http:// ifomis.org
56
http:// ifomis.org
57
The Gene Ontology
a ‘controlled vocabulary’
designed to standardize annotation of
genes and gene products
used by over 20 genome database and
many other groups in academia and
industry
and methodology much imitated
http:// ifomis.org
58
The Methodology of Annotations
Scientific curators use experimental observations
reported in the biomedical literature to link gene
products with GO terms in annotations.
The gene annotations taken together yield a slowly
growing computer-interpretable map of
biological reality,
The process of annotating literature also leads to
improvements and extensions of the ontology,
which institutes a virtuous cycle of improvement
in the quality and reach of future annotations
and of the ontology itself.
The Gene Ontology as Cartoon
http:// ifomis.org
59
cellular components
molecular functions
biological processes
1372 component terms
7271 function terms
8069 process terms
http:// ifomis.org
60
The Cellular Component
Ontology (counterpart of anatomy)
membrane
nucleus
http:// ifomis.org
61
The Molecular Function Ontology
protein stabilization
The Molecular Function ontology is
(roughly) an ontology of actions on the
molecular level of granularity
http:// ifomis.org
62
Biological Process Ontology
death
An ontology of occurrents on the level of
granularity of cells, organs and whole
organisms
http:// ifomis.org
63
GO here an example
a. of the sorts of problems confronting life
science data integration
b. of the degree to which formal methods
are relevant to the solution of these
problems
http:// ifomis.org
64
Each of GO’s ontologies
is organized in a graph-theoretical data
structure involving two sorts of links or
edges:
is-a (= is a subtype of )
(copulation is-a biological process)
part-of
(cell wall part-of cell)
http:// ifomis.org
65
Linnaeus
http:// ifomis.org
66
http:// ifomis.org
67
Entities
http:// ifomis.org
68
Entities
universals (kinds, types, taxa, …)
particulars (individuals, tokens, instances …)
Axiom: Nothing is both a universal and a particular
http:// ifomis.org
69
Entities
universals*
*natural, biological, kinds
http:// ifomis.org
70
Entities
universals
instances
http:// ifomis.org
71
universals are natural kinds
Instances are natural exemplars of
natural kinds
(problem of non-standard instances)
Not all individuals are instances of
universals
http:// ifomis.org
72
Entities
universals
instances
instances
penumbra of borderline cases
http:// ifomis.org
73
Entities
universals
junk
junk
instances
junk
example of junk: beachball-desk
http:// ifomis.org
74
Primitive relations:
inst and part
inst(Jane, human being)
part(Jane’s heart, Jane’s body)
A universal is anything that is instantiated
An instance as anything (any individual) that
instantiates some universal
http:// ifomis.org
75
Entities
human
inst
Jane
http:// ifomis.org
76
A is_a B
genus(B)
species(A)
instances
http:// ifomis.org
77
is-a
D3* e is a f =def universal(e)  universal(f)
 x (inst(x, e)  inst(x, f)).
genus(A)=def universal(A)  B (B is a A 
B  A)
species(A)=def universal(A)  B (A is a B 
B  A)
http:// ifomis.org
78
solve problem of false positives
insist that
A is_a B
holds always as a matter of scientific law
http:// ifomis.org
79
nearest species
nearestspecies(A, B)=def A is_a B &
C ((A is_a C & C is_a B)  (C = A or C = B)
B
A
http:// ifomis.org
80
Definitions
highest genus
lowest species
instances
http:// ifomis.org
81
Lowest Species and Highest Genus
lowestspecies(A)=def
species(A) & not-genus(A)
highestgenus(A)=def
genus(A) & not-species(A)
Theorem:
universal(A)  (genus(A) or
lowestspecies(A))
http:// ifomis.org
82
Axioms
Every universal has at least one instance
Distinct lowest species never share
instances
SINGLE INHERITANCE:
Every species is the nearest species to
exactly one genus
http:// ifomis.org
83
Axioms governing inst
genus(A) & inst(x, A) 
B nearestspecies(B, A) & inst(x, B)
EVERY GENUS HAS AN INSTANTIATED
SPECIES
nearestspecies(A, B)  A’s instances are
properly included in B’s instances
EACH SPECIES HAS A SMALLER CLASS
OF INSTANCES THAN ITS GENUS
http:// ifomis.org
84
Axioms
nearestspecies(B, A)
 C (nearestspecies(C, A) & B  C)
EVERY GENUS HAS AT LEAST TWO
CHILDREN
nearestspecies(B, A) & nearestspecies(C, A) &
B  C)  not-x (inst(x, B) & inst(x, C))
SPECIES OF A COMMON GENUS NEVER
SHARE INSTANCES
http:// ifomis.org
85
Theorems
(genus(A) & inst(x, A))  B (lowestspecies(B) & B
is_a A & inst(x, B))
EVERY INSTANCE IS ALSO AN INSTANCE OF
SOME LOWEST SPECIES
(genus(A) & lowestspecies(B) & x(inst(x, A) &
inst(x, B))  B is_a A)
IF AN INSTANCE OF A LOWEST SPECIES IS AN
INSTANCE OF A GENUS THEN THE LOWEST
SPECIES IS A CHILD OF THE GENUS
http:// ifomis.org
86
Theorems
universal(A) & universal(B)  (A = B or A
is_a B or B is_a A or not-x(inst(x, A) &
inst(x, B)))
DISTINCT UNIVERSALS EITHER STAND
IN A PARENT-CHILD RELATIONSHIP
OR THEY HAVE NO INSTANCES IN
COMMON
http:// ifomis.org
87
Theorems
A is_a B & A is_a C
 (B = C or B is_a C or C is_a B)
UNIVERSALS WHICH SHARE A CHILD
IN COMMON ARE EITHER IDENTICAL
OR ONE IS SUBORDINATED TO THE
OTHER
http:// ifomis.org
88
Theorems
(genus(A) & genus(B) & x(inst(x, A) &
inst(x, B)))  C(C is_a A & C is_a B)
IF TWO GENERA HAVE A COMMON
INSTANCE THEN THEY HAVE A
COMMON CHILD
http:// ifomis.org
89
Expanding the theory
Sexually reproducing organisms
Organisms in general
To take account of development (child,
adult; larva, butterfly)
Biological processes
Biological functions
-- at different levels of granularity
http:// ifomis.org
90
How to understand species (aka
types, universals, kinds)
Species are something like invariants in
reality which can be studied by science
Species have instances: this mouse, this cell,
this cell membrane ...
http:// ifomis.org
91
Universal, Classes, Sets
A class is the extension of universal
http:// ifomis.org
92
Class =def
a maximal collection of particulars
determined by a general term (‘cell’,
‘mouse’, ‘Saarländer’)
the class A
= the collection of all particulars x for
which ‘x is A’ is true
http:// ifomis.org
93
Universals and Classes vs. Sums
The former are marked by granularity:
they divide up the domain into whole units,
whose interior parts are traced over.
The universal human being is instantiated
only by human beings as single, whole
units.
A mereological sum is not granular in this
sense (molecules are parts of the
mereological sum of human beings)
http:// ifomis.org
94
A bad solution
Identify both universals and classes with sets in
the mathematical sense
Problem of false positives
adult  child
lion in Leipzig  lion
animal owned by the Emporer  mammal
mammal weighing less than 200 Kg  animal
http:// ifomis.org
95
Sets in the mathematical sense
are marked by granularity
Granularity = each class or set is laid across
reality like a grid consisting
(1) of a number of slots or pigeonholes
each (2) occupied by some member.
Each set is (1) associated with a specific number
of slots, each of which (2) must be occupied
by some specific member.
A class survives the turnover in its instances:
both (1) the number of slots and (2) the
individuals occupying these slots may vary
with time
http:// ifomis.org
96
But sets are timeless
A set is an abstract structure, existing
outside time and space. The set of human
beings existing at t is (timelessly) a
different entity from the set of human
beings existing at t because of births and
deaths.
Biological classes exist in time
Darwin: because the universals of which
they are extensions exist in time
http:// ifomis.org
97