Introduction-to-OBO - Buffalo Ontology Site
Download
Report
Transcript Introduction-to-OBO - Buffalo Ontology Site
OBO Foundry Principles
BFO
RO
Barry Smith
1
OBO Foundry Principles
open
common formal language (OBO Format, OWL DL, CL)
commitment to collaboration
maintenance in light of scientific advance
unique identifier space (Alan)
naming conventions (Susanna / EBI) – metadata for
changes
versioning
2
OBO Foundry Principles
common architecture (= RO + BFO)
clearly delineated content (redundant –
overlaps with orthogonality)
the ontology is well-documented (– overlaps
with rules for definitions; needs expanding,
for developers, for users, minimal metadata)
plurality of independent users
single locus of authority, trackers, help desk
3
OBO Foundry Principles
textual definitions plus formal definitions
all definitions should be of the genus-species
form, utilizing cross-products
therefore: single is_a inheritance (= each
ontology should be conceived as consisting of
a core of asserted single inheritance with
further is_a relations inferred)
4
Orthogonality
• For each domain, there should be convergence upon
a single ontology that is recommended for use by
those who wish to become involved with the
Foundry initiative
• Compare what happens in other parts of science: for
each domain, there should be convergence upon a
single theory
Preventing silos on the side of annotated data =
preventing forking of the ontologies used for
annotation
5
Strategy to ensure orthogonality
• If the Foundry already has an ontology O1
covering a domain D, and an outside group
creates a second ontology O2 covering D (or
part of D), we need to ask:
– is it in every respect better? (then replace O1 with
O2)
– is it in some respects better? (then negotiate an
improved synthesis, O3)
ASSUMPTION: ontologies are always comparable
PROBLEM: need better measures of ontology quality)
6
Benefits of orthogonality
• Offers a solution to the problem of silos that is
– modular
– incremental
– empirically based
– incorporates a strategy for motivating potential
developers and users
7
Orthogonality = non-redundancy
for the reference ontologies inside
the Foundry
• CARO-Mammal will not be orthogonal to
CARO
• IDO-Malaria will not be orthogonal to IDO
• IDO will not be orthogonal to DO
• DO will be orthogonal to CL
8
Absolute redundancy for
application ontologies
= all terms in application ontologies should be
taken from orthogonal reference ontologies
within the Foundry
9
Benefits of orthogonality
• Modularity brings benefits of division of labor,
division of authority, minimizes redundancy
10
Benefits of orthogonality
• Scientists become motivated to commit
themselves to developing an ontology falling
within their domain of expertise because they
themselves will need to use this ontology in
their own work in the future.
• Forking would erode this motivation
11
Benefits of orthogonality
• Incrementality means that the strategy will
still work even if ontologies are still only
partial
• this allows adoption and application at early
stages
12
Benefits of orthogonality
• Empirically based means that we can always
go back and start again if some ontology
module does not work (compare the problem
of non-modular approaches like SNOMED CT,
where it is all or nothing)
13
Benefits of orthogonality
• Modularity brings ownership, motivates on
scientist-developers to commit themselves
long term to developing the ontology
• This in turn motivates users to commit
themselves to adoption
– they see strong positive network effects from use
of the ontology)
– they gain reassurance from long-term
commitment
14
Benefits of orthogonality
• It helps those new to ontology who need to
know where to look in finding an ontology
relating to their subject-matter
• it obviates the need for ‘mappings’ between
ontologies, which are
– difficult to create and use
– error-prone
– hard to keep up-to-date when mapped ontologies
change
15
Benefits of orthogonality
• modularity (orthogonality) ensures the mutual
consistency of ontologies, and thereby also
the additivity of the annotations created with
their aid by different groups of annotators
describing common bodies of data.
• thereby contributes to the cumulativity of
science and allows new forms of unmanaged
collaboration.
16
Benefits of orthogonality
• brings grave responsibilities to those in charge
of ensuring for each domain that the Foundry
includes an ontology for that domain
• they must commit to perpetual striving for
scientific accuracy and domain-completeness
in their work
• orthogonality rules out the sorts of
simplification and partiality which may be
acceptable under more pluralistic regimes
17
Benefits of orthogonality
• it supports the strategy of utilizing crossproducts in composing terms and definitions
• this strategy will work only if we can
– minimize the degree of arbitrariness involved in
selecting the terms to be composed
– and thereby maximize the degree to which the
Foundry ontologies are networked together
through the cross-product links
18
Misunderstandings of Orthogonality
• Orthogonality does not mean that all
ontologies must be developed within the
Foundry framework
• We welcome the development of competing
approaches to open-access ontology
development – which can only make the
Foundry stronger
19
Problems with Orthogonality
• what if researchers need purpose-built
ontologies to meet their own specific needs?
• OBO Foundry provides orthogonal reference
ontologies, so that they can as far as possible
build their application ontologies using terms
composed as cross-products
• thereby avoid silos
• and contributing new terms back to the
Foundry in case of need
20
Problems with Orthogonality
• For each domain, there should be
convergence upon a single ontology that is
recommended for use by those who wish to
become involved with the Foundry initiative
Q: WHAT DOES ORTHOGONALITY MEAN?
minimally: two ontologies are not orthogonal if
they share a single term with the same meaning
Q: WHAT DOES DOMAIN MEAN?
21
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Initial OBO Foundry Reference Ontologies (jigsaw)
22
Homesteading
Recommendation: Ontology developers should
register their claim on territory not yet unoccupied,
as soon as possible, because the Foundry is
designed to serve as an attractor for collaboration
23
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Orthogonality = Westphalian principles of national
sovereignty for reference ontologies no shared
territory
24
Varieties of application ontology
•
•
•
•
cross-border national parks
Slims
Fractal ontologies
Cross-product ontologies
– Template ontologies (CARO, IDO, GDO …)
25
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
cross-border national parks: an ontology for studying
the effects of viral infection on cell function26in shrimp
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Organism
(NCBI
Taxonomy)
Cell
(CL)
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Slims = an ontology of dendritic cells
27
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Organism
(NCBI
Taxonomy)
Cell
(CL)
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Slims = an ontology of dendritic cells, with definitions
composed using terms from other ontologies
28
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
fractal ontologies, employing small portions of many
ontologies (e.g. MSO Multiple Sclerosis Ontology)
29
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RNAO, PRO)
OCCURRENT
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
Molecular Function
(GO)
rationale of OBO Foundry coverage + BFO
30
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
types plus instances
31
Fundamental Dichotomy
Continuants (aka endurants)
– have continuous existence in time
– preserve their identity through change
– exist in toto whenever they exist at all
Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive
phases
– exist only in their phases
Functions are continuants
Functionings are occurrents
Anatomical Organ
Entity
Function
(FMA,
(placehol
CARO)
der)
Phenotypic
Quality
(PATO)
CELL AND
CELLULAR
COMPONENT
MOLECULE
Cell
(CL)
Disease
(DO)
Biological
Process
(GO)
Cellular Cellular
Component Function
(FMA, GO) (GO)
(ChEBI, SO,
RNAO, PRO)
Molecular Function
(GO)
Molecular
Process
(GO)
Biomedical Investigations (OBI)
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
ORGAN AND
ORGANISM
Organism Anatomical
Organ
(NCBI
Entity
Function
Taxonomy /
(FMA,
(placehold
placeholder)
CARO)
er)
Phenotypic
Disease Biological Process
Quality
(DO)
(GO)
(PATO)
CELL AND
CELLULAR
COMPONENT
MOLECULE
Cell
(CL)
Cellular
Component
(FMA, GO)
(ChEBI, SO,
RNAO, PRO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular Process
(GO)
ORGAN AND
ORGANISM
Organism Anatomical
Organ
(NCBI
Entity
Function
Taxonomy /
(FMA,
(placehold
placeholder)
CARO)
er)
Disease
(DO)
Phenotypic
Quality
(PATO)
CELL AND
CELLULAR
COMPONENT
MOLECULE
Cell
(CL)
Cellular
Component
(FMA, GO)
(ChEBI, SO,
RNAO, PRO)
Cellular
Function
(GO)
Biological Process
(GO)
Cellular
Pathology
????
Molecular Function
(GO)
Molecular Process
(GO)
ORGAN AND
ORGANISM
Organism Anatomical
Organ
(NCBI
Entity
Function
Taxonomy /
(FMA,
(placehold
placeholder)
CARO)
er)
Disease
(DO)
Phenotypic
Quality
(PATO)
CELL AND
CELLULAR
COMPONENT
MOLECULE
Cell
(CL)
Cellular
Component
(FMA, GO)
(ChEBI, SO,
RNAO, PRO)
Cellular
Function
????
(GO???)
Biological Process
(GO)
Cellular
Pathology
????
Molecular Function
(GO)
Molecular Process
(GO)
ORGAN AND
ORGANISM
Organism Anatomical
Organ
(NCBI
Entity
Function
Taxonomy /
(FMA,
(placehold
placeholder)
CARO)
er)
Phenotypic
Disease Biological Process
Quality
(DO)
(GO)
(PATO)
CELL AND
CELLULAR
COMPONENT
MOLECULE
Cell
(CL)
Cellular
Component
(FMA, GO)
(ChEBI, SO,
RNAO, PRO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular Process
(GO)
ORGAN AND
ORGANISM
Organism Anatomical
Organ
(NCBI
Entity
Function
Taxonomy /
(FMA,
(placehold
placeholder)
CARO)
er)
Phenotypic
Disease Biological Process
Quality
(DO)
(GO)
(PATO)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cellular
Component
(FMA, GO)
2- and 3-D
Structure
(RNAO)
(PRO)
Cellular
Function
(GO)
Molecular Function
(GO)
MOLECULE
Small
Molecule
(ChEBI)
1-D
Sequence
(SO)
Molecular Process
(GO)
ORGAN AND
ORGANISM
Organism Anatomical
Organ
(NCBI
Entity
Function
Taxonomy /
(FMA,
(placehold
placeholder)
CARO)
er)
Phenotypic
Disease Biological Process
Quality
(DO)
(GO)
(PATO)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cellular
Component
(FMA, GO)
2- and 3-D
Structure
(RNAO)
(PRO)
Cellular
Function
(GO)
Molecular Process
(GO) ?????
Molecular Function
(GO)
MOLECULE
Small
Molecule
(ChEBI)
1-D
Sequence
(SO)
Molecular
Pathway
ORGAN AND
ORGANISM
Organism Anatomical
Organ
(NCBI
Entity
Function
Taxonomy /
(FMA,
(placehold
placeholder)
CARO)
er)
Phenotypic
Disease Biological Process
Quality
(DO)
(GO)
(PATO)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cellular
Component
(FMA, GO)
2- and 3-D
Structure
(RNAO)
(PRO)
Cellular
Function
(GO)
Molecular Process
(GO) ?????
Molecular Phenotypic Quality of
Molecule
Function
????
(GO)
MOLECULE
Small
Molecule
(ChEBI)
1-D
Sequence
(SO)
Reactome
Orthogonality can be preserved by
expanding the territory (land
reclamation)
42
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
GO already started to deal with biological processes
involving multiple organisms 43
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
Family, Community,
Deme, Population
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Organ
Anatomical
Function
Organism
Entity
(FMP, CPRO) Phenotypic
(NCBI
(FMA,
Quality
Taxonomy)
CARO)
(PaTO)
Cell
(CL)
Cellular
Component
(FMA, GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
http://obofoundry.org
Molecular Process
(GO)
44
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
COMPLEX OF
ORGANISMS
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Deme, Population
Population
Phenotype
Population
Process
Organ
Anatomical
Function
Organism
Entity
(FMP, CPRO)
(NCBI
(FMA,
Phenotypic
Taxonomy)
CARO)
Quality
(PaTO)
Cellular
Cellular
Cell
Component Function
(CL)
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
http://obofoundry.org
Biological
Process
(GO)
Molecular Process
(GO)
45
RELATION
TO TIME
CONTINUANT
INDEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Deme, Population
Organism
(FMA,
(NCBI
CARO)
Taxonomy)
Cell
(CL)
Cell Component
(FMA,
GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
DEPENDENT
ENVIRONMENT
COMPLEX OF
ORGANISMS
OCCURRENT
Organ
Function
(FMP,
CPRO)
Population
Phenotype
Population
Process
Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
http://obofoundry.org
46
Molecular
Process
(GO)
RELATION
TO TIME
CONTINUANT
INDEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Deme, Population
Organism
(FMA,
(NCBI
CARO)
Taxonomy)
Cell
(CL)
Cell Component
(FMA,
GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
ENVIRONMENT
COMPLEX OF
ORGANISMS
Environment of
population
Environment of single
organism
Environment of cell
Molecular environment
http://obofoundry.org
47
RELATION
TO TIME
CONTINUANT
INDEPENDENT
GRANULARITY
Family, Community,
Deme, Population
ORGAN AND
ORGANISM
Organism
(FMA,
(NCBI
CARO)
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cell Component
(FMA,
GO)
ENVIRONMENT
COMPLEX OF
ORGANISMS
Environment of
population
Environment of single
organism*
Environment of cell
* The sum total of the conditions and elements
Molecule
that
make up
theSO,surroundings
and influence
MOLECULE
(ChEBI,
Molecular
environment
RnaO, PrO)
the development and actions of an individual.
48
RELATION
TO TIME
CONTINUANT
INDEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
ENVIRONMENT
COMPLEX OF
ORGANISMS
biome / biotope, territory,
habitat, neighborhood, ...
work environment, home environment;
host/symbiont environment; ...
extracellular matrix; chemokine gradient;
...
hydrophobic surface; virus localized to
cellular substructure; active site on
protein; pharmacophore ...
http://obofoundry.org
49
CONTINUANT
INDEPENDENT
Organism
NCBI
Taxonomy
Cell
(CL)
OCCURRENT
DEPENDENT
Anatomical Organ
Entity
Function
(FMA,
(FMP,
CARO)
CPRO) Phenotyp Biological Process
ic Quality
(GO)
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
X
Organism
Taxonomy
Molecular Function Molecular Process
(GO)
(GO)
Template ontologies (CARO, IDO, CL?)
50
51
The case of IDO
Human Disease Ontology
Infectious Disease Ontology
unitary hierarchy with
root node: human disease
refers only to dependent
realizable continuants
draws terms from all BFO
categories
template exists in many copies:
specializing to different hosts,
pathogens, vectors, etc.
We have data
TBDB: Tuberculosis Database, including
Microarray data
VFDB: Virulence Factor DB
TropNetEurop Dengue Case Data
ISD: Influenza Sequence Database at LANL
PathPort: Pathogen Portal Project
...
53
We need common controlled vocabularies to
describe these data in ways that will assure
comparability and cumulation
What content is needed to adequately cover the
infectious domain?
–
–
–
–
Host-related terms (e.g. carrier, susceptibility)
Pathogen-related terms (e.g. virulence)
Vector-related terms (e.g. reservoir,
Terms for the biology of disease pathogenesis (e.g.
evasion of host defense)
– Population-level terms (e.g. epidemic, endemic,
pandemic, )
54
We need to annotate this data
to allow retrieval and integration of
– sequence and protein data for pathogens
– case report data for patients
– clinical trial data for drugs, vaccines
– epidemiological data for surveillance,
prevention
– ...
Goal: to make data deriving from different
sources comparable and computable
55
IDO needs to work with
Disease Ontology (DO) + SNOMED CT
Gene Ontology Immunology Branch
Phenotypic Quality Ontology (PATO)
Protein Ontology (PRO)
Sequence Ontology (SO)
...
56
IDO provides a common template
IDO works like CARO.
It contains terms (like ‘pathogen’, ‘vector’,
‘host’) which apply to organisms of all
species involved in infectious disease and
its transmission
Disease- and organism-specific ontologies
then built as specifications of the IDO
core
57
Proposed additions to list of OBO
Foundry Principles
• INSTANTIABILITY: Terms in an ontology
should correspond to instances in reality
Even disposition terms correspond to
instances in reality
There are no absent nipples
There are no cancelled studies
Proposed additions to list of
OBO Foundry Principles
INSTANTIABILITY: Terms in an ontology should
represent types all of which have instances in
reality
types = what are described in textbooks
instances = (roughly) what are described in data
59
Proposed additions to list of
OBO Foundry Principles
Ontologies consist of representations of types in
reality – therefore, their terms should consist
entirely of singular nouns (preferred terms blah
blah)
Ontologies should use singular nouns and noun
phrases belonging to ordinary English as
extended by technical terms already established
in the relevant discipline – they should not use
phrases like ‘EV-EXP-IGI’, no lab slang, no ellipses
60
Proposed additions to list of
OBO Foundry Principles
EVALUATION
• each ontology should be subject to evaluation
(as far as possible quantitative):
• software (conversion OBO format OWL)
• specialist review (OWL natural language)
• when one version is used for a given purposes
later versions should be applied to the same
purpose and results compared
61
Proposed additions to list of
OBO Foundry Principles
each ontology should be built on the basis of
BFO top-level distinctions (common top level):
• continuants vs. occurrents
• independent continuants (molecules, cells,
organisms …)
• specifically dependent continuants (qualities,
functions, roles …)
• generically dependent continuants
(information artifacts, sequences …)
62