Pax Terminologica - Buffalo Ontology Site

Download Report

Transcript Pax Terminologica - Buffalo Ontology Site

Standards and Ontology
The OBO Foundry
Barry Smith
Center of Excellence in Bioinformatics
& Life Sciences, University at Buffalo
IFOMIS, Saarland University
http://ontology.buffalo.edu/smith
1
we are accumulating huge amounts of
sequence data, image data, pharma data, ...
 how do we know what data we have ?
 how do I know what data you have ?
 how do we know what data we don’t have ?
 how do we make different sorts of data combinable, as
we need to do in large domains such as
neurodevelopment, immunology, cancer ...?
2
genomic medicine, molecular medicine,
translational medicine, personalized medicine ...
need
methods for data integration to enable
reasoning across data at multiple granularities
to identify biomedically relevant relations on the side
of the entities themselves
3
4
where in the body ?
what kind of disease process ?
we need semantic annotation of data
= we need ontologies
5
how create broad-coverage semantic
annotation systems for biomedicine?
Semantic Web, Moby, wikis, etc.
 let a million flowers (and weeds) bloom
 to create integration rely on (automatically
generated?) post hoc mappings
6
most successful, thus far: UMLS
built by trained experts
massively useful for information retrieval and information
integration
UMLS Metathesaurus a system of post hoc mappings
between source vocabularies separately built
7
8
UMLS-based mappings fall short
of creating interoperability
because local usage is respected
regimentation frowned upon, no concern for crossframework consistency
UMLS terminologies have different grades of formal
rigor, different degrees of completeness, different
update policies
9
with UMLS-based annotations
we can know what data we have (via term searches),
but it is noisy
we can map between data at single granularities (via
‘synonyms’), but synonymy information is noisy
how do we know what data we don’t have ?
how do we reason with data (as at the molecular
level), when no common logical backbone ?
10
for science
what is to be done?
to develop high quality annotation resources in a
collaborative, community effort?
create an evolutionary path towards improvement of
terminologies, of the sort we find elsewhere in
science
find ways to reward early adopters of the results
11
for science
for science, consistency is a sine qua non
science works out from a consensus core, and strives
to isolate and resolve inconsistencies as it extends at
the fringes
we need to create a consensus core
start with what for human beings are trivialities (low
hanging fruit) and work out from there
12
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Foundational
FMA Model of Anatomy
Tissue
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
13
for science
clinical medicine relies on anatomy
and molecular biology to provide
integration across medical specialisms
include ontologies corresponding to the basic
biomedical sciences in the core
14
for science
but we need more
where do we find scientifically validated information
linking gene products and other entities represented
in biochemical databases to semantically
meaningful terms pertaining to disease, anatomy,
development, histology in different model
organisms?
15
16
what makes
GO so wildly
successful ?
17
The methodology of annotations
science basis of the GO: trained experts curating
peer-reviewed literature
different model organism databases employ
scientific curators who use the experimental
observations reported in the biomedical literature to
associate GO terms with gene products in a
coordinated way
18
A set of standardized textual descriptions of
 cellular locations
 molecular functions
 biological processes
used to annotate the entities represented in the major
biochemical databases
thereby creating integration across these databases and
making them available to semantic search
19
what cellular component?
what molecular function?
what biological process?
20
This process
leads to improvements and extensions of the ontology
which in turn leads to better annotations
 a virtuous cycle of improvement in the quality and reach
of both future annotations and the ontology itself
RESULT: a slowly growing computer-interpretable map of
biological reality within which major databases are
automatically integrated in semantically searchable form
21
Five bangs for your GO buck
science base
cross-species database integration
cross-granularity database integration
through links to the things which are of
biomedical relevance
 semantic searchability links people to software
22
but now
need to improve the quality of GO to support more
rigorous logic-based reasoning across the data
annotated in its terms
need to extend the GO by engaging ever broader
community support for the addition of new terms
and for the correction of errors
23
but also
need to extend the methodology to other domains,
including clinical domains  need for
disease ontology
immunology ontology
symptom (phenotype) ontology
clinical trial ontology ...
24
the problem
existing clinical vocabularies are of variable quality and low
mutual consistency
need for prospective standards to ensure mutual consistency
and high quality of clinical counterparts of GO
need to ensure consistency of the new clinical ontologies
with the basic biomedical sciences
if we do not start now, the problem will only get worse
25
the solution
establish common rules governing best practices for creating
ontologies and for using these in annotations
apply these rules to create a complete suite of orthogonal
interoperable biomedical reference ontologies
this solution is already being implemented
26
First step (2003)
a shared portal for (so far) 58 ontologies
(low regimentation)
http://obo.sourceforge.net  NCBO BioPortal
27
28
Second step (2004)
reform efforts initiated, e.g. linking GO to other
OBO ontologies to ensure orthogonality
GO
id: CL:0000062
name: osteoblast
def: "A bone-forming cell which secretes an extracellular matrix.
Hydroxyapatite crystals are then deposited into the matrix to form
bone."
is_a: CL:0000055
relationship: develops_from CL:0000008
relationship: develops_from CL:0000375
Osteoblast differentiation: Processes whereby an
osteoprogenitor cell or a cranial neural crest cell
acquires the specialized features of an osteoblast, a
bone-forming cell which secretes extracellular matrix.
+
Cell type
=
New Definition
29
Third step (2006)
The OBO Foundry
http://obofoundry.org/
30
The OBO Foundry
a family of interoperable gold standard
biomedical reference ontologies to serve the
annotation of inter alia



scientific literature
model organism databases
clinical trial data
The OBO Foundry
http://obofoundry.org/
31
A prospective standard
designed to guarantee interoperability of ontologies from
the very start (contrast to: post hoc mapping)
established March 2006
12 initial candidate OBO ontologies – focused primarily on
basic science domains
several being constructed ab initio
by influential consortia who have the authority to impose
their use on large parts of the relevant communities.
32
GO Gene Ontology
undergoing
ChEBI Chemical Ontology
rigorous
CL Cell Ontology
FMA Foundational Model of Anatomy reform
PaTO Phenotype Quality Ontology
SO Sequence Ontology
CARO Common Anatomy Reference Ontology
CTO Clinical Trial Ontology
FuGO Functional Genomics Investigation Ontology
PrO Protein Ontology
RnaO RNA Ontology
RO Relation Ontology
new
The OBO Foundry
http://obofoundry.org/
33
Ontology
Scope
URL
Custodians
Cell Ontology
(CL)
cell types from prokaryotes
to mammals
obo.sourceforge.net/cgibin/detail.cgi?cell
Jonathan Bard, Michael
Ashburner, Oliver Hofman
Chemical Entities of Biological Interest (ChEBI)
molecular entities
ebi.ac.uk/chebi
Paula Dematos,
Rafael Alcantara
Common Anatomy Reference Ontology (CARO)
anatomical structures in
human and model organisms
(under development)
Melissa Haendel, Terry
Hayamizu, Cornelius Rosse,
David Sutherland,
Foundational Model of
Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,
Cornelius Rosse
Functional Genomics
Investigation Ontology
(FuGO)
design, protocol, data
instrumentation, and analysis
fugo.sf.net
FuGO Working Group
Gene Ontology
(GO)
cellular components,
molecular functions,
biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality
Ontology
(PaTO)
qualities of anatomical
structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?
attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology
(PrO)
protein types and
modifications
(under development)
Protein Ontology Consortium
Relation Ontology (RO)
relations
obo.sf.net/relationship
Barry Smith, Chris Mungall
RNA Ontology
(RnaO)
three-dimensional RNA
structures
(under development)
RNA Ontology Consortium
Sequence Ontology
(SO)
properties and features of
nucleic sequences
song.sf.net
Karen Eilbeck
34
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
Annotations plus ontologies yield an ever-growing
computer-interpretable map of biological reality. 35
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Building out from the original GO
36
Under consideration:
 Disease Ontology (DO)
 Biomedical Image and Image Process
Ontology (BiiO)
 Upper Biomedical Ontology (OBO UBO)
 Ontology of Biomedical Investigations (OBI)
 Clinical Trial Ontology (CTO)
The OBO Foundry
http://obofoundry.org/
37
OBO Foundry = a subset of OBO ontologies, whose
developers have agreed in advance to accept a
common set of principles reflecting best practice in
ontology development designed to ensure

tight connection to the biomedical basic sciences

compatibility

interoperability, common relations

formal robustness

support for logic-based reasoning
The OBO Foundry
http://obofoundry.org/
38
CRITERIA
CRITERIA
 The ontology is OPEN and available to be used
by all.
 The ontology is in, or can be instantiated in, a
COMMON FORMAL LANGUAGE.
 The developers of the ontology agree in advance
to COLLABORATE with developers of other OBO
Foundry ontology where domains overlap.
The OBO Foundry
http://obofoundry.org/
39
CRITERIA
 UPDATE: The developers of each ontology
commit to its maintenance in light of scientific
advance, and to soliciting community feedback
for its improvement.
 ORTHOGONALITY: They commit to working with
other Foundry members to ensure that, for any
particular domain, there is community
convergence on a single controlled vocabulary.
The OBO Foundry
http://obofoundry.org/
40
for science
orthogonality of ontologies
implies additivity of annotations
if we annotate a database or body of literature with
one high-quality biomedical ontology, we should be
able to add annotations from a second such
ontology without conflicts
The OBO Foundry
http://obofoundry.org/
41
CRITERIA
CRITERIA
 IDENTIFIERS: The ontology possesses a unique
identifier space within OBO.
 VERSIONING: The ontology provider has
procedures for identifying distinct successive
versions to ensure BACKWARDS COMPATIBITY
with annotation resources already in common use
 The ontology includes TEXTUAL DEFINITIONS
and where possible equivalent formal definitions of
its terms.
42
CRITERIA
 CLEARLY BOUNDED: The ontology has a
clearly specified and clearly delineated content.
 DOCUMENTATION: The ontology is welldocumented.
 USERS: The ontology has a plurality of
independent users.
The OBO Foundry
http://obofoundry.org/
43
CRITERIA
 COMMON ARCHITECTURE: The ontology uses
relations which are unambiguously defined
following the pattern of definitions laid down in
the OBO Relation Ontology.*
* Smith et al., Genome Biology 2005, 6:R46
The OBO Foundry
http://obofoundry.org/
44
OBO Relation Ontology
Foundational
is_a
part_of
Spatial
located_in
contained_in
adjacent_to
Temporal
transformation_of
derives_from
preceded_by
Participation
has_participant
has_agent
The OBO Foundry
http://obofoundry.org/
45
IT WILL GET HARDER
Further criteria will be added over time in light of
lessons learned in order to bring about a gradual
improvement in the quality of Foundry ontologies
ALL FOUNDRY ONTOLOGIES WILL BE SUBJECT
TO CONSTANT UPDATE IN LIGHT OF
SCIENTIFIC ADVANCE
The OBO Foundry
http://obofoundry.org/
46
IT WILL GET HARDER
But not everyone needs to join
The Foundry is not seeking to serve as a check on
flexibility or creativity
ALL FOUNDRY ONTOLOGIES WILL ENCOURAGE
COMMUNITY CRITICISM, CORRECTION AND
EXTENSION WITH NEW TERMS
The OBO Foundry
http://obofoundry.org/
47
GOALS
 to introduce some of the features of SCIENTIFIC
PEER REVIEW into biomedical ontology
development
 CREDIT for high quality ontology development work
 KUDOS for early adopters of high quality ontologies /
terminologies e.g. in reporting clinical trial results
The OBO Foundry
http://obofoundry.org/
48
GOALS
 to providing a FRAMEWORK OF RULES to
counteract the current policy of ad hoc creation
of new annotation schemas by each clinical
research group by
 REUSABILITY: if data-schemas are formulated
using a single well-integrated framework
ontology system in widespread use, then this
data will be to this degree itself become more
widely accessible and usable
The OBO Foundry
http://obofoundry.org/
49
GOALS

to serve as BENCHMARK FOR IMPROVEMENTS in
discipline-focused terminology resources

once a system of interoperable reference ontologies is
there, it will make sense to calibrate existing
terminologies in its terms in order to achieve more robust
alignment and greater domain coverage

exploit the avenue of EVIDENCE-BASED MEDICINE
(NIH CLINICAL RESEARCH NETWORKS) to foster their
use by clinicians
The OBO Foundry
http://obofoundry.org/
50
the vision is spreading
June 2006: establishment of MICheck:
reflects growing need for prescriptive checklists
specifying the key information to include when
reporting experimental results (concerning
methods, data, analyses and results).
The OBO Foundry
http://obofoundry.org/
51
MICheck Foundry
 MICheck: ‘a common resource for minimum
information checklists’ analogous to OBO /
NCBO BioPortal
 MICheck Foundry: will create ‘a suite of selfconsistent, clearly bounded, orthogonal,
integrable checklist modules’ *
* Taylor CF, et al. Nature Biotech, in press
The OBO Foundry
http://obofoundry.org/
52
MICheck/Foundry communities
Transcriptomics (MIAME Working Group)
Proteomics (Proteomics Standards Initiative)
Metabolomics (Metabolomics Standards Initiative)
Genomics and Metagenomics (Genomic Standards Consortium)
In Situ Hybridization and Immunohistochemistry (MISFISHIE
Working Group)
Phylogenetics (Phylogenetics Community)
RNA Interference (RNAi Community)
Toxicogenomics (Toxicogenomics WG)
Environmental Genomics (Environmental Genomics WG)
Nutrigenomics (Nutrigenomics WG)
Flow Cytometry (Flow Cytometry Community)
53
Fourth Step (the future)
how to replicate the successes of the GO in clinical
medicine?
choose two or three representative disease domains
work out reasoning challenges for those domains
work with specialists to create ontologies interoperable
with OBO Foundry basic science ontologies to address
these reasoning challenges
work with leaders of professional associations and of
clinical trial initiatives to foster the collection of clinical
data annotated in their terms
54
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
OCCURRENT
Molecular Function
(GO)
OBO Foundry coverage
(canonical ontologies)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
55
DEPENDENT CONTINUANTS
INDEPENDENT
CONTINUANTS
physiology
(functions)
pathology
acute
stage
progressive
stage
resolution
stage
organism
system
organ
organ part
tissue
cell
acellular anatomical
structure
biological molecule
genome
56
Draft Ontology
for Acute
Respiratory
Distress
Syndrome
57
Draft Ontology
for Muscular
Sclerosis
what data do
we have?
what data do the
others have?
what data do
we not have?
58
Draft Ontology
for Muscular
Sclerosis
to apprehend what is
unknown requires a
complete
demarcation of the
relevant space of
alternatives
59
National Center for
Ontological Research
Goal: to advance ontology as science
http://ncor.us
60
61