NIH VISION THING

Download Report

Transcript NIH VISION THING

How Ontologies Create Research
Communities
Barry Smith
http://ontology.buffalo.edu/smith
1
genomic medicine, molecular medicine,
translational medicine, personalized medicine ...
need
methods for data integration to enable
reasoning across data at multiple granularities
to identify biomedically relevant relations on the side
of the entities themselves
2
3
where in the body ?
what kind of disease process ?
we need semantic annotations of data
which human beings can understand
and computers can reason with
4
5
BIRN: Bioinformatics Research Network
Institute for Formal Ontology and Medical Information Science (IFOMIS)
6
BIRN: Bioinformatics Research Network
Center for In Vivo Microscopy
Brain Imaging and Analysis Center
Neuropsychiatric Imaging Research Laboratory
Yerkes National Primate Research Center
InstituteCognitive
for Formal Ontology
and Medical Information
Science (IFOMIS)
Clinical
Neuroscience
Laboratory
Mallinkrodt Institute of Radiology
Lineberger Comprehensive Cancer Center
fMRI Research Center
Surgical Planning Laboratory
Center for Magnetic Resonance Research
7
Multiscale Systems Immunology for
Adjuvant [Vaccine] Development
Investigators
Duke Center for Computional
Immunology
Thomas B Kepler
Lindsay G Cowell
Cliburn Chan
Duke Institute of Statistics and
Decision Sciences
Mike West
Duke Computer Science
Jun Yang
Duke Human Vaccine Institute
Duke Center for Computational
Sciences, Engineering and
Medicine
John Pormann
Rachael Brady
Bill Rankin
Duke Mathematics
Bill Allard
Greg Sempowski
Munir Alam
Department of Pathology,Emory
Bali Pulendran
Department of Physiology &
Biophysics, UC Irvine
Michael Cahalan
Department of Pediatrics,
Vanderbilt
Kathryn Edwards
how do we make different sorts of data
combinable in ways useful to the human
beings who carry out research?
9
how was this problem solved in the
years BC?
how did clinical researchers from different
disciplines communicate?
how did they learn to communicate?
10
through the basic biomedical sciences:
anatomy, physiology,
biochemistry, histology, ...
11
clinical medicine relies on anatomy
and molecular biology to provide
integration across medical specialisms
create ontologies corresponding to the basic
biomedical sciences
12
13
but we need more
where do we find scientifically validated information
linking gene products and other entities represented
in biochemical databases to semantically
meaningful terms pertaining to disease, anatomy,
development, histology in different model
organisms?
14
15
what makes
GO so wildly
successful ?
16
The methodology of annotations:
different model organism databases
employ scientific curators who use the
experimental observations reported in
the biomedical literature to associate
GO terms with gene products in a
coordinated way
17
A set of standardized textual descriptions of
 cellular locations
 molecular functions
 biological processes
used to annotate the entities represented in the major
biochemical databases
thereby creating integration across these databases and
making them available to semantic search
18
what cellular component?
what molecular function?
what biological process?
19
This process
leads to a slowly growing computerinterpretable map of biological reality
within which major databases are
automatically integrated in
semantically searchable form
20
Five bangs for your GO buck
science base
cross-species database integration
cross-granularity database integration
through links to the things which are of
biomedical relevance
 semantic searchability links people to software
21
but also
need to extend this methodology beyond the basic
biomedical sciences, to clinical domains
disease ontology
immunology ontology
symptom (phenotype) ontology
neuron ontology
brain (mal)function ontology ...
22
the problem
need to ensure consistency of the new clinical ontologies
with the basic biomedical sciences
need to find ways to ensure clinical data is annotated in
terms of these new controlled vocabularies
if we do not start now, the problem will only get worse
23
First step (2003)
a shared portal for (so far) 58 ontologies
(low regimentation)
http://obo.sourceforge.net  NCBO BioPortal
24
25
Second step (2004)
reform efforts initiated, e.g. linking GO to other
OBO ontologies to ensure orthogonality
GO
id: CL:0000062
name: osteoblast
def: "A bone-forming cell which secretes an extracellular matrix.
Hydroxyapatite crystals are then deposited into the matrix to form
bone."
is_a: CL:0000055
relationship: develops_from CL:0000008
relationship: develops_from CL:0000375
Osteoblast differentiation: Processes whereby an
osteoprogenitor cell or a cranial neural crest cell
acquires the specialized features of an osteoblast, a
bone-forming cell which secretes extracellular matrix.
+
Cell type
=
New Definition
26
Third step (2006)
The OBO Foundry
http://obofoundry.org/
27
The OBO Foundry
a family of interoperable gold standard
biomedical reference ontologies to serve the
annotation of inter alia
scientific literature
model organism databases
clinical trial data
The OBO Foundry
http://obofoundry.org/
28
A prospective standard
designed to guarantee interoperability of ontologies from
the very start (contrast to: post hoc mapping)
established March 2006
12 initial candidate OBO ontologies – focused primarily on
basic science domains
several being constructed ab initio
by influential consortia who have the authority to impose
their use on large parts of the relevant communities.
29
GO Gene Ontology
undergoing
ChEBI Chemical Ontology
rigorous
CL Cell Ontology
FMA Foundational Model of Anatomy reform
PaTO Phenotype Quality Ontology
SO Sequence Ontology
CARO Common Anatomy Reference Ontology
CTO Clinical Trial Ontology
FuGO Functional Genomics Investigation Ontology
PrO Protein Ontology
RnaO RNA Ontology
RO Relation Ontology
new
30
GO Gene Ontology
ChEBI Chemical Ontology
CL Cell Ontology
FMA Foundational Model of Anatomy
PaTO Phenotype Quality Ontology
SO Sequence Ontology
already in
good shape
CARO Common Anatomy Reference Ontology
CTO Clinical Trial Ontology
FuGO Functional Genomics Investigation Ontology
PrO Protein Ontology
RnaO RNA Ontology
RO Relation Ontology
31
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Foundational Model of Anatomy
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
33
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Building out from the original GO
34
Under consideration:
Disease Ontology (DO)
Biomedical Image Ontology (BIO)
Upper Biomedical Ontology (OBO UBO)
Environment Ontology (EnvO)
Systems Biology Ontology (SBO)
The OBO Foundry
http://obofoundry.org/
35
OBO Foundry = a subset of OBO ontologies, whose
developers have agreed in advance to accept a
common set of principles reflecting best practice in
ontology development designed to ensure
tight connection to the biomedical basic sciences
compatibility
interoperability, common relations
formal robustness
support for logic-based reasoning
The OBO Foundry
http://obofoundry.org/
36
CRITERIA
 The ontology is OPEN
 The ontology employs a COMMON FORMAL
LANGUAGE.
 The developers agree to COLLABORATE
 UPDATE in light of scientific advance
 ORTHOGONALITY: one ontology per
domain
37
CRITERIA
 COMMON ARCHITECTURE: The ontology uses
relations which are unambiguously defined
following the pattern of definitions laid down in
the OBO Relation Ontology.*
* Smith et al., Genome Biology 2005, 6:R46
The OBO Foundry
http://obofoundry.org/
38
IT WILL GET HARDER
Further criteria will be added over time in light of
lessons learned in order to bring about a gradual
improvement in the quality of Foundry ontologies
ALL FOUNDRY ONTOLOGIES WILL BE SUBJECT
TO CONSTANT UPDATE IN LIGHT OF
SCIENTIFIC ADVANCE
The OBO Foundry
http://obofoundry.org/
39
IT WILL GET HARDER
But not everyone needs to join
The Foundry is not seeking to serve as a check on
flexibility or creativity
ALL FOUNDRY ONTOLOGIES WILL ENCOURAGE
COMMUNITY CRITICISM, CORRECTION AND
EXTENSION WITH NEW TERMS
The OBO Foundry
http://obofoundry.org/
40
GOALS
 to introduce some of the features of SCIENTIFIC
PEER REVIEW into biomedical ontology
development
 KUDOS for early adopters of high quality ontologies /
terminologies e.g. in reporting clinical trial results
 establish ONTOLOGY CHAMPIONS to create
EVIDENCE-BASED TERMINOLOGY RESEARCH
The OBO Foundry
http://obofoundry.org/
41
GOALS
 DATA REUSABILITY: if data-schemas are
formulated using a single well-integrated
framework ontology system in widespread use,
then this data will be to this degree itself become
more widely accessible and usable
The OBO Foundry
http://obofoundry.org/
42
expand to all areas of biomedical
experimentation
June 2006: establishment of MICheck:
reflects growing need for prescriptive checklists
specifying the key information to include when
reporting experimental results (concerning
methods, data, analyses and results).
The OBO Foundry
http://obofoundry.org/
43
MICheck Foundry
 MICheck: ‘a common resource for minimum
information checklists’ analogous to OBO /
NCBO BioPortal
 MICheck Foundry: will create ‘a suite of selfconsistent, clearly bounded, orthogonal,
integrable checklist modules’ *
* Taylor CF, et al. Nature Biotech, in press
The OBO Foundry
http://obofoundry.org/
44
MICheck/Foundry communities
Transcriptomics (MIAME Working Group)
Proteomics (Proteomics Standards Initiative)
Metabolomics (Metabolomics Standards Initiative)
Genomics and Metagenomics (Genomic Standards Consortium)
In Situ Hybridization and Immunohistochemistry (MISFISHIE
Working Group)
Phylogenetics (Phylogenetics Community)
RNA Interference (RNAi Community)
Toxicogenomics (Toxicogenomics WG)
Environmental Genomics (Environmental Genomics WG)
Nutrigenomics (Nutrigenomics WG)
Flow Cytometry (Flow Cytometry Community)
45
Fourth Step (the future)
how to replicate the successes of the GO in clinical
medicine:
choose two or three representative disease domains
work out reasoning challenges for those domains
work with specialists to create ontologies interoperable
with OBO Foundry basic science ontologies to address
these reasoning challenges
work with leaders of clinical trial initiatives to foster the
collection of clinical data annotated in their terms
46
Draft Ontology
for Acute
Respiratory
Distress
Syndrome
Draft Ontology
for Muscular
Sclerosis
what data do
we have?
what data do the
others have?
what data do
we not have?
Draft Ontology
for Muscular
Sclerosis
to apprehend what is
unknown requires a
complete
demarcation of the
relevant space of
alternatives