What_Is_Ontology_Tor.. - Buffalo Ontology Site

Download Report

Transcript What_Is_Ontology_Tor.. - Buffalo Ontology Site

What is an ontology and
Why should you care?
Barry Smith
http://ontology.buffalo.edu/smith
1
What I do
• Gene Ontology (NIHGR) (Scientific Advisor)
• National Center for Biomedical Ontology
(NIHGR)
• Protein Ontology (NIGMS)
• Infectious Disease Ontology (NIAID)
• Biometrics Ontology (US Army)
• Ontology for Integration of Cross-Border
Emergency Data (European Union)
2
Uses of ‘ontology’ in PubMed abstracts
3
By far the most successful: GO (Gene Ontology)
4
You’re interested
in which genes
control heart
muscle
development
17,536 results
5
time
Defense response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Microarray data
shows changed
expression of
thousands of genes.
Puparial adhesion
Molting cycle
hemocyanin
Amino acid catabolism
Lipid metobolism
How will you spot
the patterns?
Peptidase activity
Protein catabloism
Immune response
Immune response
Toll regulated genes
attacked control
Tree:
pearson
Coloredby:
by:
arson
lw n3d
... lw n3d ... Colored
assification:
Set_LW_n3d_5p_...
Gene
List:
t_LW_n3d_5p_...
Gene
List:
Copy
of Copy
C5_RMA
Copy
ofofCopy
of(Defa...
C5_RMA (Defa...
allall
genes
(14010)(14010)
genes
6
You’re interested in which
of your hospital’s patient
data is relevant to
understanding how genes
control heart muscle
development
7
Lab / pathology data
EHR data
Clinical trial data
Family history data
Medical imaging
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
How will you spot the patterns?
How will you find the data you
need?
8
How does the
Gene Ontology work?
with thanks to
Jane Lomax, Gene Ontology Consortium
9
1. GO provides a controlled system of
representations for use in annotating data
multi-species, multi-disciplinary, open
source
contributing to the cumulativity of
scientific results achieved by distinct
research communities
compare use of kilograms, meters,
seconds … in formulating experimental
results
10
11
Definitions
12
Gene products involved in cardiac muscle development in humans
13
http://wiki.geneontology.org/index.php/Priority_Cardiovascular_genes
14
Questions for annotation
where is a particular gene product involved
• in what type of cell or cell part?
• in what part of the normal body?
• in what anatomical abnormality?
when is a particular gene product involved
• in the course of normal development?
• in the process leading to abnormality
with what functions is the gene product
associated in other biological processes?
15
2. GO provides a tool for
algorithmic reasoning
16
Hierarchical view representing
relations between represented
types
17
3. GO allows a new kind of clinical
research, based on analysis of the
massive quantities of annotations
linking GO terms to gene products
18
Uses of GO in studies of
• pathways associated with heart failure development
correlated with cardiac remodeling (PMID 18780759)
• molecular signature of cardiomyocyte clusters derived
from human embryonic stem cells (PMID 18436862)
• contrast between cardiac left ventricle and diaphragm
muscle in expression of genes involved in
carbohydrate and lipid metabolism. (PMID 18207466 )
• immune system involvement in abdominal aortic
aneurisms in humans (PMID 17634102)
19
GO is amazingly successful – but it
covers only generic biological entities
of three sorts:
–cellular components
–molecular functions
–biological processes
and it does not provide representations of
disease-related phenomena
20
Extending the GO methodology to
other domains of biology and of
clinical and translational medicine
21
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
22
Foundational Model of Anatomy
23
An A is_a B
All instances of A are instances of B
What are types? what are instances?
(Buckets, thresholds)
24
Definitions
Cell =Def. an anatomical structure which
consists of cytoplasm surrounded by a
plasma membrane
Anatomical structure =Def. a material
anatomical entity which is generated by
coordinated expression of the organism’s
own genes
An A =Def. a B which Cs
25
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
Heterotaxy =Def. the abnormal
arrangement of organs or viscera across the
left-right axis differing from ‘‘complete situs
solitus’’ and ‘‘complete situs inversus’’
Left isomerism =Def. a subset of
heterotaxy where some paired structures on
opposite sides of the left-right axis of the
body are symmetrical mirror images
of each other, and have the morphology of
the normal left-sided structures.
Jacobs, et al., 2007
27
OBO Foundry
recognized by NIH as framework to
address mandates for re-usability of data
collected through Federally funded
research
see NIH PAR-07-425: Data Ontologies for
Biomedical Research (R01)
28
Analysis of outcomes for congenital
cardiac disease: can we do better?
Jeffrey P. Jacobs, et al. 2007
• Improving methodologies for verification of
data
• Clarifying the relationship between
administrative databases [such as ICD] and
clinical databases
• Establishing links between databases
• Moving beyond geographical barriers
• Moving beyond sub-specialty barriers
OBO Foundry provides
• tested guidelines enabling new groups to
develop the ontologies they need in ways which
counteract forking and dispersion of effort
• an incremental bottoms-up approach to
evidence-based terminology practices in
medicine that is rooted in basic biology
• automatic web-based linkage between medical
terminologies and biological knowledge
resources (massive integration of databases
across species and biological system)
30
A good solution to the silo problem
must be:
•
•
•
•
•
•
modular
incremental
bottom-up
based on consistent, intuitive structure
evidence-based and thus revisable
incorporate a strategy for motivating
potential developers and users
31
An ontology is not a database
New databases for each new kind of data
New databases for each new project
Ontologies like the GO are a solution to the
silo problems databases cause
32
An ontology is not a terminology
Existing term lists
• built to serve specific data-processing
• in ad hoc ways
Ontologies
• designed from the start to ensure
integratability and reusability of data
• by incorporating a common logical
structure
33
Can existing CHD terminologies
serve as ontologies?
An ontology is a representation of the
types of entities in a given domain of
reality and of the relations between types
What happens if we apply evidence-based
rules for ontology construction?
44
Rule
• Every node in the ontology must represent
some type of entity in reality
45
CardioAccess Tree View
Rule: Each term in an ontology represents a type of biological entity
instantiated in biological reality
46
CardioAccess Tree View
Rule: Each term in an ontology represents a type of biological entity
instantiated in biological reality
1. Syntactic Consequences
47
CardioAccess Tree View
Rule: Each term in an ontology represents a type of biological entity
instantiated in biological reality
2. No ‘Other’, No ‘Miscellaneous’, No ‘NOS’
48
CardioAccess Tree View
Rule: Each term in an ontology represents a type of biological entity
instantiated in biological reality
3. Hierarchical organization of types and subtypes
49
Rule: Each term in an ontology represents a type of biological entity
instantiated in biological reality
3. Hierarchical organization of types and subtypes
50
Rule: Each term in an ontology represents a type of
biological entity instantiated in biological reality
5. Non-redundancy
51
Rule: Each term in an ontology represents a type of
biological entity instantiated in biological reality
5. Non-redundancy
52
Rule: Each term in an ontology represents a type of biological
entity instantiated in biological reality
6. An instance of a process type is never an instance of a
thing type
53
Rule: Each term in an ontology represents a type of biological
entity instantiated in biological reality
7. Consistent principles for classification not applied
54
Strategy for building a CDH ontology
within the OBO Foundry
A good solution to the silo problem must be:
•
•
•
•
•
•
modular
incremental
bottom-up
evidence-based
revisable
incorporate a strategy for motivating potential
developers and users
• work well with other ontologies for neighboring
domains
55
OBO Foundry principle of modularity
• one ontology for each domain
• once you’ve annotated existing data,
then no need for mappings (which are in
any case too expensive, too fragile, too
difficult to keep up-to-date as mapped
ontologies change)
• everyone knows where to look to find
out how to annotate each kind of data
56
Modularity fosters division of labor
• allows distributed development
• but only if there is a well-tested,
principles-based structure in place
• to ensure that the separate modules work
well together
57
Extending the OBO Foundry to
other domains of biology and of
clinical and translational medicine
59
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
Developmenal
Process
Normal Organ
Normal Anatomical Entity
Function
ORGAN AND
ORGANISM
Abnormal Anatomical
Entity
Abnormal
Organ
Function
CELL AND
CELLULAR
COMPONENT
Cellular Component
Cellular
Function
(GO)
MOLECULE
Genes and Gene Products
Genetic Predispositions
Disease
Molecular Function
Embryology
Morphology
Surgical Processes
Molecular Process
Congenital Heart Disease Ontology Modules
60