What_Is_Ontology_Mia.. - Buffalo Ontology Site

Download Report

Transcript What_Is_Ontology_Mia.. - Buffalo Ontology Site

What is an ontology and
Why should you care?
Barry Smith
http://ontology.buffalo.edu/smith
1
Uses of ‘ontology’ in PubMed abstracts
2
By far the most successful: GO (Gene Ontology)
3
You’re interested
in which genes
control heart
muscle
development
17,536 results
4
time
Defense response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Microarray data
shows changed
expression of
thousands of genes.
Puparial adhesion
Molting cycle
hemocyanin
Amino acid catabolism
Lipid metobolism
How will you spot
the patterns?
Peptidase activity
Protein catabloism
Immune response
Immune response
Toll regulated genes
attacked control
Tree:
pearson
Coloredby:
by:
arson
lw n3d
... lw n3d ... Colored
assification:
Set_LW_n3d_5p_...
Gene
List:
t_LW_n3d_5p_...
Gene
List:
Copy
of Copy
C5_RMA
Copy
ofofCopy
of(Defa...
C5_RMA (Defa...
allall
genes
(14010)(14010)
genes
5
You’re interested in which
of your hospital’s patient
data is relevant to
understanding how genes
control heart muscle
development
6
Lab / pathology data
EHR data
Clinical trial data
Family history data
Medical imaging
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
How will you spot the patterns?
How will you find the data you
need?
7
One strategy for bringing order into this
huge conglomeration of data is through the
use of Common Data Elements
• Discipline-specific (cancer, NIAID, …)
• Do not solve the problems of balkanization
(data siloes)
• Do not evolve gracefully as knowledge
advances
• Support data cumulation, but do not readily
support data integration and computation
8
How does the
Gene Ontology work?
with thanks to
Jane Lomax, Gene Ontology Consortium
9
1. GO provides a controlled system of
representations for use in annotating data
• multi-species, multi-disciplinary, open
source
• contributing to the cumulativity of
scientific results obtained by distinct
research communities
• compare use of kilograms, meters,
seconds … in formulating experimental
results
10
11
Definitions
12
Gene products involved in cardiac muscle
development in humans
13
GO provides answers to three types
of questions
for each gene product
• in what parts of the cell has it been identified?
• exercising what types of molecular functions?
• with what types of biological processes?
when is a particular gene product involved
• in the course of normal development?
• in the process leading to abnormality
with what functions is the gene product
associated in other biological processes?
14
Some pain-related terms in GO
GO:0048265 response to pain
GO:0019233 sensory perception of pain
GO:0048266 behavioral response to pain
GO:0019234 sensory perception of fast pain
GO:0019235 sensory perception of slow pain
GO:0051930 regulation of sensory perception of pain
GO:0050967 detection of electrical stimulus during sensory perception of pain
GO:0050968 detection of chemical stimulus involved in sensory perception of
pain
GO:0050966 detection of mechanical stimulus involved in sensory perception of
pain
15
GO:0050968 detection of chemical stimulus
involved in sensory perception of pain
16
GO provides a tool for
algorithmic reasoning
17
Hierarchical view representing
relations between represented
types
18
GO allows a new kind of
biological research, based on
analysis and comparison of the
massive quantities of
annotations linking GO terms to
gene products
19
One standard method
Sjöblöm T, et al. analyzed13,023 genes in
11 breast and 11 colorectal cancers
using functional information captured by GO
for given gene product types
identified 189 as being mutated at significant
frequency and thus as providing targets for
diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
20
Uses of GO in studies of:
• Biomedical discovery acceleration, with applications to
craniofacial development. PMID: 19325874
• Persistent changes in spinal cord gene expression
after recovery from inflammatory hyperalgesia: a
preliminary study on pain memory. PMID: 18366630
• Spinal cord transcriptional profile analysis reveals
protein trafficking and RNA processing as prominent
processes regulated by tactile allodynia. PMID:
17069981
• Immune system involvement in abdominal aortic
aneurisms (PMID 17634102)
21
$100 mill. invested in literature
curation using GO
over 11 million annotations relating gene
products described in the UniProt,
Ensembl and other databases to terms in
the GO
experimental results reported in 52,000
scientific journal articles manually annoted
by expert biologists using GO
ontologies provide the basis for capturing
biological theories in computable form
22
GO is amazingly successful in
overcoming problems of balkanization
but it covers only generic biological entities of
three sorts:
– cellular components
– molecular functions
– biological processes
and it does not provide representations of
diseases, symptoms, …
23
Extending the GO methodology to
other domains of biology and
medicine
24
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
25
Ontology
Scope
URL
Custodians
Cell Ontology
(CL)
cell types from prokaryotes
to mammals
obo.sourceforge.net/cgibin/detail.cgi?cell
Jonathan Bard, Michael
Ashburner, Oliver Hofman
Chemical Entities of Biological Interest (ChEBI)
molecular entities
ebi.ac.uk/chebi
Paula Dematos,
Rafael Alcantara
Common Anatomy Reference Ontology (CARO)
anatomical structures in
human and model organisms
(under development)
Melissa Haendel, Terry
Hayamizu, Cornelius Rosse,
David Sutherland,
Foundational Model of
Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,
Cornelius Rosse
Functional Genomics
Investigation Ontology
(FuGO)
design, protocol, data
instrumentation, and analysis
fugo.sf.net
FuGO Working Group
Gene Ontology
(GO)
cellular components,
molecular functions,
biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality
Ontology
(PaTO)
qualities of anatomical
structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?
attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology
(PrO)
protein types and
modifications
(under development)
Protein Ontology Consortium
Relation Ontology (RO)
relations
obo.sf.net/relationship
Barry Smith, Chris Mungall
RNA Ontology
(RnaO)
three-dimensional RNA
structures
(under development)
RNA Ontology Consortium
Sequence Ontology
(SO)
properties and features of
nucleic sequences
song.sf.net
Karen Eilbeck
26
OBO Foundry
recognized by NIH as framework to
address mandates for re-usability of data
collected through Federally funded
research
see NIH PAR-07-425: Data Ontologies for
Biomedical Research (R01)
27
OBO Foundry provides
• tested guidelines enabling new groups to
develop the ontologies they need in ways which
counteract forking and dispersion of effort
• an incremental bottoms-up approach to
evidence-based terminology practices in
medicine that is rooted in basic biology
• automatic web-based linkage between
biological knowledge resources (massive
integration of databases across species and
biological system)
28
An ontology is not a terminology
Existing term lists and CDEs
• built to serve specific data-processing
• in ad hoc ways
Ontologies
• designed from the start to ensure
integratability and reusability of data
• by incorporating a common logical
structure
29
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
30
What ontology can do for pain
Cleveland Clinic Semantic Database – how
to mine legacy data in cardiovascular
surgery
• to reveal information on outcomes
• to identify subjects for clinical trials
• to allow virtual experimentation
Goal to extend this approach across the
entirety of medicine -- starting with signs,
symptoms and other basic categories
31
Three distinct classificatory tasks
1. of people (patients, carriers, …)
2. of diseases (cases, instances, problems, …)
3. of presentations (diagnoses, signs, observations
…)
ICD confuses 1. & 2.
HL7, most standard terminologies confuse 2. & 3.
32
Big Picture
33
A disease is a disposition rooted in a
physical disorder in the organism and
realized in pathological processes.
produces
etiological process
bears
disorder
realized_in
disposition
pathological process
produces
diagnosis
interpretive process
produces
signs & symptoms
used_in
abnormal bodily features
recognized_as
34
Elucidation of Primitive Terms



‘bodily feature’ - an abbreviation for a physical
component, a bodily quality, or a bodily process.
disposition - an attribute describing the propensity to
initiate certain specific sorts of processes when
certain conditions are satisfied.
clinically abnormal - some bodily feature that



(1) is not part of the life plan for an organism of the relevant
type (unlike aging or pregnancy),
(2) is causally linked to an elevated risk either of pain or
other feelings of illness, or of death or dysfunction, and
(3) is such that the elevated risk exceeds a certain threshold
level.*
*Compare: baldness
35
Definitions - Foundational Terms

Disorder =def. – A causally linked combination of
physical components that is clinically abnormal.

Pathological Process =def. – A bodily process that is
a manifestation of a disorder and is clinically
abnormal.

Disease =def. – A disposition (i) to undergo
pathological processes that (ii) exists in an organism
because of one or more disorders in that organism.
36
Dispositions and Predispositions




All diseases are dispositions; not all dispositions are
diseases.
A predisposition is a disposition.
Predisposition to Disease of Type X =def. – A disposition
in an organism that constitutes an increased risk of the
organism’s subsequently developing the disease X.
HNPCC is caused by a
 disorder (mutation) in a DNA mismatch repair gene that
 disposes to the acquisition of additional mutations from
defective DNA repair processes, and thus is a
 predisposition to the development of colon cancer.
37
Definitions - Clinical Evaluation Terms


Sign =def. – A bodily feature of a patient that is
observed in a physical examination and is deemed by
the clinician to be of clinical significance.
(Objectively observable features)
Symptom =def. – A bodily feature of a patient that is
observed by the patient and is hypothesized by the
patient to be a realization of a disease. (A restricted
family of phenomena (including pain, nausea, anger,
drowsiness), which are of their nature experienced in
the first person)
38
Cirrhosis - environmental exposure







Etiological process - phenobarbitolinduced hepatic cell death
 produces
Disorder - necrotic liver
 bears
Disposition (disease) - cirrhosis
 realized_in
Pathological process - abnormal tissue
repair with cell proliferation and
fibrosis that exceed a certain
threshold; hypoxia-induced cell death
 produces
Abnormal bodily features
 recognized_as
Symptoms - fatigue, anorexia
Signs - jaundice, splenomegaly







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out cirrhosis
 suggests
Laboratory tests
 produces
Test results - elevated liver enzymes in
serum
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
cirrhosis
39
Influenza - infectious







Etiological process - infection of
airway epithelial cells with influenza
virus
 produces
Disorder - viable cells with influenza
virus
 bears
Disposition (disease) - flu
 realized_in
Pathological process - acute
inflammation
 produces
Abnormal bodily features
 recognized_as
Symptoms - weakness, dizziness
Signs - fever







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out influenza
 suggests
Laboratory tests
 produces
Test results - elevated serum antibody titers
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease flu
But the disorder also induces normal
physiological processes (immune response)
that can results in the elimination of the
disorder (transient disease course).
Huntington’s Disease - genetic







Etiological process - inheritance of
>39 CAG repeats in the HTT gene
 produces
Disorder - chromosome 4 with
abnormal mHTT
 bears
Disposition (disease) - Huntington’s
disease
 realized_in
Pathological process - accumulation of
mHTT protein fragments, abnormal
transcription regulation, neuronal cell
death in striatum
 produces
Abnormal bodily features
 recognized_as
Symptoms - anxiety, depression
Signs - difficulties in speaking and
swallowing







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out Huntington’s
 suggests
Laboratory tests
 produces
Test results - molecular detection of
the HTT gene with >39CAG repeats
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
Huntington’s disease
41
HNPCC - genetic pre-disposition







Etiological process - inheritance of a mutant mismatch repair gene
 produces
Disorder - chromosome 3 with abnormal hMLH1
 bears
Disposition (disease) - Lynch syndrome
 realized_in
Pathological process - abnormal repair of DNA mismatches
 produces
Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2)
 bears
Disposition (disease) - non-polyposis colon cancer
 realized in
Symptoms (including pain)
42
Definition: Etiology

Etiological Process =def. – A process in an organism that
leads to a subsequent disorder.

Example: toxic chemical exposure resulting in a mutation in
the genomic DNA of a cell; infection of a human with a
pathogenic virus; inheritance of two defective copies of a
metabolic gene

The etiological process creates the physical basis of that
disposition to pathological processes which is the disease.
43
Definitions - Diagnosis

Clinical Picture =def. – A representation of a clinical
phenotype that is inferred from the combination of
laboratory, image and clinical findings about a given
patient.

Diagnosis =def. – A conclusion of an interpretive process
that has as input a clinical picture of a given patient and
as output an assertion to the effect that the patient has a
disease of such and such a type.
44
Definitions - Qualities

Manifestation of a Disease =def. – A bodily feature of a
patient that is (a) a deviation from clinical normality that exists
in virtue of the realization of a disease and (b) is observable.





Observability includes observable through elicitation of response or
through the use of special instruments.
Preclinical Manifestation of a Disease =def. – A
manifestation of a disease that exists prior to its becoming
detectable in a clinical history taking or physical examination.
Clinical Manifestation of a Disease =def. – A manifestation
of a disease that is detectable in a clinical history taking or
physical examination.
Phenotype =def. – A (combination of) bodily feature(s) of an
organism determined by the interaction of its genetic make-up
and environment.
Clinical Phenotype =def. – A clinically abnormal phenotype.
1.
pain report
2.
symptom
(experience of
pain)
3.
tissue damage
46
tissue
damage
pain
experience
pain
report
+
+
+
+
+
‒
+
+
‒
‒
+
‒
‒
‒
+
+
+
‒
‒
‒
+
‒
‒
‒
47
trigeminal neuralgia in 50% of cases drugs  no
pain
trigeminal neuralgia pain
gene: COM-T
16 possible polymorphisms
high pain sensitivity
end with disease-disorder-disposition-diagnosis
let’s throw a cluster analysis at this
Dominik – let’s agree on the variables
(ontologically informed CDE approach)
OPERRA is being driven by cluster analysis
48