What_Is_Ontology_VPH.. - Buffalo Ontology Site

Download Report

Transcript What_Is_Ontology_VPH.. - Buffalo Ontology Site

What is an ontology?
Barry Smith
http://ontology.buffalo.edu/smith
1
You’re interested
in which genes
control heart
muscle
development
17,536 results
2
time
Defense response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Microarray data
shows changed
expression of
thousands of genes.
Puparial adhesion
Molting cycle
hemocyanin
Amino acid catabolism
Lipid metobolism
How will you spot
the patterns?
Peptidase activity
Protein catabloism
Immune response
Immune response
Toll regulated genes
attacked control
Tree:
pearson
Coloredby:
by:
arson
lw n3d
... lw n3d ... Colored
assification:
Set_LW_n3d_5p_...
Gene
List:
t_LW_n3d_5p_...
Gene
List:
Copy
of Copy
C5_RMA
Copy
ofofCopy
of(Defa...
C5_RMA (Defa...
allall
genes
(14010)(14010)
genes
3
Lab / pathology data
EHR data
Clinical trial data
Family history data
Medical image data
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
How will you find the data you
need?
4
−
−
−
−
−
−
Human
Mouse
Rat
Fish
Yeast
E. coli
How will you find the compare
the data? How will you
integrate the data
5
The GO Idea
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
:.
annotation using common ontologies
yields integration of databases
GlyProt
MouseEcotope
Holliday junction
helicase complex
DiabetInGene
GluChem
:.
ontologies are legends for data
8
they provide a growing set of natural
language labels
to make the data cognitively accessible
to human beings and algorithmically
accessible to reasoning systems
9
compare:
legends for
maps
compare: legends
for maps
10
legends for textbook diagrams
:.
ontologies as legends for images
:.
13
what lesion ?
what brain function ?
14
legends for literature
15
ontologies as legends for
mathematical equations
xi = vector of measurements of gene i
k = the state of the gene ( as “on” or “off”)
θi = set of parameters of the Gaussian model
...
...
16
17
Pathway diagrams as ontologically
annotated dynamic cartoons
18
two kinds of annotations
19
names of instances
20
names of types
21
Ontologies are
representations of
types
22
... types which are
instantiated e.g. in
the lab or clinic
23
multiple kinds of
relations between
represented types
provide a tool for
algorithmic reasoning
24
Gene Ontology: The Very Top
cellular
component
molecular
function
biological
process
25
Gene Ontology: The Very Top
continuant
cellular
component
molecular
function
occurrent
biological
process
26
BFO: The Very Top
continuant
independent
continuant
dependent
continuant
cellular
component
molecular
function
occurrent
biological
processes
27
Basic Formal Ontology
continuant
independent
continuant
occurrent
dependent
continuant
organism
28
Basic Formal Ontology
continuant
independent
continuant
occurrent
dependent
continuant
anatomical
structure
29
Continuants
• continue to exist through time,
preserving their identity while
undergoing different sorts of changes
• independent continuants – objects,
things, ...
• dependent continuants – qualities,
attributes, shapes, potentialities ...
30
Qualities
temperature
blood pressure
mass
...
are continuants
they exist through time while
undergoing changes
31
Qualities
temperature / blood pressure / mass ...
are dimensions of variation within the
structure of the entity; a quality is
something which can change while its
bearer remains one and the same
32
A Chart representing how
John’s temperature changes
34
John’s temperature
the temperature he has throughout his
entire life, cycles through different
determinate temperatures from one
time to the next
John’s temperature is a physiology
variable which, in thus changing,
exerts an influence on other physiology
variables through time
35
BFO: The Very Top
continuant
independent
continuant
occurrent
dependent
continuant
quality
temperature
36
Blinding Flash of the Obvious
independent
continuant
dependent
continuant
quality
organism
John
temperature
John’s
temperature
types
instances
37
Blinding Flash of the Obvious
independent
continuant
dependent
continuant
quality
organism
John
temperature
John’s
temperature
types
instances
38
Blinding Flash of the Obvious
inheres_in
organism
John
temperature
John’s
temperature
types
instances
39
types
temperature
37ºC
instantiates
at t1
37.1ºC
instantiates
at t2
37.2ºC
instantiates
at t3
37.3ºC
instantiates
at t4
37.4ºC
instantiates
at t5
37.5ºC
instantiates
at t6
John’s temperature
instances
40
types
human
embryo
instantiates
at t1
fetus
instantiates
at t2
neonate
instantiates
at t3
infant
child
instantiates
at t4
instantiates
at t5
adult
instantiates
at t6
John
instances
41
• lower lever of types does not ‘carry
identity’ in OntoClean terms
• are threshold divisions (hence we do
not have sharp boundaries, and we
have a certain degree of choice, e.g. in
how many subtypes to distinguish,
though not in their ordering)
42
independent
continuant
dependent
continuant
quality
organism
John
temperature
types
John’s
temperature
instances
43
independent
continuant
organism
John
dependent
continuant
occurrent
quality
process
temperature
John’s
temperature
course of
temperature
changes
John’s
temperature history
44
independent
continuant
organism
John
dependent
continuant
occurrent
quality
process
temperature
John’s
temperature
life of an
organism
John’s
life
45
BFO/GO: The Very Top
continuant
independent
continuant
dependent
continuant
cellular
component
molecular
function
occurrent
biological
processes
46
BFO: The Very Top
continuant
independent
continuant
occurrent
dependent
continuant
quality
function
role
disposition
47
Function
-
of
of
of
of
of
liver: to store glycogen
birth canal: to enable transport
eye: to see
mitochondrion: to produce ATP
liver: to store glycogen
not optional; reflection of physical
makeup of bearer; can malfunction
48
:.
Role
optional:
exists because the bearer is in
some special natural, social, or
institutional set of
circumstances in which the
bearer does not have to be
49
:.
Role
- bearers can have more than one
role
person as student / as staff member
- roles often form systems of mutual
dependence
husband / wife
first in queue / last in queue
doctor / patient
host / pathogen
:.
50
Role
of some chemical compound: to serve
as analyte in an experiment
of a dose of penicillin in this human
child: to treat a disease
of this bacteria in a primary host: to
cause infection
51
:.
Qualities are categorical
features of reality – you just
have them
Functions, roles and dispositions
are potential featires of reality:
they are realizable dependent
continuants, realized in certain
associated processes
52
:.
independent
continuant
portion of
chemical
compound
this portion
of aspirin
dependent
continuant
occurrent
role
process
drug role
process of drug
adminstration
role of this
portion of aspirin
John’s taking
this portion of aspirin
53
independent
continuant
portion of
chemical
compound
dependent
continuant
occurrent
role
process
drug role
process of drug
adminstration
inheres_in
realized_in
this portion
of aspirin
role of this
portion of aspirin
John’s taking
this portion of aspirin
54
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
55
• The Road to Convergence
All ontologies for each given domain (anatomy,
chemistry…) should be part of a single suite of
interoperable ontologies
should use a common top-level core
for subdomains with many variants, should
follow the strategy of canonical ontologies
with extensions
should require acceptance of common, tested
guidelines on all subscribing ontology
developers
56
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
initial OBO Foundry coverage, ontologies
automatically semantically coupled 57
Disposition (InternallyGrounded Realizable Entity)
disposition =def.
a realizable entity which if it ceases to
exist, then its bearer is physically
changed, and
whose realization occurs when this
bearer is in some special physical
circumstances, in virtue of the
bearer’s physical make-up
58
Function
• A Disposition (Internally-Grounded
Realizable Entity) that is designed or
selected for
59
OGMS
• Ontology for General Medical Science
http://code.google.com/p/ogms
60
Physical Disorder
– independent continuant
fiat object part
61
:.
Big Picture
62
A disease is a disposition rooted in a
physical disorder in the organism and
realized in pathological processes.
produces
etiological process
bears
disorder
realized_in
disposition
pathological process
produces
diagnosis
interpretive process
produces
signs & symptoms
used_in
abnormal bodily features
recognized_as
63
Elucidation of Primitive Terms
• ‘bodily feature’ - an abbreviation for a physical
component, a bodily quality, or a bodily process.
• disposition - an attribute describing the propensity to
initiate certain specific sorts of processes when
certain conditions are satisfied.
• clinically abnormal - some bodily feature that
– (1) is not part of the life plan for an organism of the
relevant type (unlike aging or pregnancy),
– (2) is causally linked to an elevated risk either of pain or
other feelings of illness, or of death or dysfunction, and
– (3) is such that the elevated risk exceeds a certain threshold
level.*
*Compare: baldness
64
Definitions - Foundational Terms
• Disorder =def. – A causally linked combination of
physical components that is clinically abnormal.
• Pathological Process =def. – A bodily process that is
a manifestation of a disorder and is clinically
abnormal.
• Disease =def. – A disposition (i) to undergo
pathological processes that (ii) exists in an organism
because of one or more disorders in that organism.
65
Dispositions and Predispositions
• All diseases are dispositions; not all dispositions are
diseases.
• A predisposition is a disposition.
• Predisposition to Disease of Type X =def. – A disposition
in an organism that constitutes an increased risk of the
organism’s subsequently developing the disease X.
• HNPCC is caused by a
– disorder (mutation) in a DNA mismatch repair gene that
– disposes to the acquisition of additional mutations from
defective DNA repair processes, and thus is a
– predisposition to the development of colon cancer.
66
Cirrhosis - environmental exposure
•
•
•
•
•
•
•
Etiological process - phenobarbitolinduced hepatic cell death
– produces
Disorder - necrotic liver
– bears
Disposition (disease) - cirrhosis
– realized_in
Pathological process - abnormal tissue
repair with cell proliferation and
fibrosis that exceed a certain
threshold; hypoxia-induced cell death
– produces
Abnormal bodily features
– recognized_as
Symptoms - fatigue, anorexia
Signs - jaundice, splenomegaly







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out cirrhosis
 suggests
Laboratory tests
 produces
Test results - elevated liver enzymes in
serum
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
cirrhosis
67
Influenza - infectious
•
•
•
•
•
•
•
Etiological process - infection of
airway epithelial cells with influenza
virus
– produces
Disorder - viable cells with influenza
virus
– bears
Disposition (disease) - flu
– realized_in
Pathological process - acute
inflammation
– produces
Abnormal bodily features
– recognized_as
Symptoms - weakness, dizziness
Signs - fever







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out influenza
 suggests
Laboratory tests
 produces
Test results - elevated serum antibody titers
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease flu
But the disorder also induces normal
physiological processes (immune response)
that can results in the elimination of the
68
disorder (transient disease course).
Huntington’s Disease - genetic
•
•
•
•
•
•
•
Etiological process - inheritance of
>39 CAG repeats in the HTT gene
– produces
Disorder - chromosome 4 with
abnormal mHTT
– bears
Disposition (disease) - Huntington’s
disease
– realized_in
Pathological process - accumulation of
mHTT protein fragments, abnormal
transcription regulation, neuronal cell
death in striatum
– produces
Abnormal bodily features
– recognized_as
Symptoms - anxiety, depression
Signs - difficulties in speaking and
swallowing







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out Huntington’s
 suggests
Laboratory tests
 produces
Test results - molecular detection of
the HTT gene with >39CAG repeats
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
Huntington’s disease
69
HNPCC - genetic pre-disposition
• Etiological process - inheritance of a mutant mismatch repair gene
– produces
• Disorder - chromosome 3 with abnormal hMLH1
– bears
• Disposition (disease) - Lynch syndrome
– realized_in
• Pathological process - abnormal repair of DNA mismatches
– produces
• Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2)
– bears
• Disposition (disease) - non-polyposis colon cancer
– realized in
• Symptoms (including pain)
70
The OBO Foundry Initiative
71
A good solution to the data integration
problem must be:
• modular
• incremental
• bottom-up
• evidence-based
• revisable
• incorporate a strategy for motivating
potential developers and users
72
GO is amazingly successful
– but covers only three sorts
of biological entities:
– cellular components
– molecular functions
– biological processes
and does not provide representations
of disease-related phenomena
73
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
74
OBO Foundry provides
• tested guidelines enabling new groups to
develop the ontologies they need in ways which
counteract forking and dispersion of effort
• an incremental bottoms-up approach to
evidence-based terminology practices in
medicine that is rooted in basic biology
• automatic web-based linkage between medical
terminologies and biological knowledge
resources
• traffic laws and traffic police
75
the strategy
establish common rules governing best
practices for creating ontologies in
coordinated fashion, with an evidencebased pathway to incremental
improvement
76
The methodology of cross-products
compound terms in ontologies to be defined
as cross-products of simpler terms:
E.g elevated blood glucose is a cross-product of
PATO: increased concentration with FMA: blood and
CheBI: glucose.
= factoring out of ontologies into disciplinespecific modules (orthogonality)
77
The methodology of cross-products
enforcing use of common relations in linking terms
drawn from Foundry ontologies serves
• to ensure that the ontologies are maintained and
revised in tandem
• logically defined relations serve to bind terms in
different ontologies together to create a network
78
CRITERIA
CRITERIA
 opennness
 common formal language.
 collaborative development
 evidence-based maintenance
 identifiers
 versioning
 textual and formal definitions
79
Orthogonality = modularity
• one ontology for each domain
• no need for mappings (which are in
any case too expensive, too fragile,
too difficult to keep up-to-date as
mapped ontologies change)
• everyone knows where to look to
find out how to annotate each kind
of data
80
Ontologies and research groups
using BFO and RO
– OBO Foundry (60 biomedical ontologies, including
GO, OBI, Protein Ontology, Cell Ontology, IDO …
– National Cancer Institute (BiomedGT)
– NIF (NIH Neuroscience Information Framework)
– Cleveland Clinic Semantic Database
– Siemens
– AstraZeneca
– EU (ACGT Cancer Ontology, RAPS, …)
81
Because the ontologies in the
Foundry
are built as orthogonal modules which form an
incrementally evolving network
• scientists are motivated to commit to
developing ontologies because they will need in
their own work ontologies that fit into this
network
• users are motivated by the assurance that the
ontologies they turn to are maintained by
experts
82
More benefits of orthogonality
• helps those new to ontology to find what they
need
• to find models of good practice
• ensures mutual consistency of ontologies
(trivially)
• and thereby ensures additivity of annotations
83
More benefits of orthogonality
• it rules out the sorts of simplification and
partiality which may be acceptable under
more pluralistic regimes
• thereby brings an obligation on the part of
ontology developers to commit to scientific
accuracy and domain-completeness
84
More criteria of a
successful standard
1. intelligibility to users, consistent use of terms
like ‘term’, ‘class’, ‘entity’, ‘object’ …)
2. track record of lessons learned (GO has 10
years of hard user testing)
3. lots of existing users (ontologies are like
telephone networks)
85
COMMON ARCHITECTURE
 The ontology uses relations which are unambiguously
defined following the pattern of definitions laid down
in the Basic Formal Ontology (BFO) including the
Relation Ontology (RO)
http://ifomis.org/bfo
http://www.obofoundry.org/ro/
86
top level
mid-level
Basic Formal Ontology (BFO)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
domain level
Ontology for
Biomedical
Investigations
(OBI)
Information Artifact
Ontology
(IAO)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Spatial Ontology
(BSPO)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
87
BFO:continuant
continuant
independent
continuant
portion of
material
object
fiat object
part
object
aggregate
object
boundary
dependent
continuant
site
generically
dependent
continuant
information
artifact
spatial
region
specifically
dependent
continuant
quality
realizable
entity
0D-region
1D-region
2D-region
function
3D-region
role
disposition
BFO:occurrent
occurrent
processual
entity
process
spatiotemporal
region
scattered
spatiotemporal
region
connected
spatiotemporal
region
temporal
region
scattered
temporal
region
connected
temporal
region
fiat process
part
spatiotemporal
instant
temporal
instant
process
aggregate
spatiotemporal
interval
temporal
interval
process
boundary
processual
context
Example: The Cell Ontology
top level
mid-level
Basic Formal Ontology (BFO)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
domain level
Ontology for
Biomedical
Investigations
(OBI)
Information Artifact
Ontology
(IAO)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Spatial Ontology
(BSPO)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
91