Creating the CTSA Ontology Landscape: A

Download Report

Transcript Creating the CTSA Ontology Landscape: A

Creating the CTSA Ontology
Landscape: A Modular Strategy
Barry Smith
For modularity to work, developers
must accept some basic principles
– for formulating definitions
– of modularity
– of user feedback for error correction and gap
identification
– for ensuring compatibility between modules
– for using ontologies to annotate legacy data
– for using ontologies to create new data
– for developing user-specific views
The Modular Approach
• Create a small set of plug-and-play ontologies as
stable monohierarchies with a high likelihood of
being reused
• Create ontologies incrementally
• Reuse existing ontology resources
• Use these ontologies incrementally in
annotating heterogeneous data
• Annotating = arms length approach; the data
and data-models themselves remain as they are
3
Benefits of Modularity
• Brings a clean division of labor amongst
domain experts, who can manage governance
aspects pertaining to their own domains
• Automatic consistency of the results of the
distributed efforts – no room for contradiction
• Additivity of annotations even when multiple
independently developed ontologies are used
• Lessons learned in developing and using one
module can be used by the developers and
users of later modules
4
Benefits of Modularity
• Increased likelihood of reuse, since potential
users will be aware that they are investing in
the results of an authoritative coordinated
approach of proven reliability
• Increased value and portability of training in
any given module
• Incentivization of those responsible for
individual modules
5
Benefits of Modularity
• All of those involved can more easily inspect
and criticize the results of others’ work
• Creates a collaborative environment for
ontology development
• serves as a platform for innovations which can
be easily propagated throughout the whole
system
• Developing and using ontologies in a
consistent fashion brings a number of network
effects – the value of existing annotations
increases as new annotations are added
6
You will need to embrace some strategy along
these lines if you want to get funding for
translational research
NIH Mandates for Sharing of Research Data
Investigators submitting an NIH application seeking
$500,000 or more in any single year are expected to
include a plan for data sharing
(http://grants.nih.gov/grants/policy/data_sharing)
7
Logical standards can be only part
of the solution
OWL … bring benefits primarily on the side of
syntax (language)
What we need are standards on the semantics
(content) side (via top-level ontologies),
including standards for
• top-level ontologies
• common relations (part_of …)
• relation of lower-level ontologies to each
other and to the higher levels
BFO, DOLCE, SUMO
All exist in FOL and OWL versions
All have been tested in use
BFO: very small, truly domain-neutral
DOLCE: largely extends BFO, but built to support
‘linguistic and cognitive engineering’
SUMO: has its own tiny mathematics, tiny physics,
tiny biology (‘body-covering’, ‘fruit-Or-vegetable’),
…
9
120+ ontology projects using BFO
http://www.ifomis.org/bfo/
• Open Biomedical Ontologies Foundry
• Ontology for General Medical Science
• eagle-I, VIVO, CTSAconnect
• AstraZeneca
• Elsevier
How a common upper level ontology
can help resist ontology chaos
• something to teach
• training (expertise) is portable
• each new ontology you confront will be more easily
understood at the level of content
– and more easily criticized, error-checked
• provides starting-point for domain-ontology
development
• provides platform for tool-building and innovations
• lessons learned in building and using one ontology
can potentially benefit other ontologies
• promote shareability of data across discilinary and
other boundaries
top level
mid-level
Basic Formal Ontology (BFO)
Information Artifact
Ontology
(IAO)
Ontology for Biomedical
Investigations
(OBI)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
domain level
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Ontology of General
Medical Science
(OGMS)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
12
BFO
A simple top-level ontology to support
information integration in scientific research
• No overlap with domain ontologies
(organism, person, society, information, …)
• Based on realism
• No abstracta
• Tested in many natural science domains
13
Basic Formal Ontology
Continuant
Independent
Continuant
Dependent
Continuant
entity
property
Occurrent
process, event
property depends
on bearer
14
depends_on
Continuant
Independent
Continuant
Dependent
Continuant
thing
property
Occurrent
process, event
event depends
on participant
15
Basic Formal Ontology
continuant
independent
continuant
dependent
continuant
cellular
component
molecular
function
occurrent
biological
processes
roles, qualities
Occurrent
Continuant
Independent
Continuant
Dependent
Continuant
Quality
process, event
Disposition
17
instance_of
types
Continuant
Independent
Continuant
Dependent
Continuant
thing
property
Occurrent
process, event
.... ..... .......
instances
18
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RNAO, PRO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
rationale of OBO Foundry coverage
19
Example: The Cell Ontology
Four distinct classificatory tasks
1.
2.
3.
4.
of people (patients, carriers, …)
of diseases (cases, instances, problems, …)
of courses of disease (symptoms, treatments…)
of representations (records, observations, data,
diagnoses…)
ICD confuses 1. & 2.
Most standard terminologies confuse 2. and 4
21
Ontology for General Medical
Science (OGMS)
1. person (patient, carrier, …)
– independent continuant
2. disease (case, instance, problem, …)
– specifically dependent continuant
3. course of disease (symptom, treatment…)
– occurrent
4. representation (record, datum, diagnosis…)
– generically dependent continuant
http://code.google.com/p/ogms/
22
Four distinct BFO categories
1. people (patients, carriers, …)
– independent continuants
2. disease (case, instance, problem, condition …)
– disposition
3. course of disease (symptom, episode, outbreak …)
– realization of dispositions
4. representations (records, data, diagnoses…)
– generically dependent continuants
23
Big Picture (Ontology for General Medical Science)
24
Elucidation of Primitive Terms

‘extended organism’ = the organism and all
the material entities located within it

‘bodily feature’ = either a physical part of the
extended organism, a bodily quality, or a
bodily process.
25
Elucidation of Primitive Terms

clinically abnormal - some bodily feature that



(1) is not part of the life plan for an organism of the
relevant type (unlike loss of milk teeth, aging or
pregnancy),
(2) is causally linked to an elevated risk either of
pain or other feelings of illness, or of death or
dysfunction, and
(3) is such that the elevated risk exceeds a certain
threshold level.*
*Compare: baldness
26
Disorder
A material entity (fiat object part) which is
clinically abnormal and part of an extended
organism
Compare:
Downtown Santa Barbara
Mount Everest
27
Definitions - Foundational Terms

Pathological Process =def. – A bodily process that is
clinically abnormal.

Disease =def. – A disposition (i) to undergo
pathological processes that (ii) exists in an organism
because of one or more disorders in that organism.
29
Big Picture (Ontology for General Medical Science)
30
http://code.google.com/p/ogms/
31
Disease Course
=Def. The sum of processes through which a
given disease instance is realized.
32
A disease is a disposition
produces
etiological process
bears
disorder
realized_in
disposition
pathological process
produces
diagnosis
interpretive process
produces
signs & symptoms
used_in
abnormal bodily features
recognized_as
36
Cirrhosis - environmental exposure







Etiological process - phenobarbitolinduced hepatic cell death
 produces
Disorder - necrotic liver
 bears
Disposition (disease) - cirrhosis
 realized_in
Pathological process - abnormal tissue
repair with cell proliferation and
fibrosis that exceed a certain
threshold; hypoxia-induced cell death
 produces
Abnormal bodily features
 recognized_as
Symptoms - fatigue, anorexia
Signs - jaundice, splenomegaly







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out cirrhosis
 suggests
Laboratory tests
 produces
Test results - elevated liver enzymes in
serum
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
cirrhosis
37
Influenza - infectious







Etiological process - infection of
airway epithelial cells with influenza
virus
 produces
Disorder - viable cells with influenza
virus
 bears
Disposition (disease) - flu
 realized_in
Pathological process - acute
inflammation
 produces
Abnormal bodily features
 recognized_as
Symptoms - weakness, dizziness
Signs - fever







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out influenza
 suggests
Laboratory tests
 produces
Test results - elevated serum antibody titers
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease flu
38
Huntington’s Disease - genetic







Etiological process - inheritance of
>39 CAG repeats in the HTT gene
 produces
Disorder - chromosome 4 with
abnormal mHTT
 bears
Disposition (disease) - Huntington’s
disease
 realized_in
Pathological process - accumulation of
mHTT protein fragments, abnormal
transcription regulation, neuronal cell
death in striatum
 produces
Abnormal bodily features
 recognized_as
Symptoms - anxiety, depression
Signs - difficulties in speaking and
swallowing







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out Huntington’s
 suggests
Laboratory tests
 produces
Test results - molecular detection of
the HTT gene with >39CAG repeats
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
Huntington’s disease
39
Dispositions and Predispositions
Some dispositions are predispositions to
other dispositions.
40
HNPCC - genetic pre-disposition







Etiological process - inheritance of a mutant mismatch repair gene
 produces
Disorder - chromosome 3 with abnormal hMLH1
 bears
Disposition (disease) - Lynch syndrome
 realized_in
Pathological process - abnormal repair of DNA mismatches
 produces
Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2)
 bears
Disposition (disease) - non-polyposis colon cancer
 realized in
Symptoms (including pain)
41
Arterial Aneurysm
Disposition – atherosclerosis
realized in
Pathological process – fatty material collects within the walls of arteries
produces
Disorder – artery with weakened wall
bears
Disposition – of artery to become distended
realized_in
Pathological process – process of distending
produces
Disorder – arterial aneurysm
bears
Disposition – of artery to rupture
realized in
Pathological process – (catastrophic event) of rupturing
produces
Disorder – ruptured artery, arterial system with dangerously low blood pressure
bears
Disposition – circulatory failure
realized in
Pathological process – exsanguination, failure of homeostasis
produces
Death
42
Systemic arterial hypertension







Etiological process – abnormal
reabsorption of NaCl by the kidney
 produces
Disorder – abnormally large scattered
molecular aggregate of salt in the
blood
 bears
Disposition (disease) - hypertension
 realized_in
Pathological process – exertion of
abnormal pressure against arterial wall
 produces
Abnormal bodily features
 recognized_as
Symptoms Signs – elevated blood pressure







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out hypertension
 suggests
Laboratory tests
 produces
Test results  used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease hypertension
43
Type 2 Diabetes Mellitus







Etiological process –
 produces
Disorder – abnormal pancreatic beta
cells and abnormal muscle/fat cells
 bears
Disposition (disease) – diabetes
mellitus
 realized_in
Pathological processes – diminished
insulin production , diminished
muscle/fat uptake of glucose
 produces
Abnormal bodily features
 recognized_as
Symptoms – polydipsia, polyuria,
polyphagia, blurred vision
Signs – elevated blood glucose and
hemoglobin A1c







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out diabetes mellitus
 suggests
Laboratory tests – fasting serum blood
glucose, oral glucose challenge test, and/or
blood hemoglobin A1c
 produces
Test results  used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease type 2
diabetes mellitus
44
Type 1 hypersensitivity to penicillin







Etiological process – sensitizing of mast
cells and basophils during exposure to
penicillin-class substance
 produces
Disorder – mast cells and basophils with
epitope-specific IgE bound to Fc epsilon
receptor I
 bears
Disposition (disease) – type I
hypersensitivity
 realized_in
Pathological process – type I
hypersensitivity reaction
 produces
Abnormal bodily features
 recognized_as
Symptoms – pruritis, shortness of breath
Signs – rash, urticaria, anaphylaxis







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis  suggests
Laboratory tests –
 produces
Test results – occasionally, skin testing
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease type 1
hypersensitivity to penicillin
45
Early Onset Alzheimer’s Disease
Disorder – mutations in APP, PSEN1 and PSEN2
bears
Disposition – impaired APP processing
realized in
Pathological process – accumulation of intra- and extracellular protein in the
brain
produces
Disorder – amyloid plaque and neurofibrillary tangles
bears
Disposition – of neurons to die
realized in
Pathological process – neuronal loss
produces
Disorder – cognitive brain regions damaged and reduced in size
bears
Disposition (disease) – Alzheimer’s dementia
realized in
Symptoms – episodic memory loss and other cognitive domain impairment
46
Arterial Aneurysm
•
•
Disposition – atherosclerosis
–
realized in
Pathological process – fatty material collects within the walls of arteries
–
•
•
•
•
•
•
•
•
•
Disorder – artery with weakened wall
–
bears
Disposition – of artery to become distended
–
realized_in
Pathological process – process of distending
–
produces
Disorder – arterial aneurysm
–
bears
Disposition – of artery to rupture
–
realized in
Pathological process – (catastrophic event) of rupturing
–
produces
Disorder – ruptured artery, arterial system with dangerously low blood pressure
–
bears
Disposition – circulatory failure
–
realized in
Pathological process – exsanguination, failure of homeostasis
–
•
produces
produces
Death
47
Hemorrhagic stroke
•
•
•
•
•
•
•
•
•
Disorder – cerebral arterial aneurysm
– bears
Disposition – of weakened artery to rupture
– realized in
Pathological process – rupturing of weakened blood vessel
– produces
Disorder – Intraparenchymal cerebral hemorrhage
– bears
Disposition (disease) – to increased intra-cranial pressure
– realized in
Pathological process – increasing intra-cranial pressure, compression of brain
structures
– produces
Disorder – Cerebral ischemia, Cerebral neuronal death
– bears
Disposition (disease) – stroke
– realized in
Symptoms – weakness/paralysis, loss of sensation, etc
48
Ontology modules extending of
OGMS
Sleep Domain Ontology (SDO)
Infectious Disease Ontology (IDO)
Ontology of Medically Relevant Social
Entities (OMRSE)
Vital Sign Ontology (VSO)
Mental Disease Ontology (MD)
Neurological Disease Ontology (ND)
49
Infectious Disease Ontology (IDO)
– IDO Core:
• General terms in the ID domain.
• A hub for all IDO extensions.
– IDO Extensions:
• Disease specific.
• Developed by subject matter experts.
• Provides:
– Clear, precise, and consistent natural language
definitions
– Computable logical representations (OWL, OBO)
How IDO evolves
IDOMAL
IDOFLU
IDOCore
IDORatSa
IDORatStrep
CORE and
SPOKES:
Domain
ontologies
IDOStrep
IDOSa
IDOMRSa
IDOHumanSa
IDOHIV
IDOAntibioticResistant SEMI-LATTICE:
By subject matter
experts in different
communities of
IDOHumanStrep
interest.
IDOHumanBacterial
IDO Core
• Contains general terms in the ID domain:
– E.g., ‘colonization’, ‘pathogen’, ‘infection’
• A contract between IDO extension ontologies
and the datasets that use them.
• Intended to represent information along
several dimensions:
– biological scale (gene, cell, organ, organism, population)
– discipline (clinical, immunological, microbiological)
– organisms involved (host, pathogen, and vector types)
Sample IDO Definitions
• Host of Infectious Agent (BFO Role): A role borne by
an organism in virtue of the fact that its extended
organism contains an infectious agent.
• Extended Organism (OGMS): An object aggregate
consisting of an organism and all material entities
located within the organism, overlapping the
organism, or occupying sites formed in part by the
organism.
• Infectious Agent: A pathogen whose pathogenic
disposition is an infectious disposition.
IDO and IDOSa
• Scale of the infection (disorder)
12/10/2010
from Shetty, Tang, and Andrews, 2009
54
Differentiated
by:
Antibiotic
Resistance
{
Pathogenesis
Location Type
Geographic
Region
Various
Differentia
{
{
{
Staphylococcus aureus (Sa)
MSSa
MRSa
HA-MRSa
CA-MRSa
UK CA-MRSa
Australian
CA-MRSa
Specific Strains
Sample Application: A lattice of infectious disease
application ontologies from NARSA isolate data
Network on Antimicrobial Resistance
in Staphylococcus aureus
– http://www.narsa.net/content/staphLinks.jsp
True personalized medicine – YourDiseaseOntology
Ways of differentiating
Staphylococcus aureus infectious diseases
• Infectious Disease
–
–
–
–
By host type
By (sub-)species of pathogen
By antibiotic resistance
By anatomical site of infection
• Bacterial Infectious Disease
– By PFGE (Strain)
– By MLST (Sequence Type)
– By BURST (Clonal Complex)
• Sa Infectious Disease
– By SCCmec type
• By ccr type
• By mec class
– spa type
http://www.sccmec.org/Pages/SCC_ClassificationEN.html
NRS701’s resistance to clindamycin
ido.owl
narsa.owl
narsa-isolates.owl
ndf-rt
BFO: The Very Top
continuant
independent
continuant
dependent
continuant
quality
function
role
disposition
occurrent
Basic Formal Ontology
types
Continuant
Independent
Continuant
Dependent
Continuant
thing
quality
Occurrent
process, event
.... ..... .......
instances
Basis of BFO in GO
Continuant
Independent
Continuant
Dependent
Continuant
cellular
component
molecular
function
Occurrent
biological
process
..... ..... ........
How a common upper level ontology
can help resist ontology chaos
something to teach
training (expertise) is portable
each new ontology you confront will be more easily
understood at the level of content
and more easily criticized, error-checked
provides starting-point for domain-ontology
development
provides platform for tool-building and innovations
• lessons learned in building and using one
ontology can potentially benefit other ontologies
• promote shareability of data across discilinary
and other boundaries