LEWIS - Buffalo Ontology Site

Download Report

Transcript LEWIS - Buffalo Ontology Site

PATO & Phenotypes:
From model organisms to clinical
medicine
Suzanna Lewis
September 4th, 2008
Signs, Symptoms and Findings Workshop
First Steps Toward an Ontology of Clinical Phenotypes
Describing phenotype using ontologies will aid in the
identification of models of disease & candidate
causative genes
 GWAS: Genome Wide Association Studies
 Any study of genetic variation across the entire
human genome that is designed to identify
genetic associations with observable traits
(such as blood pressure or weight), or the
presence or absence of a disease or condition.
 Given an identified gene, then what?
Animal disease models
Animal models
Mutant Gene
Mutant or missing
Protein
Mutant Phenotype
(disease model)
Animal disease models
Humans
Animal models
Mutant Gene
Mutant Gene
Mutant or missing
Protein
Mutant or missing
Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
Animal disease models
Humans
Animal models
Mutant Gene
Mutant Gene
Mutant or missing
Protein
Mutant or missing
Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
Animal disease models
Humans
Animal models
Mutant Gene
Mutant Gene
Mutant or missing
Protein
Mutant or missing
Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
Phenotype data mining = text searching?
 Text-based phenotype resources:






OMIM (NCBI)
DECIPHER (Sanger)
HGMD (Cardiff)
Disease-specific databases
MODs
PubMed
Information retrieval from text-based
resources (OMIM) is not straightforward:
Query
“large bone”
"enlarged bone"
"big bones"
"huge bones"
"massive bones"
"hyperplastic bones"
"hyperplastic bone"
"bone hyperplasia"
"increased bone growth"
# of records
713
136
16
4
28
8
34
122
543
Thanks to:
M Ashburner
Even if we can find what we are looking
for in one organism, how can we
associate that with phenotypes
observed in different organisms?
Methods to link phenotypic
descriptions of human diseases to
animal models currently don’t exist.
Goal: Turn text-based phenotypes into
ontology-based computable annotations
 Define a model for representing phenotypes
SHH-/+
SHH-/-
shh-/+
shh-/-
Phenotype
(clinical sign) =
entity
+
attribute
Phenotype
(clinical sign) =
P1
=
entity
eye
+
+
attribute
hypoteloric
Phenotype
(clinical sign) =
P1
P2
=
=
entity
+
eye
+
midface +
attribute
hypoteloric
hypoplastic
Phenotype
(clinical sign) =
P1
P2
P3
=
=
=
entity
eye
midface
kidney
+
+
+
+
attribute
hypoteloric
hypoplastic
hypertrophied
Phenotype
(clinical sign) =
P1
P2
P3
=
=
=
entity
eye
midface
kidney
+
+
+
+
ZFIN:
eye
midface
kidney
attribute
hypoteloric
hypoplastic
hypertrophied
PATO:
hypoteloric
+
hypoplastic
hypertrophied
Phenotype
(clinical sign) =
entity
+
attribute
Anatomical ontology
Cell & tissue ontology
Developmental ontology
Gene ontology
biological process
molecular function
cellular component
+
PATO
(phenotype and trait ontology)
Phenotype
(clinical sign) =
P1
P2
P3
=
=
=
entity
eye
midface
kidney
+
+
+
+
attribute
hypoteloric
hypoplastic
hypertrophied
Syndrome = P1 + P2 + P3
(disease)
(package) =
holoprosencephaly
Genetic
Phenotype annotation
model
Environment
Evidence
Qualifier
Assertion
Source
Entity
relationship
Attribution
Properties
Who makes the assertion
When, what organization
Quality
Units
represents
subj
relation
OBD and annotations
obj
annotation
Absence
of aorta
investigator
read
observation
bio-entity
X
Dev Biol 2005 Jul 15;283(2):357-72
publish/
create
Experiment/
investigation
“Sonic hedgehog is required for cardiac outflow tract
and neural crest cell development”
Information entity
communicate
Direct
query/
annotation
meta-analysis
annotation
Shh-
Shh
bio-entity
Shh+
influences
Absence
Of aorta
Heart
Participates
development
in
Computational representation
submit/
consume
Agent
(human/computer)
Community/expert
local db
local db
local db
Multiple schemas
Goal: Turn text-based phenotypes into
ontology-based computable annotations
 Define a model for representing phenotypes
 Develop and extend requisite ontologies
 For the entities being described: anatomies, processes, …
Building a suite of orthogonal interoperable
reference (evidence based) ontologies in the
biomedical domain.
It is critical that ontologies are developed
cooperatively so that
their classification strategies augment one
another.
Truth springs from arguments amongst
friends. (David Hume)
RELATION
TIME
CONTINUANT
TO
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
CELL AND CELLULAR
COMPONENT
MOLECULE
Organism
Anatomical
Organ
(NCBI
Entity
Function
Taxonomy?) (FMA, CARO) (FMP, CPRO)
Cell
(CL)
Cellular
Component
(FMA, GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Phenotypic
Quality
(PaTO)
Biological Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular Process
(GO)
Requisite ontologies
 An ontology of qualities (PATO)
 Organism specific anatomies
 A controlled vocabulary of homologous and
analogous anatomical structures (Uberon)
 Gene Ontology
 Cell Types
Goal: Turn text-based phenotypes into
ontology-based computable annotations
 Define a model for representing phenotypes
 Develop and extend requisite ontologies
 For the entities being described: anatomies, processes, …
 Develop an intuitive annotation environment for
rigorously capturing phenotypes (“semantic
authoring”)
Phenote: Simple software for
annotating using ontologies
 Provide tool for ontology-based annotation
 Standardized model to record annotations for increased
compatibility of data between disparate communities.
 Simple & intuitive user interface
 (especially for users that don’t know/care about what an
ontology is)
 Easy-to-configure for different user-communities
 Pluggable architecture for external applications to
interface/embed in application
 Provide interfaces with external SOAP and REST
services for streamlined workflow (OBD, NCBI,
EBI, etc).
 www.phenote.org
Ontologies can be utilized from various
resources in OWL and OBO format
BioPortal
Local file
External site
CVS
Phenote tour
Editor
Refining terms onthe-spot
 Post-composition:
 Join together 2 (or more) terms for specificity:
 Apoptosis of neuron in skin (GO,CL,FMA)
 S-phase of colon cancer cell (GO,CL)
 Aster of human spermatocyte (GO,FMA)
 Combine terms from different ontologies
 Increase “information content” of an annotation
 Pre-composed:
 Have decomposed definitions of ~2/3rds of MP
terms available to incorporate mouse data
Term Info Browser
Annotation Table
Retrieve data from NCBI:
OMIM, PUBMED, …
(SOAP plug-in)
Graphical Viewer
Goal: Turn text-based phenotypes into
ontology-based computable annotations
 Define a model for representing phenotypes
 Develop and extend requisite ontologies
 For the entities being described: anatomies, processes, …
 Develop an intuitive annotation environment for
rigorously capturing phenotypes (“semantic
authoring”)
 Develop a set of guidelines for biocurators
 Annotate mutant phenotypes (OMIM and models)
General Annotation Standards




Remarkable normality
Absence
Relative qualities (what does “small” mean?)
Rates/frequencies
 does it inhere in the heart or a process?
 Homeotic transformation
 Phenotypes specific to a stage or temporal
duration
Testing the methodology
 Annotated 11 gene-linked human diseases
described in OMIM, and their homologs in
zebrafish and fruitfly.
ATP2A1, BRODY MYOPATHY
EPB41, ELLIPTOCYTOSIS
EXT2, MULTIPLE EXOSTOSES
EYA1, EYES ABSENT
FECH, PROTOPORPHYRIA
PAX2, RENAL-COLOBOMA SYNDROME
SHH, HOLOPROSENCEPHALY
SOX9, CAMPOMELIC DYSPLASIA
SOX10, PERIPHERAL DEMYELINATING NEUROPATHY
TNNT2, FAMILIAL HYPERTROPHIC
CARDIOMYOPATHY
 TTN, MUSCULAR DYSTROPHY










An OMIM Record
An OMIM Record
An OMIM Record
An OMIM Record
An OMIM Record
Goal: Turn text-based phenotypes into
ontology-based computable annotations
 Define a model for representing phenotypes
 Develop and extend requisite ontologies
 For the entities being described: anatomies, processes, …
 Develop an intuitive annotation environment for
rigorously capturing phenotypes (“semantic
authoring”)
 Develop a set of guidelines for biocurators
 Annotate mutant phenotypes (OMIM and models)
 Collect & store annotations in a common resource
(OBD) and make these broadly available
4355 genes and genotypes in OBD
17782 entity-quality annotations in OBD
OBD model: Requirements
 Generic
 We can’t define a rigid schema for all of biomedicine
 Let the domain ontologies do the modeling of the domain
 Expressive
 Use cases vary from simple ‘tagging’ to complex
descriptions of biological phenomena
 Formal semantics
 Amenable to logical reasoning
 First Order Logic and/or OWL1.1
 Standards-compatible
 Integratable with semantic web
OBD Model: overview
 Graph-based: nodes and links
 Nodes: Classes, instances, relations
 Links: Relation instances
 Connect subject and object via relation plus additional
properties
 Annotations: Posited links with attribution / evidence
 Equivalent expressivity as RDF and OWL
 Links aka axioms and facts in OWL
 Attributed links:
 Named graphs
 Reification
 N-ary relation pattern
 Supports construction of complex descriptions
through graph model
OBD Dataflow
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Example of
Annotation in OBD
Post-composition of
complex anatomical
entity descriptions
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Post-composition of
phenotype classes (PATO
EQ formalism)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
key
OBD Architecture
 Two stacks
1. Semantic web stack


Built using Sesame triplestore + OWLIM
Future iterations: Science-commons Virtuoso
2. OBD-SQL stack



Current focus
Traditional enterprise architecture
Plugs into Semantic Web stack via D2RQ
OBD-SQL Stack
 Alpha version of API
implemented
 Test clients access via
SOAP
 Phenote current accesses
via org.obo model & JDBC
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
 Wraps org.obo model and
OBD schema

 Share relational
abstraction layer
 Org.obo wraps OWLAPI
Phenote currently connects via
JDBC connectivity in org.obo
Goal: Turn text-based phenotypes into
ontology-based computable annotations
 Define a model for representing phenotypes
 Develop and extend requisite ontologies
 For the entities being described: anatomies, processes, …
 Develop an intuitive annotation environment for
rigorously capturing phenotypes (“semantic
authoring”)
 Develop a set of guidelines for biocurators
 Annotate mutant phenotypes (OMIM and models)
 Collect & store annotations in a common resource
(OBD) and make these broadly available
 Develop tools & resources for mining data for
novel discovery
 Developed a similarity search algorithm to identify
genotypes with similar phenotype.
sox9 mutations curated in PATO syntax
Human, SOX9
(Campomelic dysplasia)
Scapula: hypoplastic
Lower jaw: decreased size
Heart: malformed or edematous
Phalanges: decreased length
Long bones: bowed
Male sex determination: disrupted
Zebrafish, sox9a
(jellyfish)
Scapulocorocoid: aplastic
Cranial cartilage: hypoplastic
Heart: edematous
Pectoral fin: decreased length
Cartilage development: disrupted
Average annotation consistency
600
# Annotations
500
400
total annotations
300
similar annotations
200
100
0
congruence
1
2
3
4
EYA1
SOX10
SOX9
PAX2
0.78
0.71
0.61
0.72
Reasoning over phenotype descriptions
recorded with ontologies provides
linkages in annotations.
Ontologies and reasoning
can reveal similarities in
phenotype annotations.
A zebrafish shh similar-phenotype query
returns known hedgehog pathway members
Gene
Similarity
Citation
Role in hedgehog pathway
smo
0.445
Ochi, et al. 2006
Membrane protein binds shh receptor ptc1
disp1
0.444
Nakano, et al.
2004
Regulates secretion of lipid modified shh from midline
prdm1
a
0.43
Roy, et al., 2001
Zinc-finger domain transcription factor, downstream target
of shh signaling
hdac1
0.427
Cunliffe and
Casaccia-Bonnefil,
2006
Transcriptional regulator required for shh mediated
expression of olig2 in ventral hindbrain
scube
2
0.398
Hollway et al.,
2006
May act during shh signal transduction at the plasma
membrane
wnt11
0.380
Mullor et al., 2001
Extracellular cysteine rich glycoprotein required for gli2/3
induced mesoderm development
gli2a
0.348
Kalstrom, et al.,
1999
Zinc finger transcription factor target of shh signaling
bmp2b
0.303
Ke et al., 2008
Downstream target of gli2 gene repression
gli1
0.303
Karlstrom, et al.,
2003
Zinc finger transcription factor target of shh signaling
ndr2
0.289
Muller, et al.,
2000
TGFbeta family member upstream of hedgehog signaling in
the ventral neural tube
hhip
0.265
Ochi et al., 2006
Binds shh in membrane and modulates interaction with smo
OBD similarity query
 A computational search that enables comparison
of phenotypes within and across species.
 Given a set of phenotype annotations recorded for a
mutant allele we can identify other alleles in the same
gene.
 We can identify other known pathway members in the
same species and known gene orthologs in other species
simply by comparing phenotypes alone.
 This annotation and search method provides a
novel means for laboratory researchers to identify
potential gene candidates participating in
regulatory and/or disease pathways.
Summary of (some of) the challenges
 Curating the information
 Efficiency (pre-composed vs. post-composed)
 Consistency between curators
 Missing contextual information (genetic background and
environment)
 Observation vs. the inference made from this
 Representing homology (bones named by relative
position)
 Those attempting this anyway: zebrafish,
Drosophila, C. elegans, Cyprinoid fish (evolution),
Dictyostelium, mouse (many), Xenopus,
paramecium…
Credit to
Berkeley




Christopher Mungall
Mark Gibson
Nicole Washington
Rob Bruggner
U of Oregon
 Monte Westerfield
 Melissa Haendel
National
Institutes of
Health
U of Cambridge
 Michael Ashburner
 George Gkoutos (PATO)
 David Osumi-Sutherland
OBO Foundry





Michael Ashburner
Christopher Mungall
Alan Ruttenberg
Richard Scheuermann
Barry Smith