Transcript composite

Phenotype annotation
Chris Mungall
Lawrence Berkeley Labs
NCBO
GO
Outline
•
•
•
•
•
Principles of Compositionality
Tour of PATO
Pre vs post composition
Quantitative phenotypes
Next steps
Phenotype annotation: why?
• To shed light on the relationships
between genes, environment and
phenotype
• To compare genes and phenotypes
across organisms
• To improve human health and wellbeing
Difficulties
• Phenotypes can be complex
– Descriptions are often composite
– Encompass relationships between different kinds
of entities, at different levels of granularity
– Different ways of describing the same thing
• Descriptions must be rigorous and
unambiguous
– Ensures meaningful analyses and comparisons
within and between organisms
Compositionality is essential
for describing phenotypes
• Compositionality is a principle of good
ontology design
– aka building blocks, cross-products,
normalised/modular design
– Create complex descriptions (definitions) from
simpler ones
• Descriptions can be composed at any
time
– Ontology construction time (pre-composition)
– Annotation time (post-composition)
An example of
compositionality
• Plasma membrane of spermatocyte
• Plasma membrane [GO CC]
• Spermatocyte [OBO Cell]
• Formal means of composition
• Genus-differentia
Genus
a
Differentia
plasma membrane which is part_of a spermatocyte
GO-CC
OBO-REL
Cell
Compositionality and ontology
tools
• Composition supported by:
– Phenote
– OBO-Edit
• Cross-product plugin
– Protégé-OWL
– SWOOP
– …and others
Advantage: Automatic DAG
calculation
a
a
membrane which is part_of a germ cell
plasma membrane which is part_of a spermatocyte
The building blocks of
phenotype descriptions: EQ
• Entities and qualities (EQ)
– (Bearer) Entity
• E.g: compound eye, spermatocyte, blood, wing growth,
scale morphogenesis
– Quality (aka property, attribute)
• A kind of dependent continuant
• Defined in PATO
• E.g: green, hot, squamous, rugose, edematous, lightsensitivity, luminescent, ectopic, arrested, decomposed
Formal treatment of EQ
• We must be clear about what we mean
when we compose an E and a Q
– Otherwise we will have incomplete query
results and erroneous statistics in
annotations
– The meaning must be computable
• Formally,
an EQ description defines:
a Quality which inheres_in a bearer entity
Example
normal
E
Q
Cell death in
eye
Increased rate
Eye disc cell
small
Eye disc cell
refractile
eya[1]/eya[1]
Kinds of entities which can be
bearers of biological qualities
• Continuants (3D entities)
– Cell parts (GO)
– Cells (OBO Cell ontology)
– Gross anatomical entities (CARO, FMA,
flyAO, MA, zfishAO, …)
– Aggregates of organisms (?)
• Occurrents (4D entities)
– Biological processes (GO)
normal
E
Q
Cell death in
eye
Increased rate
Eye disc cell
small
Eye disc cell
refractile
PATO
eya[1]/eya[1]
GO
FlyAO
Tour of PATO
• Tour from the top-down
• The top level of PATO has been built
according to formal ontological
principles
– This helps us define terms in a consistent
and unambiguous way
– The top level can be hidden from endusers by means of ontology views (aka
slims)
– Still subject to change
• Feedback welcome!
PATO: Top level division
Note: some
nodes omitted for
Quality
brevity
Quality of a
continuant
Quality of an
occurrent
A quality which inheres
In a continuant
A quality which inheres
In a process or spatiotemporal
region
physical
quality
color density
morphology
shape
size
cellular
duration
rate
quality
structure
arrested
premature
delayed
Divisions by granularity
Monadic quality of a continuant
…
Physical quality
Cellular quality
A quality that exists through
action of continuants at the
physical level of organisation
A quality that exists at
the cellular level of organisation
color
green
pink
yellow
temperature
hot
cold
mass
large
mass small
mass
ploidy
potency
…
nucleate
quality
diploid
multipotent
haploid
totipotent anucleate
aneuploid
oligoptent
binculeate
Monadic vs relational
quality of a continuant
…
Monadic quality of a C
A quality of a C that inheres
solely in the bearer and does not
require another entity
Physical
Cellular
quality
quality
shape
morphology
Relational quality of a C …
A quality of a C that requires another
entity apart from its bearer to exist
Sensitivity
(to)
Displacement
(with)
Connecte
d-ness
(to)
size
structure
Example relational quality
• Sensitivity
– Directed towards some entity type
• E.g.
– Sensitivity of an eye to red light
• The quality inheres_in the eye
• With respect to (towards) red light
– Pheno-syntax:
• E= eye Q= sensitivity E2= red_light
On absence
• Annotation patterns for absence, counts are
currently under discussion
• “spermatocyte devoid of asters”
– E= CL:spermatocyte
• Inheres in the spermatocyte
– Q= PATO:lacks_part
• The quality/relation of missing some part or parts
– E2= GO-CC:aster
• The quality is with respect to the type “aster”
Pre- vs post- composition
• When do we build the phenotype
description?
– In the ontology
– During annotation?
• Reconciling pre and post composition:
An analysis of the plant_trait ontology
When do we build the
phenotype description?
• Early?
– Pre-composed phenotype definitions
• MP:0000017 “big ears”
• TO:0000227 “root length”
• TO:0000029 “chlorine sensitivity”
• Late?
– Post-composed phenotype definitions
• E= MA:ear Q= PATO:big
• E= PO:root Q= PATO:length
• E= organism Q= PATO:sensitivity E2= CHEBI:chlorine
Is this comparable?
MP:0000285
MP:0000287
“abnormal cardiac valve morphology”
“heart valve hypoplasia”
?
PATO:0000051
“morphology”
PATO:0000141
“structure”
E= MA:heart_valve Q=PATO:hypoplastic
PATO:0000645
“hypoplastic”
Yes: if term is decomposable
MP:0000285
MP:0000287
Def: a
“abnormal cardiac valve morphology”
“heart valve hypoplasia”
hypoplasticity which inheres_in a heart valve
=
PATO:0000051
“morphology”
PATO:0000141
“structure”
E= MA:heart_valve Q=PATO:hypoplastic
PATO:0000645
“hypoplastic”
Comparing phenotypes
• We want to compare and query both within
and across species
– For gross anatomical phenotypes to be compared
across species, descriptions must be decomposed
or decomposable to anatomical terms
• Anatomical terms must be comparable
– Homology links
– CARO: Common Anatomy Reference Ontology
Case study: Defining plant
traits with PATO
• OBO Plant Trait ontology
• Pre-composed phenotype terms
– Analagous to OBO mammalian_phenotype
ontology
• Task: Define these terms with PATO
– A good test of PATO
– Demonstration of compositional approach
– Allows meaningful comparison across plant
species
– Pilot study before applying to metazoans
http://www.bioontology.org/wiki/index.php/PATO:Pre_vs_Post_Coordinating
Methods
• Creation of genus-differentia definitions
– First pass: Obol
– Second pass: manual editing
• Ontologies used
–
–
–
–
–
PATO
Plant anatomical entities (PO)
Gramene environment (GEO)
Chemical entities of biological interest (CHEBI)
GO
Basic phenotype terms
• “root length” (TO:00000227)
– E= PO:root Q= PATO:length
– Formally:
Def: a
length which inheres_in a root
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Relational qualities involving
types of chemical
• “Chlorine sensitivity” [TO:0000029]
• Directed towards an additional entity
type
– Q= PATO:sensitivity E2=
CHEBI:chlorine
sensitivity which is directed towards chlorine
[ inheres_in organism ]
Def: a
Relational qualities involving
the environment
• “drought sensitivity” [TO:0000029]
– Directed towards an additional entity type
– Q= PATO:sensitivity E2= EO:drought
sensitivity which is directed towards drought
[ inheres_in organism ]
Def: a
OBO needs a good environment ontology
Complex phenotypes
• “Chinsura boro”
– "Abortion of microspore development at
trinucleate stage”
arrested which inheres_in
( microspore development
which during trinucleate stage )
Def: a
Results of plant_trait analysis
• 252/784 terms provided with genusdifferentia definitions so far
• Helped find inconsistencies and
problems in the ontology
• New term suggestions for PATO
– proportionality
• Approach should work for animal
phenotype ontologies
Bacterial phenotypes
• Performed similar analysis on bacterial
phenotype terms
– Provided by Garrity & Hozzein
• Results (morphological only):
– 26 new terms added to PATO
– Rugose, rhizoidal, lobate, filamentous, …
– Todo: chemical utilization phenotypes
• Required:
– Ontologies for aggregates of organisms
– Assay ontology
Measurements
• Ontologies provide qualitative partitions
on the kinds of entities we find in nature
• We may also want to record quantitative
information
– Comes from measurements of qualities
– The measurement is not the phenotype
• Phenotypes exist independently of our
measurements of them
Measurement schema
• A measurement record consists of
– The quality being measured
• E.g. the length of a particular mouse tail
– The unit type
• From PATO UO
– A magnitude
• Floating point number
• Error measure [optional]
Sample of PATO UO
• Unit
• Quality
– Base unit
• Length unit
– Angstrom
– meter
• Mass unit
– Dalton
– Gram
• Substance unit
– Derived unit
• Concentration unit
– pH
– Morphology
• Size
length
– Physical quality
• Mass
Phenotype exchange formats
• Genotypes and phenotypes:
– Pheno-syntax
– Pheno-XML
• General purpose
– OWL (using canonical EQ encoding)
• Also has Obo equivalent
• GO annotation files
– Works with pre-coordinated terms only
OBD-Phenotype
• A database for phenotype associations
• Built on OBD framework
– Tuned for inference and reasoning
– Graph traversal built in from the start
• Results
– Annotations on data from OMIM, ZFIN and
FlyBase
– Currently too small a dataset to do analysis
Next steps
• Get PATO & Phenote used across
multiple organisms and projects
– MODs, BIRN, OMIM,
• Collect annotation data from multiple
sources in one repository (OBD)
– Both pre + post composed
– Demonstrated improved analysis of
annotation data using PATO
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
filamentous
- having thin filamentous extensions at its edge
pleomorphic
- a quality inhering in a cell by virtue of it ability to take on two or more different
shapes during its life cycle
pulvinate
- shaped like a cushion or has a marked convex cushion-like form
umbonate
- having a knob or knoblike protuberance
rugose
- having many wrinkles or creases on the surface
glistening
- emitting or reflecting lots of light
dull
- emitting or reflecting little or no light
viscid
- covered with a sticky or clammy coating
mucoid
- consistency of mucus
spiral
- plane curve traced by a point circling about the center but at increasing
distances from the center
rhizoidal
- having root like extensions radiating from its center
spiny
- having spines, thorns or similar stiff projections on its surface
warty
- having a hard rough surface; not smooth
curled
- having parallel chains in undulate fashion on the border
fragile
- easily damaged or disrupted; brittle
butyraceous
- resembling butter in appearance and consistency
undulate
- having a wavy, shallow edge
punctiform
- small and resembling a point
lobate
- a morphological quality in which the bearer has deeply undulated
edges forming lobes
erose
- having an irregularly toothed edge
raised
- is a thick colony that appear above the medium surface with
terraced edges
convex
- a shape that obtains by virtue of having inward facing edges; having a surface
or boundary that curves or bulges outward, as the exterior of a sphere
Proportions
• “amylose to amylopectin
ratio”TO:0000372
compositionality which is directed towards amylose
relative_to amylopectin
[ inheres_in organism ]
Def: a
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.