quality - National Center for Biomedical Ontology

Download Report

Transcript quality - National Center for Biomedical Ontology

GO, NCBO, Phenotypes, & the
OBO Foundry
January 29th, 2007
Ontologies for Biomedical
Investigations
La Jolla Institute for Allergy and Immunology
Suzanna Lewis
GO Consortium &
National Center for Biomedical Ontology
http://www.geneontology.org/
http://www.bioontology.org/
Outline
 Perspective on the challenge of our
mutual investment of time and effort on
standards, formalisms, and
representation
 GO case study retrospective
 NCBO today
 Phenotype case study
 OBO-Foundry
The Scientific Method
 A body of techniques for investigating phenomena
and acquiring new knowledge, as well as for
correcting and integrating previous knowledge. It is
based on observable, empirical, measurable
evidence, and subject to rules of reasoning.

Isaac Newton (1687, 1713, 1726). "Rules for the study of natural philosophy",
Philosophiae Naturalis Principia Mathematica, Book 3, The System of the World.
Third edition, the 4 rules as reprinted on pages 794-796 of I. Bernard Cohen and
Anne Whitman's 1999 translation, University of California Press ISBN 0-52008817-4, 974 pages.
Today’s data is in electronic
form
 Rules of reasoning are an intrinsic
element of scientific investigation
 In our current era, data reside in
electronic form
 Building and using computable
ontologies will support rules of
reasoning on our data
 And thereby support research in the
computer age.
Necessary Character of a
computational environment for
biological research
 Sustainable
 There must be mechanism for maintaining the environment (that is
less than the initial cost).
 Adaptable
 It must work for the complete spectrum of data types, from
genomics to clinical trials
 It must continually adapt to new knowledge and new technologies
 Interoperable
 We need the capability of easily integrating data from a variety of
sources.
 Evolvable
 Mechanisms must be put in place to respond to the needs of the
biomedical research community. They provide the primary selection
pressure on the evolution of the technology.
10 Factors in achieving goals
1. Clarity of Vision/Goal
6. Feasible within
available resources
2. Political Landscape
needs to support the
7. Providing incentives
goal
for adoption
3. Decision-Process
8. Tactics must change
needs to be
with the adoption
responsive and
curve
efficient
9. Sustaining effort over
4. Message has to be
multiple years
brought to the
10. Being satisfied with
community
highly imperfect, but
5. Accountability (i.e. no
pragmatic solutions.
vaporware)
Credit to John Glaser
Criteria for success, & signs of
failure
 Measurable evidence of improved productivity and efficiency
(time saved vs. number of users for given output)
 Evidence of learning from experience
 e.g., sustained improvements in content volume and quality)
 Evidence of discoverability - enables positive outputs that were
unanticipated
 Continual, iterative process that occurs at each stage of
development and growth, fully integrated into the lifecycle
 Evidence of community acceptance, that more data is being
provided continuously
 Contains sufficient information to supports reproducibility of results
 Negotiation of meaning has occurred
 Stymied by barriers regarding IP, and credit attribution
 High relative cost for maintenance, support, and boosting
interest
 Problems that occur between the cracks
"For a successful technology, reality must
take precedence over public relations, for
Nature cannot be fooled.”
Richard Feynman (1962)
GO case study
Three fundamental
dichotomies
1. types vs. instances
2. continuants vs. occurrents
3. dependent vs. independent
For example, in the GO’s 3
ontologies
occurent
molecular
function
continuant
dependent
biological
process
cellular
compone
nt
independent
Molecules, cell components , organisms are independent
continuants which have functions (these are dependent
continuants), and these functions may be realized as an
Specific Aims of the GO 2006
 We will maintain comprehensive, logically
rigorous and biologically accurate ontologies.
 We will comprehensively annotate 9
reference genomes in as complete detail as
possible.
 We will support annotation across all
organisms.
 We will provide our annotations and tools
to the research community.
Weaving and untangling the
GO
 Missing relations
 is_a completeness
 Adding new relations within single GO
ontology
 Adding “regulates” to BP
 Distinguishing different part_of relations
 Adding Relations between GO axis
 Linking between MF & BP & CC
 Adding relations between GO & other
ontologies
 GO+Cell
 GO+anatomy
Implicit ontologies within the
GO:





cysteine biosynthesis (ChEBI)
myoblast fusion (Cell Type Ontology)
hydrogen ion transporter activity (ChEBI)
snoRNA catabolism (Sequence Ontology)
wing disc pattern formation (Drosophila
anatomy)
 epidermal cell differentiation (Cell Type
Ontology)
 regulation of flower development (Plant
anatomy)
 interleukin-18 receptor complex (not yet in
Relations to Other Ontologies
CL
GO
blood
cell
cell differentiation
lymphocyte
differentiation
lymphocyte
B-cell
activation
is_a
B-cell differentiation
B-cell
CELL Ontology
[Term]
id: CL:0000236
name: B-cell
is_a: CL:0000542 ! lymphocyte
develops_from: CL:0000231 ! B-lymphoblast
Augmented GO
[Term]
id: GO:0030183
name: B-cell differentiation
is_a: GO:0042113 ! B-cell activation
is_a: GO:0030098 ! lymphocyte differentiation
intersection_of: is_a GO:0030154 ! cell differentiation
intersection_of: has_participant CL:0000236 ! B-cell
Correlation of mRNA
decay rates with (GO)
function
Genome Research 13:1863-1872, 2003
Decay Rates of Human mRNAs: Correlation With Functional Characteristics
and Sequence Attributes. E. Yang, E. van Nimwegen, M. Zavolan, N.
Rajewsky, M. Schroeder, M Magnasco and JE Darnell, Jr
How GO measures up
 Measurable evidence of improved productivity and efficiency
 Researchers simply use the GO, as judged by publications
 Evidence of learning from experience
 Formalism of the GO continues to improve
 Evidence of discoverability - enables positive outputs that were
unanticipated
 Primary use of GO is cluster analysis of microarray expression data
 Continual, iterative process that occurs at each stage of
development and growth, fully integrated into the lifecycle
 Quarterly updates of software
 Evidence of community acceptance, that more data is being
provided continuously
 Number of species continues to increase
The National Center for
Biomedical Ontology
BioPortal
Phenotype Annotation
NCBO’s 7 Cores







Core 1: Computer science
Core 2: Bioinformatics
Core 3: Driving biological projects
Core 4: Infrastructure
Core 5: Education and Training
Core 6: Dissemination
Core 7: Administration
Who NCBO is
 Stanford: Tools for ontology alignment,
indexing, and management (Cores 1, 4–7:
Mark Musen)
 Lawrence–Berkeley Labs: Tools to use
ontologies for data annotation (Cores 2, 5–7:
Suzanna Lewis)
 Mayo Clinic: Tools for access to large
controlled terminologies (Core 1: Chris
Chute)
 Victoria: Tools for ontology and data
visualization (Cores 1 and 2: Margaret-Anne
Story)
NCBO Driving Biological Projects
 Trial Bank: UCSF, Ida Sim
Qui ckTime™ and a
TIFF (LZW) decompressor
are needed to see thi s pi cture.
 Flybase: Cambridge, Michael Ashburner
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
 ZFIN: Oregon, Monte Westerfield
QuickT ime ™an d a
TIFF ( Uncomp res sed) deco mpre ssor
ar e need ed to see this pictur e.
BioPortal
 Indexes, searches and visualizes terms
in ontologies in library
 Uses LexGrid (Mayo)
 Contains ontologies that their editors
have released to BioPortal
The BioPortal Needs You!
 We need, and beg and plead for, your
feedback
 http://www.bioontology.org/ncbo/faces/in
dex.xhtml
 For example: Providing URIs for all
ontologies and/or ontology content?
 Tomorrow depends on you, no request
is too mundane.
Phenotype work in progress
Animal disease models
Animal models
Mutant Gene
Mutant or
missing Protein
Mutant Phenotype
Animal disease models
Humans
Animal models
Mutant Gene
Mutant Gene
Mutant or
missing Protein
Mutant or
missing Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
Animal disease models
Humans
Animal models
Mutant Gene
Mutant Gene
Mutant or
missing Protein
Mutant or
missing Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
Animal disease models
Humans
Animal models
Mutant Gene
Mutant Gene
Mutant or
missing Protein
Mutant or
missing Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
SHH-/+
SHH-/-
shh-/+
shh-/-
Phenotype
(clinical sign) = entity
+ quality
Phenotype
(clinical sign) = entity
P1
= eye
+ quality
+ hypoteloric
Phenotype
(clinical sign) = entity
P1
P2
+ quality
= eye
+ hypoteloric
= midface + hypoplastic
Phenotype
(clinical sign) = entity
P1
P2
P3
+
= eye
+
= midface +
= kidney
+
quality
hypoteloric
hypoplastic
hypertrophied
Phenotype
(clinical sign) = entity
P1
P2
P3
+
= eye
+
= midface +
= kidney
+
ZFIN:
eye
midface
kidney
+
quality
hypoteloric
hypoplastic
hypertrophied
PATO:
hypoteloric
hypoplastic
hypertrophied
Phenotype
(clinical sign) = entity
+ quality
Anatomical ontology
Cell & tissue ontology
Developmental ontology
Gene ontology
biological process
cellular component
+
PATO
(phenotype and trait ontology)
Phenotype
(clinical sign) = entity
P1
P2
P3
+
= eye
+
= midface +
= kidney
+
quality
hypoteloric
hypoplastic
hypertrophied
Syndrome = P1 + P2 + P3
(disease)
= holoprosencephaly
Human holoprosencephaly
Zebrafish
shh
Zebrafish
oep
Observable
Attribute
Bone
Value
length
Qualifi er
short
very
Conditions
Genetic
contribution
attribute
Assay
Units
Environmental
contribution
Compound
observabl e
(disease?)
obesity
observable
value
units
body
mass
30
BMI
obesity
blood
400
?
hypertension
artery
triglyceride
levels
elasticity
140/90
mm
Hg
qualifie r
assay
conditions
TIGR, December 6-7, 2002
increased
elevated
body fat
analysis
blood
test
normal
normal
PaTO upper level
 Unifying goal: Integrating data
 within and across domains (e.g. different
taxa)
 across levels of granularity
 across different perspectives
 Requires
 Rigorous formal definitions in both
ontologies and annotation schemas
Top level PaTO division:
spatial vs temporal
Note: some
nodes omitted for
Quality
brevity
Quality of a
continuant
Quality of an
occurrent
A quality which inheres
In a continuant
A quality which inheres
In a process or spatiotemporal
region
physical
quality
color density
morphology
shape
size
cellular
duration
quality
structure
arrested
premature
delayed
Top level PaTO division:
Granularity
Monadic quality of a continuant
…
Physical quality
Cellular quality
A quality that exists through
action of continuants at the
physical level of organisation
A quality that exists at
the cellular level of organisation
color
green
pink
yellow
temperature
hot
cold
mass
large
mass small
mass
ploidy
potency
…
nucleate
quality
diploid
multipotent
haploid
totipotent anucleate
aneuploid
oligoptent
binculeate
Monadic vs. relational
quality of a continuant
…
Monadic quality of a C
A quality of a C that inheres
solely in the bearer and does not
require another entity
Physical
Cellular
quality
quality
shape
morphology
Relational quality of a C …
A quality of a C that requires another
entity apart from its bearer to exist
Sensitivity
(to)
Displacement
(with)
Connecte
d-ness
(to)
size
structure
Relational qualities involving
the environment
 “drought sensitivity” [TO:0000029]
 Directed towards an additional entity
type
 Q=
PATO:sensitivity
E2=
EO:drought
Def:
a sensitivity
which is directed
towards
drought
[ inheres_in organism ]
OBO needs a good environment ontology
What is Phenote?

A tool for annotating Phenotypes
1. Curator reads about a phenotype in the
literature related to taxonomy or genotype
2. Curator enters genotype(or taxonomy)
3. Curator enters genetic context (optional)
4. Curator searches/enters Entity (e.g.
Anatomy)
5. Curator searches/enters PATO
attribute/value
Phenote
ZFIN integration
Also Phenote
Other ontologies…
 Anatomy
 Cell
 Chemical
 Drug
 Disease
 Environmental
context
 ...
 Qualifier
 Unit
 GO - biological
process
 GO - molecular
function
 GO - cellular
component
OBO-Foundry
OBO was conceived and
announced in in 2001
10 october 2001
Michael Ashburner and Suzanna Lewis with
acknowledgements of others in the GO community.
open biology ontologies - obo.org
The success of GO leads us to propose the formation of
an umbrella body which we call OBO - open biology
ontologies. OBO would act as an umbrella site for GO
and other ontologies within the broad domain of
biology.
Open Biomedical Ontology
It is critical that ontologies are
developed cooperatively so that
their classification strategies
augment one another.
The OBO-Foundry is:
 foun·dry
 An establishment where metal is melted
and poured into molds
 OBO-foun·dry
 An establishment where scientific theory
is formalized and represented in
ontologies
The OBO Foundry
 The developers of OBO ontologies agree in
advance to accept a common set of principles
designed to assure






intelligibility to biologist curators, annotators, users
formal robustness
stability
compatibility
interoperability
support for logic-based reasoning
 To create the conditions for a step-bystep evolution towards robust gold
standard reference ontologies in the
biomedical domain.
 To introduce some of the features of
scientific peer review into biomedical
ontology development.
 obofoundry.org
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Building out from the original GO
Agree on relations
 The success of ontology alignment
demands that ontological relations
(is_a, part_of, ...) have the same
meanings in the different ontologies to
be aligned.
Genome Biology 6:R46, 2005.
Elements for Success 1
 A Community with a common vision
 A pool of talented and motivated
developers/scientists
 A mix of academic and commercial
 An organized, light weight approach to
product development
 A leadership structure
 Communication
 A well-defined scope, (our “business”)
Adopted from “Open Source Menu for Success”
Elements for Success 2: There
is no requirement that ontology
be done using any particular
technology.







Michael Ashburner
Judith Blake
J. Michael Cherry
David Hill
Midori Harris
Rex Chisholm
And many more…
With
thanks
to*







Others…
GO







Seth Carbon
John Day-Richter
Karen Eilbeck
Mark Gibson
Chris Mungall
Shu Shengqiang
Nicole Washington
Berkeley
BIRN
DictyBase
FlyBase
MGI
NESCENT
ZFIN
And more…






Mark Musen
Barry Smith
Monte Westerfield
Michael Ashburner
Daniel Rubin
And more…
NCBO
*Without even going into our other projects: Apollo, SO, Chado, GMOD, DAS,