Lewis - Gene Ontology Consortium

Download Report

Transcript Lewis - Gene Ontology Consortium

NCBO, the OBO-Foundry, and
you
GO User’s meeting
September 10th, 2006
Suzanna Lewis
GO Consortium &
National Center for Biomedical
Ontology
http://www.geneontology.org/
http://www.bioontology.org/
There is no requirement that
ontology be done using any
particular technology.
Three fundamental
dichotomies
1. types vs. instances
2. continuants vs. occurrents
3. dependent vs. independent
For example, in the GO’s 3
ontologies
occurent
molecular
function
continuant
dependent
biological
process
cellular
compone
nt
independent
Molecules, cell components , organisms are independent
continuants which have functions (these are dependent
continuants), and these functions may be realized as an
A Portion of the OBO Library
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Due Diligence is the 1st step!
 We keep reinventing the wheel
 We don’t even know what’s out there!
 We need tools to help us compare and
contrast ontologies
 We need tools to keep track of ontology
history and to compare versions
 We need infrastructure for connecting
ontologies
Open Biomedical Ontologies:
OBO Mark 1
 Initially side-project of the Gene Ontology
 http://obo.sourceforge.net
 ontology management and versioning
 website
 mailing lists
 limitations due to lack of resources
 lacking ontology development support
 little in the way of integration
 neither ‘nuts-n-bolts’ and semantic integration
The National Center for
Biomedical Ontology
What is NCBO?
NCBO’s 7 Cores







Core 1: Computer science
Core 2: Bioinformatics
Core 3: Driving biological projects
Core 4: Infrastructure
Core 5: Education and Training
Core 6: Dissemination
Core 7: Administration
Who NCBO is
 Stanford: Tools for ontology alignment, indexing,
and management (Cores 1, 4–7: Mark Musen)
 Lawrence–Berkeley Labs: Tools to use ontologies
for data annotation (Cores 2, 5–7: Suzanna Lewis)
 Mayo Clinic: Tools for access to large controlled
terminologies (Core 1: Chris Chute)
 Victoria: Tools for ontology and data visualization
(Cores 1 and 2: Margaret-Anne Story)
 University at Buffalo: Dissemination of best
practices for ontology engineering (Core 6: Barry
Smith)
cBio Driving Biological Projects
 Trial Bank: UCSF, Ida Sim
Qui ckTime™ and a
TIFF (LZW) decompressor
are needed to see thi s pi cture.
 Flybase: Cambridge, Michael Ashburner
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
 ZFIN: Oregon, Monte Westerfield
QuickT ime ™an d a
TIFF ( Uncomp res sed) deco mpre ssor
ar e need ed to see this pictur e.
Animal disease models
Animal models
Mutant Gene
Mutant or
missing Protein
Mutant Phenotype
Animal disease models
Humans
Mutant Gene
Animal models
Mutant Gene
Mutant or
missing Protein
Mutant or
missing Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
Animal disease models
Humans
Mutant Gene
Animal models
Mutant Gene
Mutant or
missing Protein
Mutant or
missing Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
Animal disease models
Humans
Mutant Gene
Animal models
Mutant Gene
Mutant or
missing Protein
Mutant or
missing Protein
Mutant Phenotype
(disease)
Mutant Phenotype
(disease model)
SHH-/+
SHH-/-
shh-/+
shh-/-
Phenotype
(clinical sign) = entity
+ quality
Phenotype
(clinical sign) = entity
P1
= eye
+ quality
+ hypoteloric
Phenotype
(clinical sign) = entity
P1
P2
+ quality
= eye
+ hypoteloric
= midface + hypoplastic
Phenotype
(clinical sign) = entity
P1
P2
P3
+
= eye
+
= midface +
= kidney
+
quality
hypoteloric
hypoplastic
hypertrophied
Phenotype
(clinical sign) = entity
P1
P2
P3
+
= eye
+
= midface +
= kidney
+
ZFIN:
eye
midface
kidney
+
quality
hypoteloric
hypoplastic
hypertrophied
PATO:
hypoteloric
hypoplastic
hypertrophied
Phenotype
(clinical sign) = entity
+ quality
Anatomical ontology
Cell & tissue ontology
Developmental ontology
Gene ontology
biological process
cellular component
+
PATO
(phenotype and trait ontology)
Phenotype
(clinical sign) = entity
P1
P2
P3
+
= eye
+
= midface +
= kidney
+
quality
hypoteloric
hypoplastic
hypertrophied
Syndrome = P1 + P2 + P3
(disease)
= holoprosencephaly
Human holoprosencephaly
Zebrafish
shh
Zebrafish
oep
What is Phenote?

A tool for annotating Phenotypes
1. Curator reads about a phenotype in the
literature related to taxonomy or genotype
2. Curator enters genotype(or taxonomy)
3. Curator enters genetic context (optional)
4. Curator searches/enters Entity (e.g.
Anatomy)
5. Curator searches/enters PATO
attribute/value
A Portion of the OBO Library
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
OBO Mark II: Infrastructure
 Integrated access to all OBO ontologies
 Programmatic and user access
 web interface
 interface via tools (OBO-Edit, Protégé)
 application programmer interfaces (APIs)
 web services
 Advanced search facilities
 lexgrid
 Visualization
 Ontology metadata
Return to GO
(do not collect $200)
Specific Aims of the GO 2006
 We will maintain comprehensive, logically
rigorous and biologically accurate ontologies.
 We will comprehensively annotate 9
reference genomes in as complete detail as
possible.
 We will support annotation across all
organisms.
 We will provide our annotations and tools
to the research community.
Weaving and untangling the
GO
 Missing relations
 is_a completeness
 Adding new relations within single GO
ontology
 Adding “regulates” to BP
 Distinguishing different part_of relations
 Adding Relations between GO axis
 Linking between MF & BP & CC
 Adding relations between GO & other
ontologies
 GO+Cell
 GO+anatomy
Implicit ontologies within the
GO:





cysteine biosynthesis (ChEBI)
myoblast fusion (Cell Type Ontology)
hydrogen ion transporter activity (ChEBI)
snoRNA catabolism (Sequence Ontology)
wing disc pattern formation (Drosophila
anatomy)
 epidermal cell differentiation (Cell Type
Ontology)
 regulation of flower development (Plant
anatomy)
 interleukin-18 receptor complex (not yet in
bol produces genus-differentia logical definition
OBO
editor
go.obo
cell.obo
cell.obo
cell.obo
name
parser
Ego.obo
obol
config
cjm
oboedit
GO
editor
reasoner
obol
go
‘fixed’
obol
report
Relations to Other Ontologies
CL
GO
blood
cell
cell differentiation
lymphocyte
differentiation
lymphocyte
B-cell
activation
is_a
B-cell differentiation
B-cell
CELL Ontology
[Term]
id: CL:0000236
name: B-cell
is_a: CL:0000542 ! lymphocyte
develops_from: CL:0000231 ! B-lymphoblast
Augmented GO
[Term]
id: GO:0030183
name: B-cell differentiation
is_a: GO:0042113 ! B-cell activation
is_a: GO:0030098 ! lymphocyte differentiation
intersection_of: is_a GO:0030154 ! cell differentiation
intersection_of: has_participant CL:0000236 ! B-cell
There are many less than perfect
ontologies
Use the power of
combination and collaboration
 Ontologies are like telephones: they are
valuable only to the degree that they are
used and networked with other
ontologies
 But to work telephones must be
connected
 Like telephones, most ontologies were
broken when the technology was first
being developed
The OBO-Foundry is:
 foun·dry
 An establishment where metal is melted
and poured into molds
 OBO-foun·dry
 An establishment where scientific theory
is formalized and represented in
ontologies
 To create the conditions for a stepby-step evolution towards robust
gold standard reference ontologies
in the biomedical domain.
 To introduce some of the features
of scientific peer review into
biomedical ontology development.
 obofoundry.org
TheOBO
OBO
Foundry
Foundry
A subset of OBO ontologies whose developers
agree in advance to accept a common set of
principles designed to assure






intelligibility to biologist curators, annotators, users
formal robustness
stability
compatibility
interoperability
support for logic-based reasoning
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
December
Component Function
1st & 2nd
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Building out from the original GO
CRITERIA
 The ontology is OPEN and available to be
used by all.
 The ontology is in, or can be instantiated in, a
COMMON FORMAL LANGUAGE.
 The developers of the ontology agree in
advance to COLLABORATE with developers
of other OBO Foundry ontology where
domains overlap.
The OBO Foundry
http://obofoundry.org/
CRITERIA
 UPDATE: The developers of each ontology
commit to its maintenance in light of scientific
advance, and to soliciting community
feedback for its improvement.
 ORTHOGONALITY: They commit to working
with other Foundry members to ensure that,
for any particular domain, there is community
convergence on a single controlled
vocabulary.
The OBO Foundry
http://obofoundry.org/
 Orthogonality of ontologies implies
additivity of annotations
 If we annotate a database or body of
literature with one high-quality
biomedical ontology, we should be able
to add annotations from a second such
ontology without conflicts
The OBO Foundry
http://obofoundry.org/
CRITERIA
 IDENTIFIERS: The ontology possesses a unique
identifier space within OBO.
 VERSIONING: The ontology provider has
procedures for identifying distinct successive
versions to ensure BACKWARDS COMPATIBITY
with annotation resources already in common use
 The ontology includes TEXTUAL DEFINITIONS
and where possible equivalent formal definitions of
its terms.
The OBO Foundry
http://obofoundry.org/
CRITERIA
 CLEARLY BOUNDED: The ontology
has a clearly specified and clearly
delineated content.
 DOCUMENTATION: The ontology is
well-documented.
 USERS: The ontology has a plurality
of independent users.
The OBO Foundry
http://obofoundry.org/
CRITERIA
 AGREE ON RELATIONS: The ontology
uses relations which are unambiguously
defined following the pattern of
definitions laid down in the OBO
Relation Ontology.*
 The success of ontology alignment
demands that ontological relations
(is_a, part_of, ...) have the same
meanings in the different ontologies to
Genome Biology 6:R46, 2005.
be aligned.
Elements for Success 1
 A Community with a common vision
 A pool of talented and motivated
developers/scientists
 A mix of academic and commercial
 An organized, light weight approach to
product development
 A leadership structure
 Communication
 A well-defined scope, (our “business”)
Adopted from “Open Source Menu for Success”
Elements for Success 2
 Keep It Simple:
 lowest possible barrier to entry
 Technology independence
 “With new data, we change our minds”
 An ontology must adapt to reflect current
understanding of reality
 Plan for and anticipate changes
 Stay close to your users
 biologists and medical researchers
Ontology:
A thing of beauty is a joy
forever
With acknowledgement and thanks to
•
•
•
•
•
•
•
•
Seth Carbon
John DayRichter
Karen Eilbeck
Mark Gibson
Sima Misra
Chris Mungall
Shu Shengqiang
Nicole
Berkeley
Washington

Michael
o Mark Musen
Ashburner
o Chris Chute
 Judith Blake
o Barry Smith
 J. Michael
o Daniel Rubin
Cherry
o Monte
 David Hill
Westerfield
 Midori Harris
o Michael
 Rex Chisholm
Ashburner
NCBO
 And GO
many
o And
more…
*Without
even going into our other
projects: Apollo, SO, Chado, GMOD, DAS, Reactom
more…
BOP