Transcript Slides

amo
amos
amot
amomus
amotis
amont
.
Happy birthday Swiss-Prot
Fortaleza August 2006
Three (Orthogonal)
Ontologies
• Biological Process
– Goal or objective within cell, tissue ..
• Molecular Function
– Elemental activity or task
• Cellular Component
– Location or complex
Content of GO
•molecular function
•biological process
•cellular component
7,432 terms
10,740 terms
1,772 terms
•all
19,994 terms
definitions
19,042 (96%)
!version v.4.2
!date 4 November 1998
!author Michael Ashburner
$Gene Ontology ; GO:0000001 ; remark:
$function ; GO:0000002 ; remark:
%macromolecule ; GO:0000003 ; remark:
%protein ; GO:0000004 ; remark:
%enzyme ; GO:0000005 ; remark:
%alpha-alpha-trehalase ; GO:0000006 ; remark: ; EC:3.2.1.28
%alpha-alpha-trehalose-phosphate synthase (UDP-forming) ; GO:0000007 ; remark: ; EC:2.4.1.15
%alpha-L-fucosidase ; GO:0000008 ; remark: ; EC:3.2.1.51
%alpha-N-acetylglucosaminidase ; GO:0000009 ; remark: ; EC:3.2.1.50
%alpha-amylase ; GO:0000010 ; remark: ; EC:3.2.1.1
%alpha-glucosidase II ; GO:0000011 ; remark: ; EC:3.1.2.20
%alpha-ketoacid dehydrogenase complex ; GO:0000012 ; remark:
<oxoglutarate dehydrogenase (lipoamide) ; GO:0000013 ; remark: ; EC:1.2.4.2
....
%DNA-directed DNA polymerase ; GO:0000054 ; remark: ; EC:2.7.7.7
%nuclear DNA-directed DNA polymerase ; GO:0000055 ; remark:
%alpha DNA polymerase ; GO:0000056 ; remark:
<alpha DNA polymerase, 180Kd-subunit ; GO:0000057 ; remark:
ma11> wc gene_ontology.v4.1
3081 22643 192480 gene_ontology.v4.1
Banbury Center, CSH Labs, August 1998
The founding meeting of the Gene Ontology Consortium
Problems with the GO:
is_a and part_of relationships are poorly defined
and not used consistently.
carries a baggage of implicit ontologies.
lack of relationships between the three GO
ontologies.
Problems with the GO:
is_a and part_of relationships are poorly defined
and not used consistently.
carries a baggage of implicit ontologies.
lack of relationships between the three GO
ontologies.
Implicit ontologies within the GO:
•
•
•
•
•
•
•
•
•
cysteine biosynthesis (ChEBI)
myoblast fusion (Cell Type Ontology)
hydrogen ion transporter activity (ChEBI)
snoRNA catabolism (Sequence Ontology)
wing disc pattern formation (Drosophila anatomy)
epidermal cell differentiation (Cell Type Ontology)
regulation of flower development (Plant anatomy)
interleukin-18 receptor complex (not yet in OBO)
B-cell differentiation (Cell Type Ontology)
Integrating ontologies
CL
GO
blood
cell
cell differentiation
lymphocyte
differentiation
lymphocyte
B-cell
activation
B-cell
is_a
B-cell differentiation
CELL Ontology
[Term]
id: CL:0000236
name: B-cell
is_a: CL:0000542 ! lymphocyte
develops_from: CL:0000231 ! B-lymphoblast
Augmented GO
[Term]
id: GO:0030183
name: B-cell differentiation
is_a: GO:0042113 ! B-cell activation
is_a: GO:0030098 ! lymphocyte differentiation
intersection_of: is_a GO:0030154 ! cell differentiation
intersection_of: has_participant CL:0000236 ! B-cell
Problems with the GO:
is_a and part_of relationships are poorly defined
and not used consistently.
carries a baggage of implicit ontologies.
lack of relationships between the three GO
ontologies.
molecular_function
biological_process
obo.sf.net
obo
www.bioontology.org
obofoundry.org
The OBO Foundry
•
To create the conditions for a step-by-step
evolution towards robust gold standard
reference ontologies in the biomedical
domain.
•
To introduce some of the features of
scientific peer review into biomedical ontology
development.
The OBO Foundry
A subset of OBO ontologies whose developers agree
in advance to accept a common set of principles
designed to assure
–
–
–
–
–
–
intelligibility to biologist curators, annotators, users
formal robustness
stability
compatibility
interoperability
support for logic-based reasoning
The OBO Foundry
•
The ontology is open and available to be used by all.
•
The developers of the ontology agree in advance to
collaborate with developers of other OBO Foundry
ontology where domains overlap. The importance of
community collaboration cannot be overstated.
•
The ontology is in, or can be instantiated in, a common
formal language.
•
The ontology possesses a unique identifier space within
OBO.
•
The ontology provider has procedures for identifying
distinct successive versions.
The OBO Foundry
•
The ontology has a clearly specified and clearly
delineated content.
•
The ontology includes textual definitions for all terms.
•
The ontology is well-documented.
•
The ontology has a plurality of independent users.
•
The ontology uses relations which are unambiguously
defined following the pattern of definitions laid down in
the OBO Relation Ontology.
Foundational relations
is_a
part_of
Spatial relations
located_in
contained_in
adjacent_to
Temporal relations
transformation_of
derives_from
preceded_by
Participation relations
has_participant
has_agent
regulates
Genome Biology 6:R46, 2005.
Good ontologies require:
Consistent use of terms, supported by
logically coherent (non-circular)
definitions, in equivalent humanreadable and computable formats
Coherent shared treatment of relations
to allow cascading inference both
within and between ontologies
Each node of an ontology
consists of:
• preferred term
• term identifier
• synonyms
• definition, glosses, comments
Ontology =
A Representation of Types
Nodes in an ontology are
connected by relations:
primarily: is_a (= is subtype of)
and part_of
designed to support search,
reasoning and annotation
Ontology =
A Representation of Types
The aims of SO
1. Develop a shared set of terms and concepts to
annotate biological sequences.
2. Apply these in our separate projects to provide
consistent query capabilities between them.
3. Provide a software resource to assist in the
application and distribution of SO.
The scope of the SO
1. Features that can be located on a sequence with
coordinates. exon, promoter, binding_site
2. Properties of these features:
–
Sequence attributes
•
–
Consequences of mutation
•
–
Maternally_imprinted_gene
mutation_affecting_editing
Chromosome variation
•
aneuploid
What is a pseudogene?
• Human
– Sequence similar to known protein but contains
frameshift(s) and/or stop codons which disrupts the ORF.
• Neisseria
– A gene that is inactive - but may be activated by
translocation (e.g. by gene conversion) to a new
chromosome site.
– - note such a gene would be called a “cassette” in yeast.
Give me all the dicistronic
genes
• Define a dicistronic gene in terms of the cardinality of the
transcript to open-reading-frame relationship and the
spatial arrangement of open-reading frames.
• ISA—927 relationships
• PARTOF—186 relationships
holonym
meronym
Relationships allow
reasoning.
• VALIDATION - We can check the internal
consistency of an annotation against the
ontology. We can also check that any
topological assertions are true.
•  3’ UTR part_of mRNA
•  intron part_of mRNA
Classical Extensional Mereology
•
The formal properties of parts:
1. If A is a proper part of B then B is not a part of A (nothing is a
proper part of itself)
2. If A is a part of B and B is a part of C then A is a part of C
•
Because of these rules, we can apply functions to
parts…
Extensional Mereology (EM) : a formal theory of parts
EM operation
Definition
Overlap
(x○y)
x and y overlap if they have a part in common.
Disjoint
(xιy)
x and y are disjoint if they share no parts in common.
Binary Product
(x.y)
The parts that x and y share in common.
Difference
(x–y)
The largest portion of x which has no part in common with
y.
Binary Sum
(x+y)
The set consisting of individuals x and y
Gene Ontology Consortium
http://www.geneontology.org
QuickTi me™ and a
TIFF ( LZW) decompressor
are needed to see thi s pi ctur e.
The Pathogen
Group
Schizosaccharomyces pombe
Genome Sequencing Project
DictyBase