Transcript Document

Extending to the GO model
OBO
open biology ontologies
http://obo.sf.net/
aka - extended go - (ego)
obo.sf.net
obo
The aims of SO
1.
Develop a shared set of terms and concepts to
annotate biological sequences.
2.
Apply these in our separate projects to provide
consistent query capabilities between them.
3. Provide a software resource to assist in the application
and distribution of SO.
What is a pseudogene?
• Human
– Sequence similar to known protein but contains frameshift(s)
and/or stop codons which disrupts the ORF.
• Neisseria
– A gene that is inactive - but may be activated by translocation
(e.g. by gene conversion) to a new chromosome site.
– - note such a gene would be called a “cassette” in yeast.
Give me all the dicistronic
genes
• Define a dicistronic gene in terms of the cardinality of the
transcript to open-reading-frame relationship and the spatial
arrangement of open-reading frames.
• ISA—927 relationships
• PARTOF—186 relationships
holonym
meronym
Classical Extensional Mereology
•
The formal properties of parts:
1.
If A is a proper part of B then B is not a part of A (nothing is a proper
part of itself)
2.
•
If A is a part of B and B is a part of C then A is a part of C
Because of these rules, we can apply some functions to
parts…
Extensional Mereology (EM) : a formal theory of parts
EM operation
Definition
Overlap
(x○y)
x and y overlap if they have a part in common.
Disjoint
(xιy)
x and y are disjoint if they share no parts in common.
Binary Product
(x.y)
The parts that x and y share in common.
Difference
(x–y)
The largest portion of x which has no part in common with
y.
Binary Sum
(x+y)
The set consisting of individuals x and y
Exon distribution to transcripts Drosophila
chromosome 4.
Exon part of single transcript
285
Exon in all transcripts
243 (52%)
Exon in one transcript
148 (32%)
Exon in > 1 but < all
74 (16%)
Anatomy Ontologies
For the representation of
phenotypic and expression data.
Now available for: Drosophila,
Mus, C. elegans, Arabidopsis,
Ozyra ….
The need for a (bio)chemical ontology.
•CAS - commercial & expensive.
•LIGAND - no internal structure.
•MESH - semantically weak, very biased
towards pharmaceutical agents.
ChEBI - in development at EBI - 1st release
was June 2004.
Tissue, cell & pathology ontologies.
Medical ontologies - e.g. SNOMED (a) commercial.
(b) designed not for research, but for
billing.
The next challenge
A syntax and semantics for the description
of phenotypic data.
entity
describes
attribute
has
value
Thank yous
• Berkeley
– Chris Mungall, John Richter, Brad Marshall
• Insightful biologists
– Midori Harris, David Hill, Bernard de Bono
• My Co-founders
– Suzanna Lewis, Judith Blake, and Mike Cherry
• The GO Editorial Team at the EBI
– Midori Harris, Jane Lomax, Amelia Ireland & Jennifer Clark
• SO: Karen Eilbeck, Mark Yandel
• And many, many more…
Gene Ontology Consortium
http://www.geneontology.org
Quick Time™ an d a
TIFF ( LZW) dec ompre sso r
ar e n eed ed to s ee this pic ture .
The Pathogen
Group
Schizosaccharomyces pombe
Genome Sequencing Project
DictyBase