Transcript Ontologies

Ontologies
GO Workshop
3-6 August 2010
Ontologies
What are ontologies?
 Why use ontologies?
 Open Biological Ontologies (OBO),
National Center for Biomedical Ontology
(NCBO)
 Some useful ontologies…

What Are Ontologies?
"An ontology is an explicit specification of some topic. For our purposes,
it is a formal and declarative representation which includes the
vocabulary (or names) for referring to the terms in that subject area and
the logical statements that describe what the terms are and how they
are related to each other…
“Ontologies therefore provide a vocabulary for representing and
communicating knowledge about some topic and a set of relationships
that hold among the terms in that vocabulary”
(From the Stanford Knowledge Systems Lab).
What Are Ontologies?
“An ontology is a controlled vocabulary of well defined terms
with specified relationships between those terms, capable of
interpretation by both humans and computers.”
Bio-ontologies can be used to provide
structured annotation.
Biocurators are biologists who are trained to
catalogue biological data (using database
structures, bio-ontologies, etc).
Why use ontologies?

new sequencing technologies are
increasing the rate that DNA is
sequenced:


Jan 2009: 20 billion bases (or letters) of highquality human DNA sequence – seven-times
the length of a human genome – in 10 days.
Computer analysis of the genome took another
10 days.
complexity of data is also increasing
How manage the data?
- data sharing
- from data to knowledge
Why use ontologies?



Bio-ontologies are used to capture biological
information in a way that can be read by both
humans and computers
 annotate data in a consistent way
 allows data sharing across databases
 allows computational analysis of high-throughput
“omics” datasets
Objects in an ontology (eg. genes, cell types, tissue
types, stages of development) are well defined.
The ontology shows how the objects relate to each
other.
relationships
between terms
Ontologies
digital identifier
(computers)
description
(humans)
Ontology Relationships
ontologies link terms using relationships
 relations between terms are also categorized and
defined
 GO:







is a (eg. lyase activity is a catalytic activity)
part of (eg. replication fork is part of chromosome)
regulates
negatively regulates
positively regulates
PO:



is a
part of
develops from
http://www.geneontology.org/GO.onto
logy.relations.shtml
Relationships: the True Path Rule




Why are relationships between terms
important?
TRUE PATH RULE: all attributes of
children must hold for all parents
so if a protein is annotated to a term, it
must also be true for all the parent
terms
this enables us to move up the ontology
structure from a granular term to a
broader term
Premise of many GO anaylsis tools
Bio-ontology requirements
1.
Ontology development

2.
Annotate data to the ontology


3.
continual process as new terms are added to
support more detailed data
computational annotation (breadth - quick)
manual biocuration (depth - slow)
Tools that use the ontology data


browsing and searching the ontology and its
associated data
analysis of data annotated to the ontology
Resources for biocuration


bio-ontologies (Open Biomedical Ontologies)
computational pipelines (‘breadth’)



manual biocuration (‘depth’)




for computational annotations
useful for gene products without published information
requires trained biocurators
community annotation efforts
each species has its own body of literature
biocuration co-ordination




MODs? Consortium? Community?
biocuration prioritization
co-ordination with existing Dbs, annotation, nomenclature
initiatives
data updates
Current bio-ontology limitations
ontology development
 annotation strategies to match increasing
amount of biological data





computational pipelines & biocomputing
community annotation/prioritization strategies
biocurators
tools for dataset analysis (data
complexity)


cross-ontology data mining
data visualization
http://obo.sourceforge.net/
The Open Biomedical Ontology is an initiative
to develop bio-ontologies using common
rules/principles and resources
 aim to develop interoperable ontologies





common relationships
common evidence codes
standardize file sharing
develop links between ontologies?
http://obo.sourceforge.net/
Gene Ontology
Plant Ontology
Sequence Ontology
Trait Ontology
Expression/Tissue Ontologies
Infectious Disease Ontology
Cell Ontology
Genomic Annotation

1.
2.

Genome annotation is the process of
attaching biological information to genomic
sequences. It consists of two main steps:
identifying functional elements in the
genome: “structural annotation”
attaching biological information to these
elements: “functional annotation”
biologists often use the term “annotation”
when they are referring only to structural
annotation
Structural & Functional Annotation
Structural Annotation:
 Open reading frames (ORFs) predicted during genome
assembly
 predicted ORFs require experimental confirmation
 Sequence Ontology Project (SO): provide for a structured
controlled vocabulary for the description of primary
annotations of nucleic acid sequence
Functional Annotation:
 Gene Ontology (GO): annotation of gene product function
 initially, predicted ORFs have no functional literature and GO
annotation relies on computational methods (rapid)
 functional literature exists for many genes/proteins prior to
genome sequencing
Genomic Annotation
Other
annotations
using other bioontologies e.g.
Anatomy
Ontology
Structural Annotation
including Sequence Ontology
Functional annotation using
Gene Ontology
Nomenclature
(species’ genome
nomenclature
committees)
Gene Ontology (GO)

Not about genes!



Gene products: genes, transcripts, ncRNA,
proteins
The GO describes gene product function
Not a single ontology



Biological Process (BP or P)
Molecular Function (MF or F)
Cellular Component (CC or C)
Gene Ontology (GO)
de facto method for functional annotation
 Widely used for functional genomics (high
throughput)
 Many tools available for gene expression
analysis using GO
 The GO Consortium homepage:

http://www.geneontology.org
Plant Ontology (PO)





describes plant structures and growth and
developmental stages
Currently used for Arabidopsis, maize, rice – more
being added (soybean, tomato, cotton, etc)
Plant Structure: describes morphological and
anatomical structures representing organ, tissue and
cell types
Growth and developmental stages: describes (i)
whole plant growth stages and (ii) plant structure
developmental stages
The PO Consortium homepage:
http://www.plantontology.org/
PO Browser – based on the GO Consortium browser, Amigo
http://www.ebi.ac.uk/ontology-lookup/