Using the Gene Ontology for Expression Analysis

Download Report

Transcript Using the Gene Ontology for Expression Analysis

Using the Gene Ontology
(GO) for analysis of
expression data
Jane Lomax
EMBL-EBI
25th June 2007
Jane Lomax
What is the Gene Ontology?
• Set of standard biological phrases (terms)
which are applied to genes/proteins:
– protein kinase
– apoptosis
– membrane
25th June 2007
Jane Lomax
What is the Gene Ontology?
• Genes are linked, or associated, with GO
terms by trained curators at genome
databases
– known as ‘gene associations’ or GO
annotations
• Some GO annotations created
automatically
25th June 2007
Jane Lomax
GO annotations
GO database
gene ->
GO term
associated genes
genome and protein
databases
25th June 2007
Jane Lomax
What is the Gene Ontology?
• Allows biologists to make queries across
large numbers of genes without
researching each one individually
25th June 2007
Jane Lomax
Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868
Copyright ©1998 by the National Academy of Sciences
GO structure
• GO isn’t just a flat list of
biological terms
• terms are related within
a hierarchy
25th June 2007
Jane Lomax
GO structure
gene
A
25th June 2007
Jane Lomax
GO structure
• This means genes can
be grouped according
to user-defined levels
• Allows broad
overview of gene set
or genome
25th June 2007
Jane Lomax
How does GO work?
• GO is species independent
– some terms, especially lower-level, detailed
terms may be specific to a certain group
• e.g. photosynthesis
– But when collapsed up to the higher levels,
terms are not dependent on species
25th June 2007
Jane Lomax
How does GO work?
What information might we want to
capture about a gene product?
• What does the gene product do?
• Where and does it act?
• Why does it perform these activities?
25th June 2007
Jane Lomax
GO structure
• GO terms divided into three parts:
– cellular component
– molecular function
– biological process
25th June 2007
Jane Lomax
Cellular Component
• where a gene product acts
25th June 2007
Jane Lomax
Cellular Component
25th June 2007
Jane Lomax
Cellular Component
25th June 2007
Jane Lomax
Cellular Component
• Enzyme complexes in the component
ontology refer to places, not activities.
25th June 2007
Jane Lomax
Molecular Function
• activities or “jobs” of a gene product
glucose-6-phosphate isomerase activity
25th June 2007
Jane Lomax
Molecular Function
25th June 2007
insulin binding
insulin receptor activity
Jane Lomax
Molecular Function
25th June 2007
drug transporter activity
Jane Lomax
Molecular Function
• A gene product may have several
functions
• Sets of functions make up a biological
process.
25th June 2007
Jane Lomax
Biological Process
a commonly recognized series of events
25th June 2007
Jane Lomax
cell division
Biological Process
25th June 2007
Jane Lomax
transcription
Biological Process
regulation of gluconeogenesis
25th June 2007
Jane Lomax
Biological Process
25th June 2007
Jane Lomax
limb development
Biological Process
25th June 2007
Jane Lomax
courtship behavior
Ontology Structure
• Terms are linked by two relationships
– is-a
– part-of
25th June 2007
Jane Lomax


Ontology Structure
cell
membrane
mitochondrial
membrane
25th June 2007
Jane Lomax
is-a
part-of
chloroplast
chloroplast
membrane
Ontology Structure
• Ontologies are structured as a
hierarchical directed acyclic graph (DAG)
• Terms can have more than one parent
and zero, one or more children
25th June 2007
Jane Lomax
Ontology Structure
cell
membrane
mitochondrial
membrane
25th June 2007
Jane Lomax
Directed Acyclic Graph
(DAG) - multiple
parentage allowed
chloroplast
chloroplast
membrane
Anatomy of a GO term
id: GO:0006094
unique GO ID
name: gluconeogenesis
term name
ontology
namespace: process
def: The formation of glucose from
noncarbohydrate precursors, such as
definition
pyruvate, amino acids and glycerol.
[http://cancerweb.ncl.ac.uk/omd/index.html]
exact_synonym: glucose biosynthesis
synonym
xref_analog: MetaCyc:GLUCONEO-PWY
database ref
is_a: GO:0006006
parentage
is_a: GO:0006092
25th June 2007
Jane Lomax
GO terms
• Where do GO terms come from?
– GO terms are added by editors at EBI and annotating
databases
– new terms are usually only added when they are
asked for by annotators
– GO editors work with experts to make major
ontology developments
• metabolism
• pathogenesis
• cell cycle
25th June 2007
Jane Lomax
GO stats
• over 23,000 GO terms:
– 13593 biological_process
– 1980 cellular_component
– 7700 molecular_function
25th June 2007
Jane Lomax
GO annotations
• Where do the links between genes and
GO terms come from?
25th June 2007
Jane Lomax
GO annotations
• Contributing databases:
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Berkeley Drosophila Genome Project (BDGP)
dictyBase (Dictyostelium discoideum)
FlyBase (Drosophila melanogaster)
GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania
major and Trypanosoma brucei)
UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases
Gramene (grains, including rice, Oryza)
Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus
musculus)
Rat Genome Database (RGD) (Rattus norvegicus)
Reactome
Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae)
The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana)
The Institute for Genomic Research (TIGR): databases on several bacterial
species
WormBase (Caenorhabditis elegans)
Zebrafish Information Network (ZFIN): (Danio rerio)
25th June 2007
Jane Lomax
Species coverage
• All major eukaryotic model organism
species
• Human via GOA group at UniProt
• Several bacterial and parasite species
through TIGR and GeneDB at Sanger
– many more in pipeline
25th June 2007
Jane Lomax
Annotation coverage
25th June 2007
Jane Lomax
Anatomy of a GO annotation
• Three key parts:
– gene name/id
– GO term(s)
– evidence for association
25th June 2007
Jane Lomax
Example annotation
• Breast cancer type 1 susceptibility protein gene
in humans
25th June 2007
Jane Lomax
Types of GO annotation:
25th June 2007

Electronic Annotation

Manual Annotation
Jane Lomax
Manual annotation
• Created by scientific curators
• High quality
• Small number
25th June 2007
Jane Lomax
Manual annotation
In this study, we report the isolation and molecular
characterization of the B. napus PERK1 cDNA, that is predicted to
encode a novel receptor-like kinase. We have shown that like
other plant RLKs, the kinase domain of PERK1 has
serine/threonine kinase activity, In addition, the location of a
PERK1-GTP fusion protein to the plasma membrane supports the
prediction that PERK1 is an integral membrane protein…these
kinases have been implicated in early stages of wound response…
25th June 2007
Jane Lomax
Manual annotation
25th June 2007
Jane Lomax
Electronic Annotation
• Annotation derived without human
validation
– mappings file e.g. interpro2go, ec2go.
– Blast search ‘hits’
• Lower ‘quality’ than manual codes
25th June 2007
Jane Lomax
Mappings files
Fatty acid biosynthesis
( Swiss-Prot Keyword)
EC:6.4.1.2
(EC number)
GO:Fatty acid biosynthesis
(GO:0006633)
GO:acetyl-CoA carboxylase
activity
(GO:0003989)
IPR000438: Acetyl-CoA
carboxylase carboxyl
transferase beta subunit
(InterPro entry)
25th June 2007
Jane Lomax
GO:acetyl-CoA carboxylase
activity
(GO:0003989)
Evidence types
•
•
•
•
•
•
•
•
•
•
ISS: Inferred from Sequence/structural Similarity
IDA: Inferred from Direct Assay
IPI: Inferred from Physical Interaction
IMP: Inferred from Mutant Phenotype
IGI: Inferred from Genetic Interaction
IEP: Inferred from Expression Pattern
TAS: Traceable Author Statement
NAS: Non-traceable Author Statement
IC: Inferred by Curator
ND: No Data available
•
IEA: Inferred from electronic annotation
25th June 2007
Jane Lomax
GO tools
• GO resources are freely available to
anyone to use without restriction
– Includes the ontologies, gene associations
and tools developed by GO
• Other groups have used GO to create
tools for many purposes:
http://www.geneontology.org/GO.tools
25th June 2007
Jane Lomax
GO tools
• Affymetrix also provide a Gene Ontology
Mining Tool as part of their NetAffx™
Analysis Center which returns GO terms
for probe sets
25th June 2007
Jane Lomax
GO tools
• Many tools exist that use GO to find
common biological functions from a list
of genes:
http://www.geneontology.org/GO.tools.microarray.shtml
25th June 2007
Jane Lomax
GO tools
• Most of these tools work in a similar way:
– input a gene list and a subset of ‘interesting’
genes
– tool shows which GO categories have most
interesting genes associated with them i.e.
which categories are ‘enriched’ for
interesting genes
– tool provides a statistical measure to
determine whether enrichment is significant
25th June 2007
Jane Lomax
Microarray process
•
•
•
•
•
•
•
•
Treat samples
Collect mRNA
Label
Hybridize
Scan
Normalize
Select differentially regulated genes
Understand the biological phenomena involved
25th June 2007
Jane Lomax
Traditional analysis
Gene 1
Apoptosis
Cell-cell signaling
Protein phosphorylation
Mitosis
…
Gene 3
Growth control
Gene 4
Mitosis
Nervous system
Oncogenesis
Pregnancy
Protein phosphorylation
Oncogenesis
…
Mitosis
…
25th June 2007
Jane Lomax
Gene 2
Growth control
Mitosis
Oncogenesis
Protein phosphorylation
…
Gene 100
Positive ctrl. of cell prolif
Mitosis
Oncogenesis
Glucose transport
…
Traditional analysis
• gene by gene basis
• requires literature searching
• time-consuming
25th June 2007
Jane Lomax
Using GO annotations
• But by using GO annotations, this work
has already been done for you!
GO:0006915 : apoptosis
25th June 2007
Jane Lomax
Grouping by process
Apoptosis
Gene 1
Gene 53
Positive ctrl. of
cell prolif.
Gene 7
Gene 3
Gene 12
…
25th June 2007
Jane Lomax
Mitosis
Gene 2
Gene 5
Gene45
Gene 7
Gene 35
…
Glucose transport
Gene 7
Gene 3
Gene 6
…
Growth
Gene 5
Gene 2
Gene 6
…
GO for microarray analysis
• Annotations give ‘function’ label to genes
• Ask meaningful questions of microarray
data e.g.
– genes involved in the same process,
same/different expression patterns?
25th June 2007
Jane Lomax
Using GO in practice
• statistical measure
– how likely your differentially regulated genes
fall into that category by chance
80
70
60
50
40
30
20
10
0
microarray
1000 genes
25th June 2007
experiment
Jane Lomax
100 genes
differentially
regualted
mi tos is
apoptosi s
posi ti ve con trol of
glu cose tran sport
cel l prol ife ration
mitosis – 80/100
apoptosis – 40/100
p. ctrl. cell prol. – 30/100
glucose transp. – 20/100
Using GO in practice
• However, when you look at the
distribution of all genes on the
microarray:
Process
Genes on array
mitosis
apoptosis
p. ctrl. cell prol.
glucose transp.
25th June 2007
Jane Lomax
800/1000
400/1000
100/1000
50/1000
# genes expected in
100 random genes
80
40
10
5
occurred
80
40
30
20
Enrichment tools
• GO is developing its own enrichment tool
as part of the GO browser AmiGO
• Currently in testing phase, should be
released next month
25th June 2007
Jane Lomax
Onto-Express walkthrough
http://vortex.cs.wayne.edu/projects.htm#Onto-Express
25th June 2007
Jane Lomax