Zeeberg - Gene Ontology Consortium

Download Report

Transcript Zeeberg - Gene Ontology Consortium

GUI GoMiner and
High-Throughput GoMiner
Analysis of Alternative Splice Variants
Barry Zeeberg, Ari Kahn, Michael Ryan,
David Kane, Curtis Jamison, Hongfang Liu,
Alessandro Ferrucci, William Reinhold, and
John Weinstein
plus a lot of help from Rich Einstein and
Mike Brenner of ExonHit
The World According to a Microarray:
• Genes are not Genes
• Genes are a Mixture of Splice Variants
Patterns of
alternative
splicing
The Ostrich Effect
• Tend to hide our head in the sand
• Treat microarray data as if a gene did not
have multiple alternative splice forms
• But altered expression of one splice variant
can be more important than altered
expression of the “gene”
> i.e., lumping together all splice forms in
one monolithic measurement is bad to do
Motivation: The Problem
• In many disease states, differential expression of
individual splice variants may be more relevant than
differential expression of genes
• Traditional microarrays are not designed to permit
elucidation of individual splice variants
• State-of-the-art microarrays are being developed to
permit elucidation of individual splice variants
• A major limitation is that software tools are not
available to exploit the potential information content
of the state-of-the-art microarrays
Our Solution: Three Components
• Develop a database (EVDB) and web
application (SpliceMiner) that maps
probe sequences to known splice
variants
• Enhance GoMiner with a mechanism to
process splice variants
• Connect these two “ends” with the
appropriate integration approach
Our Solution: Three Components
• Develop a database (EVDB) and web
application (SpliceMiner) that maps
probe sequences to known splice
variants
• Enhance GoMiner with a mechanism to
process splice variants
• Connect these two “ends” with the
appropriate integration approach
SpliceMiner Home Page
Remember these: used later in
GoMiner “Tilde” mechanism!!
HGNC symbol
chromosomal
coordinates
“Batch” is key to analysis
of microarray results
Our Solution: Three Components
• Develop a database (EVDB) and web
application (SpliceMiner) that maps
probe sequences to known splice
variants
• Enhance GoMiner with a mechanism
to process splice variants
• Connect these two “ends” with the
appropriate integration approach
GoMiner and
High-Throughput GoMiner
• GoMiner organizes lists of 'interesting' genes
(for example, under- and overexpressed
genes from a microarray experiment) for
biological interpretation in the context of the
Gene Ontology
• High-Throughput GoMiner is an
enhancement of GoMiner which efficiently
performs the computationally-challenging
task of automated batch processing of an
arbitrary number of microarray experiments
GoMiner “Tilde” (“~”) Mechanism
• GoMiner traditionally dereplicates input files so that
only one instance of a gene name is processed
• When multiple alternatively spliced forms are to be
analyzed, however, dereplication would result in a
loss of relevant information
• Consequently, we have added a new feature to
GoMiner to retain full information about the
alternative splice variants by replicating the input of
each gene according to the number of alternative
exons
Example of Tilde Mechanism
• As a specific example, suppose that a microarray
platform contained probes that were unique for two
different splice variants of BRCA1
• Then the two splice variants would be designated as
'BRCA1~1' and 'BRCA1~2'
• The '~' tells GoMiner to treat these as different
entries, rather than to de-replicate them, but to ignore
the suffix when querying the GO database
• By this mechanism, all splice variants are counted
when computing the Fisher exact p value
A Publication using Tilde Mechanism
• Study of “exon expression” regulated by
Nova, a key neuronal splicing factor
• Reference: Nova regulates brainspecific splicing to shape the synapse,
Ule et al., Nature Genetics 37, 844 852 (2005)
GoMiner Detected Differences
in Neurologically-Important GO
Categories between Wild Type
and Nova Knockouts
Significance of Nova paper
• First description of a regulatory module
operating at the level of information content
mediated by RNA exon usage
• Levels of Nova-regulated RNAs are unchanged
in knockout versus wild-type brains: alternative
exon usage as a means of modulating the
quality of synaptic protein interactions
• Regulation of quality, not quantity
Our Solution: Three Components
• Develop a database (EVDB) and web
application (SpliceMiner) that maps
probe sequences to known splice
variants
• Enhance GoMiner with a mechanism to
process splice variants
• Connect these two “ends” with the
appropriate integration approach
Generalization of the
Tilde Mechanism
• A Previous slide noted that two splice variants could
be designated as ‘BRCA1~1’ and ‘BRCA1~2’
• But the suffix can be an arbitrary string that carries
biological information, not just used as an ordinal
index
• So we can use the output of SpliceMiner (HGNC
symbol, GenBank accession, chromosomal
coordinates) to construct a string of the correct form,
with a suffix that is highly informative
• Using the output from SpliceMiner as the input to
GoMiner will connect the two “ends” and permit splice
variant-based GO categorization
Conclusions
• The new era of microarray research will
demand analysis of differential expression of
exons and transcripts, rather than genes
• We are developing resources to map probe
sequences to exons and transcripts
• GoMiner can integrate this information with
GOA to allow the molecular biologist to
leverage both knowledgebases for enhanced
analysis and interpretation of microarray data
Collaborators
GBG:
Ari Kahn
Michael Ryan
David Kane
Hongfang Liu
William Reinhold
John Weinstein
GMU:
Curtis Jamison
UMBC:
Alessandro Ferrucci
ExonHit:
Rich Einstein
Mike Brenner