Alternative Splicing (a review by Liliana Florea, 2005)

Download Report

Transcript Alternative Splicing (a review by Liliana Florea, 2005)

Alternative Splicing
(a review by Liliana Florea, 2005)
CS 498 SS
Saurabh Sinha
11/30/06
What is alternative splicing?
• The first result of transcription is “pre-mRNA”
• This undergoes “splicing”, i.e., introns are
excised out, and exons remain, to form
mRNA
• This splicing process may involve different
combinations of exons, leading to different
mRNAs, and different proteins
• This is alternative splicing
Alternative splicing
• Important regulatory mechanism, for
modulating gene and protein content in
the cell
• Large-scale genomic data today
suggests that as many as 60% of the
human genes undergo alternative
splicing
Significance
• Number of human genes has recently been
estimated to be about 20-25 K.
• Not significantly greater than much less
complex organisms
• Alternative splicing is a potential explanation
of how a large variety of proteins can be
achieves with a small number of genes
• Errors in splicing mechanism implicated in
diseases such as cancers
What happens in alternative
splicing?
• Different combinations of exons within a gene
are spliced from the RNA precursor, to be
included in mRNA
• The combination depends on tissue type,
developmental stage, disease etc.
• Thus different proteins in these different
conditions
• Different types of alternative splicing on next
slide
http://bib.oxfordjournals.org/cgi/content/full/7/1/55/F1
exon inclusion/exclusion
alternative 5’ exon
alternative 3’ exon
intron retention
5’ alternative UTR
3’ alternative UTR
Bioinformatics of Alt. splicing
• Two main goals:
– Find out cases of alt. splicing
• What are the different forms (“isoforms”) of a
gene?
– Find out how alt. splicing is regulated
• What are the sequence motifs controlling alt.
splicing, and deciding which isoform will be
produced
Identification of splice variants
• All cells have same genome
• But all cells don’t have the same
“transcriptome” (i.e., transcripts)
– Different cells may express different
(alternative) transcripts of the same gene
• Goal of bioinformatics is to find “splice
forms”, i.e., what are the alternative
splicing events?
Identification of splice variants
• Direct comparison between sequences of
different cDNA isoforms
– Q: What is cDNA? How is this different from a
gene’s DNA?
– cDNA is “complementary DNA”, obtained by
reverse transcription from mRNA. It has no introns
• Direct comparison reveals differences in the
isoforms
• But this difference could be part of an exon, a
whole exon, or a set of exons
Bioinformatics methods for identifying alternative splicing
direct
comparison
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Copyright restrictions may apply.
Identification of splice variants
• Comparison of exon-intron structures
(the gene’s architecture)
• Where do the exon-intron structures
come from?
– Align cDNA (no introns) with genomic
sequence (with introns)
– This gives us the intron and exon structure
Bioinformatics methods for identifying alternative splicing
comparison
of exon-intron
structures
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Copyright restrictions may apply.
Identification of splice variants
• Alignment tools.
• Align cDNA sequence to genomic sequence
• Why shouldn’t this be a perfect match with
gaps (introns)?
– Sequencing errors, polymorphisms, etc.
• Special purpose alignment programs for this
purpose
Identifying full lengh alt.
spliced transcripts
• Previous methods identified parts of alt.
spliced transcript
• Much more difficult to identify full length
alternatively spliced transcripts
• Such methods include “gene indices”
Gene indices
• Compare all EST sequences against
one another
• Identify significant overlaps
• Group and assemble sequences with
compatible overlaps into clusters
Gene
indices
Problems with gene indices
• Overclustering: paralogs may get clustered
together.
– What are paralogs?
– Related but distinct genes in the same species
• Underclustering: if number of ESTs is not
sufficient
• Computationally expensive:
– Quadratic time complexity
Splice graphs
•
•
•
•
Nodes: Exons
Edges: Introns
Gene: directed acyclic graph
Each path in this DAG is an alternative
transcript
Splice
graph
Splice graphs
• Combinatorially generate all possible
alt. transcripts
• But not all such transcripts are going to
be present
• Need scores for candidate transcripts,
in order to differentiate between the
biologically relevant ones and the
artifactual ones
Splice variants from
microarray data
• Affymetrix GeneChip technology uses
22 probes collected from exons or
straddling exon boundaries
• When an exon is alternatively spliced,
expression level of its probes will be
different in different experiments
Bioinformatics methods for identifying alternative splicing
splice variants
from micro
array data
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Copyright restrictions may apply.
Part 2:
Regulation of
alternative splicing
Biological mechanism
• Splicing of pre-mRNA is a complex cellular
process
• “Spliceosome” is a complex of several
molecules that assembles onto each intron
and catalyzes the excision of the intron
• Splice sites (5’ or donor splice site and 3’ or
acceptor splice site) play a major role in
splicing
• More sites, apart from the splice signals, in
introns and exons, contribute to splicing
Biological mechanism
• Cis-regulatory elements (again !)
• Promote (“splicing enhancers”) or
repress (“splicing silencers”) the
inclusion of the exon in the mRNA
• Can be located in exons or introns
Bioinformatics methods
• Goal: find the cis-regulatory elements
that mediate splicing (alternative
splicing)
• Early work: find consensus sequences
(motifs) of splicing enhancers
• More advanced work: Position weight
matrices (PWMs)
Bioinformatics representations of splicing regulatory motifs: (a) consensus sequence and (b)
position weight matrix (PWM)
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Copyright restrictions may apply.
Motif finding (again !)
• Statistical overrepresentation
• Find k-mers that occur more often in one
class of sequences than in another;
• Should be statistically significant
• Exonic splicing enhancers (ESE) are more
likely to occur in exons than in introns; hence
find 6-mers (k=6) statistically overrepresented
in exons compared to introns
• Calculate z-score of count
– (Count - mean)/(standard deviation)
– Homework 1
Motif finding
• Other standard approaches of motif
finding also adopted:
– MEME & Gibbs sampling
• Comparative genomics
– Find conserved sites in introns
– Find conserved sites in exons. This has to
be done carefully. Because exons already
have selective pressure.
Summary
• Alternative splicing is very important
• Bioinformatics for finding alternative
spliced forms
• Bioinformatics for finding regulatory
mechanisms