Transcript Lecture 12

Bioinformatics
Lecture 13
• Alternative splicing
• Multiple isoforms
• Exonic Splicing Enhancers (ESE) and Silencers
(ESS)
• SpliceNest
ALTERNATIVE SPLICING
Two or more mRNA molecules can be
produced from the same gene
mRNA 1
Gene
mRNA 2
Number of mRNAs produced by Dscam gene in Drosophila melanogaster
exceeds 38, 016 different mature transcripts! The entire Drosophila
genome consists of only ~14,000 gene.
One Gene
Many Proteins
• The classical vision ONE GENE
ONE PROTEIN is not
correct for at least 40-60% of studied mammalian
genes
• Data show that many variants of mRNA and proteins
can be produced from the same gene
Gene
mRNA1
mRNA2
mRNA3
Protein1
Protein2
Protein3
Gene prediction/identification and alternative
splicing
• While gene prediction can be done relatively precisely, this may not
be sufficient to predict structure of the mature mRNA
• Different alternative mRNA isoforms can be produced from the
same gene in different tissues and in different time
• It means that numerous factors can enhance of silence certain
splicing points
• Identification of these factors is essential for improving the
predictive power of computer programs
• It is particularly important to combine experimental and
computational studies in order to get progress in this field
Five common models of mRNA alternative splicing
Exon skipping/inclusion
Alternative 3’ splice sites
Alternative 5’ splice sites
Mutually exclusive exons
Intron retention
Constitutive exon
Alternatively spliced exon
Alternative splicing of the -tropomyosin gene mRNA
Models of serine/arginine reach protein action in
Exonic Splicing Enhancer (ESE) dependent splicing
U2 snRNP – small nuclear ribonucleoprotein; RRM- RNA recognition motif; RS – Arg/Ser enriched domain
ESS – Exonic Splicing Silencer;
THE MODELS ARE NOT MUTUALY EXCLUISIVE AND MAY HAVE NUMEROUS VARIATIONS
Predictive identification of exonic splicing
enhancers (ESE) in human genes
• ESEs play important roles in constitutive and alternative splicing.
• A computational method, RESCUE-ESE, was developed that
predicts which sequences have ESE activity by statistical analysis of
exon-intron and splice site composition.
• When large data sets of human gene sequences were used, this
method identified 10 predicted ESE motifs. Representatives of all 10
motifs were found to display enhancer activity in vivo, whereas point
mutants of these sequences exhibited sharply reduced activity.
• The motifs identified enable prediction of the splicing phenotypes of
exonic mutations in human genes
Consensus RNA motifs for the sites attracting four
serine/arginine reach proteins acting as exonic splicing
enhancers (ESE)
Expressed Sequence Tags and splice sites
• An expressed sequence tag (EST) is a small part of the active part of
a gene, made from cDNA, which can be used to fish the rest of the
gene out of the chromosome, by matching base pairs with part of the
gene.
• ESTs and particularly consensus of sequences of clustered ESTs
provide useful information about splice variants of genes.
• Predicted human mRNA sequences were mapped onto human
genomic DNA to compute gene structure and splice variants. The
results have been collected in a public database, SpliceNest, with a
web based interactive graphical user interface. Similar computations
can be done for several other species.
htpp://splicenest.molgen.mpg.de/
SpliceNest: visualizing gene structure and
alternative splicing based on EST clusters
• SpliceNest is a tool to explore gene structure, including
alternative splicing, based on a mapping on the EST consensus
sequences (contigs) from GeneNest to the complete human
genome.
• SpliceNest is integrated with GeneNest and the SYSTERS
protein sequence cluster set in one framework, permitting an
overall exploration of the whole sequence space covering protein,
mRNA and EST sequences, as well as genomic DNA.
Cluster: A group of ESTs and/or mRNAs that are sufficiently similar to assume that they constitute transcripts from the same gene. Contig:
A representation of a (partial) transcript summarized by a consensus sequence, created by multiple alignment of overlapping sequences.
Alternative splice candidates