Splicing (1977) Roberts and Sharp (Nobel 1993)

Download Report

Transcript Splicing (1977) Roberts and Sharp (Nobel 1993)

On the biological
significance of alternative
splicing: a bioinformatics
approach
Sandro J. de Souza
TDR, 07/05/2004
RNA 10:757-765, 2004
Genomics
Bioinformatics
Large-scale Biology
The Real Revolution
Early 20th century: Mendel and the inheritance laws
Mid 20th century: DNA as the genetic element (Avery)
Mid 20th century: Watson and Crick and the structure of DNA.
70’s and 80’s: Molecular biology/biotechnology
90’s and 21th century: Genomics and Bioinformatics
Paradigm in Biology: Evolution by means of natural selection
(Darwin and Wallace, mid 19th century)
Bioinformatics
Development of tools
 Gateway to explore new datasets
 Processing of data derived from largescale projects
 A new way to do hypothesis-driven
science

Splicing (1977)
Roberts and Sharp (Nobel 1993)
Exons
Introns
mRNA
Coding
Non-coding
Splicing
Splicing depends on recognition of exon-intron boundaries
Exon
A G
64 73
G
U
100 100
5’ site
A
62
A
68
G
84
Intron
U … Py12
63
N
C
65
A
G
100 100
3’ site
Splice sites are generic and consist solely of:
5’ boundary
3’ boundary
Acceptor site
Polypyrimidine tract
Exon
N
.....if they occur at the boundaries of the regions to be spliced
out, can change the splicing pattern, resulting in the deletion
or addition of whole sequences of amino acids.
Walter Gilbert. Why genes in pieces. Nature 271:501, 1978.
At least half of all human
genes undergo alternative
splicing
Biological significance or
spurious events?
Alternative splicing
1. Chromosomal ratio
activates txn of Sxl in
females only
2. SXL controls splicing of
tra-2 mRNA
3. Females: exon 2 (which
has a stop codon) is
removed via SXL
Males: exon 2 is not
removed.
4. Males: no active TRA
Females: TRA is made.
5. TRA directs splicing of dsx
mRNA in specific
manner; in males default
splicing occurs.
Alternative Splicing – Auditory Hair Cells
K+ channel
PM
Picture of human cochleal hair cells from
http://www.sickkids.on.ca/otolaryngology/Hearloss.asp
Cytosol
Sound frequency
Cytosolic Ca2+ concentration
K+ channel opens
Therefore Ca2+ concentration ‘decodes’ frequency
AVSGRK
AVSGRKAMFARYVPEIAALILNRKKYGGTFNSTRGRK
Dotted lines show regions of the protein dependent on splicing
Ca2+ concentration at which K+ channel opens depends on alternative splicing of K+
channel – 576 possible alternative splicing combinations
Types of alternative splicing:
5´
3´
mRNA
Exon skipping
Alternative 5’ splic. site
Alternative 3’ splic. site
Intron Retention
Large-scale analysis of
intron retention in the
human transcriptome
Pedro F.A. Galante, Noboru Jo Sakabe, Natanja Slager,
Sandro J. de Souza
Examples of intron retention events
with biological significance
Msl2 in Drosophila
 P element in Drosophila
 retroviruses

Immature B cells express membrane-bound Ig.
Activation leads to production of secreted form
Activation
Hydrophilic tail
Stop codons
Transmembrane domain
Ig gene
Immature B Cell
In immature B cells an intron
containing an early translational stop
signal is removed yielding a long
transcript. The additional sequence
encodes an transmembrane region.
Transmembrane domain
This intron is not removed in
activated B cells, giving rise to a
truncated (secreted) product
Hydrophilic stretch
Intron retention and cancer
CD44
Gastrin receptor
Ret tyrosine kinase
Fas receptor
several tumors
pancreas
pheochromocytomas
T-cell lymphoma
Known mRNAs
EST data
SAGE data
Transcriptome
Database
Genome Data
Genome-based cDNA clustering
Exon 1
Exon 2
Exon 3
DNA
RNAm
cluster
Transcript Mapping
P53
Types of Data
Dataset
Retention Full length
EST
Total
Prototype
Full length
640
691
1120
EST
2594
n.d
2594
Total
2793
691
3127
Experimental validation
14% of all human genes show evidence
of intron retention
Kan, States & Gish (2002)
36% of RefSeq database!
After sample statistics: 5%
Distribution of events along transcripts.
elite group
MGC
events in observed expected
CDS
287
(53%)
502
(93%)
5’ UTR
84 (15%)
27 (5%)
3’ UTR
170
(32%)
12 (2%)
p << 0.005
Observed
expected
87 (52%)
155 (93%)
15 (9%)
8 (5%)
65 (39%)
4 (2%)
p << 0.005
This bias can be a product of:
Underreporting of sequences
Nonsense-mediated decay (NMD)
2563 out of 3195 (80%) sequences
with a retained intron had an
exon/exon boundary downstream
of the retention event.
Retained introns are shorter
P<<<<0.001
Domains encoded by retained
introns
Number of domains entirely encoded by:
Retained introns only:
02
Exon-intron-exon:
31
Number of domains partially encoded by:
Retained introns only:
25
Exon-intron-exon:
10
Retained introns have a higher
GC content
P<<<<0.001
Did retained introns encode
protein domains?
Only retained introns in the CDS were
used.
 Only retained introns defined by fulllength mRNAs were used.
 Protein sequences were searched against
PFAM database.

Codon Usage
Conservation of intron retention in
mouse cDNA sequences
40%-57% of all retained introns present a mouse hit
Identity of orthologous retained introns is 84%
Non-retained introns is 60%; Exons 87%
Mouse cDNA also corresponds to an retention variant
26% - 10 out of 46
Frequency of stop codon
exon
retained
intron
exon
TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACAC
TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACACTGTGA
Stop codons – TAG, TGA, TAA
Found 651 stop codons
Expected: 1064
p-value << 0.005
88 cases where the retention generates a putative truncated protein
cds
mRNA
stop
cds
mRNA
GC content for sequences upstream and
downstream the premature stop codon – 88
cases
5’
exon
retained intron
GC 58%
exon
stop
GC 49%
Are under selective pressure for coding potential
3’
Why the argument of ‘selection’
is important?
•As noted originally by Gilbert (1978), mutations
that affect splicing can allow the production of
new proteins without the loss of the original one
•Therefore, there should not be any “negative
selection” on this variant.
•If, however, the new variant has some biological
significance, selection will act to maintain the function of
this variant.
Intron Retention in
Tumors
Tissue
Breast
T/N
T
N
Prostate T
Brain
Colon
N
T
N
T
N
IR
1.52*
0.62
1.45*
0.44
2.52*
3.16
0.85
0.60
Towards a reliable set of intron
retention events
w/ downstream
spliced intron
w/ hit w/
mouse cDNAs*
encoding
protein
domains*
experimentally
validated
(both forms)
2563/3195
80 %
74/152
49 %
47/151
31 %
2/2
* full-length vs full-length set and
retained intron entirely in the CDS
Second International
Conference on Bioinformatics and
Computational Biology
www.icobicobi.com.br
25-28/10/2004
Angra dos Reis
Group of
Computational Biology
Sandro J. de Souza
Helena Samaia
Ana C. Pereira
Maarten Leerkes
Noboru Sakabe
Maria Vibranovski
Elza Helena
Natanja Slater
Pedro Galante
Elisson C. Osorio
Jorge E. de Souza
Rodrigo Soares
Andre Zaiats
tennis player
Research Assistant
Admin. Assistant
Ph.D student
Ph.D student
Ph.D student
Ph.D student
Ph.D student
Ph.D student
programmer
Ph.D student
programmer
system admin.