Evolution of alternative splicing

Download Report

Transcript Evolution of alternative splicing

Alternative splicing:
A playground of evolution
Mikhail Gelfand
Research and Training Center for Bioinformatics
Institute for Information Transmission Problems RAS,
Moscow, Russia
RECOMB, 20 May 2008
% of alternatively spliced human and mouse
genes, by year of publication
100%
2008
C.Burge
Human (genome / random sample)
All genes
Human (individual chromosomes)
Only multiexon genes
Mouse (genome / random sample)
Genes with high EST coverage
Roles of alternative splicing
• Functional:
– creating protein diversity
• human: ~30.000 genes, >100.000 proteins
– maintaining protein identity
• e.g. membrane (receptor) and secreted isoforms
• dominant negative isoforms
• combinatorial (transcription factors, signaling domains)
– regulatory
• e.g. via chanelling to NMD (nonsense-mediated decay)
• Evolutionary
Plan
• Evolution of alternative exonintron structure
• Origin of new (alternative)
exons and sites
• Evolutionary rates in
constitutive and alternative
regions
Elementary alternatives
Cassette exon
Mutually
exclusive exons
Alternative
donor site
Alternative
acceptor site
Retained intron
Sources of data
• ESTs:
1999 global
2002-3 comparative
– mapping exon-intron structure to genome
– global alignment of genomes
– identifying non-conserved exons and splice sites
• oligonucleotide arrays (chips):
2001 global
2004 comparative
– qualitative analysis (inclusion values)
– genome-specific constitutive / alternative exons
• mRNA-seq (new generation high-throughput):
2008 global
expected 2009-10 comparative
Alternative exons are often genome-specific
(Modrek & Lee, 2003)
~ 25% AS events in ~50% genes
are not conserved
Na/K-ATPase
Fxyd2/FXYD2
p53
Nurtdinov…Gelfand, 2003
Alternative exon-intron structure
in fruit flies and malarial mosquito
• Same procedure (AS data from FlyBase)
– cassette exons, splicing sites
– also mutually exclusive exons, retained introns
• Follow the fate of D. melanogaster exons in the
D. pseudoobscura and Anopheles genomes
• Technically more challenging:
– incomplete genomes
– the quality of alignment with the Anopheles genome is lower,
especially for terminal exons
– frequent intron insertion/loss (~4.7 introns per gene in
Drosophila vs. ~3.5 introns per gene in Anopheles)
Malko…Gelfand, 2006
Conservation of coding segments
constitutive
segments
alternative
segments
D. melanogaster –
D. pseudoobscura
97%
75-80%
D. melanogaster –
Anopheles gambiae
77%
~45%
Conservation of D.melanogaster elementary
alternatives in D. pseudoobscura genes
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
CONSTANT
exon
Donor site
Acceptor site
Retained intron Cassette exon Exclusive exon
blue – exact
green – divided exons
yellow – joined exon
orange – mixed
red – non-conserved
• retained introns are the least conserved
(are all of them really functional?)
• mutually exclusive exons are as conserved as constitutive exons
Conservation of D.melanogaster elementary
alternatives in Anopheles gambiae genes
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
CONSTANT
exon
Donor site
Acceptor site
Retained intron Cassette exon Exclusive exon
blue – exact
green – divided exons
yellow – joined exons
orange – mixed
red – non-conserved
• ~30% joined, ~10% divided exons (less introns in Aga)
• mutually exclusive exons are conserved exactly
• cassette exons are the least conserved
Genome-specific AS:
real or noise?
young or deteriorating?
• minor isoforms, small inclusion rate
• often frameshifting and/or stop-containing
=> NMD
– regulatory role?
Sorek, Shamir & Ast, 2004
Alternative exon-intron structure
in the human, mouse and dog genomes
• Human-mouse-dog triples of orthologous
genes
• We follow the fate of human alternative sites
and exons in the mouse and dog genomes
• Each human AS isoform is spliced-aligned to
the mouse and dog genome. Definition of
conservation:
– conservation of the corresponding region
(homologous exon is actually present in the
considered genome);
– conservation of splicing sites (GT and AG)
Nurtdinov…Gelfand, 2007
Caveats
• we consider only possibility of AS in mouse and
dog: do not require actual existence of
corresponding isoforms in known transcriptomes
• we do not account for situations when alternative
human exon (or site) is constitutive in mouse or
dog
• functionality assignments (translated / NMDinducing) are not very reliable
Gains/losses: loss in mouse
Common
ancestor
Gains/losses: gain in human (or noise)
Common
ancestor
Gains/losses: loss in dog
(or possible gain in human+mouse)
Common
ancestor
Triple comparison
Human-specific
alternatives:
Human-specific alternatives:
noise?
noise?
Conserved
alternatives
Conserved
alternatives
Translated and NMD-inducing cassette exons
• Mainly included exons are highly conserved irrespective of function
• Mainly skipped translated exons are more conserved than NMD-inducing
ones
• Numerous lineage-specific losses
– more in mouse than in dog
– more of NMD-inducing than of translated exons
• ~40% of almost always skipped (<1% inclusion) human exons are
conserved in at least one lineage (mouse or dog)
Mouse+rat vs human and dog: a possibility to
distinguish between exon gain and noise
Nurtdinov…Gelfand, 2009
The rate of exon gain:
decreases with the exon inclusion rate;
increases with the sequence evolutionary rate
• Caveat: spurious exons
still may seem to be
conserved in the rodent
lineage due to short
time
Conserved rodent-specific exons and pseudoexons
Estimation of “FDR” by analysis of conservation of pseudoexons
• intronic fragments with the same characteristics (length distribution etc.)
• apply standard rules to estimate “conservation”
• obtain the number (fraction) of rodent-specific exons that could be
pseudoexons conserved by chance (brown)
• obtain the number (fraction) of real rodent-specific exons (dark green):
~50%, that is, ~15% of mouse-specific exons (the rest is likely noise)
Alternative donor and acceptor sites: same trends
• Higher conservation of ~uniformly used sites
• Internal sites are more conserved than external ones (as expected)
Evolution of (alternative) exon-intron
structure in 11 Drosophila spp.
Dmoj
Dgri
Dvir
D.
D.
D.
D.
D.
D.
D.
D.
D.
Dmel
Dsec
melanogaster
sechelia
yakuba
erecta
ananassae
pseudoobscura
mojavensis
virilis
grimshawi
Dyak
Dana
Dere
Dpse
D.persimilis
D. Pollard,
http://rana.lbl.gov/~dan/trees.html
D.willistonii
Gain and loss of alternative segments and constitutive exons
Unique events per
1000 substitutions.
Caveat: We cannot
observe exon gain
outside and exon loss
within the D.mel.
lineage
Dvir
Dgri
– 14.
– 1.6
– 40.
– 2.3
+ 184.
+ 1.1
Dmoj
+ 143.
+ 1.1
Dere
– 57.
– 0.5
Dsec
– 100.
– 6.6
Dana
– 13.
– 0.6
+ 131.
+ 0.4
Dpse
– 75.
– 7.2
+ 85.
+ 0.8
Dper
– 175.
– 20.2
– 5.
– 0.2
+ 45.
+ 0.9
Sample size
397 / 18596
Dyak – 37.
– 8.7
Dmel
– 24.
– 1.2
– 134.
– 1.1
– 34.
– 0.9
± 57.
± 1.0
– 16.
– 0.3
Dwil
Gain and loss of alternative segments and constitutive exons
Non-unique events per
1000 substitutions
(Dollo parsimony)
Dgri
+ 213.
+ 1.3
Dmoj
– 83.
– 4.2
+ 226.
+ 2.7
– 33.
– 2.9
– 233.
– 1.8
Dere
– 272.
– 1.0
Dsec
– 330.
– 9.3
Dana
– 68.
– 1.4
+ 188.
+ 0.7
Dpse
– 238.
– 9.8
+ 98.
+ 1.3
Dper
– 408.
– 27.6
– 72.
– 0.4
+ 120.
+ 1.7
Sample size
452 / 18874
Dyak – 164.
– 11.7
Dmel
Dvir
– 40.
– 2.1
– 151.
– 3.6
± 81.
± 1.3
– 49.
– 1.1
Dwil
Conserved alternative splicing in nematodes
• 92% of cassette exons from Caenorhabditis
elegans are conserved in Caenorhabditis
briggsae and/or Caenorhabditis remanei
(EST-genome comparisons)
– in minor isoforms as well
– especially for complex events
• there is less difference between levels of AS
(exon inclusion) in natural C.elegans isolates
than in mutation accumulation lines
(microarray analysis)
=> positive selection on the level of AS.
Irimia…Roy, 2007; Barberan-Sohler & Zaler, 2008
Plants:
little conservation of
alternative splicing
• Arabidopsis thaliana
– Oriza sativa (rice)
• Oriza sativa (rice)
– Zea mays (maize)
• Few AS events are
conserved (5% of genes
compared to ~50% of
genes with AS)
• the level of conservation
is the same for translated
and NDM isoforms
Severing…van Hamm, 2009
Constitutive exons becoming alternative
• human-mouse comparison, EST data => 612
exons constitutively spliced in one species and
alternatively in the other
• all are major isoform (predominantly included)
• analysis of other species (selected cases):
ancestral exons have been constitutive
• characteristics of such exons (molecular
evolution: Kn/Ks, conservation of intron flanks
etc) are similar to those of constitutive exons
Lev-Maor…Ast, 2007
Changes in inclusion rate
• orthologous alternatively spliced (cassette)
exons of human and chimpanzee
• quantitative microarray profiling
• estimate the inclusion rate by comparison of
exon and exon-junction probes
=> 6-8% of altertnative exons have significantly
different inclusion levels
Calarco…Blencowe, 2007
Sources of new exons
• exon shuffling and duplications
– mutually exlusive exons
• exonisation: new exons, new sites
– in repeats
• constitutive exons becoming alternative
Alternative splice sites:
Model of random site fixation
• Plots: Fraction of exonextending alternative sites
as dependent on exon
length
– Main site defined as the one in
protein or in more ESTs
– Same trends for the acceptor
(top) and donor (bottom) sites
• The distribution of alt. region
lengths is consistent with
fixation of random sites
– Extend short exons
– Shorten long exons
A natural model: genetic diseases
• Mutations in splice sites yield exon skips or activation of
cryptic sites
• Exon skip or activation of a cryptic site depends on:
– Density of exonic splicing enhancers (lower in skipped exons)
– Presence of a strong cryptic nearby
Av. dist. to a
stronger site
Skipped
exons
Cryptic site
exons
Non-mutated
exons
Donor sites
220
75
289
Acceptor sites
185
66
81
Kurmangaliev & Gelfand, 2008
Creation of sites
acceptor sites
in exon
in intron
cryptic sites
(mutations in the
main site)
88
29
new sites
32
78
donor sites
in exon
in intron
cryptic sites
(mutations in the
main site)
121
133
new sites
46
46
Vorechovsky, 2006; Buratti…Vorechovsky, 2007
MAGE-A family of human CT-antigens
• Retroposition of a spliced mRNA, then duplication
• Numerous new (alternative) exons in individual copies arising from
point mutations
Creation of donor sites
Improvement of an acceptor site
Exonisation of repeats
• early studies: 61 alternatively
spliced translated exon with
hits to Alu (no constitutive
exons)
• 84% frame-shiting or stopcontaining
• exonisation by point mutations
in cryptic sites in the Alu
consensus
– studied in experiment
• both donor and acceptor sites
• recent studiy: 1824 human
exons, 506 mouse exons
human
mouse
unique
1060
(Alu)
285
(B1, B2,
B4, ID)
MIR
181
27
L1
219
102
L2
103
9
CR1
12
0
LTP
155
72
DNA
93
11
– Alu, L1, LTR may generate
completely new exons
Sorek, Ast, Graur, 2002;
Lev-Maor…Ast, 2003; Sorek…Ast, 2004; Sela…Ast, 2007
Evolutionary rate in constitutive and
alternative regions
• Human and mouse orthologous genes
• D. melanogaster and D. pseudoobscura
• Estimation of the dn/ds ratio:
higher fraction of non-synonymous
substitutions (changing amino acid)
=> weaker stabilizing (or stronger positive) selection
Human/mouse genes:
non-symmetrical histogram of
dn/ds(const. regions)–dn/ds(alt. regions)
Genes
1000
752 642
329
199
100
136
73
67
40
27
10
18
15
9
18
10
7
5
7
3
1
0
0
0
1
–
C
–
–1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1
0
0.1
0.2 0.3
0.4 0.5
0.6
0.7 0.8
0.9
1
Black: shadow of the left half.
In a larger fraction of genes dn/ds(alt) > dn/ds(const),
especially for larger values
A
1
Concatenated regions:
Alternative regions evolve faster
than constitutive ones
(*) in some other studies
dN(alt)<dN(const): less synonymous
substitutions in alternaitve regions
dS
П
0,405
dN/dS
0,79
0,80
A
П
A
dS
0,414
0,31
0,28
dN/dS
П
0,168
dN
П
A
A
0,25
0,22
0,183
П
A
0,068
0,076
dN
0
1
Weaker stabilizing selection
(or positive selection)
in alternative regions
(insignificant in Drosophila)
dS
П
0,405
dN/dS
0,79
0,80
A
П
A
dS
0,414
0,31
0,28
dN/dS
П
0,168
dN
П
A
A
0,25
0,22
0,183
П
A
0,068
0,076
dN
0
Drosophila:
Synonymous substitutions prevalent
in terminal alternative regions;
non-synonymous substitutions,
in internal alternative regions
dN/dS
AI
Different behavior of
terminal alternatives
0,90
AC
0,79
П
0,80
A
Mammals: Density of substitutions
increases in the N-to-C direction
П
0,405
AI
A
AN
0,414
0,410
0,437
AC
0,445
0,62
AN
dS
0,37
0,31
AC
0,297
П
0,168
1,5
1,43
A
A
0,186
П
0,22
N
0,183
0,28
AN
0,33
AI
A
0,25
AN
A
0,23
0,23
0,28
AC
I
0,25
AI
0,169
П
A
AN
AI
0,068
0,076
0,076
0,074
AC
0,132
dN
0
Many drosophilas, different alternatives
dN in
mutually
exclusive
exons same
as in
constitutive
exons
dS lower in
almost all
alternatives:
regulation?
Relaxed (positive?) selection in alternative regions
The MacDonald-Kreitman test: evidence for positive
selection in (minor isoform) alternative regions
•
•
•
•
Human and chimpanzee genome substitutions vs human SNPs
Exons conserved in mouse and/or dog
Genes with at least 60 ESTs (median number)
Fisher’s exact test for significance
Pn/Ps (SNPs) Kn/Ks (genomes)
Const.
0.72
0.62
Major
0.78
0.65
diff.
– 0.10
– 0.13
Signif.
0
0.5%
Minor
+ 0.48
0.1%
1.41
1.89
Minor isoform alternative regions:
• More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06%
• More non-synonym. substitutions: Kn(alt_minor)=.91% >> Kn(const)=.37%
• Positive selection (as opposed to lower stabilizing selection):
α = 1 – (Pa/Ps) / (Ka/Ks) ~ 25% positions
• Similar results for all highly covered genes or all conserved exons
An attempt of integration
• AS is often species-specific
• young AS isoforms are often minor and tissue-specific
• … but still functional
– although species-specific isoforms may result from aberrant splicing
• AS regions show evidence for decreased negative selection
– excess non-synonymous codon substitutions
• AS regions show evidence for positive selection
– excess fixation of non-synonymous substitutions (compared to SNPs)
• AS tends to shuffle domains and target functional sites in
proteins
• Thus AS may serve as a testing ground for new
functions without sacrificing old ones
What next?
• Changes in inclusion rates (mRNA-seq)
– revisit constitutive-becoming-alternative exons
• Other taxonomical groups
• Evolution of regulation
– donor and acceptor splicing sites
– splicing enhabcers and silencers
– cellular context (SR-proteins etc.)
• Control for:
–
–
–
–
–
functionality: translated / NMD-inducing (frameshifts, stop codons)
exon inclusion (or site choice) level: major / minor isoform
tissue specificity pattern (?)
type of alternative – 1: N-terminal / internal / C-terminal
type of alternative – 2: cassette and mutually exclusive exons,
alternative sites, etc.
Acknowledgements
• Discussions
–
–
–
–
–
–
Eugene Koonin (NCBI)
Igor Rogozin (NCBI)
Vsevolod Makeev (GosNIIGenetika)
Dmitry Petrov (Stanford)
Dmitry Frishman (GSF, TUM)
Sergei Nuzhdin (USC)
• Support
– Howard Hughes Medical Institute
– Russian Academy of Sciences
(program “Molecular and Cellular Biology”)
– Russian Foundation of Basic Research
Authors
• Andrei Mironov (Moscow State University)
• Ramil Nurtdinov (Moscow State University)
– human/mouse+rat/dog
• Dmitry Malko (GosNIIGenetika, Moscow)
– drosophila/mosquito
• Ekaterina Ermakova (IITP)
– Kn/Ks
• Vasily Ramensky (Institute of Molecular Biology,
Moscow)
– SNPs, MacDonald-Kreitman test
• Irena Artamonova (Inst. of General Genetics and IITP,
Moscow)
– human/mouse, plots, MAGE-A
Bonus track: conserved secondary structures
regulating (alternative) splicing in the
Drosophila spp.
• ~ 50 000 introns
• 17% alternative, 2% with alt. polyA signals
• >95% of D.melanogaster introns mapped to at least 7
of 12 other Drosophila genomes
• Search for conserved complementary words at
intron termini (within 150 nt. of intron boundaries),
then align
• Restrictive search => 200 candidates
• 6 tested in experiment (3 const., 3 alt.). All 3 alt.
ones confirmed
CG33298 (phopspholipid translocating ATPase):
alternative donor sites
Atrophin (histone deacetylase):
alternative acceptor sites
Nmnat
(nicotinamide mononucleotide adenylytransferase):
alternative splicing and polyadenylation
Less restrictive search => many more candidates
Properties of regulated introns
• Often alternative
• Longer than usual
• Overrepresented in genes linked to
development
Authors
• Andrei Mironov (idea)
• Dmitry Pervouchine (bioinformatics)
• Veronica Raker, Center for Genome
Regulation, Barcelona (experiment)
• Juan Valcarcel, Center for Genome
Regulation, Barcelona (advice)
• Mikhail Gelfand (general pessimism)