Alu elements and splicing events

Download Report

Transcript Alu elements and splicing events

Bat Sheva Workshop
Can junk
DNA be
exapted?
Dan Graur
1
Can straw
(junk DNA)
be spun into
gold (genes)?
2
Exaptation
3
15 February 2001
4
The human genome
is disappointing:
• It is small
• It is empty
• It is unoriginal
• It is repetitive
5
K-value paradox: Complexity
does not correlate with
chromosome number.
Homo sapiens
46
Lysandra atlantica
250
Ophioglossum reticulatum
12606
C-value paradox: Complexity
does not correlate with
genome size.
3.4  109 bp
Homo sapiens
6.7  1011 bp
Amoeba dubia
7
N-value paradox: Complexity
does not correlate with gene
number.
~31,000 genes ~26,000 genes ~50,000 genes
8
1.5%
Exons
Introns (junk)
Intergenic
regions
(junk)
The genome is empty.
9
The genome
contains a large
number of genetic
“corpses”
(pseudogenes).
10
L-gluono-g-lactone oxidase deficiency
11
There are gene-dense (urban centers) and
gene-poor (deserts) chromosomes
From 23 genes per million base pairs on
chromosome 19 (3%) to only 5 genes per
million base pairs on chromosome 13 (0.7%).
12
How can we be sure that
the genome is empty?
Isn’t it possible that the
emptiness is a mere
artifact of our ignorance?
13
959 cells 1,031 cells
~108 cells
19,000 genes
13,600 genes
14
The gene number game: Gensweep©
July 2000
July 2001
Bets: 165
Mean: 61,710
Lowest: 27,462
Highest: 153,478
Bets: 281
Median: 61,302
Lowest: 27,462
Highest: 212,278
15
Humans are not at all
original in comparison
with other vertebrates.
16
Mouse-human synteny. Human chromosomes
can be cut into ~150 pieces, then shuffled into a
reasonable approximation of the mouse genome.
17
2 solutions to the N-value paradox:
* What looks empty
isn’t.
* What looks functional
is more so.
18
Junk DNA
Junk can sometimes be useful:
• spare parts (modules)
•
motif donors (exon shuffling)
•
molds (gene conversion)
19
Eukaryotic genes (exons & introns)
Splicing
Translation
20
Alternative splicing:
One gene, several proteins!
Alternative
Splicing
Mature
splice
variant I
Mature
splice
variant II
21
Types of
alternative
splicing
22
Cassette exon
or
internal-exon skipping
23
Deduction of internal-exon skipping
through mRNA sequence alignment
24
Large-scale multiple alignment
of expressed sequences

Databases:
tens of thousands of mRNAs
 millions of ESTs


From large-scale alignments, it is known
that 40-60% of all human genes undergo
alternative splicing.
25
GenCarta (Compugen): Alignment of expressed
sequences to genomic sequences
26
Alternative splicing:
Alternative splicing may be
unconditional, i.e., two or more mRNA
variants are produced in all tissues
expressing the gene.
 Alternative splicing may be conditional,
i.e., tissue specific, developmental-stage
specific or physiological-state specific.

27
Initial goal: Identifying sequence elements
that regulate alternative splicing



Compile a database of skipped exons.
Compile a database of constitutive exons.
Characterize diagnostic features of alternative splicing
versus constitutive splicing.
28
Initial results





4,151 constitutive exons.
1,182 alternative exons.
A motif searching program was run on each
set.
A strong motif, found in some of the
alternative exons, was not found in the
constitutive ones.
The motif turned out to be part of an Alu
element.
29
Exaptation
case report:
Alus
30
Alu elements





Length = ~300 bp
Repetitive: > 1,000,000 times in the human genome
Constitute >10% of the human genome
Found mostly in intergenic regions and introns
Propagate in the genome through retroposition (RNA
intermediates).
31
Repetitive DNA
Alus are like that!
interspersed
I
in tandem
32
Evolution of Alu elements
33
Master-gene model for Alu
proliferation in the genome
Master gene A
Replicatively
incompetent
progeny
Progeny undergoes
multiple independent
mutations
Mutation renders A nonfunctional & creates new
master gene B
Mutation renders B nonfunctional & creates new
master gene C
34
Alu elements can be divided into
subfamilies
The subfamilies are
distinguished by
~16 diagnostic
positions.
35
Signals of splicing
Donor site
1
Branch point
CAG GTRAGT
A
Acceptor site
2
YYYYYYYYYNCAG G
Pyrimidine tract
1
-OH
2
A
Lariat
A
1
2
36
Because mRNAs and Alus are
frequently reverse transcribed and
incorporated into the genome,
pyrimidine tracts are ubiquitous
The complementary strand of
polyA is polyT = pyrimidine
tract.
37
Our findings

Out of 1,182 alternatively spliced cassette exons, 62
have a significant hit to an Alu sequence.

Out of 4,151 constitutively spliced exons, none has a
significant hit to an Alu sequence.
 all Alu-containing
exons are alternatively
spliced.
38
Retention Ratio
 Retention
ratio = number of mRNA
molecules containing the alternatively
spliced exon divided by total number
of mRNA molecules.
 Retention ratio for Alu-containing
exons was ~10%.
 Retention ratio for alternatively
spliced exons that do not contain Alu
was ~45%.
39
Alu elements: Definitions
aaaaa
+ strand:
– strand:
tttttttttttttttt
aaaaaaaaaa
ttttttt
40
The minus strand of Alu elements
contains “near” splice sites

The minus strand of Alu contains ~3 sites that
resemble the acceptor recognition site:
Consensus acceptor site:YYYYYYNCAG/R
Alu-J: (127-114)
:TTTTTTGtAG/A

The minus strand of Alu contains ~9 sites that
resemble the consensus donor site:
Consensus donor site: CAG/GTRAGT
Alu-J: (25-17)
: CAG/GTGtGA
41
The plus strand of Alu elements
does not contain “near” acceptor
splice sites
42
Exonization of a minus strand
(all is Alu)
Donor
Exon
Alu
Acceptor
43
Exonization of a plus strand
(3’ of Alu is “in”)
Donor
Alu
Exon
Acceptor
44
Alus within alternatively spliced exons
– strand
+ strand
50
1
3’
1
6
5’
3
1
middle of
exon
0
0
Alu
occupies
entire
exon
45
Proposed model for Alu exonization
Exon
Exon
46
Proposed model for Alu exonization
Exon
Exon
47
Does Exonization Represent
Functionalization?
1. Alus are only found in alternative
exons.
– Alu-containing constitutive exons cannot be created by
mutation.
– Alu-containing constitutive exons are deleterious and,
therefore, selected against.

Constitutve Alu-containing exons are known and
they are invariably deleterious.
48
Does Exonization Represent
Functionalization?
2. Alus are only found in alternative
exons with low retention indices.

Highly expressed alternative Alu-containing exons
are deleterious.
49
Does Exonization Represent
Functionalization?
3. Eighty-four percent of all Alu-
containing exons cause frameshifts or
premature termination.

Alu-containing exons are unlikely to contribute to
the proteome.
50
Does Exonization Represent
Functionalization?
4. There are reasons to believe that
many identifications of alternative
splicing are spurious.

The contribution of alternative splicing to the
proteomic repertoire may be vastly overestimated.
51
Conclusion?

Alu elements increase coding and
regulatory versatility of the
transcriptome, while maintaining
the intactness of the genomic
repertoire.
52
Conclusion

No exaptation
53
Exaptation
case report:
numts*
*pronounced “new mights”
54
Numts (nuclear mitochondrial
DNA sequences) are a type of
promiscuous DNA, i.e.,
nuclear sequences of organelle
(e.g., mitochondrial) origin.
55
Numts: Evolution’s
misplaced witnesses
56
The transfer of functional genes
from the mitochondria to the
nucleus is thought to have has
stopped in evolution after the
emergence of animals (~1,000
MYA).
57
The reason is thought to be
the differences between the
nuclear and mitochondrial
genetic codes.
58
The transfer of nonfunctional
pieces of mitochondrial
genetic information continues
to this day.
59
Numts have been found so far
in 83 eukaryote species.
60
Most species whose genomes
have been completely
sequenced contain very few
numts.
Saccharomyces cerevisiae
Caenorhabditis elegans
Drosophila melanogaster
Plasmodium falciparum
17 numts
3 numts
3 numts
3 numts
61
In the human genome we find
~1,000 numts
total length = 831 Kb
~0.02% of the nuclear genome
62
We found 82 numts
larger than 1,000 bp
in the human
genome.
63
Numts were found on all chromosomes.
Numts larger than 1,000 bp were found on 21 chromosomes.
64

The newest numt was found on
chromosome 6.
Length = ~6,000 bp (35% of the human
mithochondrial genome)
 Similarity = 98.2% DNA identity.


The longest numt was found on
chromosome 5.
Length = ~16,000 (an entire
mitochondrional genome)
 Similarity = 88.8% DNA identity.

65
The largest documented
nonhuman numt is a 7.9-Kb
fragment in the nuclear genome
of the domestic cat.
66
The 82 numts contain a
total of 362 complete
mitochondrial genes (of
which 108 are proteincoding genes).
67
With the exception of the D-loop, which is variable and
difficult to detect by similarity, all other regions of the mtDNA
are represented in numts at frequencies that do no deviate
significantly from the random expectation
68
Only 4 numts retained
an intact reading
frame. They are
annotated as putative
protein coding genes
69
In all cases the gene is
NADH dehydrogenase
subunits 4L (ND4L).
70
ND4L is the also the
only mitochondrial
gene that can be
translated “without
incident” by the
nuclear genetic code.
71
Conclusion
No
exaptation
72