A Non-EST-Based Method for Exon

Download Report

Transcript A Non-EST-Based Method for Exon

A Non-EST-Based Method for Exon-Skipping Prediction
Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and
Ron Shamir
Genome Research August 2004
楊佳熒
Segments and blocks >300kb
in size with conserved in human
are superimposed on the mouse
genome
Homologous human and mouse exon are, on the average, 85% identical in
their sequences, but introns are more pooly conserved.
(Waterston et al. Nature,2002)
Reference
•
Sorek, R. et al. Intronic Sequences Flanking Alternatively Spliced
Exons Are Conserved Between Human and Mouse.
Genome Research, 2003.
•
Sorek, R. et al. How prevalent is functional alternative splicing in
the human genome. TRENDS in Genetics, 2004.
•
Sorek, R. et al. A Non-EST-Based Method for Exon-Skipping
Prediction. Genome Research, 2004.
What is Exon-Skipping ?
gene
exon1
exon2
exon3
est1
est2
est3
est4
dbESTs
exon4
exon5
exon6
Intronic Sequences Flanking Alternatively Spliced Exons
Are Conserved Between Human and Mouse
Rotem Sorek and Gil Ast
Genome Research July 2003
Objective and Result
1. Alternatively spliced conserved exons
Human gene
exon1
A1
exon2
B1
exon3
exon1
A2
exon2
B2
exon3
Human est1
Human est2
Mouse est1
Alternatively spliced internal exons
3583
Alternatively spliced conserved exons
243
Mouse est2
Mouse gene
223/243=92%
199/243=82%
188/243=77%
2. Constitutively spliced conserved exons
Human gene
exon1 C1 exon2
D1
exon3
Human est1
Human est2
Constitutively spliced internal exons
7557
Constitutively spliced conserved exons
1966
Human est3
Human est4
Mouse est
Mouse gene
exon1 C2 exon2
D2
exon3
886/1966=45% 691/1966=35%
343/1966=17%
Per-position conservation near alternatively and
constitutively spliced exons
<Example> Human KCND3 gene (exon 4~8)
Refseq:NM_004980
KCDN3 gene
exon information
KCDN3 gene
exon 6 sequences (bold)
(alternatively spliced exon)
Compare to chimpanzee genome (NM_004980)
Compare to chimpanzee genome (NM_172198)
Review : Finding exon-skipping events that are conserved
between human and mouse
243 Conserved exon skipping events (25%)
737(980-243) Non-Conserved exon skipping events(75%)
How prevalent is functional alternative splicing in the
human genome ?
Rotem Sorek, Ron Shamir and Gil Ast
TRENDS in Genetics Vo1.20 February 2004
Motivation
1.
How many of there predicted splice variants are functional?
2.
How many are the result of aberrant splicing (noise data)?
The influence of alternatively spliced exon on the proteincoding sequence.
139
 73% are peptide cassettess
191
109
 21% are peptide cassettess
510
Features differentiating between conserved alternatively spliced exons
and non-conserved alternatively spliced exons
Features
Average size
Conserved alternatively
spliced exons
Non-conserved
alternatively spliced
exons
87
116
Percentage of exon that
a multiple of three
77%(147/191)
40%(206/510)
Percentage of exons that
are “peptide cassettes”
73%(139/191)
21%(109/510)
61%(27/44)
8%(25/304)
Percentage of exon
insertion that result in a
longer protein by a
nearby stop codon
Percentage of exon
insertions that result in a
protein <100 amino acids
Average supporting
expressed sequences
30%
62%
9%(4/44)
30%(91/304)
9
2.2
Conclusion
1.
We show that conserved (functional) cassette exons possess
unique characteristics in size, repeat content and in their
influence on the protein.
2.
By contrast, most non-conserved cassette exons do not
share these characteristics.
3.
We conclude that a portion of skipping exon evidence in EST
databases is not functional, and might result from aberrant
rather than regulated splicing.
Review : Intronic Sequences Flanking Alternatively Spliced Exons Are
Conserved Between Human and Mouse
1. Alternatively spliced conserved exons
Human gene
exon1
A1
exon2
B1
exon3
exon1
A2
exon2
B2
exon3
Human est1
Human est2
Mouse est1
Alternatively spliced internal exons
3583
Alternatively spliced conserved exons
243
Mouse est2
Mouse gene
223/243=92%
199/243=82%
188/243=77%
2. Constitutively spliced conserved exons
Human gene
exon1 C1 exon2
D1
exon3
Human est1
Human est2
Constitutively spliced internal exons
7557
Constitutively spliced conserved exons
1966
Human est3
Human est4
Mouse est
Mouse gene
exon1 C2 exon2
D2
exon3
886/1966=45% 691/1966=35%
343/1966=17%
Review : Features Differentiating Between Alternatively Spliced and
Constitutively Spliced Exons
Alternatively spliced
exons
Average size
Constitutively spliced
exons
87
128
Percent exons whose length is a
multiple of 3
73%(177/243)
37%(642/1753)
Percent exons with upstream
intronic elements conserved in
mouse
92%(223/243)
45%(788/1753)
Pervent exons with downstream
intronic elements conserved in
mouse
82%(199/243)
35%(611/1753)
Percent exons with both
upstream and downstream
intronic elements conserverd in
mouse
77%(188/243)
17%(292/1753)
A Non-EST-Based Method for Exon-Skipping Prediction
Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess,
Gil Ast and Ron Shamir
Genome Research August 2004
Objective
1.
Our goal was to find a combination of features that would
detect a substantial fraction of the alternative exons.
2.
The features we have chosen are the following :
1)
2)
3)
4)
exon length
divisible / not divisible by 3
percent identity when aligned to the mouse
conservation in the upstream and downstream intronic sequences
Result
1.
The best rule is :
1)
2)
3)
at least 95% identity with mouse exon counterpart
exon size is a multiple of three
a best local alignment of at least 15 intronic nucleotides upstream of the
exon with at least 85% identity
a perfect match of at least 12 intronic nucleotides downstream of the exon
4)
2.
The combination of features identified 76 exons, 31% of the
243 alternatively spliced exons in the training sets, whether
non of 1753 constitutively spliced exons matched these
features.
To test this classifier in a genome-wide manner (cont.)
108,983
108,983 human exons for which a mouse counterpart
could be identified
using these rules
952 candidate exon,
~1%, were found.
1. For 453(48%) of the 952 candidate alternative exon there was such
skipping evidence.
2. Only(17%) of the 453 exons that were classified by our rule had their
exon-skipping supported by only one EST.
3. The rest were supported by two or more.
To test this classifier in a genome-wide manner (cont.)
108,983
108,983 human exons for which a mouse counterpart
could be identified
search ESTs and cDNA
7% (7495 exons) out of
our entire set
1. In comparison, skipping was supported by only a single EST in 46% of the
total 7495 exons.
2. This suggests that our classification rule enriches for alternatively spliced
exons with higher probability of being “real” relative to alternative exons
merely supported by EST evidence.
To test this classifier in a genome-wide manner
1.
The remaining 499 candidate alternative exons (952-453) for
which no EST/cDNA showing an exon skipping event was
found.
2.
Using the UCSC genome browser to check, we found that for
190 additional exons there was a human expressed
sequence showing patterns of alternative splicing other than
exon skipping cases.
1)
2)
3)
Alternative donor/acceptor 22%
Intron retention17%
Mutually exclusive exon  7%
3.
Thus, for 643(453+190 ; 68%) of the 952 candidate
alternative exons identified by this method, there was
independent evidence for alternative splicing in dbEST.
Conclusion
1.
We show that a substantial fraction of the splice variants in
the human genome could not be identified through current
human EST or cDNA data.
2.
In the future, we hope it could develop into a more general
alternative splicing predictor that would identify other types of
alternative splicing.
Classification of alternative splicing
1. Skipped Exons
2. Multiple Skipped Exons
3. Alternative Donor / Acceptors
4. Retained Introns