What is EST?

Download Report

Transcript What is EST?

逢甲大學
生物資訊研究中心
Constructions and Applications of Alternative
Splicing Databases
speaker: 許芳榮
Outline
Introduction
 Construction of alternative splicing
database
 Survey of existing solutions
 Applications

Introduction
RNA Splicing
Alternative Splicing

Definitions
Splicing the same pre-mRNA in two or more
ways to yield two or more different mRNAs that
produce two or more different protein products
Types of alternative splicing
The Troponin T (muscle protein) pre-mRNA
is alternatively spliced to give rise to
64 different isoforms of the protein
Constitutively spliced exons (exons 1-3, 9-15, and 18)
Mutually exclusive exons (exons 16 and 17)
Alternatively spliced exons (exons 4-8)
Exons 4-8 are spliced in every possible way
giving rise to 32 different possibilities
Exons 16 and 17, which are mutually exclusive,
double the possibilities; hence 64 isoforms
Expressed Sequence Tags (ESTs)
What are the relationships
of Genome, mRNA and
ESTs?
Genome
EST
EST
5’
Exon 1
Exon 2
Exon 3
Intron 1
Intron 2
Intron 3
Exon 4
3’
EST 1
EST 2
AAA...
EST 3
EST 4
EST 5
EST 6
EST 7
AAA...
Construction of alternative splicing
database
Genome
Sequences
dbEST
5 million ESTs
3 billion bp
alignment
Exons, Introns
Database
Gene
Discovery
SNP
Alternative
Splicing
Methods of Alternative Splicing
Detection

mRNA – EST alignment (or EST consensus)
 Without

knowledge of genomic sequence
Genomic sequence to EST alignment
 informative
How to cluster ESTs ?

UniGene cluster
 Consider
the ESTs in the same UniGene
cluster
 Save time but not informative
Genome template
 Genomic sequence to EST alignment

 informative
but time consuming
The Approaches of EST Clustering

Unigene like approach
1.
2.
3.

Genome template
1.
2.
3.

Overlapped ESTs are grouped in a cluster as
Unigene.
Generating a consensus sequence of each cluster.
Aligning consensus sequences to genome sequence.
Cut Human Genome Sequence in 20k base pairs.
Screening in ESTs similarity by BLAST.
Detecting exons by sim4.
Directly alignment
Unigene like approach
Overlapped ESTs
are grouped in a
cluster as Unigene.
2. Generating a
consensus
sequence of each
cluster.
3. Aligning consensus
sequences to
genome sequence.
1.
consensus
sequence
genomic
seq
BLAST
Candidates
of gene location
STS
gene
Report exons
Genome template
1.
2.
3.
Cut Human
Genome Sequence
into 20k base pairs.
Screening in ESTs
similarity by BLAST.
Detecting exons
by sim4.
genomic template
EST DB
WU-BLAST
ESTs
with similarity
Sim4
exons
Directly alignment
Using UniGene Cluster is not
Informative
Many ESTs in different UniGene clusters
are aligned to same genome area.
 UniGene cluster ID
101131,100437,100738,101182 and
100143 should be grouped together to
detect alternative splicing

Resource
ASAP
TAP
Description
Approach
performed genome-wide detection of human
Quantity
6201 A.S sites in
all genes
Human
alternative splicing and
Unigen Like
performed an EST-based gene structure
669 A.S sites in
365of the 1007
Genome template multiexon genes
prediction in genomic sequences and also
collected splicing information
PALS db is a collection of Putative
Alternative Splicing information. Alternative
Species
Institute
UCLA
Human
WU
Human,
mouse
Yang Ming
splicing sites were predicted by using the
longest mRNA sequence in each UniGene
PALSdb
cluster as the reference sequence
Unigen Like
ESTs and all mRNA sequences were aligned
with the human genome
sequence using LEADS, Compugen’s
Compugen alternative splicing modeling platform.
Unigen Like
Web-based graphical tool to explore gene
structure, including alternative splicing,
based on a mapping of the EST consensus
SpliceNest
from GeneNest to complete human genome Unigen Like
STACK
STACK can provide putative tissue-specific
transcripts for each gene
Unigen Like
9,952/19,936
Human
Compugen
Human,
26880 introns,
mouse,
32348 exons from arabidopsis, Max-Planck5468 genes,
zebrafish
Gesellschaft
Egenetic,
South
Human
Africa
Avatar: a value added transtriptome
database

Align entire dbEST to genome using PCs
Number. of alternative splicing events
Organism
5’ AS
3’ AS
Exon
skipping
mutually
exclusive
intron
retention
Homo sapiens
14,989
22,969
11,188
330
7,481
Mus musculus
7,479
13,075
4,850
127
3,493
531
900
401
4
373
162
28
263
5
174
351
117
221
6
221
83
4
77
1
32
Rattus norvegicus
Caenorhabditis
elegans
Drosophila
melanogaster
Arabidopsis thaliana
Applications
Cross-species analysis
 Tissue specific analysis
 SNP and alternative splicing
 Quantity analysis
 Splicing enhancer
 Gene prediction through dbEST
 SNP finding through dbEST

Tissue distributions of 51 tumor-specific alternative splicing sites
WholeBlood, 2
Testis, 1
BoneMarrow , 1
Stomach , 5
Brain , 15
Placenta , 4
MammaryGland , 1
LymphNode , 1
Eye , 1
Lung , 5
Liver, 15
BoneMarrow
Brain
Eye
Liver
Lung
LymphNode
MammaryGland
Placenta
Stomach
Testis
WholeBlood
1,598 SNP dependent alternative splicing
Comparison of human and mice
Exon skipping
F1
F1
F2
F2
F1
F1
F2
Conserved alternative splicing
events (CES events)
F2
Non-conserved alternative splicing
events (NCES events)
If NCES.F1 > K and NCES.F2 == 0
Discovering the different
constitutive splicing events
Human
SNX3
ME12713588-1
EST support: 41
ME12751459-1
MR12705131-1
+
94
91
Mouse
Snx3
ME2231614-2
ME2238811-1
EST support: 90
EST frequency
>=1
EST frequency
>=10
PSMD13
MR178998-1
ME184041-1
ME184161-1
F1
F2
CT
48
0
TC
2
2
F1
Psmd13
CT
ME579264-1
ME582152-1
86
ME582275-1
184167,C,T,D,2,2,48,0,0.00452488687782805
184171,T,C,D,2,2,48,0,0.00452488687782805
Human exon
GGTGAACCCTTTGTCCCTCGTGGAAATCATTCTTCATGTAGTTAGACAGATGACTG
T C
Mouse exon
GGTAAACCCTCTGTCCCTGGTAGAAATAATTCTCCATGTGGTTAGACAGATGACCG
Finding SNP from dbEST
5’
Exon 1
Exon 2
Exon 3
Intron 1
Intron 2
Intron 3
Exon 4
3’
EST 1
EST 2
AAA...
EST 3
EST 4
EST 5
EST 6
EST 7
AAA...
EST to genome alignment with
profile
Exon 1
Exon 2
Exon 3
Exon 4
5’
Intron 1
Intron 2
Intron 3
3’
EST 3
EST 4
EST 5
EST 6
EST 7
AAA...
Translocation
Finding gene from dbEST
5’
Exon 1
Exon 2
Exon 3
Intron 1
Intron 2
Intron 3
Exon 4
3’
EST 1
EST 2
AAA...
EST 3
EST 4
EST 5
EST 6
EST 7
AAA...
Transciptome Genomics

Where  What  Why  How
Conclusion