encode-pgene-27oct05-draft-summary-for-gt

Download Report

Transcript encode-pgene-27oct05-draft-summary-for-gt

ENCODE
Pseudogene Call
Summary
Mark Gerstein
2005,10.27 11:00 EDT
(Draft for G&T call on
2005,10.28 10:00 EDT)
1
Pseudogene group
Core people:
Jennifer Harrow <[email protected]>,
<[email protected]>,
WEI Chia-Lin
Adam Frankish <[email protected]>,
"Dike, Sujit" <[email protected]>,
Robert
Baertsch <[email protected]>, [email protected], Deyou
Zheng <[email protected]>, Yontao Lu <[email protected]>
[email protected], [email protected]
Others:
"Hoyem, Tara L" <[email protected]>, Roderic Guigo Serra <[email protected]>,
"'Gingeras, Tom'“ [email protected]>, [email protected],
Suganthi Balasubramanian [email protected]
6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27
2
Refresher: many repetitions of the below
“Venn analysis”
54 (2)
17 (2)
Yale:
167 pseudogenes
(164 + 3)
16 (0)
Havana-Gencode:
165 pseudogenes
(167 -2 )
81 (34)
15 (1)
16 (7)
7 Havana agrees to be added
(8, 11, 40, 59, 139, 152, 169).
4 at coding loci.
[Yale agrees to delete]
1 with weak sequence identity.*
5 with “non-real” proteins.*
33 (1)
UCSC retrogenes: 146 not expressed
Numbers
according to
Adam’s note
9 Havana agrees to be added.
2 at coding loci. [Yale agrees to delete]
* Solved by
1 with weak sequence identity.*
2 with “non-real” proteins.*
consistent protein
set & threshold
3
A proposal for qualified union with a
uniform criteria for boundaries
1.
2.
Identify a “good” set of human proteins – HAVANA set?
Remove pseudogenes (from all 4 groups) overlapping with current GENCODE exons
(does GENCODE have an updated version?).
3.
4.
5.
6.
7.
Create an union of the remaining pseudogenes.
Find the “best” matching proteins for each pseudogene, remove entries without a BLAST hit (e-value cutoff
issue?).
Realign each pseudogene to its parent protein to produce a uniform alignment and to define the start and end
coordinates.
Apply a threshold to sequence identity and coverage? (No.)
Classify pseudogenes into processed and non-processed (how?)
Overall 222 pseudogenes
Application of above receipe gives
198 Consensus
Intersection set of above is 81 (proc) + 49 (non-proc)
on browser + encode wiki + http://pseudogene.org/ENCODE
From Deyou Z. + Robert B.
4
From Adam F.
Insertion into processed pseudogene
First insertion event
heterogeneous nuclear
ribonucleoprotein A1
(HNRPA1) pseudogene
(parent on Chr12)
Remnant of a second, mitochondrial
insertion event (has post-insertion deletions)
NADH dehydrogenase 2
(MTND2) pseudogene
(parent mitochondrial)
NADH dehydrogenase
4 (MTND4) pseudogene
(parent mitochondrial)
Protein
evidence
cytochrome b (CYTB)
pseudogene (parent
mitochondrial)
5
From Adam F.
Rearranged exon order in
unprocessed pseudogene
Dot plot protein evidence vs genome
adaptor-related protein
complex 1, beta 1 subunit
(AP1B1) pseudogenes
Protein
evidence
Exon 6
Exon 3
Splice sites same as parent gene
Following duplication of the AP1B1 locus rearrangements/duplications have produced two
unprocessed pseudogenes corresponding to exons 6 and 3 of the parent gene
6
From Adam F.
Rearrangement of processed pseudogene
mRNA dot plot
pseudogene similar to
part of ribosomal
protein L3 (RPL3)
Protein dot plot
Following insertion, one end of the RPL3 pseudogene
has been flipped onto the opposite strand (with some
loss of internal sequence)
7
Transcription among
198 consensus pseudogenes
- Nb overlapped by interrogated regions (affy arrays):
180 (90.9%)
- Nb overlapped by yale tars or affy transfrags (union):
106 (53.5% of all ; 58.9% of interrogated)
=> There is evidence of transcription (from tars or transfrags) of the
pseudogene or the parent gene (if cross-hybridization) for 53.5% of
the consensus pseudogenes
- Nb overlapped by cage tags:
11 (5.5%)
- Nb overlapped by ditag tags:
1 (0.5%)
(83 (41.9%) are overlapped by full length ditags)
From France D.
8
Pseudogene overlapped by
tars/transfrags and ditags:
ENCODE_consensus_187
93% similar to parent
From France D.
9
Overlaps by tar/transfrag subset
- Nb overlapped by interrogated regions (affy arrays):
180 (90.9%)
- Nb overlapped by yale tars or affy transfrags (union):
106 (53.5% of all ; 58.9% of interrogated)
- Nb overlapped by yale tars (union):
84 (42.4% of all ; 46.7% of interrogated)
- Nb overlapped by affy transfrags (union):
102 (51.5% of all ; 56.7% of interrogated)
- Nb overlapped by polyA+ tars/transfrags (union)
105 (53% of all ; 58.3% of interrogated)
- Nb overlapped by total RNA tars (union)
61 (30.8% of all ; 33.9% of interrogated)
From France D.
10
ENCODE pseudogenes expression
• ENCODE pseudogenes from the
intersection part of consensus set
– 49 non-processed, 125 processed
• Designed oligos (25mer, Tm 70°C)
– Either specific to pseudogene or shared
between parental gene and pseudogene
From Alex R.
11
ENCODE pseudogenes expression #2
• 5’RACE in 12 human tissues
– Brain, heart, kidney, spleen, liver, colon, sm.
intestine, muscle, lung, stomach, testis,
placenta
– First 96 pseudogenes 5’RACEs done in 12
tissues
– Last 78 will be done next week
• To do: pool multiple RACEs, send to Santa
Clara and hybridize to Affymetrix ENCODE
20 nucleotide resolution arrays
Stylianos Antonarakis, Robert Baertsch, Jorg Drenkow, Tom
Gingeras, Charlotte Henrichsen Philipp Kapranov, Catherine Ucla,
Alexandre Reymond
Affymetrix, UCSC, University of Geneva, University of Lausanne
From Alex R.
12
Expression from pseudogene locus
(1) – putative novel transcript
HAVANA sialyltransferase
pseudogene (RP3-477O4.5)
supported by protein
evidence
Aligned proteins
(column collapsed)
Supporting EST (100% ID)
Putative novel transcript
supported by a single EST
with has a polyA site and
signal
polyA site and signal
Appears to be some transcription from this locus which is supported at the 3’ end by a single EST
From Adam F.
13
Expression from pseudogene locus
(2) – 5’ UTR of known gene From Adam F.
LILR pseudogene
Frameshift
LILRA3
Upstream pseudogene corresponds to exons 1-3 of LILR family
genes, 3’ exons have been lost. EST evidence supports expression
from the pseudogene locus extending to known gene LILRA3.
14
Intersect Consensus Pseudogenes with ChIP-chip Hits
Factors
E2F
Group
H3K4me3
(30h)
Sp3
STAT1
UCDavis UCSD
UCSD
Stanford
Yale
Total Hits
400
1000
1000
400
400
Known Genes
(405)
145
149
154
86
15
genes (198)
4
25
24
3
7
From Deyou Z.
H3K4me3
(0h)
15
Consensus Pseudogenes with ≥2 ChIP-chip Hits
Has Transcriptional
Evidence
(intersects
Gencode transcript)
From Deyou Z.
Pgene-ID
Pgene-type
E2F
H3K4me3
(0h)
H3K4me3
(30h)
Sp3
STAT1
13
Processed
0
1
1
0
0
45
Processed
0
1
1
0
0
47
Processed
0
1
1
0
0
77
Processed
1
1
1
0
0
126
Processed
0
1
1
0
0
149
Processed
1
1
1
0
0
174
Non-Processed
0
1
1
0
0
[ 177 ]
Non-Processed
1
1
1
0
0
187
Processed
0
1
1
0
0
193
Processed
0
0
0
1
1
16
Example Pseudogene with Binding Hits (#177)
From Deyou Z.
17