encode-GT-call-pgene-summary-28oct05

Download Report

Transcript encode-GT-call-pgene-summary-28oct05

ENCODE
Pseudogene Summary
for GT call
Mark Gerstein
2005,10.28 11:00 EDT
summary of 6 Calls:
Sept. 15, 22; Oct. 6, 13, 20, 27
1
Developed Consensus Set
of 198 Pseudogenes
A Derived from a qualified union of GIS, Havana, UCSC, &
Yale with a uniform criteria on boundaries
1.
2.
Identify a “good” set of human proteins – HAVANA set?
Remove pseudogenes (from all 4 groups) overlapping with current GENCODE exons
3.
4.
5.
6.
7.
Create an union of the remaining pseudogenes.
Find the “best” matching proteins for each pseudogene, remove entries without a BLAST hit (e-value cutoff issue?).
Realign each pseudogene to its parent protein to produce a uniform alignment and to define the start and end coordinates.
Apply a threshold to sequence identity and coverage? (No.)
Classify pseudogenes into processed and non-processed (how?)
(does GENCODE have an updated version?).
B Overall 222 pseudogenes; application of above receipe gives
198 Consensus
(Intersection set of above is 81 (proc) + 49 (non-proc))
C Currently, on test browser + encode wiki +
http://pseudogene.org/ENCODE
From Deyou Z. + Robert B.
2
Interesting Complexities of Pseudogene Annotation:
Insertion of One Pseudogene into Another One
First insertion event
heterogeneous nuclear
ribonucleoprotein A1
(HNRPA1) pseudogene
(parent on Chr12)
Remnant of a second, mitochondrial
insertion event (has post-insertion deletions)
NADH dehydrogenase 2
(MTND2) pseudogene
(parent mitochondrial)
NADH dehydrogenase
4 (MTND4) pseudogene
(parent mitochondrial)
Protein
evidence
From Adam F.
cytochrome b (CYTB)
pseudogene (parent
mitochondrial)
3
EST Evidence of Expression from a Pseudogene
at 5’ UTR of Known Gene
LILR pseudogene
Frameshift
LILRA3
From Adam F.
Upstream pseudogene corresponds to exons 1-3 of LILR family
genes, 3’ exons have been lost. EST evidence supports expression
from the pseudogene locus extending to known gene LILRA3.
4
TAR/Transfrag Evidence for Transcription in
198 consensus pseudogenes
- # of 198 overlapped by interrogated regions (affy arrays):
180 (90.9%)
- # of 198 overlapped by yale tars or affy transfrags (union):
106 (53.5% of all ; 58.9% of interrogated)
=> There is evidence of transcription (from tars or transfrags) of the
pseudogene or the parent gene (if cross-hybridization) for 53.5% of
the consensus pseudogenes (upper bound on transcription)
- # overlapping cage tags:
11 (5.5%)
- # overlapping ditag tags:
1 (0.5%)
(83 (41.9%) are overlapped by full length ditags)
From France D.
5
Example Pseudogene overlapped by
tars/transfrags and tags:
ENCODE_consensus_187
but pseudogene is
93% similar to parent
From France D.
6
Consensus Pseudogenes with ≥2 ChIP-chip Hits
Has Transcriptional
Evidence
(intersects
Gencode transcript)
Pgene-ID
Pgene-type
E2F
H3K4me3
(0h & 30h)
Sp3
STAT1
13
Processed
0
1
0
0
45
Processed
0
1
0
0
47
Processed
0
1
0
0
77
Processed
1
1
0
0
126
Processed
0
1
0
0
149
Processed
1
1
0
0
174
Non-Processed
0
1
0
0
[ 177 ]
Non-Processed
1
1
0
0
187
Processed
0
1
0
0
193
Processed
0
0
1
1
Look for ChIP-chip hits upstream of the pseudogenes
From Deyou Z.
7
Pot. Transcribed Pseudogene (#177)
with Upstream ChIP-chip Hits
From Deyou Z.
8
Experiments to Validate Expression
of Encode Pseudogenes
• Select ENCODE
pseudogenes from the
intersection part of
consensus set
– 49 non-processed, 125
processed
• Designed oligos
(25mer, Tm 70°C)
– Either specific to
pseudogene or shared
between parental gene and
pseudogene
From Alex R.
Stylianos Antonarakis, Robert Baertsch,
Jorg Drenkow, Tom Gingeras, Charlotte
Henrichsen Philipp Kapranov, Catherine
Ucla, Alexandre Reymond
Affymetrix, UCSC, University of Geneva,
University of Lausanne
• Doing 5’RACE in 12 human
tissues
– Brain, heart, kidney, spleen,
liver, colon, sm. intestine,
muscle, lung, stomach,
testis, placenta
– First 96 pseudogenes
5’RACEs done in 12 tissues
– Last 78 will be done next
week
• To do: pool multiple RACEs,
send to Santa Clara and
hybridize to Affymetrix
ENCODE 20 nucleotide
resolution arrays
9
Extra Slides
10
Pseudogene group
Core people:
Jennifer Harrow <[email protected]>,
<[email protected]>,
WEI Chia-Lin
Adam Frankish <[email protected]>,
"Dike, Sujit" <[email protected]>,
Robert
Baertsch <[email protected]>, [email protected], Deyou
Zheng <[email protected]>, Yontao Lu <[email protected]>
[email protected], [email protected]
Others:
"Hoyem, Tara L" <[email protected]>, Roderic Guigo Serra <[email protected]>,
"'Gingeras, Tom'“ [email protected]>, [email protected],
Suganthi Balasubramanian [email protected]
6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27
11
Refresher: many repetitions of the below
“Venn analysis”
54 (2)
17 (2)
Yale:
167 pseudogenes
(164 + 3)
16 (0)
Havana-Gencode:
165 pseudogenes
(167 -2 )
81 (34)
15 (1)
16 (7)
7 Havana agrees to be added
(8, 11, 40, 59, 139, 152, 169).
4 at coding loci.
[Yale agrees to delete]
1 with weak sequence identity.*
5 with “non-real” proteins.*
33 (1)
UCSC retrogenes: 146 not expressed
Numbers
according to
Adam’s note
9 Havana agrees to be added.
2 at coding loci. [Yale agrees to delete]
* Solved by
1 with weak sequence identity.*
2 with “non-real” proteins.*
consistent protein
set & threshold
12
From Adam F.
Rearranged exon order in
unprocessed pseudogene
Dot plot protein evidence vs genome
adaptor-related protein
complex 1, beta 1 subunit
(AP1B1) pseudogenes
Protein
evidence
Exon 6
Exon 3
Splice sites same as parent gene
Following duplication of the AP1B1 locus rearrangements/duplications have produced two
unprocessed pseudogenes corresponding to exons 6 and 3 of the parent gene
13
From Adam F.
Rearrangement of processed pseudogene
mRNA dot plot
pseudogene similar to
part of ribosomal
protein L3 (RPL3)
Protein dot plot
Following insertion, one end of the RPL3 pseudogene
has been flipped onto the opposite strand (with some
loss of internal sequence)
14
Overlaps by tar/transfrag subset
- Nb overlapped by interrogated regions (affy arrays):
180 (90.9%)
- Nb overlapped by yale tars or affy transfrags (union):
106 (53.5% of all ; 58.9% of interrogated)
- Nb overlapped by yale tars (union):
84 (42.4% of all ; 46.7% of interrogated)
- Nb overlapped by affy transfrags (union):
102 (51.5% of all ; 56.7% of interrogated)
- Nb overlapped by polyA+ tars/transfrags (union)
105 (53% of all ; 58.3% of interrogated)
- Nb overlapped by total RNA tars (union)
61 (30.8% of all ; 33.9% of interrogated)
From France D.
15
Expression from pseudogene locus
(1) – putative novel transcript
HAVANA sialyltransferase
pseudogene (RP3-477O4.5)
supported by protein
evidence
Aligned proteins
(column collapsed)
Supporting EST (100% ID)
Putative novel transcript
supported by a single EST
with has a polyA site and
signal
polyA site and signal
Appears to be some transcription from this locus which is supported at the 3’ end by a single EST
From Adam F.
16
Intersect Consensus Pseudogenes with ChIP-chip Hits
Factors
E2F
Group
H3K4me3
(30h)
Sp3
STAT1
UCDavis UCSD
UCSD
Stanford
Yale
Total Hits
400
1000
1000
400
400
Known Genes
(405)
145
149
154
86
15
genes (198)
4
25
24
3
7
From Deyou Z.
H3K4me3
(0h)
17