Inferring Gap Scores for DNA Alignment

Download Report

Transcript Inferring Gap Scores for DNA Alignment

0
How Dead Are the Dead Zones?
(16/Sep/2010)
Bob Harris
Penn State
Center for Comparative Genomics and
Bioinformatics
[email protected]
How Dead are the Dead Zones?
1
• Looking at ChromHMM and Segway (shortrange) segmentations, vs. certain annotated
“features”
• Nothing fancy; just a simple base counting
process
Portion of Genome
(full class names given on slide 11)
2
Most or much of genome is assigned to
dead zone classes.
ChromHMM assigns 76% of the genome to
dead zones.
Segway assigns 42%
Dead Zone Classes
Promoter Classes
Enhancer Classes
Other Classes
Mappable Bases
3
• Mappability derived from signal tracks
– 152 signal tracks are the inputs to the segmentation
– What is considered mappable for a given signal track is
dependent on the tag extension length for that track
• I’m using the union of mappable intervals over all
the tracks
– A base is counted as mappable if it appears in an interval in any
track
• Not to be confused with the “mapability track”
(wgEncodeMapability)
Mappable Bases
4
Not dead simply as an artifact of
not mapping.
Dead Zone
Promoter
Enhancer
Other
In Repeats
5
Repeats for ChromHMM dead zones
Are comparable to other classes.
Ditto for Segway’s DF and DFC.
Dead Zone
Promoter
Enhancer
Other
In Genes
6
Dead zones contain interesting things
like genes.
Dead Zone
Promoter
Enhancer
Other
In Exons
7
Exon content is low for dead zones.
Dead Zone
Promoter
Enhancer
Other
GC Content, CpG Ratio
8
Dead zones are on the
Low end for GC content.
CpG Ratio is low, but
comparable to other
non-promoter classes.
Dead Zone
Promoter
Enhancer
Other
Related Work
9
• Also looked/looking at
– SNPs
– Sequence composition
• More plots and spreadsheet at
http://www.bx.psu.edu/~rsharris/encode/index.html#dead_zones
• Integration Vignette B02, in progress
http://encodewiki.ucsc.edu/EncodeDCC/index.php/Integration_Vignette_B02
Data Sources
10
• ChromHMM K562 kitchensink
– http://www.broadinstitute.org/~jernst/K562_max_25state_49mark.bed.gz
– Lifted over to hg19
• Segway short-range K562 kitchensink
– http://noble.gs.washington.edu/~stasis/public/2010/segtools/round5b/kitchensink/k
562/round5b.kitchensink.k562.1224-0218a.stws1.bed.gz
– Lifted over to hg19
• Signal tracks
– http://noble.gs.washington.edu/~stasis/public/2010/encode/round6/rawSignal/
– 152 *.bedGraph.gz files
Class Names
5P0
14
5'UTR
BBT
23
0.11 BRF1+BDP1+TR4
5P1
24
Promoter - 5' UTR
D0
1
0.0 D dead zone
CNV0
6
Repetitive/CNV high
D+Alu
18
0.1 K9me1 H3K9me1+H4K20me1
CNV1
13
Repetitive/CNV medium
DF
13
0.2 F0 FAIRE
CNV2
15
Repetive/CNV low
DFC
19
0.3 FC FAIRE+CTCF
D0
8
Dead zone (more dead)
E0
0
2.1 GM0 enhancer
D1
20
Dead zone
E1
2
2.4 GM1 enhancer
E0
16
Enhancer strong
FI
16
0.10 FI FAIRE+input
E1
17
Enhancer - moderate
GE0
3
2.7 gene end
E2
11
Enhancer
GE1
7
0.12 gene end
GE
3
End of transcription
GM0
21
2.5 gene end
GS
7
Transcription initation
GM1
4
2.6 gene end
I0
9
CTCF + open chromatin high
GM+K36me3
5
0.7 gene end
I1
12
CTCF + open chromatin medium
GS0
10
2.2 gene body
I2
0
CTCF + open chromatin low
GS1
15
2.8 GM2 gene middle
IG
4
Intergenic
I
20
2.3 I insulator
K27me3
23
H3K27me3
K4me1
12
0.8 H3K4me1+H3K9me1+H4K20me1
K36me3
10
H3K36me3 transcribed
R0
6
0.4 R0 repression
K4me1
5
H3K4me1
R1
9
0.5 R1 repression
R0
2
Specific Repression Strong
R2
22
0.6 R2 repression
R1
1
Specific Repression Weaker
R3
11
0.9 R3 repression
T0
21
Weak transcribed
RGM
8
0.13 H4K20me1+H3K9me1
T1
19
Transcribed 5'
RTSS
24
0.14 R4 repressed TSS
TSS0
22
TSS promoter strong
TSS0
17
1.0 transcription start site
TSS1
18
Promoter/TSS
TSS1
14
2.0 near transcription start site
Segway
ChromHmm
11