Discovering conserved DNA
Download
Report
Transcript Discovering conserved DNA
ChIP-seq QC
Xiaole Shirley Liu
STAT115, STAT215
Initial QC
•
•
•
•
•
FASTQC
Mappability
Uniquely mapped reads
Uniquely mapped locations
Uniquely mapped locations / Uniquely mapped
reads
• Good to keep one read / location in peak calling
2
Peak Calls
• Tag distribution along the genome ~ Poisson
distribution (λBG = total tag / genome size)
• ChIP-Seq show local biases in the genome
– Chromatin and sequencing bias
– 200-300bp control windows have to few tags
– But can look
further
Dynamic λlocal =
max(λBG, [λctrl, λ1k,] λ5k, λ10k)
ChIP
Control
300bp
1kb
5kb
10kb
http://liulab.dfci.harvard.edu/MACS/
Zhang et al, Genome Bio, 2008
Peak Call Statistics
•
•
•
•
•
P-value and FDR
Simulation: random sampling of reads?
FDR = A / B, BH correction or Qvalue
P-value / FDR changes with sequencing depth
FoldMAT:
change does
not
Quality
Control
<1% enriched
A
B
4
ChIP-seq QC
• Number of peaks with good FDR and fold change
• FRiP score:
– Fraction of reads in peaks
– Often higher for histone modifications than
transcription factors
– Often increase slightly with increasing read depth
• Overlap with union of peaks in public DNase-seq
data
– Working ChIP-seq peaks overlap > 70% of union DHS
5
DNase-seq
• Captures all regulatory
sequences in the prostate
genome
6
Sabo et al, Nat Methods 2006; Thurman et al, Nat 2012
6
ChIP-seq QC
• Evolutionary conservation
– Can be used for ChIP QC
• Conserved sites more
functional?
– Majority of functional sites
not conserved
7
Odom et al, Nat Genet 2007
Enrichment Distribution
• CEAS (Shin et al, Bioinfo, 2009)
– Meta-gene profiles: TF and histone marks
– % of peaks at promoter, exons, introns, and distal
intergenic sequences
– SitePro of signal at specific sites
• Replicate agreement: > 60% or > 0.6
8
ChIP-seq Downstream Analysis
9
Target Gene Assignment
Yeast TF
Regulatory
Network
Protein
Transcribe
Regulate
Gene
10
Human TF Binding Distribution
•
•
•
•
•
•
Most TF binding sites are outside promoters
How to assign targets?
Nearest distance?
Binding within 10KB?
Number of binding?
Other knowledge?
11
Higher Order Chromatin Interactions
Chromatin confirmation capture
Hi-C
Interactions follows
exponential decay with
distance
Lieberman-Aiden et al, Science 2009
How to Assign Targets for Enhancer
Binding Transcription Factors?
• Regulatory potential: sum of binding sites
weighted by distance to TSS with exponential
decay
• Decay modeled from Hi-C experiments
TSS
14
Direct Target Identification
• Binary decision?
• Rank product of
regulatory
potential and
differential
expression
• BETA
15
Is My Factor an Activator,
Repressor, or Both?
• Most labs have differential expression profiling of
transcription factor together with TF ChIP-seq
• Do genes with higher regulatory potential show
more up- or down-expression than all the genes
in the genome?
16
ChIP-chip/seq Motif Finding
• ChIP-chip gives 10-5000 binding regions ~2001000bp long. Precise binding motif?
– Raw data is like perfect clustering, plus enrichment
values
• MDscan
– High ChIP ranking => true targets, contain more sites
– Search TF motif from highest ranking targets first
(high signal / background ratio)
– Refine candidate motifs with all targets
17
Similarity Defined by m-match
For a given w-mer and any other random w-mer
TGTAACGT8-mer
TGTAACGTmatched
AGTAACGTmatched
TGCAACATmatched
TGACACGGmatched
AATAACAGmatched
8
7
6
5
4
m-matches for
TGTAACGT
Pick a reasonable m to call two w-mers similar
18
MDscan Seeds
Higher enrichment
A 9-mer
ATTGCAAAT
TTGCAAATC
TTTGCGAAT
Seed
motif pattern
ChIP-chip
selected upstream
sequences
TTGCAAATC
TTGCGAATA
TTGCAAATT
TTGCCCATC
ATTGCAAAT
TTTGCGAAT
TTTGCAAAT
TTTGCAAAT
GCAAATCCA
CAAATCCAA
GCAAATTCG
CAAATCCAA
GCAAATCCA
GAAATCCAC
GGAAATCCA
GGAAATCCT
TGCAAATCC
TGCAAATTC
GCCACCGT
ACCACCGT
ACCACGGT
GCCACGGC
…
19
Update Motifs With Remaining Seqs
Seed1
m-matches
Extreme
High
Rank
All ChIP-selected targets
20
Refine the Motifs
Seed1
m-matches
Extreme
High
Rank
All ChIP-selected targets
21
Further Refine Motifs
• Could also be used to examine known motif
enrichment
• Is motif enrichment correlated with ChIP-seq
enrichment?
• Is motif more enriched in peak summits than
peak flanks?
• Motif analysis could identify transcription factor
partners of ChIP-seq factors
22
Estrogen Receptor
•
•
•
•
Carroll et al, Cell 2005
Overactive in > 70% of breast cancers
Where does it go in the genome?
ChIP-chip on chr21/22, motif and expression
analysis found its “pioneering factor” FoxA1
ER
TF??
Estrogen Receptor (ER)
Cistrome in Breast Cancer
•
•
•
•
Carroll et al, Nat Genet 2006
ER may function far away (100-200KB) from genes
Only 20% of ER sites have PhastCons > 0.2
ER has different effect based on different collaborators
NRIP
ER
AP1
Estrogen Receptor (ER)
Cistrome in Breast Cancer
•
•
•
•
Carroll et al, Nat Genet 2006
ER may function far away (100-200KB) from genes
Only 20% of ER sites have PhastCons > 0.2
ER has different effect based on different collaborators
ER
AP1
NRIP
Cell Type-Specific Binding
• Same TF bind to very different locations in
different tissues and conditions, why?
• TF concentration?
• Collaborating factors, esp pioneering factors
• Interesting observations about pioneering factors
26
Summary
• ChIP-seq identifies genome-wide in vivo proteinDNA interaction sites
• ChIP-seq peak calling to shift reads, and
calculate correct enrichment and FDR
• Functional analysis of ChIP-seq data:
– Strong vs weak binding, conserved vs non-conserved
– Target identification
– Motif analysis
• Cell type-specific binding Epigenetics
27