Yeast whole-genome analysis of conserved regulatory motifs

Download Report

Transcript Yeast whole-genome analysis of conserved regulatory motifs

Chromatin state dynamics in nine human cell types
elucidate regulators and disease-associated SNPs
Jason Ernst
Joint work with Pouya Kheradpour, Luke Ward
Brad Bernstein and Manolis Kellis
Goal: interpreting disease-associated
variants using epigenomics
CATGACTG
CATGCCTG
Epigenomics
Disease variants
• GWAS implicate hundreds of non-coding loci with disease
• Challenges towards interpreting disease variants:
– Find ‘true’ causative SNP among many in Linkage Disequilibrium
– Determine type of function: especially outside protein-coding
– Reveal relevant cell type of activity
– Link to upstream regulators and downstream target genes
 Epigenomics tools to address these challenges
2
From chromatin states to disease
Chromatin State Introduction
Chromatin State Dynamics across Cell Types
Reveal enhancer networks: TFenhancertarget
Use these to study disease-associated variants
From chromatin states to disease
Chromatin State Introduction
Chromatin State Dynamics across Cell Types
Reveal enhancer networks: TFenhancertarget
Use these to study disease-associated variants
Challenge of data integration in many marks/cells
Construct antibodies
pull down chromatin
 ChIP-seq tracks
Epigenomic information
retains genome ‘state’
in differentiation
and development
Two types:
DNA methyl.
Histone marks
Histone tail
modifications
• Dozens of chromatin tracks
• Understand their function
• Reveal their combinations
• Annotate systematically
• Common chromatin states
DNA packaged into
chromatin around
histone proteins
• Explicitly model combinations
• Unsupervised approach,
probabilistic model
From ‘chromatin marks’ to ‘chromatin states’
Promoter states
Transcribed states
Active Intergenic
Repressed
• Learn de novo
significant
combinations of
chromatin marks
• Reveal functional
elements, even
without looking
at sequence
• Use for genome
annotation
• Use for studying
regulation
dynamics in
different cell
types
Ernst and Kellis, Nat Biotech 2010
From chromatin states to disease
Chromatin State Introduction
Chromatin State Dynamics across Cell Types
Reveal enhancer networks: TFenhancertarget
Use these to study disease-associated variants
ENCODE: Study nine marks in nine human cell lines
9 human cell types
9 marks
81 Chromatin Tracks (2^81 combinations)
H3K4me1
HUVEC
Umbilical vein endothelial
H3K4me2
NHEK
Keratinocytes
GM12878
Lymphoblastoid
K562
Myelogenous leukemia
HepG2
Liver carcinoma
H4K20me1
NHLF
Normal human lung fibroblast
H3K36me3
HMEC
Mammary epithelial cell
CTCF
HSMM
Skeletal muscle myoblasts
H1
Embryonic
H3K4me3
H3K27ac
H3K9ac
H3K27me3
+WCE
+RNA
x
Brad Bernstein Chromatin Group
Ernst et al, Nature 2011
Chromatin states dynamics across nine cell types
• Single annotation track for each cell type
• Summarize cell-type activity at a glance
• Can study 9-cell activity pattern across
From chromatin states to disease
Chromatin State Introduction
Chromatin State Dynamics across Cell Types
Reveal enhancer networks: TFenhancertarget
Use these to study disease-associated variants
Introducing multi-cell activity profiles
Gene
expression
Chromatin
States
Active TF motif
enrichment
TF regulator
expression
Dip-aligned
motif biases
HUVEC
NHEK
GM12878
K562
HepG2
NHLF
HMEC
HSMM
H1
ON
OFF
Active enhancer
Repressed
Motif enrichment
Motif depletion
TF On
TF Off
Motif aligned
Flat profile
Linking Distal Regulatory Elements to Genes
Which gene(s) is this active enhancer in HMEC likely regulating?
?
HMEC state
IRF6
expression
-0.7
?
H3K27ac signal
-1.1
-1.7
1.2
-1.6
0.0
-1.7
-1.3
0.9
0.5
-1.6
-0.1
-1.6
0.1
4.2
0.4
3.7
0.3
Compute correlations between gene expression levels and
enhancer associated histone modification signals
C1orf107
expression
12
Linking Distal Regulatory Elements to Genes
Which gene(s) is this active enhancer in HMEC likely regulating?
Random gene
expression HMEC state
-1.1
IRF6
expression
4.0
-1.7
-0.5
-1.6
-0.8
-1.7
0.5
0.9
-0.5
-1.6
0.6
-1.6
-1.1
4.2
-1.0
3.7
Random
H3K27ac signal
-0.7
Combine intensity signal from all marks:
Train logistic regression classifier to
discriminate real from random correlations,
conditioned on state, TSS dist, cell type
Real
Compare correlations between enhancer and gene expression
between real and randomized data
13
Enhancer-gene links supported by eQTL-gene links
eQTL study
15kb
Individuals
Indiv. 1
-0.5
Indiv. 2
-1.5
Indiv. 3
-1.8
Indiv. 4
3.1
Indiv. 5
1.1
Indiv. 6
-1.8
Indiv. 7
-1.4
Indiv. 8
3.2
Indiv. 9
4.4
…
…
Expression
level of gene
A
A
A
C
A
A
A
C
C
…
Validation rationale:
• Expression Quantitative Trait Loci (eQTLs)
provide independent SNP-to-gene links
• Do they agree with activity-based links?
Example: Lymphoblastoid (GM) cells study
• Expression/genotype across 60 individuals
(Montgomery et al, Nature 2010)
• 120 eQTLs are eligible for enhancer-gene
linking based on our datasets
• 51 actually linked (43%) using predictions
 4-fold enrichment (10% exp. by chance)
Sequence variant
at distal position
• Independent validation of links.
• Relevance to disease datasets.14
From chromatin states to disease
Chromatin State Introduction
Chromatin State Dynamics across Cell Types
Reveal enhancer networks: TFenhancertarget

Use these to study disease-associated variants
Introducing multi-cell activity profiles
Gene
expression
Chromatin
States
Active TF motif
enrichment
TF regulator
expression
Dip-aligned
motif biases
HUVEC
NHEK
GM12878
K562
HepG2
NHLF
HMEC
HSMM
H1
ON
OFF
Active enhancer
Repressed
Motif enrichment
Motif depletion
TF On
TF Off
Motif aligned
Flat profile
Coordinated activity reveals activators/repressors
Enhancer
activity
Gene
activity
Predicted
regulators
Activity signatures for each TF
• Enhancer networks: Regulator  enhancer  target gene
• Ex1: Oct4 predicted activator of embryonic stem (ES) cells
• Ex2: Gfi1 repressor of K562/GM cells
Causal motifs supported by dips & enhancer assays
Dip evidence of TF binding
(nucleosome displacement)
Enhancer activity halved
by single-motif disruption
 Motifs bound by TF, contribute to enhancers
18
From chromatin states to disease
Chromatin State Introduction
Chromatin State Dynamics across Cell Types
Reveal enhancer networks: TFenhancertarget
Use these to study disease-associated variants
Revisiting diseaseassociated variants
(Ganesh et al, Nat Genet 2009)
(Teslovich et al, Nature 2010)
(Stahl et al, Nat Genet 2010)
(Liu et al, Nat Genet 2010)
(Han et al, Nat Genet 2009)
(Kathiresan et al, 2008)
(Kamatani et al, Nat Genet 2009)
(Soranzo et al, Nat Genet 2009)
(Houlston et al, Nat Genet 2008)
(Newton-Chen et al, Nat Genet 2009)
rs9271100
• Disease-associated SNPs enriched for enhancers in relevant cell type
Ex1: Systemic lupus erythrematosus SNP: Ets-1 motif
• SNP in lymphoblastoid GM enhancer state
• Disrupts Ets1 motif instance, predicted GM regulator
 Model: Disease SNP abolishes GM enhancer
Ets-1 is a predicted activator of GM enhancers
Enhancer
activity
Activity signatures for each TF
• Ets expression  Ets-1 motif enrichment in enhancers
 Model: Ets-1 disruption would abolish enhancer state
Ex2: Erythrocyte phenotype study SNP: Gfi-1 motif
K562: erythroleukaemia cell type
`
`
• Disease SNP creates motif instance for Gfi-1 repressor
• Gfi-1 predicted repressor for K562-specific enhancers
 Creation of repressive motif abolishes K562 enhancer
Gfi-1 is a predicted repressor of non-K562 enhancers
Enhancer
activity
Activity signatures for each TF
• Gfi expression  Gfi-1 motif depletion in enhancers
• Prediction: Gfi-1 large-scale repression of non-K562
 Motif created  Gfi-1 recruited  enhancer repressed
SNPs from GWAS Enrich for Cell Type Specific Strong Enhancer
Chromatin States in Biologically Relevant Cell Types
Title
Author/
Journal
Multiple loci influence erythrocyte phenotypes in the
Ganesh et al
CHARGE Consortium.
Nat Genet 2009
Biological, clinical and population relevance of 95 loci
Teslovich et al
for blood lipids
Nature 2010
Genome-wide association study meta-analysis identifies
Stahl et al
seven new rheumatoid arthritis risk loci
Nat Genet 2010
Genome-wide meta-analyses identify three loci
Liu et al
associated with primary biliary cirrhosis
Nat Genet 2010
Genome-wide association study in a Chinese Han
population identifies nine new susceptibility loci for
Han et al
systemic lupus erythematosus.
Nat Genet 2009
Six new loci associated with blood low-density
lipoprotein cholesterol, high-density lipoprotein
Kathiresan et al
cholesterol or triglycerides in humans.
Nat Genet 2008
Genome-wide association study of hematological and
Kamatani et al
biochemical traits in a Japanese population
Nat Genet 2009
A genome-wide meta-analysis identifies 22 loci
associated with eight hematological parameters in the
Soranzo et al
HaemGen consortium.
Nat Genet 2009
Meta-analysis of genome-wide association data
identifies four new susceptibility loci for colorectal
Houlston et al
cancer.
Nat Genet 2008
Genome-wide association study identifies eight loci
Newton-Chen et al
associated with blood pressure.
Nat Genet 2009
Total
#SNPs
Fold
35
101
Cell
Type
17 K562
11 HepG2
# SNPs in
Strong
enhancers
FDR
9
0.02
13
0.02
29
15 GM12878
7
0.03
6
41 GM12878
4
0.03
18
21 GM12878
6
0.03
18
24 HepG2
5
0.03
39
12 K562
7
0.03
28
15 K562
6
0.03
4
66 HepG2
3
0.03
9
30 K562
4
0.04
From chromatin states to disease
Chromatin State Introduction
Chromatin State Dynamics across Cell Types
Reveal enhancer networks: TFenhancertarget
Use these to study disease-associated variants
Chromatin state dynamics: Contributions summary
• Chromatin states capture mark combinations
– Reveal promoter/enhancer/insulator/transcribed regions
• Chromatin states capture chromatin dynamics
– Single annotation track for each cell type
– One 15-state track per cell type instead of 29 combinations
• Activity profiles capture correlated changes
– Gene expression vs. chromatin: EnhancerGene links
– Motifs vs. TF expr vs. chromatin: Activators/Repressors
• Regulatory predictions validated: eQTLs/dips/lucif.
– eQTLs: links. Dips: binding. Luciferase assays: motif role
• Interpret disease-associated variants
– Intergenic SNPs enriched for cell-type specific enhancers
– Mechanistic predictions reveal potential drug targets
Ever-expanding dimensions of epigenomics
Additional dimensions:
Environment
Thousands of whole-genome
Genotype
datasets
Disease
Gender
Chromatin marks
Stage
Age
Cell types
• Today: Cell-type and chromatin-mark dimensions
• Next: Personal epigenomes: genotype/phenotype
• Complete matrix of conditions, individuals, alleles
Collaborators and Acknowledgements
Broad Institute/
MGH Pathology/HHMI:
• Tarjei Mikkelsen
MIT compbio group:
• Noam Shoresh
• Pouya Kheradpour
• Charles B. Epstein
• Lucas Ward
• Xiaolan Zhang
• Manolis Kellis
• Li Wang
ENCODE consortium • Robyn Issner
• Michael Coyne
Funding
• Manching Ku
• NHGRI, NIH, NSF,
• Timothy Durham
HHMI, Sloan Foundation • Bradley E. Bernstein