X - Penn State University

Download Report

Transcript X - Penn State University

Integrative analysis of epigenomes
illuminates differentiation and diseases
of blood cells
Ross Hardison
Department of Biochemistry and Molecular Biology
Huck Institute for Genomics
Penn State University
9/29/16
Bioinformatics and Genomics, UNC
Charlotte
1
Simplified scheme of hematopoiesis
HSC
CMP
MEP
MEG
GMP
ERY EOS Mast GRA MONO
CLP
T
B
NK
2 M sec-1
9/29/16
2
Differentiation and diseases of blood cells
• Lineage specific binding of key transcription factors drives
expression patterns that determine cell type
• Maps of transcription factor occupancy inform models of
regulation
• Cell specific phenotypes arise from lineage-specific binding of
transcription factors at distinct sites
• ValIdated Systematic IntegratiON: A VISION for epigenomics in
hematopoietic gene regulation
– Measure distances between cell types by quantitative comparisons of chromatin
accessibility landscapes and transcriptomes
– Integrative analysis of epigenomics can improve prediction of enhancers
– Formal modeling to understand regulation of a locus and regulatory output of
each cis-regulatory module
• Use this information to increase accuracy of search for genetic
variants in regulatory regions to explain phenotypes
9/29/16
3
The guiding principle of developmental biology:
Differential gene expression determines the
distinctive properties of each cell type.
E.H. Davidson, 1976, Gene Activity in Early Development,
2nd ed.
9/29/16
4
Lineage specific binding of key transcription
factors drives expression patterns that
determine cell type
9/29/16
5
GATA1 is required for production of erythrocytes,
megakaryocytes, mast cells, and eosinophils
ES cells, Gata1HSC
blastocyst
Chimeric mouse
Did the Gata1- ES
cells contribute to
specific lineages?
Pevny et al. 1991. Nature 349:257;
Pevny et al. 1995. Development 121:163
and subsequent papers,
multiple alleles of Gata1
S.H. Orkin (1995) J. Biol. Chem. 270:
4955-4958.
9/29/16
CMP
GMP
MEP
X
X
CLP
X X
MEG ERY EOS Mast GRA MONO T
B
NK
6
Lineage-restricted TFs determine hematopoietic cell fate
HSC
CMP TAL1
MEP
GATA2
PU.1
GMP
IKAROS
PU.1
GATA1
GATA1
CEBPA
TAL1
FLI1 LMO2
GATA1
PU.1
KLF1
MEG
9/29/16
ERY EOS Mast GRA MONO
CLP
PAX5
GATA3
T
GATA3
B
NK
7
CELL-RESTRICTED TRANSCRIPTION
FACTORS REGULATE TARGET GENES
POSITIVELY AND NEGATIVELY
9/29/16
8
Erythroid differentiation in cultured and primary cells
G1E-ER4
BFU-E
9/29/16
Weiss, Yu, Orkin (1997) Mol. Cell. Biol. 17: 1642
Welch et al.. (2004) Blood 104: 3146
Wu et al. 2011 Genome Res 21: 1659-1671
Pilon, Subramanian, Kumar et al.2011 Blood Epub Sep 2011
9
Transcriptional response to GATA1-ER activation in G1E cells
Differentially expressed genes
B
Induced
0
3
7
14
24
30 hr
Represse
d
Platform
Genes
induced
Genes
repressed
References
A
Affymetrix
microarrays
1048
1568
Cheng et al. 2009. Genome Res 19:2172
B
RNA-seq,
polyA+ RNA
1416
1039
Jain, Mishra et al. 2015. Genomics Data
4:1-7
9/29/16
10
TFs regulate lineage-specific genes
GATA1
+
WGATAR
GATA1
WGATAR
Induced gene
Repressed gene
Contexts must differ between induced and repressed:
Sequence, motifs?
Other TFs? Co-activators? Co-repressors?
Chromatin?
Nuclear location?
9/29/16
11
Yong Cheng
Ying Zhang
G. Celine Han
GATA1 OCCUPANCY GENOME-WIDE:
CLUES ABOUT REGULATION
9/29/16
12
Locations of occupancy by GATA1
• ChIP-chip
~3,558 sites
Cheng et al. 2009.
• ChIP-seq
~14,000 sites
Cheng et al. 2009.
Erythroblasts, Wu
et al. 2011; Pimkin
et al. 2014
• ChIP-exo
~10,000 sites
Han et al. 2016.
Mol. Cell Biol.
9/29/16
Jain, Mishra et al. 2015. Genomics Data 4:1-7.
13
Distinguishing features of GATA1-mediated gene induction
• GATA1 tends to bind close to the TSS
– Most often in the first intron but frequently in the proximal flanking region
• Multiple GATA1 OSs
– 58% of induced genes, 24% repressed
• Evolutionary constraint on the GATA motif instances
• Region around the TSS depleted of H3K27me3
9/29/16
Cheng et al. (2009) Genome Res. 19:2172-2184
14
TAL1 + GATA1 = induction
Gerd Blobel
9/29/16
Tripic et al (2009) Blood 113: 2191
Cheng et al (2009) Genome Res. 19: 2172
Wu et al (2011) Genome Res. 21: 1659
Weisheng Wu
15
~15,000 GATA1-bound sites
• More than number of GATA1-responsive genes
(~2,500)
– Average of 6 bound sites per responsive gene
• Far fewer than number of GATA1 binding site motif
instances (~8 million)
– About 1 bound site per 500 motif instances
– Considering DNA segments (500bp) containing at least one
motif instance, about 1 in 150 DNA segments are bound
9/29/16
16
Determinants of GATA1 occupancy: Chromatin >> motifs
• Study DNA segments comparable
in size to ChIP-chip peaks (500bp)
that also have a match to a GATA1
binding site motif
• What distinguishes GATA1-bound
from unbound segments?
• Additional motifs increase
discriminatory power only 2 fold.
• Mark of active chromatin
(H3K4me1) increases
discrimination 25 fold.
Zhang et al. 2009. Nucleic Acids Res 37:7024.
Ying Zhang
9/29/16
17
Epigenetic features associated with transcriptional
regulation, assayed genome-wide
Repressed chromatin
Enhancer
Promoter
Repressed chromatin
H3K27ac
9/29/16
18
Maxim Pimkin, Chris Morrissey, Tejas Mishra, Deepti Jain, Weisheng Wu..
CHANGES IN TF OCCUPANCY DRIVE
DIFFERENTIAL REGULATION
9/29/16
19
Most GATA1 and TAL1 binding sites are distinctive to
ERYs vs MEGs
The TFs GATA1 and TAL1 are required for production of both
erythroblasts and megakaryocytes.
9/29/16
Pimkin et al. (2014) Genome Research 24: 1932
20
Major shifts in TAL1 occupancy during hematopoiesis
9/29/16
Wu et al. (2014) Genome Research 24: 1945
21
ValIdated Systematic IntegratiON: A VISION
for epigenomics in hematopoietic gene
regulation
Ross Hardison
Department of Biochemistry and Molecular Biology
Huck Institute for Genome Sciences
Penn State University
9/29/16
22
Rationale for the VISION project
• Acquisition of genome-wide epigenetic data across
hematopoiesis is no longer the major barrier to
understanding mechanisms of gene regulation during normal
and pathological tissue development
• The chief challenges are how to
– integrate epigenetic data in terms that are accessible and
understandable to a broad community of researchers
– build validated quantitative models explaining how the
dynamics of gene expression relates to epigenetic features
– translate information effectively from mouse models to
potential applications in human health.
9/29/16
23
VISION: ValIdated Systematic IntegratiON of
epigenomics in hematopoietic gene regulation
Acquire
Validate
Integrate
Translate
9/29/16
24
Initial VISION Resources
http://www.bx.psu.edu/~giardine/vision/
2,000,000
Myc
Mouse Dec. 2011 (GRCm38/mm10) chr15:61,950,001-62,300,000 (350,000 bp)
62,050,000
62,100,000
62,150,000
62,200,000
62,250,000
Pvt1
Z11981
H2afy3
BX Browser: Visualize
functional genomics data
3D Genome Browser
CODEX compendium
of functional genomics
Repository of hematopoietic
transcriptomes
neg
pos
Jens Lichtenberg poster
IDEAS data integration
Single cell transcriptomes, HSC
Gottgens lab
ENCODE Element
Browser
9/29/16
Translate between mouse
and human
25
Generate, compile, and curate epigenomic data
736 datasets
Work from
individual
labs
11,774 datasets
High quality,
high information tracks
Hematopoietic cells
:
9/29/16
26
Focus on myeloid-erythroid branches of hematopoiesis
HSC
HPC7
CMP
MEP
GMP
CLP
G1E
ER4
CFU-Mk CFU-E
MEG
ERY EOS Mast GRA MONO
T
B
NK
2 M sec-1
9/29/16
27
Mouse Dec. 2011 (GRCm38/mm10) chr8:122,270,001-122,450,000 (180,000 bp)
122,300,000
122,350,000
122,400,000
Window Position
chr8:
EnhTested
ScriptSeq RNA-seq at
Zfpm1 and neighbors
Zfpm1
Trhr2
Zc3h18
Mvd
Il17c
Gm20735
Cyba
9330133O14Rik
1007 HSC+
1007 HSC1063 HSC+
1063 HSC1009 CMP+
1009 CMP1010 CMP+
1010 CMP1017 GMP+
1017 GMP1018 GMP+
1018 GMP1049 CFU-M+
1049 CFU-M1050 CFU-M+
1050 CFU-M1051 megs+
1051 megs1052 megs+
1052 megs1020 MEP+
1020 MEP1064 MEP+
1064 MEP983 G1E+
983 G1E984 G1E+
984 G1E985 ER4+
985 ER4986 ER4+
986 ER41046 CFU-E+
1046 CFU-E1087 CFU-E+
1087 CFU-E1047 Ery+
9/29/16
1047 Ery1088 Ery+
1088 Ery-
28
Color Key
and Histogram
20
Transcript levels of all genes (RNA-seq)
0
Count
40
Hierarchical clustering: Erythroid separates from others
0.7
0.8
0.9
1
9/29/16
0.79
0.8
CFUE1087
0.87
1
0.84
0.87
0.81
0.82
0.82
0.81
0.8
0.79
0.81
0.81
0.79
0.8
0.81
0.81
CFUE1046
0.87
0.84
1
0.87
0.79
0.8
0.78
0.77
0.77
0.78
0.79
0.78
0.79
0.78
0.79
0.79
ery1088
0.86
0.87
0.87
1
0.83
0.84
0.82
0.81
0.81
0.81
0.82
0.82
0.81
0.82
0.83
0.83
ery1047
0.79
0.81
0.79
0.83
1
0.94
0.84
0.84
0.87
0.87
0.86
0.87
0.88
0.89
0.9
0.9
megs1052
0.8
0.82
0.8
0.84
0.94
1
0.84
0.84
0.87
0.87
0.86
0.87
0.88
0.89
0.9
0.9
megs1051
0.8
0.82
0.78
0.82
0.84
0.84
1
0.92
0.88
0.87
0.9
0.91
0.87
0.88
0.89
0.89
MEP1064
0.79
0.81
0.77
0.81
0.84
0.84
0.92
1
0.9
0.88
0.92
0.92
0.88
0.89
0.9
0.91
MEP1019
0.78
0.8
0.77
0.81
0.87
0.87
0.88
0.9
1
0.93
0.91
0.92
0.88
0.9
0.9
0.91
HSC1063
0.79
0.79
0.78
0.81
0.87
0.87
0.87
0.88
0.93
1
0.91
0.92
0.9
0.91
0.9
0.91
HSC1007
0.8
0.81
0.79
0.82
0.86
0.86
0.9
0.92
0.91
0.91
1
0.94
0.91
0.92
0.92
0.92
CMP1010
0.79
0.81
0.78
0.82
0.87
0.87
0.91
0.92
0.92
0.92
0.94
1
0.91
0.92
0.92
0.93
CMP1009
0.79
0.79
0.79
0.81
0.88
0.88
0.87
0.88
0.88
0.9
0.91
0.91
1
0.94
0.93
0.93
GMP1017
0.78
0.8
0.78
0.82
0.89
0.89
0.88
0.89
0.9
0.91
0.92
0.92
0.94
1
0.94
0.94
GMP1018
0.79
0.81
0.79
0.83
0.9
0.9
0.89
0.9
0.9
0.9
0.92
0.92
0.93
0.94
1
0.95
CFUMk1050
0.8
0.81
0.79
0.83
0.9
0.9
0.89
0.91
0.91
0.91
0.92
0.93
0.93
0.94
0.95
1
CFUMk1049
megs1051
CFUMk1049
0.78
CFUMk1050
0.79
GMP1018
0.79
GMP1017
0.8
CMP1009
0.79
CMP1010
0.78
HSC1007
0.79
HSC1063
0.8
MEP1019
0.8
MEP1064
0.79
ery1047
0.86
ery1088
0.87
CFUE1046
0.87
CFUE1087
BG, July 22, 2016
1
megs1052
Value
29
ATAC-seq in
Zfpm1 and
neighbors
Mouse Dec. 2011 (GRCm38/mm10) chr8:122,270,001-122,450,000 (180,000 bp)
122,300,000
122,350,000
122,400,000
Window Position
chr8:
EnhTested
Zfpm1
Trhr2
Zc3h18
Mvd
Il17c
Gm20735
Cyba
9330133O14Rik
841:HSC ATAC
849:HSC ATAC
987:HSC ATAC
LSK Amit
842:CMP ATAC
850:CMP ATAC
CMP Amit
843:GMP ATAC
851:GMP ATAC
GMP Amit
847:CFU-M ATAC
855:CFU-M ATAC
848:Megs ATAC
856:Megs ATAC
852:MEP ATAC
844:MEP ATAC
MEP Amit
870:G1E ATAC
871:G1E ATAC
872:ER4 ATAC
873:ER4 ATAC
845:CFU-E ATAC
853:CFU-E ATAC
846:Ery ATAC
854:Ery ATAC
Mono Amit
GRA Amit
NK Amit
B Amit
9/29/16
CD4 Amit
CD8 Amit
30
Hierarchical clustering: Erythroid separates from others
Nuclease accessibility (ATAC-seq)
9/29/16
BG, Aug 03, 2016
31
General model for lineage choice
HSC
Similar
regulatory
landscapes
CMP
GMP
CFU-Mk
MEG
Dynamic
TF binding
MEP
CFU-E
ERY
= change in regulatory landscape
9/29/16
• Lineage choice occurs with –
or even via – establishment of
permissive and repressive
chromatin states
• These chromatin states are
relatively stable within a
lineage – even when
expression changes
dramatically
• Induction and repression
within a lineage are largely a
result of changes in patterns of
TF binding on the stage of the
permissive chromatin
32
Nergiz Dogan
INTEGRATIVE ANALYSIS OF EPIGENOMICS
CAN IMPROVE PREDICTION OF ENHANCERS
9/29/16
33
Epigenetic signatures can predict enhancers with high
accuracy: TAL1 occupancy
9/29/16
Dogan et al (2015) Epigenetics & Chromatin 8: 16
34
TF occupancy: frequently active as enhancers
HMs without TFs: rarely active as enhancers
n= 273
9/29/16
Dogan et al. (2015) Epigenetics & Chromatin 8: 16.
35
Integration of epigenetic signals in two dimensions
simultaneously
• Integration of epigenetic signals along chromosomes and
across cell types
• Yu Zhang (Statistics, PSU): Integrative and Discriminative
Epigenome Annotation System (IDEAS)
– Zhang, An, Yue, Hardison (2016) Nucleic Acids Research 44:6721-6731
• Joint characterization of epigenetic landscapes in many cell
types and detection of differential regulatory regions
• Preserves the position-dependent and cell type-specific
information at fine scales
9/29/16
36
Integrative analysis of histone modifications reveals little
change during erythroid maturation
Ernst & Kellis (2012) Nature Methods
9/29/16
Wu et al. (2011) Genome Research 21: 1659.
37
Integrative and Discriminative Epigenome Annotation
System (IDEAS)
Zhang, An, Yue, Hardison (2016) Nucleic Acids Research 44:6721-6731
9/29/16
38
IDEAS to integrate histone modifications and ATAC-seq
across cell types
Mouse Dec. 2011 (GRCm38/mm10) chr8:122,270,002-122,450,000 (179,999 bp)
122,300,000
122,350,000
122,400,000
Window Position
chr8:
EnhTested
Zfpm1
Trhr2
Il17c
Zc3h18
Gm20735
Mvd
Cyba
9330133O14Rik
ATAC: Hardison
& Bodine, Amit
lab
Histone
Mod iChIP:
Amit lab
987:HSC ATAC
LSK Amit
H3K27ac LT-HSC
H3K4me1 LT-HSC
H3K4me3 LT-HSC
842:CMP ATAC
CMP Amit
H3K27ac CMP
H3K4me1 CMP
H3K4me3 CMP
IDEAS: Integrative
and Discriminative
Epigenome
Annotation System:
2D segmentation
Yu Zhang et al. 2016
NAR14:6721-6731
9/17/16
IDEAS LSK bm
IDEAS CMP bm
IDEAS GMP bm
IDEAS MEP bm
IDEAS G1E
IDEAS ER4
IDEAS CFU-E ad
IDEAS Ery ad
IDEAS Meg h
IDEAS GRA bm
IDEAS Mono bm
IDEAS B spl
IDEAS NK spl
IDEAS T CD4 spl
IDEAS T CD8 spl
Promoter
Active chromatin
Quiescent
39
Nascent VISION gives new insights
Previous studies:
Autoregulation by GFI1B
binding to promoter
proximal CRM
Moroy et al. 2005. NAR 33:987.
Multipotent
progenitor
cells
Maturing
erythroid cells
Structural TFs
9/29/16
40
Interpreting the maps as testable hypotheses
9/29/16
41
Try to integrate all the epigenomic and
expression information to derive rules
for regulation that apply globally
rules = equations
9/29/16
42
Modeling
different
aspects of
regulation
in VISION
9/29/16
43
Functional output from distal CRMs measured for Hbb locus
Blood, 2012
9/29/16
44
Locus models for Hbb and Hba
Locus model: States the functional output Xi,j from each of the cis-regulatory
modules (CRMs) contributing to the expression level of the target gene (T).
E.g. here is a formal statement of results from Bender et al. 2012:
THbb = XHS1 + XHS2 + XHS3 + XHS4 + XHS5,6 = 0.22 + 0.41 + 0.29 + 0.19 + 0.03
For the Hba complex of enhancers (Hay et al. 2016. Nature Genetics 48: 898):
THba = XR1 + XR2 + XR3 + XRm + XR4 = 0.3 + 0.5 + 0.1 + 0.05 + 0.2
9/29/16
45
Models for cis-regulatory modules (CRMs)
CRM model: Quantitative estimates of the contribution of epigenomic
features, sequence, conservation, etc. to the functional output Xi,j from
each of the CRMs
XHS2, Hbb-b1 = 0.41= combination of f(chromatin state), f(TF occupancy), …
XHS1, Hbb-b1 = 0.22
Window Position
chr7:
846:Ery ATAC
854:Ery ATAC
IDEAS LSK bm
IDEAS CMP bm
IDEAS GMP bm
IDEAS MEP bm
IDEAS CFU-E ad
IDEAS Ery ad
IDEAS Meg h
IDEAS GRA bm
IDEAS Mono bm
IDEAS B spl
4_Nfe2_HPC7
G1E Gata2
G1E Tal1
G1E Smad1
HPC Tal1
HPC Ldb1
G1E-ER4 Gata1
G1E-ER4 Tal1
G1ER Smad1
Ter119+ Gata1
Ery Tal1
Ter119+ Nfe2
Mouse Dec. 2011 (GRCm38/mm10) chr7:103,858,283-103,862,310 (4,028 bp)
103,859,000
103,860,000
103,861,000
103,862,000
HS1
9/29/16
HS2
46
Global application of models
• Once you have a CRM model, you can apply it globally
• It is an equation using variables for which you have
measurements genome-wide
– H3K27ac, GATA1 occupancy, TAL1 occupancy, motifs, etc.
• So you can predict Xi,j for all candidate CRMs
• We learned it from a few CRMs in a few loci, and of course it
should work there. But what about other loci?
• Test these predictions! Genome editing in additional,
reference loci
9/29/16
47
EPIGENOME MAPS PROVIDE A
GUIDE TO NONCODING VARIANTS
ASSOCIATED WITH PHENOTYPE
9/29/16
48
Variants affecting gene regulation play a prominent role
in complex traits
• The majority of genomic variants associated with complex traits are not in
protein-coding exons
– Hindorff et al (2009) PNAS 106:9362.
• Phenotype-associated, noncoding variants are highly enriched in DNA with
epigenetic signatures of regulatory regions.
Maurano et al. (2012) Science 337: 1190
Schaub et al. (2012) Genome Research
9/29/16
ENCODE Consortium (2012) Integrated Encyclopedia … Nature
49
From GWAS results to allele-specific regulation
CRM = cis regulatory
module, e.g. enhancer
9/29/16
Hardison (2012) JBC 287:30932. Minireview on Epigenetic data as guide to interpret GWAS
50
Cluster of SNPs associated with inflammatory
diseases are close to sites occupied by GATA factors
9/29/16
ENCODE Consortium (2014) Integrated Encyclopedia … Nature
51
Strategy for linking regulatory variation to phenotype
•
•
•
•
•
Locus with
Human Feb. 2009 (GRCh37/hg19) chr6:135,370,001-135,520,000 (150,000 bp)
Window Position
phenotypechr6:
135,400,000
135,450,000
135,500,000
GWAS Catalog
associated
HBS1L
MYB
CandCRM_HiHbF
variants
GSM1339559_UwStam_FLER-DS13348.perBase.36.hg19
DNase
FL ERY
Identify
GSM1339560_UwStam_FLER-DS13290.perBase.36.hg19
DNase
FL ERY
candidate
CRMs from DNase
Multipot prog
ENCFF001CQQ
epigenomic
PBDE GAT1 UCD
data
K562 GATA1 IgM
Find common
and rare
K562 TAL1 IgM
variants in
K562 p300 IgR
CRMs for in
cohorts of
K562 CTCF IgR
patients
K562 Rad2 Std
Predict those
K562 SMC3 IgR
likely to affect
regulation
Candidate enhancers
Candidate loop bases
Test for allelespecific effects
9/29/16
52
Differentiation and diseases of blood cells
• Lineage specific binding of key transcription factors drives
expression patterns that determine cell type
• Maps of transcription factor occupancy inform models of
regulation
• Cell specific phenotypes arise from lineage-specific binding of
transcription factors at distinct sites
• ValIdated Systematic IntegratiON: A VISION for epigenomics in
hematopoietic gene regulation
– Measure distances between cell types by quantitative comparisons of chromatin
accessibility landscapes and transcriptomes
– Integrative analysis of epigenomics can improve prediction of enhancers
– Formal modeling to understand regulation of a locus and regulatory output of
each cis-regulatory module
• Use this information to increase accuracy of search for genetic
variants in regulatory regions to explain phenotypes
9/29/16
53
Thanks to the VISION team
Cheryl Keller
Yu Zhang
Gerd Blobel
James Taylor
Berthold Gottgens
Amber Miller
Feng Yue
Mitch Weiss
David Bodine
Doug Higgs
Belinda Giardine
Jim Hughes
Hardison Lab
http://www.bx.psu.edu/~giardine/vision/
9/29/16
Supported by
54
Deliverables from VISION
• Comprehensive catalogs of cis-regulatory modules utilized
during hematopoiesis
– Built by integration of multiple data types
– Validated by extensive experimental tests
• Quantitative models for gene regulation
– Built by machine learning
– Extensively tested by genome editing approaches in ten reference loci
– Predictions applied genome-wide.
• A guide for investigators to translate insights from mouse
models to human clinical studies.
9/29/16
55