Gene-specific Analysis

Download Report

Transcript Gene-specific Analysis

Andrea Baccarelli, MD, PhD, MPH
Laboratory of Environmental Epigenetics
Harvard School of Public Health
[email protected]
Lecture 4
Technologies for DNA
Methylation Analyses
Papers on Epigenetics by Year
Burris and Baccarelli, J Appl Tox 2014
Methods for DNA Methylation Analysis:
Historical Overview
1942
Conrad Waddington coins the
term epigenetics
1962
Huang & Bonner report that histones
suppress RNA synthesis
1948
DNA methylation is first described
in bacteria
1949
Barr & Bertram describe condensed
X-chromosome, eventually found to
be highly methylated
1953
Watson & Crick describe
the DNA double helix
1985
CpG Islands are first described
1992
Wollfe and Bird laboratories
describe methylated-DNA
binding domains
1964
Allfrey et al. link histone acetylation and
methylation to RNA synthesis control
2009
Irizarry et al. describe
CpG shores
1964
DNA methyl transferase is identified
in E. Coli
1975
Riggs and Holliday & Pugh independently
suggest that 5mC is duplicated during cell
division and controls gene expression
2009
5-hydroxymethylcytosine is
described in human tissues
1978
Bird & Southern use restriction
enzymes to detect DNA methylation
1989
Restriction
endonucleases
2002
Methylation analysis by
ligation mediated PCR
1992
Bisulfite sequencing
Qualitative analyses
Adapted from Baccarelli et al., Circulation Genetics 2010
2006
Low-density
arrays
(1536 CpGs)
1996
Methylation
specific PCR
Pyrosequencing
2003
MALDI-TOF
2007
Microarrays
2007
Next Generation
Sequencing
Quantitative analyses
2015
486K CpGs
2.1M features
5M CpG sites
Genome wide
Lecture 4 Outline
Gene-specific
analysis
Global
methylation
content
Genome-wide
methods
From single gene to the epigenome
DNA methylation & histone modification analysis
• Molecular biology: largely related to genetics
– methods that analyze the DNA sequence
• Epigenetic marks: do not modify the
underlying DNA sequence
• Workarounds:
– Bisulfite treatment
– Antibody-based methods
(or alternatively methyl-binding proteins)
Bisulfite treatment
Antibodies
Modifies non methylated
cytosines
Bind modified or
methylated cytosines,
modified histones
Differentiation of
methylated and non
methylated cytosines
DNA enriched with the
mark of interest
(Ab specificity)
Any method that can
analyze sequence
Any method that can
quantify enrichment
Workaround
Analysis
DNA methylation
Histone modifications
Some lab nomenclature
• DNA Methylation
• Gene-specific analysis
– How much methylation at or nearby a candidate gene
• Global methylation content
– How much methylation in a test DNA, regardless of the
position
• Genome-wide scans
– Microarrays, Next Generation Sequencing
• Histone Modifications
• Gene-specific analysis
• Global modification content
• Genome-wide scans
Gene-specific analysis
Measuring methylation in candidate genes
Bisulfite modification of DNA
• Prior to PCR,
DNA is treated with
sodium bisulfite
• Non-methylated C is
permanently modified
to U
• In PCR, U and T are
equivalent
5mC is protected from conversion
Genomic
DNA
Methylated DNA
Unmethylated DNA
Bisulfite modified
DNA
5’-CTACGG-3’
5’-TTACGG-3’
3’-GATGCC-5’
3’-GATGCT-5’
5’-CTACGG-3’
5’-TTATGG-3’
3’-GATGCC-5’
3’-GATGTT-5’
Any methods that read the resulting DNA sequence can now
discriminate Methylated (M) vs. Unmethylated (U)
Gene-specific Analysis
• You want to investigate whether a
specific sequence/gene is
hypomethylated or hypomethylated:
– In a specific disease
– Following an exposure to an
environmental toxicant
– Etc.
Gene-specific Analysis:
assay set up
– Get information about the gene/sequence
(e.g., promoter)
www.genome.ucsc.edu
– CpG islands are between 200-1000 bp in
length
– CpG shores are up to 2kb from the
transcription start site
– Methylation is not tipically evenly
distributed
Gene-specific Analysis:
Methods
• Methylation-specific PCR (MSP)
• Quantitative Methylation Specific PCR
– Methylight (not discussed here)
• Bisulfite sequencing
– Pyrosequencing
• MALDI-TOF
– Sequenom
Karin Michels, Epigenetic Epidemiology, pag 41
Karin Michel. Epigenetic Epidemiology, pag 41
Methylation-Specific PCR (MSP)
Methylation-Specific PCR (MSP)
• Based on bisulfite modification
• A pair of PCR assays
– Methylation-dependent primer design
– M & U assays, run side-by-side on the same
sample
• Most common methylation analysis technique
in the literature
MSP
Methylation-Specific PCR (MSP)
• Multiple CpG and non-CpG cytosines in primers
• If so, MSP does not allow for the measurement of
methylation at individual CpG cytosines
• Qualitative Results
– A gene/sequence is called as either methylated or
unmethylated
– The cutoff of methylation for the call is undetermined
– Semiquantitative analisys of band density can be
performed
MSP assay for p16
• Unmethylated tissues
U – Primer for unmethylated sequence (post bisulfite treatment)
M – Primer for methylated sequence (post bisulfite treatment)
W– Primer for untreated DNA (“Wild type”). This is used as control
for complete bisulfite conversion
Herman, et al. PNAS 1996
MSP assay for p16
• Methylated tissues
U – Primer for unmethylated sequence (post bisulfite treatment)
M – Primer for methylated sequence (post bisulfite treatment)
W– Primer for untreated DNA (“Wild type”). This is used as control
for complete bisulfite conversion
Herman, et al. PNAS 1996
Karin Michel. Epigenetic Epidemiology, pag 41
Bisulfite sequencing
• Bisulfite modification + Genomic sequencing
• Gold standard in DNA methylation analysis
• Different methods
– Sequencing (Qualitative or semiquantitative)
– Pyrosequencing (Highly quantitative)
Pyrosequencing
Pyrosequencing Analysis
• Step 1: Bisulfite modification
• Step 2: Unbiased PCR amplification
– Include non-CpG, avoid CpG in primers
• Step 3: Pyrosequencing of PCR products
– quantitation of average sequence for the
population of molecules in the test DNA
Pyrosequencing
Example of a Pyrosequencing Output (Pyrogram)
B10 : TAC /TGAAAATATAAC /TGAAGC /TGTGTATGATC /TGAAATTAGAAGAGATTAAAGTAAAAT
59%
62%
66%
71%
2000
1500
1000
500
0
E
S
A
T
G
A
T
5
Assay Name: MK3
Sample ID: Example1
C
G
A
A
T
10
A
T
G
A
T
15
C
G
A
T
G
20
T
C
G
T
G
25
T
A
T
G
T
30
A
T
C
G
A
35
T
Tost et al. Biotechniques 2003
Karin Michel. Epigenetic Epidemiology, pag 41
MALDI-TOF Sequenom Mass-Array
MALDI-TOF MS
• Matrix-assisted laser desorption/ionization
time-of-flight mass spectrometry
• Developed by Sequenom, Inc
(MassArray Quantitative Methylation Analysis)
• Steps:
– Bisulfite treatment
– Cleavage at specific bases
• T reaction + C reaction
– MALDI-TOF detection
MALDI-TOF
The ions enter a vacuum where they are accelerated by a strong electric field in a 'flight
tube' where they are then separated in time and finally hit the detector. An analyser
measures the time-of-flight (TOF) taken for particular ions to hit the detector.
TOF of an ion is related to its mass-to-charge ratio (m/z), thus, mass spectra can be
generated from simple time measurements.
MALDI-TOF
-Allows for analysis in one single run of 80/300 bp
-High start up costs, high reagent costs
-384 well plates, good for very large studies
MSP: Methylation>25% generated positive results
Ammerpohl O et al. Biochim Biophys Acta. 2009 Sep;1790(9):847-62.
Pros/Cons
• Methylation Specific PCR
– Pro: inexpensive and easy to perform
– Cons: qualitative/semiquantitative, no single site resolution
• Real-time PCR
– Pro: equipment easily accessible
– Cons: low precision, no single site resolution
• Pyrosequencing
– Pro: Highly quantitative, single site resolution
– Cons: dedicated equipment, problems with high density CpGs
• Maldi-TOF (Sequenom Mass Array)
– Pro: Quantitative, single site resolution, extended sequence
(amplicon)
– Cons: dedicated equipment (high costs), high costs/gene,
sequence fragmentation may exclude some CpG sites
Global methylation
content
How much methylation in your tissue?
How much methylation in your tissue?
• Global methylation
content refers to the
amount of DNA
methylation in a
tissue/DNA sample
• Regardless of the location
of methylation
• No information on
individual CpG
sites/genes/loci
Lower Global Methylation
• Tissue DNA
– Cancer (Feinberg & Vogelstein, 1982)
– Atherosclerotic lesions (Hiltunen, 2002)
• Blood DNA
–
–
–
–
–
–
Cancer (Hsiung, 2007)
Cardiovascular Disease (Castro, 2003)
Folate deficiency (Choi, 2005)
Inflammatory states (Stenvinkel, 2007)
Aging (Fuke 2004)
Environmental Exposures (Chanda 2005, Bollati 2007,
Rusiecki 2008, Baccarelli, 2009)
Figure 7.13 Genomes 3 (© Garland Science 2007)
Global Methylation
• Most methylation in repeated elements:
– LINE-1 elements: >500K/haploid genome
– Alu elements: >1,100K/haploid genome
• LINE-1/Alu methylation is correlated with global
content (Weisenberger, 2005)
• Function of repeated elements?:
– Chromosomal structure
– Repeat sequence transcription
– miRNA
How to confuse one reader’s mind
•
•
•
•
•
Global DNA methylation
Genomic methylation content
Genome-wide methylation content
Genome-wide methylation
Global cell methylation
Analysis of Global Methylation Content
• Direct measurement of methyl group content
– High-Performance liquid chromatography
(Ehlrich M, et al., Nucl Ac Res 1982)
– GC/MS (Rossella F et al., Rapid Comm MS 2009)
– Immunostaining with anti-5mC
– Digestion with methylation sensitive enzymes (MspI, HpaII)
• Estimated in repeated elements
– Based on Pyrosequencing
(Yang et al. Nucl Ac Res 2004)
– Based on Methylight
(Weisenberger et al. Nucl Ac Res 2005)
HPLC for Global Methylation Contents Analysis
•
•
•
•
Highly quantitative and reproducible
Requires larger amounts of DNA
Expensive
Not suitable for high throughput analysis
Retrotransposons
Retrotransposon
• Transposition: Movement of
gene from one chromosome
to another or movement
from one site to another;
does not require homology
• Transposons: mobile
genetic elements that
enable genes to move
between non-similar sites
• Retrotransposition: Creates
genetic diversity
• Retrotransposons: Replicate
and move to other sites on
DNA through an RNA
intermediate
RNA polymerase
Retrotransposon mRNA
Reverse
transcriptase
Retrotransposon DNA
Integration Protein
Original
Retrotransposon
Duplicated
retrotransposon
Estimating Global Methylation Content in Repeated Elements
Alu methylation
Yang et al., Nucl Ac Res
2005
What do we measure in repeated elements
• A few CpG sites in a specific repeated element
sequence
• The sequence (and the CpG sites therein) are
repeated throughout one single haploid genome:
– LINE-1 elements: >500K/haploid genome
– Alu elements: >1,100K/haploid genome
• The LINE-1 and Alu assays measure the %mC in those
repeated elements, with no distiction about their
position in the genome
Weisenberger, et al.
Nucl Ac Res 2005
Advantages of repeated element analysis for global
methylation
• Advantages
–
–
–
–
–
Requires lower amounts of DNA
Higher throughput
Less expensive
Quantitative
Highly quantitative with Pyrosequencing
• Disadvantages
– Correlation with global content demonstrated only in
studies including cancer tissues
– A marker of global methylation only for cancer tissues
Genome-wide analysis
Unbiased screens of the epigenome
Bisulfite conversion-based
Microarray Analysis
A DNA microarray is a technology that consists of an arrayed series of
thousands of microscopic spots of DNA oligonucleotides, called probes, that
are used to hybridize a target sequence, under high-stringency conditions.
Probe-target hybridization is usually detected and quantified by detection of
fluorophore-, or chemiluminescence-labeled targets.
Fluorescently labelled target
sequences that bind to a probe
sequence generate a signal that
depends on:
• strength of the hybridization
(determined by the number of paired
bases)
• hybridization conditions (such as
temperature)
• washing after hybridization.
Infinium methylation assay
Unmethylated cytosines are chemically deaminated to uracil
in the presence of bisulfite, while methylated cytosines are
refractory to the effects of bisulfite and remain cytosine.
After bisulfite conversion, each sample is whole-genome
amplified (WGA) and enzymatically fragmented.
The bisulfite-converted WGA-DNA samples is purified and
applied to the BeadChips.
Features covered in the 450k Infinium BeadChip
The 450K BeadChip covers a total of 77,537 CpG Islands and CpG Shores (N+S)
Region Type
Regions
CpG Island
N Shore
S Shore
N Shelf
S Shelf
Remote/Unassigned
Total
26,153
25,770
25,614
23,896
23,968
-
N Shelf
TSS1500
N Shore
TSS200
CpG sites covered on Average # of CpG
450K BeadChip array sites per region
139,265
73,508
71,119
49,093
48,524
104,926
485,553
CpG Island
5.08
2.74
2.66
1.97
1.94
-
S Shore
5’ UTR
The 450K BeadChip covers a total of 20,617 genes
S Shelf
3’ UTR
Illumina Infinium 450K
peculiar kind of data
54
Illumina Infinium 450K
peculiar kind of data
M = -∞
M = +∞
55
Illumina Infinium 450K
peculiar kind of data
• Handling β-values
– Linear regression and hope for the best
– Robust regression (Joubert BR et al EHP 2012)
– Beta-regression (Seow WJ, PLOSone 2012)
56
Next Generation (Deep) sequencing
• Higher coverage: mapping DNA methylation at
single-base resolution provides a quantitative
measure of methylation abundance, rather than
relative measure from array-based methods.
• Lower cost than all the
commonly used methods
for whole-genome approaches
(m)RRBS: (multiplexed) Reduced Representation
Bisulfite Sequencing
• Utilizes cutting pattern of MspI (C^CGG) to
systematically digest CpG-poor DNA
• Covers the majority of CpG islands and promoters,
and a reasonable number of exons, shores and
enhancers
• Advantages:
– Only need 50-200ng DNA
– can be from any species
– Cost and time
(reduced sequencing, 96-well format, multiplexing)
Sequencing Capture Epi CpGiant (Roche, Inc)
Materials from Roche, Inc
MICROARRAY
Next Gen SEQUENCING
Microarray vs. NGS
Requires a priori
Knowledge of genome
knowledge of the genome annotation is helpful, but
or genomic features
not required
Major obstacle in
microarray analysis is
cross-hybridization
between similar
sequences
Directly sequencing
removes experimental
bias and crosshybridization issues from
the analysis
High signal to noise ratios
and the fact that
competitive hybridization
on microarrays is a
relative measure
Quantification is based on
counting sequence tags
rather than relative
measures between
samples
Micrograms of DNA are
needed to hybridize to
arrays
Nanograms of material
are sufficient
Next lecture
Tissue specificity & Study Design in Epigenetics
In-class Review of Selected Readings
1. Simpkin, Andrew J., et al. "Longitudinal analysis of DNA methylation associated with
birth weight and gestational age." Human molecular genetics(2015): ddv119.
2. Sharp, Gemma C., et al. "Maternal pre-pregnancy BMI and gestational weight gain,
offspring DNA methylation and later offspring adiposity: Findings from the Avon
Longitudinal Study of Parents and Children." International journal of epidemiology
(2015): dyv042.