Genome Biology and

Download Report

Transcript Genome Biology and

Genome Biology and
Biotechnology
9. The localizome
Prof. M. Zabeau
Department of Plant Systems Biology
Flanders Interuniversity Institute for Biotechnology (VIB)
University of Gent
International course 2005
Summary
¤ DNA localizome or DNA interactome
– Genome-wide mapping of DNA binding proteins
• Transcription factor binding sites
• Localization of replication origins
¤ Protein localizome
– High throughput localization of proteins in cellular
compartments
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Genome-wide Analysis of Regulatory Sequences
¤ Gene expression is regulated by transcription factors
selectively binding to regulatory regions
– protein–DNA interactions involve sequence-specific recognition
– Other factors, such as chromatin structure may be involved
¤ Sequence-specific DNA-binding proteins from
eukaryotes generally
– recognize degenerate motifs of 5–10 base pairs
– Consequently, potential recognition sequences for transcription
factors occur frequently throughout the genome
¤ Genome-wide surveys of in vivo DNA binding proteins
– provides a platform to answer these questions
Genome-wide Analysis of Regulatory Sequences
¤ Methods combine
– Large-scale analysis of in vivo
protein–DNA crosslinking
– microarray technology
¤ ChIP-on-chip
– Chromatin ImmunoPrecipitation on DNA chips
Reprinted from: Biggin M., Nature Genet. 28, 303 (2001)
Genome-Wide Location and Function of DNA Binding
Proteins
Ren et. al., Science, 290, 2306 (2000)
¤ Paper presents
– proof of principle for microarray-based approaches to determine
the genome-wide location of DNA-bound proteins
• Study of the binding sites of a couple of well known gene-specific
transcription activators in yeast: Gal4 and Ste12
– Combines data from
• in vivo DNA binding analysis with
• expression analysis
• to identify genes whose expression is directly controlled by these
transcription factors
Chromatin Immuno Precipitation (Chip) Procedure
– Cells are fixed with formaldehyde, harvested, and sonicated
– DNA fragments cross-linked to a protein of interest are enriched by
immunoprecipitation with a specific antibody
– Immuno-precipitated DNA is amplified and labeled with the fluorescent
dye Cy5
– Control DNA not enriched by immunoprecipitation is amplified and
labeled with the different fluorophore Cy3
– DNAs are mixed and hybridized to a microarray of intergenic sequences
– The relative binding of the protein of interest to each sequence is
calculated from the IP-enriched/unenriched ratio of fluorescence from
3 experiments
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Modified Chromatin Immuno Precipitation (Chip) Procedure
Close-up of a scanned image of a micro-array containing 6361 intergenic
region DNA fragments of the yeast genome
ChIP-enriched
DNA fragment
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Proof of concept: Gal4 transcription factor
¤ Identification of sites bound by the transcriptional
activator Gal4 in the yeast genome and genes induced
by galactose
– Gal4 activates genes necessary for galactose metabolism
• The best characterized transcription factor in yeast
– 10 genes were bound by Gal4 and induced in galactose
• 7 genes in the Gal pathway, previously reported to be
regulated by Gal4
• 3 novel genes: MTH1, PCL10, and FUR4
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Genome-wide location of Gal4 protein
Genes whose promoter regions are bound by Gal4 and whose expression levels
were induced at least twofold by galactose
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Role of Gal4 in Galactose-dependent Cellular Regulation
The identification of MTH1, PCL10, and FUR4 as Gal4-regulated genes explains
how regulation of several different metabolic pathways can be coordinated
increases
intracellular pools of
uracil
Fur4
Pcl10
MTH
1
reduces levels of
glucose transporter
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Conclusions
¤ The genes whose expression is controlled directly by
transcriptional activators in vivo
– Are identified by a combination of genome-wide location and
expression analysis
¤ Genome-wide location analysis provides information
– On the binding sites at which proteins reside in the genome
under in vivo conditions
Genomic Binding Sites of the Yeast Cell-cycle
Transcription Factors SBF and MBF
Iyer et al., Nature 409: 533 (2001)
¤ Paper presents
– The use of CHIP and DNA microarrays to define the genomic binding sites
of the SBF and MBF transcription factors in vivo
– The SBF and MBF transcription factors are active in the initiation of the
cell division cycle (G1/S) in yeast
• A few target genes of SBF and MBF are known but the precise roles of
these two transcription factors are unknown
• The two transcription factors are heterodimers containing the same
Swi6 subunit and a DNA binding subunit
– MBF is a heterodimer of Mbp1 and Swi6
– SBF is a heterodimer of Swi4 and Swi6
Genomic targets of SBF and MBF
Reprinted from: Iyer et al., Nature 409: 533 (2001)
In Vivo Targets of SBF and MBF
¤ The CHIP experiments identified
– 163 possible targets of SBF
– 87 possible targets of MBF
– 43 possible targets of both factors
¤ Support for the possible in vivo targets
– Most of the genes downstream of the putative binding sites peak
in G1/S
– Target genes are highly enriched for functions related to DNA
replication, budding and the cell cycle
– In vivo binding sites are highly enriched for sequences matching
the defined consensus binding sites
Reprinted from: Iyer et al., Nature 409: 533 (2001)
Transcriptome data for synchronized cell cultures
Expression Profiles
of SBF and MBF
Targets
Reprinted from: Iyer et al.,
Nature 409: 533 (2001)
Expression Profiles of SBF and MBF Targets
¤ Why are two different transcription factors used to
mediate identical transcriptional programmes during
the cell-division cycle in yeast?
– A possible answer is suggested by differences in the functions
of the genes that they regulate
• Many of the targets of SBF have roles in cell-wall biogenesis and
budding
• 25% of the MBF target genes have known roles in DNA replication,
recombination and repair
– The results support a model in which
• SBF is the principal controller of membrane and cell-wall formation
• MBF primarily controls DNA replication
¤ The need for DNA replication and membrane / cellwall biogenesis may be different in the mitotic and
meiotic cell cycle
Reprinted from: Iyer et al., Nature 409: 533 (2001)
A high-resolution map of active promoters in the
human genome
Kim et. al., Nature 436: 876-880 (2005)
¤ Paper presents
– a genome-wide map of active promoters in human fibroblast cells
• determined by experimentally locating the sites of RNA polymerase
II preinitiation complex (PIC) binding
• map defines 10,567 active promoters corresponding to
– 6,763 known genes
– >1,196 un-annotated transcriptional units
– Global view of functional relationships in human cells between
• transcriptional machinery
• chromatin structure
• gene expression
Identification of active promoters in the human genome
¤ Microarrays cover
– All non-repeat DNA at 100 bp
resolution
¤ Pol II preinitiation complex
(PIC)
– RNA polymerase II
– transcription factor IID
– general transcription factors
¤ ChIP of PIC-bound DNA
– monoclonal antibody against
TAF1 subunit of the complex
(TBP associated factor 1 )
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Results from TFIID ChIP-on-chip analysis
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Characterization of active promoters
¤ Matched the 12,150 TFIID-binding sites to
– the 5' end of known transcripts in transcript databases
– 87% of the PIC-binding sites were within 2.5 kb of annotated 5' ends of
known messenger RNAs
¤ 8,960 promoters were mapped
– within annotated boundaries of 6,763 known genes in the EnsEMBL genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
The chromatin-modification features of the
active promoters
¤ Validation of active
promoters
– ChIP-on-chip using an antiRNAP antibody
– ChIP-on-chip analysis using
• anti-acetylated histone H3
(AcH3) antibodies
• anti-dimethylated lysine 4 on
histone H3 (MeH3K4)
antibodies
• known epigenetic markers of
active genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
TFIID, RNAP, AcH3 and MeH3K4 profiles on
the promoter of RPS24 gene
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Additional findings
¤ Promoters of non-coding transcripts
– Are very similar to promoters of protein coding genes
¤ Promoters of novel genes
– Estimate 13% of human genes remain to be annotated in the genome
¤ Clustering of active promoters
– co-regulated genes tend to be organized into coordinately regulated
domains
¤ Genes using multiple promoters
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Multiple promoters in human genes
¤ WEE1 gene locus
– Two different transcripts with alternative 5’ends
• Encoding different proteins
– Two different TFIID-binding sites- two promoters
– Differential transcription during the cell cycle
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
The transcriptome of a cell line
¤
Functional relationship between transcription machinery and gene
expression
–
¤
correlated genome-wide expression profiles with PIC promoter
occupancy
Four general classes of promoters
I.
II.
III.
IV.
Actively transcribed genes
Weakly expressed genes
Weakly PIC bound genes
Inactive genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Genome-Wide Distribution of ORC and MCM Proteins in
yeast: High-Resolution Mapping of Replication Origins
Wyrick et. al., Science, 294, 2357 (2001)
¤ Paper presents
– Genome-wide location analysis to map the DNA replication origins in the
16 yeast chromosomes by determining the binding sites of prereplicative
complex proteins
Chromosome Replication In Eukaryotic Cells
¤ Chromosome replication
– initiates from origins of replication distributed along
chromosomes
– Origins of replication comprise autonomously replicating
sequences (ARS)
• ARS contain an 11-bp ARS consensus sequence (ACS)
– Essential for replication initiation
– Recognized by the Origin Recognition Complex (ORC)
• The majority of sequence matches to the ACS in the genome do not
have ARS activity
¤ Prereplicative complexes at replication origins comprise
– Origin Recognition Complex (ORC) proteins
– Minichromosome Maintenance (MCM) proteins
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Prereplicative Complexes At Origins Of
Replication
Reprinted from: Stillman, Science, 294, 2301(2001)
ORC- and MCM-binding sites compared with known ARSs
¤ High degree of correlation
between MCM and ORC
binding sites and known ARSs
– Correct identification of 88%
known ARSs
¤ The method can accurately
identify the position of ARSs to
a resolution of 1 kb or less
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Genome-wide Location
Of Potential Replication
Origins
Identification of 429
potential origins on the
entire genome
Reprinted from: Wyrick et.
al., Science, 294, 2357 (2001)
Conclusions
¤ The ChIP-based method identified the majority of
origins found in the analysis of genome-wide
replication timing in yeast
– and provides direct, high-resolution mapping of potential origins
¤ Similar approaches identified origins in other
organisms
– For example: Coordination of replication and transcription along
a Drosophila chromosome
• MacAlpine et al., Genes & Dev. 18: 3094-3105 (2004)
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Global analysis of protein localization in budding
yeast
Huh et. al., Nature 425, 686 - 691 (2004)
¤ Paper presents
– An approach to define the organization of proteins in the context of
cellular compartments involving
– the construction and analysis of a collection of yeast strains expressing
full-length, chromosomally tagged green fluorescent protein fusion
proteins
Experimental Strategy
¤ Systematic tagging of yeast ORFs with green
fluorescent protein (GFP)
– GFP is fused to the carboxy terminus of each ORF
– Full length fusion proteins are expressed from their native
promoters and chromosomal location
¤ The collection of yeast strains expressing GFP fusions
was analyzed by
– fluorescence microscopy to determine the primary subcellular
localization of the fusion proteins
• Defines 12 categories
– co-localization with red fluorescent protein (RFP) markers to
refine the subcellular localization
• Defines 11 additional categories
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Construction of GFP fusion proteins
¤ For each ORF a pair of PCR primers was designed
– Homologous to the chromosomal insertion site
– Matching a GFP – selectable marker construct
¤ Yeast was transformed with the PCR products to generate
– Strains expressing chromosomally tagged ORFs
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Representative GFP Images
Nucleus
Nuclear periphery
Bud neck
mitochondrion
ER
Lipid particle
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
GFP and RFP Co-localization Images
Nucleolar marker
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Global results
22 categories
¤ Constructed ~6.000 ORF-GFP
fusions
– 4.156 had localizable GFP signals
(~75% of the yeast proteome)
– Good concordance with data from
earlier studies
• GFP does not affect the location
• Localized 70% of the new proteins
– Major compartments: cytoplasm
(30%) and the nucleus (25%)
– 20 other compartments: 44% of the
proteins
¤ Most the proteins can be located in
discrete cellular compartments
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
The proteome of the nucleolus
¤ Detected 164 proteins
in the nucleolus
– Plus 45 identified in other
studies
¤ Data are consistent with
MS analysis of human
Nucleolar proteins
– Allows identification of
yeast-human orthologs
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Transcriptional co-regulation and subcellular
localization are correlated
subcellular localization
33 transcription modules
Co-regulated genes
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Conclusion
¤ The high-resolution, high-coverage localization data
set
– represents 75% of the yeast proteome
• classified into 22 distinct subcellular localization categories,
¤ Analysis of these proteins
– in the context of transcriptional, genetic, and protein–protein
interaction data
• provides a comprehensive view of interactions within and between
organelles in eukaryotic cells.
• helps reveal the logic of transcriptional co-regulation
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Recommended reading
¤ DNA-interactome
– Genome-Wide Location of DNA Binding Proteins
• Ren et. al., Science, 290, 2306 (2000)
– Map of active promoters in the human genome
• Kim et. al., Nature 436: 876-880 (2005)
¤ Global analysis of protein localization in yeast
• Huh et. al., Nature 425, 686 - 691 (2004)
Further reading
¤ Genome-Wide Location of DNA Binding Proteins
– Genomic Binding Sites of the Yeast Cell-cycle Transcription
Factors SBF and MBF
• Iyer et al., Nature 409: 533 (2001)
– High-Resolution Mapping of Replication Origins
• Wyrick et. al., Science, 294, 2357 (2001)