Exploiting transcription factor binding site clustering to
Download
Report
Transcript Exploiting transcription factor binding site clustering to
Exploiting transcription factor
binding site clustering to identify
cis-regulatory modules involved
in pattern formation in the
Drosophila genome
ECS289A Presentation
By Hua Chen
2003-3-3
Background Knowledge
A significant character of cis-regulatory sites: the
multiple binding sites for different transcriptional factors
tend to cluster together in one region around the gene,
forming the Cis-Regulatory Modules (CRM).
The searching of cis-regulatory sites gives out too many
candidate positions, which make it difficult to tell the true
ones;
The character of CRM provides a feasible method to
identify the cis-regulatory sites in the genome.
One example of CRM in Drosophila:
eve gene
The System
Investigated:
Targets:
Adopt the clustering of
cis-regulatory modules as
a method to identify the
functional motifs;
Test the method with
some known real CRM
regions;
Search the genome to
discover CRMs and
confirm the results by
experiments.
The early Drosophila
embryo.
Five transcriptional
factors: Bcd, Cad, Hb,
Kr and Kni are
investigated.
Methods:
Collecting Transcription Factor Binding Sequences in preceding
lab works and doing Alignment;
Construction of Position Weight Matrices (PWM) for the
conserved motifs.
Test the method with the known CRMs;
Genome-wide Searching for unknown regulatory regions;
mRNA Hybridization and Microarray hybridization to test
whether the predicted regions are near to genes under regulation
of the Transcription Factors;
One special case: giant gene, further investigated by Transgenics
and Mutant Embryo.
Step1: Collection and Alignment of
TF Binding Sites
Bcd, Cad, Hb, Kr, Kni binding sequences are
determined by in vitro DNAse protection
assays;
The sequences are aligned with MEME.
Step 2: Construction of PWMs and
Searching:
Patser is used to construct the Position Weight Matrix;
Cis-Analyst is used to identify the potential binding
sites matching to the PWM in the Drosophila genome.
A user-defined cutoff parameter (site_p) to eliminate
predicted low-affinity sites;
Search the sequence with a specified window length;
Retain the windows that contain at least min_sites binding
sites;
Merge all overlapping windows into a “cluster”.
Binding Site Sequence for Cad:
Binding Sites:
Step 3: Collection of Known CRMs:
Successful
Result: 14/19
with the
searching criteria:
window-size=700
bp, number of
predicted
sites>=13
Step 4: Genome-wide Searching:
28 clusters identified;
23 out of 28 fall in regions between genes;
5 in the intron regions;
49 genes in the nearby regions.
Step 5: Examine the expression pattern of the 49
genes by RNA in situ hybridization and microarray
hybridization:
The 49 genes are examined
by hybridizations to see
whether they show the
pattern of under regulation
of the TFs;
10 out of the 28 clusters are
near to at least one gene
show the anterior-posterior
expression pattern (Under
regulation of the five TFs).
Step 6: The special case: giant gene
The posterior expression is
regulated by Cad,Hb,Kr;
The cis-regulatory sites are
still unknown;
The predicted CRM nearest
to the giant gene is cloned to
the upstream of lacZ
reporter gene.
The lacZ gene show a
similar expression pattern as
the giant mRNA.
+/+
Kr/Kr
Conclusions:
Binding site clustering is an effective method to
identify cis-regulatory modules;
A major block is the paucity of the binding data
for most transcription factors, which need a
systematical work;
The real CRM structures is more complex, it
needs to incorporate more complex rules in the
method.
Reference
Berman, B.P., Nibu, Y. et al. 2001. Exploiting
transcription factor binding site clustering to
identify cis-regulatory modules involved in
pattern formation in the Drosophila genome.
P. N. A. S. 99:757-762