Exploiting transcription factor binding site clustering to

Download Report

Transcript Exploiting transcription factor binding site clustering to

Exploiting transcription factor
binding site clustering to identify
cis-regulatory modules involved
in pattern formation in the
Drosophila genome
ECS289A Presentation
By Hua Chen
2003-3-3
Background Knowledge



A significant character of cis-regulatory sites: the
multiple binding sites for different transcriptional factors
tend to cluster together in one region around the gene,
forming the Cis-Regulatory Modules (CRM).
The searching of cis-regulatory sites gives out too many
candidate positions, which make it difficult to tell the true
ones;
The character of CRM provides a feasible method to
identify the cis-regulatory sites in the genome.
One example of CRM in Drosophila:
eve gene
The System
Investigated:
Targets:



Adopt the clustering of
cis-regulatory modules as
a method to identify the
functional motifs;
Test the method with
some known real CRM
regions;
Search the genome to
discover CRMs and
confirm the results by
experiments.


The early Drosophila
embryo.
Five transcriptional
factors: Bcd, Cad, Hb,
Kr and Kni are
investigated.
Methods:






Collecting Transcription Factor Binding Sequences in preceding
lab works and doing Alignment;
Construction of Position Weight Matrices (PWM) for the
conserved motifs.
Test the method with the known CRMs;
Genome-wide Searching for unknown regulatory regions;
mRNA Hybridization and Microarray hybridization to test
whether the predicted regions are near to genes under regulation
of the Transcription Factors;
One special case: giant gene, further investigated by Transgenics
and Mutant Embryo.
Step1: Collection and Alignment of
TF Binding Sites


Bcd, Cad, Hb, Kr, Kni binding sequences are
determined by in vitro DNAse protection
assays;
The sequences are aligned with MEME.
Step 2: Construction of PWMs and
Searching:


Patser is used to construct the Position Weight Matrix;
Cis-Analyst is used to identify the potential binding
sites matching to the PWM in the Drosophila genome.




A user-defined cutoff parameter (site_p) to eliminate
predicted low-affinity sites;
Search the sequence with a specified window length;
Retain the windows that contain at least min_sites binding
sites;
Merge all overlapping windows into a “cluster”.
Binding Site Sequence for Cad:
Binding Sites:
Step 3: Collection of Known CRMs:
Successful
Result: 14/19
with the
searching criteria:
window-size=700
bp, number of
predicted
sites>=13
Step 4: Genome-wide Searching:




28 clusters identified;
23 out of 28 fall in regions between genes;
5 in the intron regions;
49 genes in the nearby regions.
Step 5: Examine the expression pattern of the 49
genes by RNA in situ hybridization and microarray
hybridization:


The 49 genes are examined
by hybridizations to see
whether they show the
pattern of under regulation
of the TFs;
10 out of the 28 clusters are
near to at least one gene
show the anterior-posterior
expression pattern (Under
regulation of the five TFs).
Step 6: The special case: giant gene




The posterior expression is
regulated by Cad,Hb,Kr;
The cis-regulatory sites are
still unknown;
The predicted CRM nearest
to the giant gene is cloned to
the upstream of lacZ
reporter gene.
The lacZ gene show a
similar expression pattern as
the giant mRNA.

+/+
Kr/Kr
Conclusions:



Binding site clustering is an effective method to
identify cis-regulatory modules;
A major block is the paucity of the binding data
for most transcription factors, which need a
systematical work;
The real CRM structures is more complex, it
needs to incorporate more complex rules in the
method.
Reference

Berman, B.P., Nibu, Y. et al. 2001. Exploiting
transcription factor binding site clustering to
identify cis-regulatory modules involved in
pattern formation in the Drosophila genome.
P. N. A. S. 99:757-762