Presentation
Download
Report
Transcript Presentation
Identification of Transcription
Factor Binding Sites
Lior Harpaz
Ofer Shany
09/05/2004
1
Goal - find TFBS !
input
output
2
Importance
TF regulate gene expression.
Identification of TF can teach us:
Mapping of regulatory pathways
Potential functions of genes
3
Experimental Methods
Footprinting
EMSA - electrophoretic mobility shift
assay
Problems:
•Time consuming
•Not scaled up to whole genomes
4
Computational Methods Goals
Identifying known TFBSs in previously
unknown locations.
Identifying unknown TFBSs.
5
Computational Methods
Basic idea - locate TFBS using sequencesearching
Problems:
•Short sequences (5-15 bp)
•Degenerate sequences
•Location
•Biological reality
6
Computational Methods
Possible solutions:
Conservation = functional importance
mRNA expression pattern
Phylogenetic footprinting
Network-level conservation
7
Phylogenetic footprinting
Identify ortholog genes
Concentrate on conserved non-coding
regions (possible regulatory regions)
Look for conserved motifs.
8
Why should it work ?
40% alignment between human and
mice genome
80% of mouse genes have orthologs in
human genome
Only 1%-5% of human genome
encodes proteins.
9
Things to consider…
Choosing genomes.
Locating transcriptional start site.
Alignment method.
=
?
10
More things to consider…
Different evolution rates for different
regions in the genome.
PSSM score cut-off
Note - TFBSs within ORFs are not
detected.
11
Phylogentetic footprinting in
proteobacterial genomes
Study set of 190 genes of E.Coly with
known TBFSs.
Orthologs were searched in eight other
bacteria.
Motif search by Bayesian Gibbs
sampling.
12
Bayesian Gibbs sampling
Algorithm for motif search.
Each motif is assigned with a MAP value.
13
Bayesian Gibbs sampling
Parameters and extensions:
Model sequence
Palindromic patterns
Background pattern
Distribution of spacing between TFBSs and
translation start site
14
Results
Overall – in 146/184 sets, motives matched
known regulatory sequences.
In 18 genes (with 1 ortholog) only 67% known
sites were matched, and with low MAP value.
In 166 sets (with >=2 orthologs) – 81% of
motives matched known regulatory sequences.
15
Results
Out of the 166 sets (with >= 2 orthologs):
131 corresponded to known TFBSs.
3 corresponded to known stem & loop structures.
32 data sets contained predictions with large MAP
value: could be undocumentd sites !
Documented site were found in 138 sites without
using palindromic models.
16
Identification of a new TF
New site found near fabA, fabB & yqfA
YijC binds to these sites.
Site location, protein structure & previous
experimental results suggests YijC is a
repressor for the fab genes.
Indication of yqfA’s involvement in
metabolism of fatty-acids.
17
Genomic scale phylogenetic
footprinting
2113 ORFs of E.coli used.
187 new sites identified as probable
sites for 46 known TFs.
Remaining sites are expected to
represent unknown TFBSs
MAP Values of predicted sites were
lower.
18
MAP values left-shift
19
Study set
Ortholog Distribution
Full set
20
Conclusions
New sites for known TF were found.
Conservation of Regulatory stem-loops.
New sites for unknown TF are
predicted.
New TF identified (YijC).
Predicted gene function (yqfA).
21
הפסקה
22
Network level conservation
Each TF regulates the expression of
many genes (20-400).
Conservation of global gene expression
requires the conservation of regulatory
mechanisms.
23
24
Data analysis
Total motifs: 80,000
P-value filter: 12,000
Low-complexity filter:
7,673
Hierarchically
clustering: 1,269
25
Validation
34/48 known sites
discovered.
Large fraction of
matches for
significant p-values.
26
Identification of known binding sites
27
Biological Significance
Functional coherence
Expression coherence
28
Characteristic Features
Conservation of binding affinity
Conservation of position & orientation
29
References
Bulyk, M. Computational prediction of transcription-factor
binding site locations. Genome Biol. 2003 5:201
McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire
V, Lawrence CE. Phylogenetic footprinting of transcription factor
binding sites in proteobacterial genomes. Nucleic Acids Res.
2001 29:774-782.
Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome
discovery transcription factor binding sites by network-level
conservation. Genome Res. 2004 14:99-108
30
Sensitivity Vs. Specificity
TP
sensitivit y
TPFN
TP
specificit y
TP FP
31