Presentation

Download Report

Transcript Presentation

Identification of Transcription
Factor Binding Sites
Lior Harpaz
Ofer Shany
09/05/2004
1
Goal - find TFBS !
input
output
2
Importance

TF regulate gene expression.

Identification of TF can teach us:

Mapping of regulatory pathways

Potential functions of genes
3
Experimental Methods


Footprinting
EMSA - electrophoretic mobility shift
assay
Problems:
•Time consuming
•Not scaled up to whole genomes
4
Computational Methods Goals


Identifying known TFBSs in previously
unknown locations.
Identifying unknown TFBSs.
5
Computational Methods

Basic idea - locate TFBS using sequencesearching
Problems:
•Short sequences (5-15 bp)
•Degenerate sequences
•Location
•Biological reality
6
Computational Methods
Possible solutions:
Conservation = functional importance

mRNA expression pattern

Phylogenetic footprinting

Network-level conservation
7
Phylogenetic footprinting



Identify ortholog genes
Concentrate on conserved non-coding
regions (possible regulatory regions)
Look for conserved motifs.
8
Why should it work ?



40% alignment between human and
mice genome
80% of mouse genes have orthologs in
human genome
Only 1%-5% of human genome
encodes proteins.
9
Things to consider…

Choosing genomes.

Locating transcriptional start site.

Alignment method.
=
?
10
More things to consider…



Different evolution rates for different
regions in the genome.
PSSM score cut-off
Note - TFBSs within ORFs are not
detected.
11
Phylogentetic footprinting in
proteobacterial genomes



Study set of 190 genes of E.Coly with
known TBFSs.
Orthologs were searched in eight other
bacteria.
Motif search by Bayesian Gibbs
sampling.
12
Bayesian Gibbs sampling


Algorithm for motif search.
Each motif is assigned with a MAP value.
13
Bayesian Gibbs sampling

Parameters and extensions:




Model sequence
Palindromic patterns
Background pattern
Distribution of spacing between TFBSs and
translation start site
14
Results



Overall – in 146/184 sets, motives matched
known regulatory sequences.
In 18 genes (with 1 ortholog) only 67% known
sites were matched, and with low MAP value.
In 166 sets (with >=2 orthologs) – 81% of
motives matched known regulatory sequences.
15
Results

Out of the 166 sets (with >= 2 orthologs):

131 corresponded to known TFBSs.

3 corresponded to known stem & loop structures.


32 data sets contained predictions with large MAP
value: could be undocumentd sites !
Documented site were found in 138 sites without
using palindromic models.
16
Identification of a new TF

New site found near fabA, fabB & yqfA

YijC binds to these sites.


Site location, protein structure & previous
experimental results suggests YijC is a
repressor for the fab genes.
Indication of yqfA’s involvement in
metabolism of fatty-acids.
17
Genomic scale phylogenetic
footprinting




2113 ORFs of E.coli used.
187 new sites identified as probable
sites for 46 known TFs.
Remaining sites are expected to
represent unknown TFBSs
MAP Values of predicted sites were
lower.
18
MAP values left-shift
19
Study set
Ortholog Distribution
Full set
20
Conclusions

New sites for known TF were found.

Conservation of Regulatory stem-loops.

New sites for unknown TF are
predicted.

New TF identified (YijC).

Predicted gene function (yqfA).
21
‫הפסקה‬
‫‪22‬‬
Network level conservation

Each TF regulates the expression of
many genes (20-400).

Conservation of global gene expression
requires the conservation of regulatory
mechanisms.
23
24
Data analysis




Total motifs: 80,000
P-value filter: 12,000
Low-complexity filter:
7,673
Hierarchically
clustering: 1,269
25
Validation


34/48 known sites
discovered.
Large fraction of
matches for
significant p-values.
26
Identification of known binding sites
27
Biological Significance

Functional coherence

Expression coherence
28
Characteristic Features

Conservation of binding affinity

Conservation of position & orientation
29
References

Bulyk, M. Computational prediction of transcription-factor
binding site locations. Genome Biol. 2003 5:201

McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire
V, Lawrence CE. Phylogenetic footprinting of transcription factor
binding sites in proteobacterial genomes. Nucleic Acids Res.
2001 29:774-782.

Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome
discovery transcription factor binding sites by network-level
conservation. Genome Res. 2004 14:99-108
30
Sensitivity Vs. Specificity
TP
sensitivit y 
TPFN
TP
specificit y 
TP  FP
31