Identification of Transcription Factor Binding Sites

Download Report

Transcript Identification of Transcription Factor Binding Sites

Identification of Transcription
Factor Binding Sites
Presenting:
Mira & Tali
March 03
Goal
AGCCA
AGCCA
AGCCA
AGCCA
Regulatory
regions
AGCCA
Motif –
AGCCA
Binding site???
Why Bother?
UNDERSTAND
Gene
expression
regulation
Co-regulation
Difficulties

Multiple factors for a single gene

Variability in binding sites




The nature of variability is NOT well understood
Usually Transitions
Insertions and deletions are uncommon
Location, location, location…
Experimental methods


EMSA – Electrophoretic mobility shift
assay
Nuclease protection assay
NOT ENOUGH!!!!!
So, what can we do?

Find conserved sequences in regulation
regions
1. Define what you want to find
2. Define what is a good result
3. Decide how to find it…
Principal Methods:

Global optimum

Enumerative methods
Going over ALL possibilities
Taking the best one
Advantage :
Disadvantage :
Certainty
Limited to small search
spaces
Principal Methods:

Local optimum

Gibbs sampling, AlignACE
Start somewhere (arbitrary)
Next step direction – proportional to what
we “gain” from it
We can get anywhere with some probability
Advantage :
Disadvantage :
Basically good
results, faster
You can never know…
Articles Overview

Identifying motifs



Expression patterns
Phylogenetic footprinting
Identifying networks


Common motifs in expression clusters
Combinatorial analysis
Discovery of novel trancription factor
binding sites by statistical
overrepresentation
S. Sinha, M. Tompa
Goal:
Identify binding sites
in yeast
Use sets of coregulated genes
Enumeration
YMF algorithm
Identify overrepresented
upstream sequences
What constitutes a motif?
(tailored for S.cerevisiae)

In S.cerevisiae typically 6-10 conserved
bases – The motif

Spacers varying in length (1-11bp)

Usually located in the middle
ACCNNNNNNGTT
Taken from SCPD – S.cerevisiae promoter database
How do we measure motifs?

Z-score – Motif over-representation

Pmax(X) – Probability of Zscore >= X
YMF algorithm
Yeast Motif Finder
INPUT:
A set of
promoter
regions
Transition
Matrix
Motif length - l
• modest values
6
Maximum
number of
spacers
allowed - w
11
YMF algorithm
Post Processing:
FindExplanators:
artificial
overrepresentation
TCACGCT (motif)
CACGCTA (artifact)
Co-expression score
W-score
Experiments

Validate YMF results


Running YMF on regulons with known binding
sites (SCPD)
Run YMF on MIPS catalogs
(MIPS - Munich Information center for Protein Sequences)


Functional
Mutant phenotype
Validation
New binding sites
or false positives?
A novel site candidate
Further research


Validation of novel binding sites and
transcription factors
Modification of the algorithm to be
applicable for other organisms
Systematic determination of genetic
network architecture
Saeed Tavazoie, Jason D. Hughes, Michael J. Campbell, Raymond J. Cho,
George M. Church
Goal:
Identify co- regulated
networks of genes in
yeast
Cluster by
expression patterns
AlignACE
Identify upstream
sequence patterns
Aligns Nucleic Acid Conserved Elements
Clusters


Cluster – a group of genes with a similar
expression pattern
Cluster’s members


Tend to participate in common processes
Tend to be co-regulated
Clusters
10-54
Identifying motifs


Using AlignACE
18 motifs from
12 clusters were
found.
7 of the found
motifs were
identified
experimentally
And what about the
others????
Scanning for more binding sites


Once a significant motif was found the
whole genome was scanned for it
Most motifs were cluster specific
Why so few motifs?



Too stringent rules for defining a
“significant” motif
Post transcriptional regulation (mRNA
stability)
Some clusters represent “noise”
“Tightness”

“Tightness” of a cluster


how close are the cluster members of a
particular cluster to its mean
A strong correlation between the presence
of significant motifs and the “tightness” of
a cluster
Things to remember


Discovering regulons and motifs using
expression based clustering
Minimal biases


Validation as a methodology for new
organisms
Identifying expected cis-regulatory motif
EACH TIME!!
Identifying regulatory networks by
combinatorial analysis of promoter
elements
by Yitzhak Pilpel, Priya Sudarsanam & George M.Church
Goals:
Identify motif combinations
affecting expression patterns in yeast
Understand transcriptional
network
Basic definitions

Expression coherence score-

Synergistic motifs –
EC(a&b) > EC(a\b) , EC(b\a)
Methods:
A database of motifs
Gene sets
Calculating EC score
Significant synergistic combinations
Understanding the
effect of individual and
combination of motifs
Visualizing the
transcriptional network
GMC

GMC – Gene Motif Combination.
Motif numbers:
(m1, m2, m3, m4, m5) = (1,0,1,1,0)


Synergistic motif combinationEC(n motifs) > max(EC(n-1 motifs))
GMC – what is it good for?
Combinograms
Clustering
GMCs
Combinograms – what is it good
for?


They help visualizing the
“single motif - specific
expression pattern”
connection
They also show which
motif is more critical in
determining expression
pattern.
Motif synergy map
visualizing transcription networks
conclusion

The combinogram importance

The motif synergy map importance
Phylogenetic footprinting of
transcription factor binding sites in
proteobacterial genomes
Lee Ann McCue, William Thompson, C.Steven Carmack, Michael P.Ryan, Jun S.Liu,
Victoria Derbyshire and Charles E.Lawrence
Goals: Identifying novel TF
binding sites in E.coli
Finding
orthologs
Describing transcription
regulatory network
Identify upstream
sequence patterns
Local optimum
Gibbs sampling algorithm
Methods:
One E.coli gene
and orthologs
Data set
Gibbs sampling
algorithm
MAP score – a measure of
overrepresentation
of motif
Motif
Applying the method in a small
scale – Validation




Choosing 190 E.coli genes.
Creating 184 data sets.
Running Gibbs sampling algorithm.
More than 67% success in the prediction
for the most probable motif.
Motif Model
Identification of the YijC binding
sites


A strongly predicted site was upstream of
the fabA, fabB and yqfA genes.
Chromatography – identifying the factor.
Identifying the YijC binding sites
and predicting gene function


Mass spectrometry
identification – YijC
Predicting a function
for yqfA.
Applying the method genome wide

Choosing 2113 E.coli ORFs.

For 2097 a TF-binding site was predicted.
Map scores- ortholog
distribution
Study set
Full set
Adding binding sites for known TFs

Building a TF binding site model for known
TFs.

Scanning E.coli upstream regions.

187 new probable sites.
Building a regulatory network

Required steps:



Identifying motif models
Clustering the models
Problem:

Specifity
Conclusion

What have we gained so far?


A better prediction of gene function.
New possibilities for identification of TF
binding site and the TF which binds them!!!