Transcript Slide 1

Journal club
06/27/08
Phylogenetic footprinting
• A technique used to identify TFBS within a noncoding region of DNA of interest by comparing it
to the orthologous sequences in different
species (1988 by Tagle)
• The function and DNA binding preferences of
transcription factors are well-conserved between
diverse species
• Important non-coding DNA sequences that are
essential for regulating gene expression will
show differential selective pressure. (A slower
rate of change occurs in TFBS than in other
parts of the non-coding genome)
Phylogenetic footprinting
Phylogenetic footprinting
• Not all conserved sequences are under
selection pressure
• To eliminate false positives, statistical
analysis must be performed that the motifs
reported have a mutation rate meaningfully
less than that of the surrounding
nonfunctional sequence.
Mixture selective pressure
• Maintain the function of the protein
encoded by the gene (AA selecvtive
pressure)
• Maintain their regulatory role (CRUNCS) –
ex: regulatory factors binding sites
Methods
•
1.
2.
•
•
•
Sequence conservation:
Entropy score
Parsimony score
Conservation p-value (mixture models)
Posterior distributions of conservation
scores
Conditional p-values
Parsimony V.S. Entropy Score
Fitch’s algorithm
1. Conditional model
How surprising?
Aligned codons illustrating to what extent the
conservation of each column is surprising, given the
amino acids encoded
L: CUN, UUA, UUG; W: UUG; V: GUN; A:GCN; G:GGN; D: GAU, GAC
1. Conditional model
2. Mixture model
50 functional classes
Hydrophobic favor
Glycine favor
2. Mixture model
Non-coding model: HKY model
 j  ( A, j , C, j , G, j , T , j )
 j represents the transition/transversion rate ratio for  j
How to compute :