An analysis of “Alignments anchored on genomic landmarks can aid

Download Report

Transcript An analysis of “Alignments anchored on genomic landmarks can aid

An analysis of
“Alignments anchored on genomic
landmarks can aid in the identification
of regulatory elements”
by Kannan Tharakaraman et al.
Sarah Aerni
July 8, 2005
Gene Regulation

Transcription factors
–
Cis-acting elements

–
Gene expression is regulated by gene itself (gene acts
upon itself)
Trans-acting elements

Gene expression is regulated by other genes (gene
inhibits another)
Gene Regulation
US Department of Energy Office of Science
Motifs

Binding sites
–
–

Transcription factors
Zinc Finger
Hard to identify
–
–
–
Relatively short sequences
Some indices well conserved
Usually localized in certain
proximity of the gene
Techniques to Identify Regulatory
Elements

Enumerative Methods
–
–
–

Align sequences, usually
use orthologous genes
Depends on local
alignments
Cannot be too similar or
too distant

Alignment Methods
–
–
Create w-mers and find
over-represented motifs
Frequency may be
misconstrued due to
repeats
Tharakaraman Technique
–
–
Combine both methods
Include word placement with frequency – is the location of
Cis-Regulatory regions correlated?
Initial Steps

Mask repeats
–
–

Avoid identifying repeats as motifs
Maintain one position for possible
motifs
Align Transcription Start Site
(TSS)
–
–
Depend on proximity to TSS
Allow for slight shifts – look for
clusters
Define Significance

Alignment scores
–
–
–
Assign significance using
gap penalties from Mock
Set
Jittering – watch for
overrepresented
octonucleotides
ρ = 5 determined to be
significant without
jittering
TRANSFAC


Database of Eukaryotic Transcriptional Regulatory
Elements
Comparison of TRANSFAC octonucleotides to those
identified by paper’s technique
GLAM


Sequence input
Every sequence arbitrary position and window size
chosen
–
–

Gapless multiple alignment in window sequences
Uses probability to determine whether windows are
repositioned or resized (Gibbs Sampling)
“seed” constraints
–
–
OOPS (1 occurrence per sequence)
ZOOPS(0 or 1 occurrence per sequence)
Alignment Techniques


Different techniques
show different results
A-GLAM determined to
be best
–
–
Compare to TRANSFAC
AlignACE cannot
function computationally
at genomic scale
Distance to TSS



Cis-acting element locations determined by blocks
Largest number close to 0 (TSS)
Identified element correlated with TRANSFAC
Further Discussion

Discussion is limited to method results
–
–

Little information given on whether location is truly
correlated
No Biological discussion
Proximity of TSS and Cis-Acting binding sites
–
–
Narrow search range to a smaller field
Use in identification of types of element?