Transcript Document
Phase Bias
in
genes
Overlappin
Niv Sabath
University of Houston
Introduction
• Overlapping genes are ubiquitous, particularly in
bacteria and viruses
• Genes can overlap
On the same strand (→ →)
On opposite strands (→ ← or ← →)
• In bacteria
~30% of the genes overlap
~70% of the overlaps are on the same strand
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
1
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
2
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
4
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
5
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
7
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
8
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
10
T G A
T A A
T A G
Long Overlaps
Short Overlaps
Same-Strand
Overlaps
Count
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
11
T G A
T A A
T A G
Long Overlaps
Short Overlaps
“…the phase bias must be a property of
gene locations.”
“We propose that through some
mechanism yet to be determined, the
creation of unidirectional gene overlaps
of phase +1 confers some advantage.”
Count
(Cock and Whitworth 2007)
Phase 2
Phase 1
Phase 0
5’
3’
Overlap Length
T A G C A T G T A T C A T G G
170 bacterial genomes
15298 long overlaps in phase 1
4153 long overlaps in phase 2
T A G C A T G T A T C A T G G
170 bacterial genomes
15298 long overlaps in phase 1
4153 long overlaps in phase 2
Composition?
T A G C A T G T A T C A T G G
Scenarios of overlap creation
5’
3’
5’
3’
5’
3’
T A G C A T G T A T C A T G G
Hypothesis
The phase bias in overlap frequency is a
result of difference between the
frequencies of initiation/termination codons
in phase 1 and phase 2 reading frames
T A G C A T G T A T C A T G G
We examined the frequencies of ATG, the most
common start codon, and stop codons in phase 1
and phase 2 reading frames
T A G C A T G T A T C A T G G
T A G C A T G T A T C A T G G
What causes the difference in ATG frequencies?
Met
Phase 2
Phase 1
Phase 0
Met
5’ T A G C A T G T A T C A T G G
3’
T A T
Tyr
T G T
Cys
C A T
His
T G C
Cys
A A T
Asn
T G A Stop
G A T
Asp
T G G
T A G C A T G T A T C A T G G
Trp
What causes the difference in ATG frequencies?
Relative abundance of amino acids
Phe
Val
Ala
Arg
Phase 1
Leu
Ser
Gln
Gly
Ile
Pro
Lys
Met
Thr
Glu
Phase 2
T A T
Tyr
T G T
Cys
C A T
His
T G C
Cys
A A T
Asn
T G A Stop
G A T
Asp
T G G
T A G C A T G T A T C A T G G
Trp
The frequencies of ATG are correlated
with the expected frequencies
f
0
NAT
f
f
0
NNA
f
0
GNN
0
TGN
r2 = 0.93
r2 = 0.80
T A G C A T G T A T C A T G G
Amino-acid
frequency
ATG
frequency
Phase bias
T A G C A T G T A T C A T G G
Acknowledgments
Dr. Dan Graur
Dr. Giddy Landan
NSF
© Marko Posavec
Why was the correlation between overlap
frequency and GC content unnoticed?
Long + short overlaps:
Phase 1 + phase 2
Phase 1
Phase 2 (64% ATGA)
Why was the correlation between overlap
frequency and GC content unnoticed?
Long + short overlaps:
Stop codon usage:
Phase 1 + phase 2
TGA
Phase 1
Phase 2 (64% ATGA)
TAA
TAG
What is the proportion (P) of overlaps, which
are created by each scenario after
accounting for ATG frequencies?
Rgenes
phase1 overlap frequency
phase 2 overlap frequency
RATG
phase1 ATG frequency
phase2 ATG frequency
Rstop
phase1 stop codon frequency
phase2 stop codon frequency
Rgenes
phase1 overlap frequency
phase 2 overlap frequency
RATG
phase1 ATG frequency
phase2 ATG frequency
Rstop
phase1 stop codon frequency
phase2 stop codon frequency
Rgenes PRATG 1 PRstop
Rgenes
phase1 overlap frequency
phase 2 overlap frequency
RATG
phase1 ATG frequency
phase2 ATG frequency
Rstop
phase1 stop codon frequency
phase2 stop codon frequency
Rgenes PRATG 1 PRstop
3.7
P6.4 1 P1
P 0.5
The expected value under no bias for overlaps in the two
scenarios