Transcript Slide
Origins and impact of constraints in evolution of gene families
Boris E. Shakhnovich and Eugene V.Koonin
Genome Research 2006, October 19
Stella Veretnik
Journal Club
November 14, 2006
Essential genes and
their families:
diverge more
slowly than nonessential genes
diverge to a
greater extent
than non-essential
genes
Why this happens?
What parameters are
responsible? unanswered
paralogous families with essential genes: E-families
evolution through paralogy
tolerance to mutations -> extent of evolution within the family
paralogous families without essential genes: N-families
Essential genes definition: Genes that when mutated can result in a lethal phenotype.
Type of selection acting on evolving genes: purifying selection.
What is purifying selection?
The ratio Ka/Ks <1
Ka is the number of nonsynonymous mutations per site
Ks is the number of the synonymous mutation per site
fraction of essential genes
that are not singletons
3.5%
1.9
1.3
13.7
ratio of non-essential to
essential genes in E-families
9.2%
18.4%
Most of essential genes do not have paralogs - Why?
Is there something special about those which do have paralogs?
No answer in this paper…
How can a gene have paralogs and still be essential? - All the paralogs together cannot replace
all the function of the essential gene.
Once this happens, the gene becomes non-essential.
Divergence and diffusion graph.
Edges represent homology relationships
Significantly fewer edges between paralogs in E-families
How were the families assembled?
Construction of paralogous families.
Each ORF is a node on a graph.
1.
Do all-vs.-all Blast comparison of sequences of all translated ORFs within organis
2. Measure amino acid identity level between nodes
3. Translate amino acids to nucleotides and calculate Ks (synonymous substitution per site) and Ka (nonsynonymous
substitutions)
The result is 3 weighted graphs (as defined by 1, 2, and 3).
A paralogous family consist of strongly connected components of the graph.
A cutoff of Ks=5 and E-value 1e-15 are used in this work.
In general there is a near-linear dependency of cutoff on Ks.
Do non-essential
members always evolve
from essential
memebers of the family?
Largest families
What is a typical size of E-family and of N-family?
Are N-families typically larger? Are there more N-families than E-families? Both?
Can a duplicate of nonessential paralog
become essential?
How paralogous families evolve:
After duplication and divergence the following may happen:
A more typical scenario
for N-families
a. Nonfunctionalization: a duplicate turns into pseudogene
b. subfnuctionalization: multiple functions of the ancestral gene are divided between the paralogs
c. neofuntionalization: one of the paralogs evolves a new function, the other keeps the old function(s)
More common
for E-families
Purifying selection is stronger in E-familes (about 2 times) – Ka/Ks ratio is lower in E-families
Implication: N-families diverge faster…
How this is done:
1. For single feature polymorphism (SFP): check within Saccharomyces cerevisiae
2. For Ka/Ks ratio compare orthologs between closely related species (S.cerevisiae/S.paradoxus – yeast;
E.coli K12/CFT073 orthologs )
Rate of conversion to peudogene is substantially higher in N-families
6.8 fold difference
Paralogs get fixated more often in N-families (explains the larger size of N-families?)
Equal rate of duplication in E-families and in N-families is assumed.
What happens to the paralogs that do
not go to fixation?
Do they become pseudogenes,
something else?
Ks is higher in E-families, than in F-families
Implication: paralogs in E-families stick around for a longer time, than in Nfamilies (3 times longer)
Sequence divergence is higher in E-families
nonsynonomous substitutions
among paralogs within the
family
sequence identity among paralogs
within the family
It is possible to identify E- and N-families using only sequence divergence information.
ROC plot
(true positives)
Clustering coefficient measures now
well connected are the neighbors of a
given node in a graph.
(true negatives)
Transcriptional regulation of paralogs changes more in E-families: paralogs rarely share
trancriptional factors
ChIP-cip experiments
Summary:
Two types of paralogous families exist: E-families and N-families
Two type of families have dramatically different dynamics of molecular evolution:
E-families diverge slowly, but persist for a long periods of time, thus diverging further than the paralogs in
N-families
N-families undergoes a more dynamic evolution: many duplicate get fixated, many other become
pseudogenes. Level of sequence divergence is significantly lower.
Duplicate in E-families typically assume part of the functions from the original gene and/or evolve a
new function.
This is less so with duplicates in N-families (no data shown for this…)
My musings:
In a minimalistic organism every gene would be an essential gene.
The gene becomes non-essential when its functions are assumed by other gene or split between several genes.
Every non-essential gene will go through the stage of being in an E-family in which one there is one essential gene.
N-families gradually evolve from E-families, when the essential gene(s) in the family is not essential any longer.
This happens when sufficient number of duplicates exist to assure that all function of the original essential gene
are covered.
In this scenario, the E-families are the transition link between essential genes on their way to
become non-essential.
(You could argue that more robust organism has less essential genes…)
Essential genes
(singleton)
careful evolution
Transition to non-essentiality
(E-families)
very careful creeping forward
Non-essential genes
(N-families)
careless evolution
Different selection pressures in each category? – Yes.
But… how does the behavior of the family changes once it crosses from E-family to N-family?