lecture22_Duplicatio..

Download Report

Transcript lecture22_Duplicatio..

Percentage of genes
families with >5 genes are more
common in plants than in animals
100.0
90.0
80.0
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
Human
Yeast
Fruit fly
Nematode
Rice
Arabidopsis
1
2
3-5
>5
Number of genes per family
adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65
alternative splicing (AS) is more
common in animals than in plants
Arabidopsis
and rice AS
Boue S, et al. 2003. BioEssays 25: 1031-1034; Iida K, et al. 2004. Nucleic
Acids Res 32: 5096-5103; Kikuchi S, et al. 2003. Science 301: 376-379
duplications occur on any length scale,
from individual genes (where tandem
refers to a gene and its duplicate being
adjacent), to multi-gene segments of
the chromosome, to an entire genome
e.g. wild wheat is diploid 2n, domestication gave a tetraploid 4n (pasta) and a hexaploid 6n (bread)
synteny is when 2 or more genes are
found in the same order/orientation on
the chromosomes of related species
dicot
monocot
polyploidy (whole genome
duplication) events among plants
adapted from Blanc G, Wolfe KH. 2004. Plant Cell 16: 1667-1678;
Paterson AH, et al. 2004. Proc Natl Acad Sci USA 101: 9903-9908
phylogeny of the favored plants
there is extensive synteny among Gramineae but between
Gramineae and Arabidopsis there is essentially no synteny
Gramineae
55~70 Mya
sorghum
maize
barley
wheat
rice
Arabidopsis
monocot-dicot
170~235 Mya
the duplication history of rice
every cDNA-defined gene is assigned a duplication category
using the methods of Yu J, et al. 2005. PLoS Biol 3: e38
1.
analysis relies entirely on 19,079 full length cDNAs; had we
used predicted genes instead many of the duplications would
have been missed
2.
a homolog pair refers to a cDNA and its TblastN match (i.e.
comparisons done at amino acid level to genome translation
in all 6 reading frames) at an expectation value of 1E-7 and
requiring that >50% be aligned; note that the TblastN match
is not necessarily expressed itself
3.
if a gene has any homologs at all, the mean(median) number
of homologs is 40(5)
4.
multiple duplications are difficult to analyze; so consider the
cDNAs with 1-and-only-1 homolog
ONE whole genome duplication, a
recent segmental duplication, and
many individual gene duplications
whole
genome
birth
recent
segmental
individual
genes
death
time
18 pairs of duplicated segments
covering 65.7% of rice genome
higher order homologs used to backfill established trend lines
Rice-Rice Comparison
40
segmental
30
20
10
0
0
10
20
30
Rice Chr02 (Mb)
Rice
Chr01
Chr02
Chr03
Chr04
Chr05
Chr06
Chr07
Chr08
Chr09
Chr10
Chr11
Chr12
ancient whole genome duplication (WGD) in rice
uninterpretable plot if use cDNAs
with more than one homolog in rice
mean (median) number of homologs per duplicated gene is 40 (5)
Rice-Rice Comparison
40
30
20
10
0
0
10
20
30
Rice Chr02 (Mb)
Rice
Chr01
Chr02
Chr03
Chr04
Chr05
Chr06
Chr07
Chr08
Chr09
Chr10
Chr11
Chr12
unmarked trend along diagonal
from tandem gene duplications
there were NO segmental duplications within a chromosome
Rice-Rice Comparison
40
background
30
20
tandem
10
0
0
10
20
30
40
Rice Chr01 (Mb)
Rice
Chr01
Chr02
Chr03
Chr04
Chr05
Chr06
Chr07
Chr08
Chr09
Chr10
Chr11
Chr12
computing molecular clocks and
indicators of evolutionary selection
Ka = non-synonymous changes per available site
Ks = synonymous changes per available site
available site corrects for fact that 76% of substitutions,
or 438 of 3364, encode a different amino acid
Ka/Ks < 1 is evidence of purifying selection
Ka/Ks = 1 is evidence of no selection (pseudogene)
Ka/Ks > 1 is evidence of adaptive selection
mean Ka/Ks is 0.20 in primates and 0.14 in rodents
from neutral substitution rate to
time since divergence of species
Kumar S, Hedges SB.
1998. Nature 392: 917-920
common
ancestor
species1
species2
time since divergence equals
species2-species1 divided by
(2 × neutral substitution rate)
neutral substitution rates vary with genes and evolutionary lineages but
on average they are 2.2×10-9 for mammals and 6.5×10-9 for Gramineae
17 of 18 segments are attributable
to a whole genome duplication just
before the Gramineae divergence
90
Rice-Rice segmental duplication
higher order homologs
Ks from K-Estimator
400
Rice-Rice tandem duplication
two TblastN hits are allowed
Ks from K-Estimator
300
60
200
30
0
0
100
0.5
1
subs per silent site, Ks
1.5
0
0
0.2
0.4
subs per silent site, Ks
0.6
timing of WGD relative to Gramineae divergence is based on observed syntenies and not Ks
background duplications have Ks
signature like tandem duplications
except that they are more ancient
400
Rice-Rice tandem duplication
two TblastN hits are allowed
Ks from K-Estimator
Rice-Rice background duplication
one and only one homolog
200
Ks from K-Estimator
300
150
200
100
100
50
0
0
0.2
0.4
subs per silent site, Ks
0.6
0
0
1
2
subs per silent site, Ks
3
peak at zero Ks and exponential decay thereafter is indicative of ongoing duplication process
duplicated genes undergo periods
of relaxed selection and are usually
silenced within 4~17 million years
one copy
left alone
progenitor
gene
one copy
to modify
post-duplicative
‘transient’ of duration
4~17 million years
reduced
expression
novel
function
relaxed
selection
eventual
death
hypothesis introduced by Lynch M, Conery JS. 2000. Science 290: 1151;
with details in Lynch M, Conery JS. 2003. J Struct Funct Genomics 3: 35
rice analysis succeeded only because
duplication is not too old
when the duplication is old: an analysis
from yeast comparing related genomes
with and without the duplication
Kellis M, et al. 2004. Proof and evolutionary analysis of ancient genome duplication in the
yeast Saccharomyces cerevisiae. Nature 428: 617-624
when the duplication is extremely new:
an analysis from human
Bailey JA, et al. 2002. Recent segmental duplications in the human genome. Science 297:
1003-1007
proof of whole genome duplication in
Saccharomyces cerevisiae by comparison
to sequence of Kluyveromyces waltii
duplication
mutation
gene death
interleaving genes from sister segments in comparison to K. waltii
gene and regional correspondences with K. waltii
ancient whole genome duplication in S. cerevisiae
identifying recent segmental
duplications in human assembly
whole genome shotgun (WGS) reads from Celera are aligned to map-based genome from
IHGSC; recent segmental duplications are detected in similarity and read depth anomalies
patterns of intra-chromosomal
and inter-chromosomal duplication
recent segmental duplications of length>10-kb & identity>95%; intra-chromosomal
(blue lines) and inter-chromosomal (red bars) duplication; unique regions surrounded
by intra-chromosomal duplications (gold bars) are hot spots for genomic disorders
recent segmental duplications
in IHGSC and Celera genomes
proportion of Celera aligned bases falls rapidly as identity exceeds 97% or length
exceeds 15-kb, but the total sequence lost is still only 2%~3%
NB: search of the map-based rice genome revealed no segmental duplications of
recent origins (Yu J, et al. 2006. Trends Plant Sci 11: 387-391
“Although it is clear that the detailed clone-ordered
approach is superior in the resolution of segmental
duplications, it would be unrealistic to propose that
the sequencing community should abandon wholegenome-shotgun based approaches. These are the
most efficient cost-effective means of capturing the
bulk of the euchromatic sequence.”
Evan E. Eichler (21 October 2004)