The divergence of duplicate genes in Arabidopsis

Download Report

Transcript The divergence of duplicate genes in Arabidopsis

The dynamics of nuclear
gene order in the
eukaryotes
Genome archaeology in the
angiosperms
Todd Vision
Department of Biology
University of North Carolina at Chapel Hill
Comparative maps
Spaghetti Diagram
Livingstone et al 1999 Genetics 152:1183
Crop Circle
Gale & Devos 1998 PNAS 95:1972
Arabidopsis as a hub for plant
comparative maps
megabases
genome sizes in angiosperms
907
1000
750
560 622
473
367 367 372 415 439
500
262
250 145
0
is ch er ge ya ce go ot am an to
s
p ea mb an pa ri an rr y be ma
o
o
p cu or pa
m ca
d
a
t
i
m
b
cu
a
li
r
A
data from Arumuganathan & Earle (1991)Plant Mol Biol Rep 9:208-218
Tomato-Arabidopsis synteny
Bancroft (2001) TIG 17, 89 after Ku et al (2000) PNAS 97, 9121
Outline
• Ancient genome duplication
– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication
in genome evolution
Outline
• Ancient genome duplication
– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication
in genome evolution
Rice-Arabidopsis synteny
Mayer et al. (2001) Genome Res. 11, 1167
Paleotetraploidy?
The Arabidopsis Genome Initiative. 2000. Nature 408:796
Genomic dot-plot
Chromosome copy 1
Chromosome copy 2
gene
1
2
3
4
5
6
7
8
1
1
0
0
0
1
0
0
0
2
0
1
0
0
0
1
0
0
1
2
3
4
5
6
7
8
3
0
0
1
0
0
0
1
0
4
0
0
0
1
0
0
0
1
5
1
0
0
0
1
0
0
0
6
0
1
0
0
0
1
0
0
7
0
0
1
0
0
0
1
0
8
0
0
0
1
0
0
0
1
Duplication vs. multiplication
Multiple duplications generate abundant
overlaps among homeologous regions
Segmental paralogy in Arabidopsis
Vision et al. (2000) Science 290:2114-7.
Many duplicated segments but
few duplication events
frequency of blocks
12
A B
C
D
E F
10
8
6
4
2
0
0
.1
.2
.3
.4 .5
.6
.7
.8
amino acid substitution
.9
Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144.
rice
Arabidopsis
Angiosperm Phylogeny Website. Version 2 August 2001.
http://www.mobot.org/MOBOT/research/APweb/.
tomato
Block 37
after
Asterid-Rosid
split
Block 57
before
monocot-dicot
divergence
Raes, Vandepoele, Saeys, Simillion, Van de Peer (2003) J. Struct. Func. Genomics 3, 117-129
Divergence of homeologs
• Homeologs from age class C and older share less
than a third of their genes
– Gene loss
– Or subsequent gene movement?
• There is no evidence for uneven proportions of
duplicated genes between homeologs
Redundant gene function: SHATTERPROOF
Martin Yanofsky
Implications for comparative maps
• Networks of synteny
• Goodbye to pairwise comparisons
Outline
• Ancient genome duplication
– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication
in genome evolution
Ghosts and Muggles
Simillion, Vandepoele, Van Montagu, Zabeau, Van de Peer (2002) PNAS 99, 13627
Interspecies comparison can
reveal Ghosts
Things needful
• Identification of highly diverged Muggles
• A systematic way to identify Ghosts
• Centralization of mapped and sequenced
DNA markers from multiple species
FISH
(Fast Identification of Segmental
Homology)
• Identifies candidate segmental homologies
– Dynamic programming
• Statistically evaluates candidates
– Null model of transpositional duplication
• No permutations required
• Approaches limits to sensitivity
FISH under null model
k
2
observed standard upper
number error
bound
45.8
0.06
47.6
lower
bound
40.1
3
2.28
0.02
2.39
1.78
4
0.113
0.003
0.120
0.079
5
0.006
0.001
0.006
0.004
6
0.0003
0.0002
0.0003
0.0002
eAssembler
• Reconstructs ancestral gene
order by joining duplicated
blocks with overlapping
gene content
• Uses ‘breakpoint median’
as objective function
• Similar to algorithms used
in sequence assembly
Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144.
PHYTOME
integrating plant genome maps,
sequences and phylogenies
From www.plantgdb.org
Outline
• Ancient genome duplication
– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication
in genome evolution
Gene duplications in a
chromosomal context
• Turnover within gene families can be high
– Rate of duplication= 0.002/gene*MY
– Half-life=23MY
• Three modes of duplication
– Tandem
– Transpositional
– Segmental
• How does the mode of origin affect the molecular
and functional divergence of duplicate genes?
Gene family turnover
Lynch and Conery (2000) Science 290, 1151
Importance of tandem and
transpositional duplications
~10% of genes are in tandem arrays
85% of dispersed duplications are not in blocks
• Duplicates on the same chromosome are 20%
more common than expected by chance
• Duplicates on the same chromosome are 86%
as distant as would be expected by chance
Aux/IAA and ARF sister families
• Importance in Arabidopsis
Diversification of the Aux/IAA gene family
David Remington and Jason Reed
Diversification of ARF gene family
Chromosome 2-4 complex:
242 duplicated gene pairs
4200
chromosome 4 (4.6 Mb)
52
3800
54
45
3400
56
49
3000
2600
1200
1600
2000
2400
chromosome 2 (5.6 Mb)
2800
Substitutions in coding sequences
• silent substitutions (Ks) only alter the
codon, not the resulting amino acid
• replacement substitutions (Ka) alter the
amino acid
• Ka and Ks are standardized by the numbers
of synonymous and nonsynonymous sites
Ratio of Ka to Ks
Ka/Ks < 1
selective constraint
Ka/Ks = 1
pure neutrality
Ka/Ks > 1
positive selection
How have these ancient
segmental duplicates diverged?
1. What is the variation in Ka and Ks among
simultaneously duplicated pairs?
2. Do the Ka/Ks ratios suggest positive selection?
3. Do the members of each duplicated pair evolve
at the same rate?
70
coefficient of variation = 0.67
60
frequency
50
40
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Ka
120
coefficient of variation = 0.53
100
frequency
80
60
40
20
0
0
1
2
3
Ks
4
5
Relationship between Ka and Ks
1
0.8
Ka/Ks =1
Ka
0.6
0.4
0.2
0
0
1
2
3
Ks
r2=0.558, p<0.001
4
5
Relative rate test
O (outgroup)
d1
A
B
d2
d3
compare the fit of a model in which d2 = d3
with one in which they are allowed to vary
Relative rate tests
• 105 gene pairs could be evaluated against an
outgroup
• >30 showed significantly unequal rates of evolution
• no evident chromosomal or regional biases
Distance measure
Significant pairs
protein
15
Ka
29
Ks
9
Are paralogs different than
orthologs?
• Homologous genes are either
– Paralogs that diverged through duplication
– Orthologs that diverged though speciation
• Paralogs must coexist in the same genome – do
they diverge differently as a result?
• Comparison to 212 Arabidopsis-Brassica
orthologs by Tiffin and Hahn (2002) JME 54, 746.
– For all pairs, Ka/Ks < 1
– Ka/Ks unimodal around 0.14 (as opposed to 0.20)
– CVKs/CVKa is appx. 2
Conclusions
• A network of synteny due to duplication and gene
loss makes deep comparative mapping difficult
• But phylogenetically-informed methods should
allow us to go much deeper than at present
• Only by going deep will we be able to understand
the varied roles of different kinds of duplication
events in the diversification of gene families
Acknowledgements
• Arabidopsis genome evolution
– Daniel Brown
– Steven Tanksley
• Comparative mapping
– Peter Calabrese
– Sugata Chakravarty
– Luke Huan
• Evolution of duplicated genes
– Liqing Zhang
– Brandon Gaut
– David Remington
– Jason Reed
• Support
– USDA
– NSF
Conservation of gene orientation
parallel
convergent
divergent
Formulating the problem in
terms of graph traversal
• nodes are matches
• edges are unidirectional
• edges have associated distances
The putative duplicated blocks
consist of the paths through the
graph that traverse edges with
short distances
Statistical framework
• Null model of duplications
– Single-gene duplication/random transposition
– Leads to uniformly distributed dots
• Null distribution for
– The edge distance between nearest neighbors
– The number of serially connected short edges
• Observed edge distances and path lengths
analytically compared to null expectation
• Can be approximated by a permutation test
Only a fraction of the genes are (still?) duplicated
Chr2 segment
1183 genes
Chr4 segment
1168 genes
326
duplicates
(~28%)
271 (83%) pairwise duplications
Tandem substitutions
• correlation between Ka and Ks disappears
when tandem substitutions are excluded
• could be due to
– doublet mutations
– compensatory substitutions
49.5 calmodulin-binding protein
49.62 beta-expansin
AT4g31000
At2g18750
tobacco
1698547 0.16
0.37
rice
8118436
AT4g28250
At2g20750
0.22
0.70
0.12
0.13
p<0.0001
p<0.05
49.63 NADH-ubiquinone oxireductase
56.1 unknown transmembrane
At2g20800
AT4g30430
AT4g28220
potato 0.29
5734586
Hemerocallis
3551953 At2g23810
0.16
0.10
0.30
0.14
p<0.0001
p<0.01
0.22