Genome evolution: a sequence
Download
Report
Transcript Genome evolution: a sequence
Genome Evolution. Amos Tanay 2010
Genome evolution
Lecture 10: Quantitative traits
Genome Evolution. Amos Tanay 2010
Every meaningful evolutionary traits is ultimately quantitative
Continuous traits: Weight, height, milk yeild, growth rate
Categorical traits: Number of offspring, petals, ears on a stalk of corn
Threshold traits: disease (the underlying liability toward the trait)
F. Galton
Ultimately, fitness is a quantitative trait, so what is special about it?
Historically, research on genetics and directed selection were distinct from evolutionary
theory
Currently, a quantitative approach to molecular evolution and population genetics is a
major frontier in evolutionary research
Genome Evolution. Amos Tanay 2010
The basic observation: heritability
Var ( x) E ( x 2 ) E ( x) 2
Cov( x, y ) E ( xy) E ( x) E ( y )
A linear fit would try to minimize the mean square deviation:
SS E[ y (a bx)]2
SS
2 E ( y a bx) 2( E ( y ) a bE ( x)) 0
a
SS
2 E ( x( y a bx)) 2[ E ( xy) aE ( x) bE ( x 2 )] 0
b
[ E ( xy) E ( x) E ( y )] b[ E ( x 2 ) E ( x) 2 ] 0
Cov( x, y )
b
Var ( x)
Cov( x, y )
r
Var ( x)Var ( y )
Heritability is defined:
1 2
b h
2
(dividing because
only one parent is
considered)
This is the “narrow
sense” heritability
Genome Evolution. Amos Tanay 2010
Artificial selection
%Oil
12
48
Over 100 years of an ongoing selection experiments
From 4.6% to 20.4% oil
What kinds of evolutionary dynamics allow for such rapid increase in the trait?
12
Genome Evolution. Amos Tanay 2010
Artificial selection
Selection can work by exploiting existing
polymorphic sites
or by fixating new mutations
SNP data suggest that at least 50 genes were involved in the corn selection
Theory suggest that fixation of all strong effects should occur rapidly – 20 generations.
Later one should see fixation of alleles with smaller effect or new mutations
Remainder- Theorem (Kimura):
t (2 / s) ln( 2 N )
One strong candidate for introducing mutations are repetitive elements.
The corn population is of tiny size (60)
Selection is enhanced due to the threshold effect
Genome Evolution. Amos Tanay 2010
Limits to artificial selection
After some (variable number of) generations, artificial selection stop increasing the trait
One reason for that can be the exhaustion of polymorphism
This is frequently not the case, since reversing the selection is frequently shown to have
an effect – meaning polymorphisms is present
Another reason for converging trait values is selection on other traits (fertility!)
Using many allele affecting the trait, artificial selection can reach trait values that are
practically never observed in the original population
Not all traits can be artificially selected: in 1960, Maynard-Smith and Sondhi showed
they could not select for asymmetric body plan in flies by choosing flies with excess of
dorsal bristles on the left side
This suggest that some traits are strongly stabilized
Artificial selection can proceed non-linearly: starting and stopping
A main possible reason for that is that recombination of strongly
linked alleles takes time
J. Maynard-Smith
Genome Evolution. Amos Tanay 2010
Truncation selection
M
The selection differential is generally larger than the
selection response
Differential:
S = MS - M
This is because some of the selected offsprings are
of high trait value due to non-genetic effects
MS
Another reason is that the genotype of the selected
offspring is modified by segregation and
recombination
M’
Response:
R = M’ - M
We redefine (realized) heritability as the ratio
between selection differential and selection
response
R h2S
Genome Evolution. Amos Tanay 2010
Back to genetics: two loci
BB
Bb
bb
Assume additive selected trait
AA
4
3
2
1/16
Aa
2/16
Generally: M=2(p(A)+p(B))
Selecting class 0 and 1 MS = 0.8
3
1
2
2/16
aa
1/16
2
4/16
1
1/16
BB
0
2/16
Bb
After selection:
p(A)=p(B) = 0.2
Yielding:
M’ = 0.8
and
h2 = 1
2/16
1/16
bb
Assume dominant selected trait
AA
4
4
1/16
Aa
2/16
4
4
2/16
aa
2
2
2
4/16
2
1/16
1/16
2/16
0
2/16
1/16
Selecting class 2,0
MS = 12/7
After selection:
p(A)=p(B) =2/7
Yielding:
M’ =96/49
and
h2 =17/21=0.81
Genome Evolution. Amos Tanay 2010
Continuous traits: regression of alleles and phenotypes
We now assume each genotype have a distribution
of trait values
The variability may be a consequence of
environmental factors or other loci
m
AA
M
(MS-M)/s2Z/B
ma
mA' A' m a
m – mean
a – additively
d = dominance
Z
mAA' m d
MS
B
T
AA’
M p 2 (m a) 2 pq(m d ) q 2 (m a)
m ( p q)a 2 pqd
A’A’
Cov(pheno, number of A alleles)=
AA
2 pm 2 p 2 a 2 pqd [m ( p q)a 2 pqd ]( 2 p)
2 pqa 2 pq(q p)d
m
Var(number of A alleles)= 4 p 2 pq (2 p) 2 pq
2
2 pqa 2 pq( p q)d
b
a (q p)d
2 pq
2
a
m AA m A'A'
2
m AA m A'A'
2
d mAA' m
Genome Evolution. Amos Tanay 2010
Continuous traits: truncation selection
M
(MS-M)/s2Z/B
Z
Selecting on a threshold over the mean of the
population (T)
T a
T d
Thresh relative to AA normal distrib
T a
Thresh Relative to A’A’ normal distrib
Thresh relative to AA’ normal distrib
MS
B
T
AA’
mA' A' m a
mAA' m d
A’A’
The “fitness” equals the ratio between the areas
beyond the threshold
AA
Assuming small differences, the areas are nearly
rectangular:
w11 w12 Z ((T d ) (T a)) Z (a d )
w12 w22 Z ((T a) (T d )) Z (a d )
mAA m a
m
a
m AA m A'A'
2
m AA m A'A'
2
d mAA' m
Genome Evolution. Amos Tanay 2010
Allele frequency change
M
w11 w12 Z ((T d ) (T a)) Z (a d )
(MS-M)/s2Z/B
w12 w22 Z ((T a) (T d )) Z (a d )
It can be generally shown (compute!) that:
Z
p pq[ p(w11 w12 ) q(w12 w22 )] / w
MS
B
T
AA’
Average fitness is the area B:
p pq[ pZ (a d ) qZ (a d )] / B
mAA m a
mA' A' m a
mAA' m d
A’A’
p ( Z / B) pq[a (q p)d ]
AA
Selection Allele
Intensity frequency
m
Phenotype to genotype
regression
a
m AA m A'A'
2
m AA m A'A'
2
d mAA' m
Genome Evolution. Amos Tanay 2010
Mean Phenotype
M
p ( Z / B) pq[a (q p)d ]
Selection Allele
Intensity frequency
(MS-M)/s2Z/B
Phenotype to genotype
regression
Z
M ' ( p p) 2 (m a) 2( p p)( q p)( m d ) (q p) 2 (m a)
MS
B
T
AA’
p 2 (m a) 2 pq(m d ) q 2 (m a) 2[a (q p)]p
mA' A' m a
M 2[a (q p)d ]p
M ' M 2[a (q p)d ]p
mAA m a
mAA' m d
A’A’
( Z / B)2 pq[a (q p)d ]2
( M M S )2 pq[a (q p)d ]2 / s 2
AA
R S 2 pq[a (q p)d 2 ] / s 2
h 2 2 pq[a (q p)d 2 ] / s 2
Interpretation: Proportion of phenotypic variance
that can be explained by genotype change
m
a
m AA m A'A'
2
m AA m A'A'
2
d mAA' m
Genome Evolution. Amos Tanay 2010
Heritability
Narrow sense heritability
0.7
0.6
0.5
Whither height
%protein in milk
0.2
Body length
Body weight
0.4
0.3
Back fat
Egg weight
Ear weight
Albumen cont.
Feed efficiency
Sexual maturity
Milk Yield
Feed efficiency
Daily gain
Litter size
0.1
Plant height
Eggs/hen
Ear number
Yield
Calving interval
Cattle
Human:
Stature – 0.85
Poultry
Weight – 0.62
Swine
Maize
Handedness – 0.31 Fertility – 0.1-0.2
Genome Evolution. Amos Tanay 2010
Genome wide association studies for Stature
Standard approach
extend what we
have shown above
to small pedigrees
and look for linkage
of QTL
GWAS use large cohort
(63,000 in this example) of
unrelated individuals..
Simulations of detection power:
“..What have we learned about the nature of quantitative trait variation for height from these studies? At a
first glance it looks quite simple: variation is explained by many variants of small effects, with no evidence
for interactions between alleles, either within loci (dominance) or between loci (epistasis), and there are no
strong differences in effects between males and females. These observations are consistent with patterns
of familial resemblance for height. However, given the design and analysis used, there was little statistical
power to find evidence for departures from this simple model. Not surprisingly, given the small effect sizes
found, there was no significant overlap between the location of the associated variants and previously
reported loci from linkage studies. It remains a challenge to reconcile the findings of GWAS and linkage
studies, because the former suggest individual variants with small effects, whereas the latter suggest
genomic regions with large effects within pedigrees. “
Genome Evolution. Amos Tanay 2010
Genetic analysis of genome-wide variation in human gene expression (Morely et al. 2004)
14 CEPH families (of ~8 members each)
3554 variable expression genes (in
lymphoblastoid cells)
2756 SNPs (just a few!)
Alternatively: 94 unrelated CEPH grandfathers
Testing linkage of expression and SNPs in the
large family trees yield linkage for ~1000
phenotypes
The test on families use the genealogical
structure (SIBPAL - http://darwin.cwru.edu/)
Alternative test on unrelated individuals use
simple correlation of the 0,1,2 individual
Difficulties: multiple testing vs low resolution
Reporting on loci that are linked with many
QTLs
Genome Evolution. Amos Tanay 2010
Genetic analysis of radiation-induced changes in human gene expression (Smirnov 2009)
15 CEPH families (of ~8 members each)
Low resolution ~3000 SNPS, and high resolution
HapMap SNPS,
3280 responding genes – different time points
during irradiation
Follow up molecular biology experiments
Variability in B-cells response to irradiation
Mapped eQTL
Genome Evolution. Amos Tanay 2010
Genetic Dissection of Transcriptional Regulation in Budding Yeast (Brem et al 2002)
Crossing two budding yeast strains
Fully genotyping, testing expression (later in different conditions)
Hundred of variably expressed genes
Using the compact yeast genome help deciding linkage
Using the well-characterized biology of yeast helps explain linkage
Genome Evolution. Amos Tanay 2010
Identifying regulatory mechanisms using individual variation reveals key role for chromatin
modification (Lee et al 2008)
Building association to groups of genes instead of single genes (Litvin et al 2009)
©2009 by National Academy of Sciences
Genome Evolution. Amos Tanay 2010
Schadt EE et al. 2005 (and many publications following it)
R – expression
L – locus genoetype
C - phenotype
Looking for gene expression traits that explain QTLs
– stands between genetic loci and some disease
trait of interest
Applied to obesity linkage (in mice)
Further development use more data (not just
expression), or gene subnetworks
Ultimate goal is to build a model explaining
phenotype by genotype through molecular
phenotypes
Positive correlation
suggests linked eQTLs
Correlation between genetic
distance and correlation
suggests LD effect
Possible modes of causality or interaction
Genome Evolution. Amos Tanay 2010
Direct quantitative trait locus mapping of mammalian metabolic phenotypes
in diabetic and normoglycemic rat models (Dunas et al. 2007)
Crossing two rat strains: diabitic and normal
2000 microsatellite and SNP markers
Using NMR to perform metabolic profiling – looking for linkage explaining metabolic abnormalities
(a) The horizontal axis shows the frequency
from the NMR spectrum expressed as
chemical shift from right to left ( , ppm).
The vertical axis indicates genetic locations
(cM) on chromosomes 1 to X. The lod
scores between each genotype and each
metabolite are color coded. Significant
linkages between genomic locations and
regions of the plasma NMR profile are
present in the aliphatic region (0.5 to 4.5
ppm) and the aromatic region (>5.5 ppm).
Resonances corresponding to the
anesthetics and their degradation products
were withdrawn as described in Methods.
(b,c) Genome-wide linkage mapping
across the full metabonomic spectrum for
marker D14Wox10 (b) and linkage data
across the genome for the metabolite 7.86
(c).
Genome Evolution. Amos Tanay 2010
Understanding mechanisms underlying human gene expression
variation with RNA sequencing
JK Pickrell et al. Nature 000, 1-5 (2010) doi:10.1038/nature08872
Genome Evolution. Amos Tanay 2010
Loci affecting isoform expression.
“..we identified more than a thousand genes at which genetic variation
influences overall expression levels or splicing..”
Genome Evolution. Amos Tanay 2010
Phenotype variation: back to evolutionary theory
Phenotypes in natural environment can be modeled as a combination of genotype and
environmental effects:
P G E
More carefully, the genotype effect on phenotype may is a function of the environment,
and the additive form may be wrong
For example, gene expression of stress related genes depends on the genotype
differently for different stresses
Understanding QTL evolution
Mapping phenotypes to QTL