No Slide Title

Download Report

Transcript No Slide Title

Quantitative Genetics in the
Age of Genomics
Quic kT ime™ and a T IFF (Unc ompres sed) dec ompres sor are needed to s ee this picture.
QuickTi me™ and a T IFF (Uncompressed) decompressor are needed to see this picture.
Classical Quantitative Genetics
• Quantitative genetics deals with the observed variation in a
trait both within and between populations
• Basic model (Fisher 1918): The phenotype (z) is the sum of
(unseen) genetic (g) and environmental values (e)
•z=g+e
• The genetic value needs to be further decomposed into an
additive part A passed for parent to offspring, separate
from dominance (D) and epistatic effects (I) that are only
fully passed along in clones
• g=A+D+I
• Var(g)/Var(z) is quantitative measure of nature vs. nurture
– fraction of all trait variation due to genetic differences
Fisher’s great insight: Phenotypic covariances between
relatives can estimate the variances of g, e, etc.
• For example, in the simplest settings,
– Cov(parent,offspring) = Var(A)/2
– Cov(Full sibs) = Var(A)/2 + Var(D)/4
– Cov(clones) = Var(g) = Var(A)+Var(D)+Var(I)
• Random-effects model
– Interest is in estimating variances
• Thus, in classical quantitative genetics, a few
statistical descriptors describe the underlying
complex genetics
– This leaves an uneasy feeling among most of my molecular
colleagues.
– Does the age of genomics usher in the death knell of
Quantitative Genetics?
Approximate costs of genome projects
• Arabidopsis Genome Project
... $500 million
• Drosophila Genome Project
... $1 billion
• Human Genome Project
... $10 billion
• Working knowledge of multivariate
statistics
... Priceless
Model systems
QuickTime™ and a
Photo - JPEG decompressor
are needed to see this picture.
Euchloe guaymasensis
Neoclassical Quantitative Genetics
• Use information from both an individual’s
phenotype (z) and marker genotype (m)
• z = u + Gm + g + e
– Gm is genotypic value associated with the scored genotype
(m )
– Obvious extensions: include Gm x e and Gm x g
• Mixed model: can treat as the Gm as fixed
effects; g and e as random
• My molecular colleagues hope that Gm accounts for
most of the variance in the trait
– If true, then Var(g)/Var(z) trivial
Limitations on Gm
• The importance of particular genotypes may be quite
fleeting
– can easily change as populations evolve and as the biotic and
abiotic environments change
– If epistasis and/or genotype-environment interactions are
significant, any particular genotype may be a good, but not
exceptional, predictor of phenotype
• Quantitative genetics provides the machinery
necessary for managing all this uncertainty in the
face of some knowledge of important genotypes
– e.g., proper accounting of correlations between relatives in
the unmeasured genetic values (g)
The importance of even rather imperfect
marker information
• Suppose an F1 is segregating favorable alleles at n loci, and
we inbred to fixation before selecting among pure lines
– Pr (fixation favorable allele) = 1/2
• What are the required number of lines for Pr (at least one
line fixed for n favorable alleles) = 0.9?
• For n = 10: 2,360 lines
• For n = 20: 2,400,000 lines
• Suppose marker information increases the probability of
fixation by 50% (to 0.75)
• Required number of lines for Prob(at least one line fixed for
n favorable alleles) = 0.9
• For n = 10: 40 lines (60-fold reduction)
• For n = 20: 725 lines (3,300-fold reduction)
How do we obtain Gm?
• Ideally, we screen a number of candidate loci
• QTL (Quantitative trait locus) mapping
• Uses molecular markers to follow which chromosome
segments are common between individuals
•
This allows construction of a likelihood function, e.g.,
•
•
1
` ( z j š ; æ2A ; æ2A § ; æ2e ) = p
exp ° (z ° š ) T V ° 1 (z ° š )
2
(2º ) n jV j
1
Estimated QTL ef f ect
Background genetic eff ects
where
Estimated from marker inf ormation
and
ž
R ij =
V = R æ2A + A æ2A § + I æ2e
Know n f rom pedigree relationships
1
Rij
for i = j
;
for i =
6 j
ž
A ij =
1
2£ i j
for i = j
for i =
6 j
A typical QTL map from a likelihood analysis
Estimated QTL location
Support interval
Significance
Threshold
Genomics and candidate loci
• Typical QTL confidence interval 20-50 cM
• The big question: how do we find suitable
candidates?
• The hope is that a genomic sequence will
suggest candidates
Genomics tools to probe for candidates
• Dense marker maps
• Complete genome sequence
– Expression data (microarrays)
– Proteomics
– Metablomics
The accelerating pace of genomics
• Faster and cheaper sequencing
• Rapid screening of thousands of loci via
DNA chips
• “Phylogenetic bootstrapping” from model
systems to distant relatives
L
K
J
I
M
B
F
A
C
H
D
E
G
Q
O
N
Prediction of Candidate Genes
• Try homologous candidates from other species
• Examine all Open Reading Frames (ORFs) within a
QTL confidence interval
– Expression array analysis of these ORFs
– Lack of tissue-specific expression does not exclude a
gene
• Proteomics
– Specific protein motifs may provide functional clues
• Cracking the regulatory code (in silico genetics)
• Analysis of networks and pathways
Searching for Natural Variation
• This may be the area where genomics has the
largest payoff
• Source (natural and/or weakly domesticated)
populations contain more variation than the
current highly domesticated lines
• Key is to first detect and localize importance
variants, then introgress them into elite lines
Impact of other biotechnologies
• Cloning, other reproductive technologies
– Maintain elite lines as cell cultures?
– Embryo transplation into elite maternal lines?
• Transgenics
– Important tool in both breeding and evolutionary biology
• Complications:
– Silencing of multiple copies in some species
– Strong position effects
– Currently restricted to major genes
• Major genes can have deleterious effects on other
characters
• Importance of quantitative genetics for selecting for
background polygenic modifiers
Useful Tools for Quantitative Genetic
analysis
• Four subfields of Quantitative Genetics
–
–
–
–
Plant breeding
Animal breeding (forest genetics)
Evolutionary Genetics
Human Genetics
• Restricted communications between fields
• Important tools often unknown outside a field
Tools from Plant Breeding
• Special features dealt with by plant breeders
– Diversity of mating systems (esp. selfing)
– Sessile individuals
• Issues
– Creation and selection of inbred lines
– Hybridization between lines
– Genotype x Environment interactions
– Competition
• Plant breeding tools useful in other fields
– Field-plot designs
– G x E analysis models: AMMI and biplots
• These designs are also excellent candidates for the analysis
of microarray expression data
– Covariance between inbred relatives
– Line cross analysis
Animal Breeding
• Special features
– Complex pedigrees
– Large half-sib (more rarely full-sib) families
– Long life spans
– Overlapping generations
• Tree breeders face many of these same issues
• Animal breeding tools useful in other fields
– BLUP (best linear unbiased predictors) for genotypic values
– REML (restricted maximum likelihood) for variance
components
• BLUP/REML allow for arbitrary pedigrees, very complex models
– Maternal effects designs
• Endosperm work of Shaw and Waser
– Selection response in structured populations
Evolutionary Genetics
• Issues
– Estimating the nature and amount of selection
– Population-genetic models of evolution
• Tools
– Estimation of the nature of natural selection on any
specified character
• Lande-Arnold fitness estimation; cubic splines
– Using DNA sequences to detect selection on a locus
• Example: teosinte-branched 1
– Coalescent theory
• The genealogy of DNA sequences within a random sample
– Analysis of finite-locus and non-Gaussian models of
selection response
• Barton and Turelli; Burger
Human Genetics
• Issues
– Very small family sizes
– Lack of controlled mating designs
• Tools of potential use
– Sib-pair approaches for QTL mapping
• QTL mapping in populations
– Transmission-disequilibrium test (TDT)
• Account for population structure
– Linkage-disequilibrium mapping
• Use historical recombinations to fine-map genes
– Random-effects models for QTL mapping
• BLUP/REML-type analysis over arbitrary pedigrees
A Bayesian Future?
• 1970s saw the start of a shift in QG from methods-of
moments approaches (i.e., estimators based on sample means
and variance) to likelihood approaches that use the entire
distribution of the data
– Initial objections to having to specify a likelihood
function,
• L(u | data)
– As these methods became computationally feasible, they
started to supplant their method-of-moments
counterparts.
• Similarly, Bayesian approaches have become much more
computationally feasible recently because of both advances
in computational power and a greater appreciation of the
power of resampling methods (MCMC and Gibbs samplers)
Posterior ( u | data ) = C* Likelihood ( u | data) *
prior (u)
0.02
0.0175
posterior
0.015
0.0125
0.01
prior
0.0075
0.005
0.0025
0
100
200
300
400
Why Bayesian?
• Marginal posteriors
– The effects of the uncertainty in estimating
nuisance parameters (those not of interest) are
fully accounted for.
• Exact for small sample size
• Powerful interative sampling methods (MCMC,
Gibbs) allow Bayesian analysis to work on problems
with a very large number of parameters and
relative few actual data points (vectors)
Conclusions
• Genomics will increase, not decrease, the
importance of quantitative genetics
• The machinery of classical quantitative genetics is
easily modified (indeed, it is actually preadapted)
to account for massive advances in genomics and
other fields of biotechonology
• Useful and powerful tools have been developed to
address specific issues in the various subfields of
quantitative genetics
• Bayesian analysis will continue to increase in
importance