Bayesian Inference for QTLs in Inbred Lines
Download
Report
Transcript Bayesian Inference for QTLs in Inbred Lines
networking in biochemistry:
building a mouse model of diabetes
Brian S. Yandell, UW-Madison
October 2008
www.stat.wisc.edu/~yandell
Real knowledge is to know the extent of one’s ignorance.
Confucius (on a bench in Seattle)
October 2008
BMI Chair Talk © Brian S. Yandell
1
outline
1.
2.
3.
4.
5.
how did I got here?
what problems caught my eye?
what have I done, anyway?
how do I work in teams?
what challenges remain?
October 2008
BMI Chair Talk © Brian S. Yandell
2
how did I get here?
• Biostatistics, School of Public Health, UC-Berkeley 1981
– RA/TA with EL Scott, J Neyman, CL Chiang, S Selvin
– PhD 1981
• non-parametric inference for hazard rates (Kjell A Doksum)
– Annals of Statistics (1983) 50 citations to date (2 in 2008)
• research evolution
– early career focus on survival analysis
– shift to non-parametric regression (1984-99)
– shift to statistical genomics (1991--)
• joined Biometry Program at UW-Madison in 1982
– attracted by chance to blend statistics, computing and biology
– valued balance of mathematical theory against practice
– enjoyed developing methodology driven by collaboration
October 2008
BMI Chair Talk © Brian S. Yandell
3
Yandell “Lab” Projects
• Bayesian QTL Model Selection
– R software development (Whipple Neely)
– collaboration with UAB & Jackson Labs
– data analysis of SCD1, ins10
• meta-analysis for fine mapping Sorcs1
– Chr 19 QTL introgressed as congenic lines
– combined analysis across to increase power
• QTL-based causal biochemical networks
– algorithm development (Elias Chaibub)
– data analysis with Christine Ferrara, Duke U
October 2008
BMI Chair Talk © Brian S. Yandell
4
UAB:
Allison,
Yi
Jax:
Churchill,
von Smith
stat/hort:
Yandell
Duke:
Newgaard,
Ferrara
biochem:
Attie,
Keller, Zhu
BMI:
Kendziorski,
Broman,
Craven
Rosetta:
Schadt,
Zhang, Zhu
October 2008
BMI Chair Talk © Brian S. Yandell
5
Pareto diagram of QTL effects
3
(modifiers)
minor
QTL
polygenes
1
2
major
QTL
0
3
additive effect
major QTL on
linkage map
2
1
October 2008
0
4
5
5
10
15
20
25
30
rank order of QTL
BMI Chair Talk © Brian S. Yandell
6
problems of single QTL approach
• wrong model: biased view
– fool yourself: bad guess at locations, effects
– detect ghost QTL between linked loci
– miss epistasis completely
• low power
• bad science
– use best tools for the job
– maximize scarce research resources
– leverage already big investment in experiment
October 2008
BMI Chair Talk © Brian S. Yandell
7
advantages of multiple QTL approach
• improve statistical power, precision
– increase number of QTL detected
– better estimates of loci: less bias, smaller intervals
• improve inference of complex genetic architecture
– patterns and individual elements of epistasis
– appropriate estimates of means, variances, covariances
• asymptotically unbiased, efficient
– assess relative contributions of different QTL
• improve estimates of genotypic values
– less bias (more accurate) and smaller variance (more precise)
– mean squared error = MSE = (bias)2 + variance
October 2008
BMI Chair Talk © Brian S. Yandell
8
QTL mapping idea
• observe phenotype y , marker genotypes m
• genetic architecture identifies model
– number and location of QTL
– gene action and epistasis (pairwise interactions)
• missing data: genotypes q at may be unknown
– pr(q | m, , )
– form of genotype model well known
• phenotype y depends on genotype q
– pr(y | q, µ, )
– often linear model in q
– possible interactions among QTL (epistasis)
October 2008
BMI Chair Talk © Brian S. Yandell
9
October 2008
BMI Chair Talk © Brian S. Yandell
10
how does phenotype y improve
guess of QTL genotypes q?
D4Mit41
D4Mit214
what are probabilities
for genotype q
between markers?
120
bp
110
recombinants AA:AB
100
all 1:1 if ignore y
and if we use y?
90
AA
AA
AB
AA
AA
AB
AB
AB
Genotype
October 2008
BMI Chair Talk © Brian S. Yandell
11
Gibbs sampler for loci indicators
• QTL at pseudomarkers
• loci indicators
= 1 if QTL present
= 0 if no QTL present
• Gibbs sampler on loci indicators
– relatively easy to incorporate epistasis
– Yi et al. (2005, 2007 Genetics)
• (earlier work of Yi, Ina Hoeschele)
(
q
)
(
q
)
(
q
,
q
)
,
0
,
1
q
October 2008
0 1
11
2
22
12
12
12 k
BMI Chair Talk © Brian S. Yandell
12
likelihood and posterior
• likelihood relates “known” data (y,m,q) to
unknown values of interest (,,)
– pr(y,q|m,,,) = pr(y|q,,) pr(q|m,,)
– mix over unknown genotypes (q)
• posterior turns likelihood into a distribution
– weight likelihood by priors
– rescale to sum to 1.0
– posterior = likelihood * prior / constant
October 2008
BMI Chair Talk © Brian S. Yandell
13
Bayes theorem for QTLs
likelihood
*prior
posterior
constant
phenotype
likelihood
*[prior
for
q
,
,
,]
posterior
for
q
,
,
,
constant
pr
(y|q
,
,)*[pr
(q|m
,
,)pr
(|)pr
(|m
,)pr
()]
pr
(q
,
,
,| y,m
)
pr
(y|m
)
October 2008
BMI Chair Talk © Brian S. Yandell
14
why use a Bayesian approach?
• first, do both classical and Bayesian
– always nice to have a separate validation
– each approach has its strengths and weaknesses
• classical approach works quite well
– selects large effect QTL easily
– directly builds on regression ideas for model selection
• Bayesian approach is comprehensive
– samples most probable genetic architectures
– formalizes model selection within one framework
– readily (!) extends to more complicated problems
October 2008
BMI Chair Talk © Brian S. Yandell
15
Markov chain sampling
• construct Markov chain around posterior
– posterior is stable distribution of Markov chain
– use MC samples to estimate posterior
• sample QTL model unknowns from full conditionals
– update unknowns one at a time or in batches
(
,
q
,
,
)
(
,
q
,
,
)
(
,
q
,
,
)
(
,
q
,,)
~
pr
(
,
q
,,|y
,
m
)
1
October 2008
2
BMI Chair Talk © Brian S. Yandell
N
16
Bayes posterior vs. maximum likelihood
• LOD: classical Log ODds
– maximize likelihood over effects µ
– R/qtl scanone/scantwo: method = “em”
• LPD: Bayesian Log Posterior Density
– average posterior over effects µ
– R/qtl scanone/scantwo: method = “imp”
LOD
(
)
log
{max
(y|m
,
,
)}
c
10
pr
LPD
(
)
log
{
pr
(
|m
)pr
(y|m
,
,
)pr
(
)
d
}
C
10
likelihood
mixes
over
missing
QTL
genotypes
:
pr
(y|m
,
,
)
pr
(y|q
,
)pr
(q
|m
,
)
q
October 2008
BMI Chair Talk © Brian S. Yandell
17
LOD & LPD: 1 QTL
n.ind = 100, 10 cM marker spacing
October 2008
BMI Chair Talk © Brian S. Yandell
18
marginal LOD or LPD
• what is contribution of a QTL adjusting for all others?
– improvement in LPD due to QTL at locus
– contribution due to main effects, epistasis, GxE?
• how does adjusted LPD differ from unadjusted LPD?
– raised by removing variance due to unlinked QTL
– raised or lowered due to bias of linked QTL
– analogous to Type III adjusted ANOVA tests
• can ask these same questions using classical LOD
– see Broman’s newer tools for multiple QTL inference
October 2008
BMI Chair Talk © Brian S. Yandell
19
1-QTL LOD vs. marginal LPD
1-QTL LOD
October 2008
BMI Chair Talk © Brian S. Yandell
20
hyper data: scanone
October 2008
BMI Chair Talk © Brian S. Yandell
21
what is best estimate of QTL?
•
find most probable pattern
– 1,4,6,15,6:15 has posterior of 3.4%
•
estimate locus across all nested patterns
– Exact pattern seen ~100/3000 samples
– Nested pattern seen ~2000/3000 samples
•
estimate 95% confidence interval using quantiles
> best <- qb.best(qbHyper)
> summary(best)$best
247
245
248
246
chrom locus locus.LCL locus.UCL
n.qtl
1 69.9 24.44875
95.7985 0.8026667
4 29.5 14.20000
74.3000 0.8800000
6 59.0 13.83333
66.7000 0.7096667
15 19.5 13.10000
55.7000 0.8450000
> plot(best)
October 2008
Manichaikul et al. 2008
Genetics (in review)
BMI Chair Talk © Brian S. Yandell
22
what patterns are “near” the best?
• size & shade ~ posterior
• distance between patterns
–
–
–
–
sum of squared attenuation
match loci between patterns
squared attenuation = (1-2r)2
sq.atten in scale of LOD & LPD
• multidimensional scaling
– MDS projects distance onto 2-D
– think mileage between cities
October 2008
BMI Chair Talk © Brian S. Yandell
23
Software for Bayesian QTLs
R/qtlbim: www.qtlbim.org
• Properties
– cross-compatible with R/qtl
– new MCMC algorithms
• Gibbs with loci indicators; no reversible jump
– epistasis, fixed & random covariates, GxE
– extensive graphics
• Software history
– initially designed (Satagopan, Yandell 1996)
– major revision and extension (Gaffney 2001)
– R/bim to CRAN (Wu, Gaffney, Jin, Yandell 2003)
– R/qtlbim to CRAN (Yi, Yandell et al. 2006)
• Publications
– Yi et al. (2005); Yandell et al. (2007); Yi et al. (2007ab)
October 2008
BMI Chair Talk © Brian S. Yandell
24
BTBR mouse is
insulin resistant
B6 is not
make both obese…
glucose
(courtesy AD Attie)
October 2008
BMI Chair Talk © Brian S. Yandell
insulin
25
studying diabetes in an F2
• mouse model: segregating panel from inbred lines
– B6.ob x BTBR.ob F1 F2
– selected mice with ob/ob alleles at leptin gene (Chr 6)
– sacrificed at 14 weeks, tissues preserved
• physiological study (Stoehr et al. 2000 Diabetes)
– mapped body weight, insulin, glucose at various ages
• gene expression studies
– RT-PCR for a few mRNA on 108 F2 mice liver tissues
• (Lan et al. 2003 Diabetes; Lan et al. 2003 Genetics)
– Affymetrix microarrays on 60 F2 mice liver tissues
• U47 A & B chips, RMA normalization
• design: selective phenotyping (Jin et al. 2004 Genetics)
October 2008
BMI Chair Talk © Brian S. Yandell
26
log10(ins10)
Chr 19
black=all
blue=male
red=female
purple=sexadjusted
solid=512 mice
dashed=311 mice
October 2008
BMI Chair Talk © Brian S. Yandell
27
Sorcs1 study
in mice:
11 sub-congenic strains
marker regression
meta-analysis
within-strain
permutations
Nature Genetics 2006
Clee, Yandell et al.
October 2008
BMI Chair Talk © Brian S. Yandell
28
Interaction plot for D19Mit58 and D8Mit289
we were lucky!
2.0
AA
AB
BB
1.8
BTBR background
needed to see SORCS1
1.6
logins10
epistatic interaction
of chr 19 and 8
…
discovered much later
D19Mit58
1.4
1.2
1.0
0.8
AA
AB
BB
D8Mit289
October 2008
BMI Chair Talk © Brian S. Yandell
29
Sorcs1 gene & SNPs
October 2008
BMI Chair Talk © Brian S. Yandell
30
Sorcs1 study in humans
Diabetes 2007
Goodarzi et al.
October 2008
BMI Chair Talk © Brian S. Yandell
31
2M observations
30,000 traits
60 mice
October 2008
BMI Chair Talk © Brian S. Yandell
32
experimental context
• B6 x BTBR obese mouse cross
– model for diabetes and obesity
– 500+ mice from intercross (F2)
– collaboration with Rosetta/Merck
• genotypes
– 5K SNP Affymetrix mouse chip
– care in curating genotypes! (map version, errors, …)
• phenotypes
– clinical phenotypes (>100 / mouse)
– gene expression traits (>40,000 / mouse / 4-6 tissues)
– other molecular traits (proteomic, miRNA, metabolomic)
October 2008
BMI Chair Talk © Brian S. Yandell
33
QTL mapping
thousands
of gene
expression traits
PLoS Genetics
2006 Lan, Chen et al.
October 2008
BMI Chair Talk © Brian S. Yandell
34
QTLs on chr n
gray scale for
variance
red=trans
blue=cis
October 2008
BMI Chair Talk © Brian S. Yandell
35
Chaibub Neto et al. (2008)
Genetics
October 2008
BMI Chair Talk © Brian S. Yandell
36
causal phenotype networks
• goal: mimic biochemical pathways with
directed (causal) networks
• problem: association (correlation) does not
imply causation
• resolution: bring in driving causes
– genotypes (at conception)
– processes earlier in time
October 2008
BMI Chair Talk © Brian S. Yandell
37
Causal vs Reactive? (Elias Chaibub, Brian Yandell)
y1 causes y2: y1 ~ g1 and y2 ~ g2*y1
October 2008
BMI Chair Talk © Brian S. Yandell
38
Ferrara et al.
October 2008
BMI Chair Talk © Brian S. Yandell
39
inferring phenotype networks
• build in prior pathway knowledge (PPI, TF)
– co-map correlated traits
• Banerjee, Yandell, Yi (2008 Genetics)
– pathways induce correlation structure
• ramp up to 100s, 1000s of phenotypes?
– danger of mixing unrelated pathways
– want closely linked upstream (causal) drivers
October 2008
BMI Chair Talk © Brian S. Yandell
40
UAB:
Allison,
Yi
Jax:
Churchill,
von Smith
stat/hort:
Yandell
Duke:
Newgaard,
Ferrara
biochem:
Attie,
Keller, Zhu
BMI:
Kendziorski,
Broman,
Craven
Rosetta:
Schadt,
Zhang, Zhu
October 2008
BMI Chair Talk © Brian S. Yandell
41
why build Web eQTL tools?
• common storage/maintainence of data
– one well-curated copy
– central repository
– reduce errors, ensure analysis on same data
• automate commonly used methods
– biologist gets immediate feedback
– statistician can focus on new methods
– codify standard choices
October 2008
BMI Chair Talk © Brian S. Yandell
42
how does one build tools?
• no one solution for all situations
• use existing tools wherever possible
– new tools take time and care to build!
– downloaded databases must be updated regularly
• human component is key
– need informatics expertise
– need continual dialog with biologists
• build bridges (interfaces) between tools
– Web interface uses PHP
– commands are created dynamically for R
• continually rethink & redesign organization
October 2008
BMI Chair Talk © Brian S. Yandell
43
steps in using Web tools
•
•
•
•
user enters data on Web page
PHP tool interprets user data
PHP builds R script
R run on script
– creates plots, summaries, warnings
• PHP grabs results & displays on page
• user examines, saves
• user modifies data and reruns
October 2008
BMI Chair Talk © Brian S. Yandell
44
raw data or fancy results?
• raw data flexible but slow
– LOD profiles for 100 (1000) traits?
• fancy results from sophisticated analysis
– IM, MIM, BIM, MOM analysis
– too complicated to put in biologists’ hands?
• methods are unrefined, state-of-art, research tools
• use of methods involved many subtle choices
– batch computation over weeks
• compute once, save, display many times
October 2008
BMI Chair Talk © Brian S. Yandell
45
October 2008
BMI Chair Talk © Brian S. Yandell
46
LOD profiles: many traits
October 2008
BMI Chair Talk © Brian S. Yandell
47
1.5 LOD interval approximate 95% CI
October 2008
BMI Chair Talk © Brian S. Yandell
48
QTLs on chr n
gray scale for
variance
red=trans
blue=cis
October 2008
BMI Chair Talk © Brian S. Yandell
49
what challenges remain?
• from eQTL to candidate pathways
– statistical issues
• networks, correlated traits
• better model selection approaches
– biological evidence (Weiss 2007 Genetics)
• Mouse to human to mouse
• KOs, etc.
• upgrade informatics environment
– harden local code (R, Python, PHP, …)
– build on other high throughput systems
• Swertz, Jansen (2007); Stein (2008) Nat Rev Gen
October 2008
BMI Chair Talk © Brian S. Yandell
50
many thanks
Karl Broman
Jackson Labs
Gary Churchill
Hao Wu
Randy von Smith
U AL Birmingham
David Allison
Nengjun Yi
Tapan Mehta
Samprit Banerjee
Ram Venkataraman
Daniel Shriner
Tom Osborn
David Butruille
Marcio Ferrera
Josh Udahl
Pablo Quijada
Alan Attie
Michael Newton
Hyuna Yang
Daniel Sorensen
Daniel Gianola
Liang Li
my students
Jonathan Stoehr
Hong Lan
Susie Clee
Jessica Byers
Mark Keller
Jaya Satagopan
Fei Zou
Patrick Gaffney
Chunfang Jin
Elias Chaibub Neto
W Whipple Neely
Jee Young Moon
USDA Hatch, NIH/NIDDK (Attie), NIH/R01 (Yi, Broman)
October 2008
BMI Chair Talk © Brian S. Yandell
51