Adjusting Relatedness for Family Data in Collapsing Test of Rare
Download
Report
Transcript Adjusting Relatedness for Family Data in Collapsing Test of Rare
Adjusting Relatedness for Family Data
in Collapsing Test of Rare Variants
Qunyuan Zhang, Doyoung Chung
Ingrid Borecki, Michael A. Province
Division of Statistical Genomics
Washington University School of Medicine
St. Louis, Missouri, USA
IGES, Sept. 2011, Heidelberg
Contact: Qunyuan Zhang, [email protected]
1
Introduction
Advances of sequencing technologies have been
facilitating rare variants (RVs) identification.
Family data, as potentially enriched with RVs
within pedigrees, may provide a great source for
detecting association between RVs and human
complex traits. Most RV testing methods developed
in recent years, however, are data-driven and
permutation-based collapsing methods, which are
inapplicable to family data, because direct
permutation test ignores and destroys family
structure.
2
Purpose
To deal with the relatedness issue in family data ,
we propose a mixed model based procedure that
incorporates family information with collapsing
analysis in a permutation test, denoted by MMPT
(Mixed Model-based Permutation Test).
3
Statistical Model
To deal with family structure, we generalize collapsing test as
a weighted sum score test based on a linear mixed model:
m
Y wi g i Z
(1)
i 1
Y is the observed trait , α the intercept, β the collective effect
coefficient, m the number of RVs in a genetic unit (usually a gene) of
interest, wi the weight of variant i, gi the number (0, 1 or 2) of minor
allele of variant i, ε the residual. The Σwigi part in the model is the
weighted sum score of multiple variants. Z is the design matrix
corresponding to γ, and γ follows a multivariate normal distribution of
N(0, G). Here G is the variance-covariance matrix of γ, which can be
decomposed as G=2σ2K, where K is the kinship matrix and σ2 is the
additive ploygene genetic variance.
4
Weighted Sum Scores
In terms of weighting, most existing collapsing methods
can be viewed as special instances of model (1). For
example, Morgenthaler and Thilly’s CAST is equivalent
to setting wi =1 for all RVs; Li and Leal’s CMC sets wi
=1 for all RVs but limits the sum ≤1. Madsen and
Browning’s WSS calculates wi based-on allele frequency
in controls. Han and Pan’s aSum test recodes genotypes
(equivalent to choosing wi = 1 or -1) according to a predefined cutoff of p-value; Zhang et al’s PWST and
SPWST define wi as a rescaled left-tailed p-value.
5
MMPT: Mixed Model-based Permutation Test
Since WSS, aSum, PWST and SPWST are data-driven
and permutation-based test, we apply model (1) to them
by permuting the weighted sum score part and fixing
the subject IDs of the rest of components, illustrated as
below:
Permuted
m
Y wi g i Z
i 1
Non-permuted, subject IDs fixed
6
Data
The 200 replications of data of 697 subjects from 8 extended families
simulated by the Genetic Analysis Workshop (GAW) 17 [Almasy et
al., 2011] were used, and the quantitative trait Q2 was chosen as the
target trait. For each gene, the genotypes with minor allele frequency
(MAF) less than 0.01 were collapsed into a variable using different
weighting methods (CMC, WSS, aSum, PWST and SPWST) . The
kinship matrix K was calculated based on the pedigree data.
The Genetic Analysis Workshop (GAW) 17 is supported by the NIH
Grant R01 GM031575. Preparation of the GAW 17 simulated data set
was supported in part by NIH R01 MH059490 and used sequencing
data from the 1000 Genomes Project (www.1000genomes.org)
7
Results(1)
Q-Q Plots of –log10(P) under the Null
CMC non-permutation test,
ignoring family structure,
inflation of type-1 error
CMC non-permutation test, modeling
family structure via mixed model,
inflation is corrected
8
Results(2)
Q-Q Plots under the Null
WSS
Permutation test,
ignoring family structure,
inflation of type-1 error
aSum
PWST
SPWST
9
Results(3)
Q-Q Plots under the Null
WSS
Mixed model-based
permutation test (MMPT),
modeling family structure,
inflation corrected
aSum
PWST
SPWST
10
Conclusions
Ignoring relatedness between subjects in family data may
result in significant inflation of type-1 error in collapsing test of
rare variants.
Directly modeling kinship data using mixed model can
correct the inflation of non-data-driven collapsing test (e.g.
CMC).
Directly applying
data-driven and permutation-based
methods (e.g. WSS, aSum, PWST and SPWST) to family data
may result in significant inflation of type-1 error, too.
The inflation of data-driven and permutation-based methods
can be corrected by the proposed MMPT method, which
incorporates kinship information with permutation test.
11
Main References
Almasy LA, Dyer TD, Peralta JM, Kent JW Jr, Charlesworth JC, Curran JE,
Blangero J.: Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc
2011, 5 (suppl 8):
Han F, Pan W. 2010. A data-adaptive sum test for disease association with
multiple common or rare variants. Hum Hered 70(1):42-54.
Li B, Leal SM. 2008. Methods for detecting associations with rare variants for
common diseases: application to analysis of sequence data. Am J Hum Genet
83(3):311-21.
Madsen BE, Browning SR. 2009. A groupwise association test for rare mutations
using a weighted sum statistic. PLoS Genet 5(2):e1000384.
Morgenthaler S, Thilly WG. 2007. A strategy to discover genes that carry multiallelic or mono-allelic risk for common diseases: a cohort allelic sums test
(CAST). Mutat Res 615(1-2):28-56.
Zhang Q, Irvin MR, Arnett DK, Province MA, Borecki I. Genet Epidemiol.
2011, doi: 10.1002/gepi.20618
12