Transcript Slides Here

Identifying and estimating
gene-gene and geneenvironment interactions
Christopher Amos1,2 and Carol Etzel1
Departments of Epidemiology
and Bioinformatics and Computational Genetics
U.T. M.D. Anderson Cancer Center, Houston, TX
1
Overview of talk
• Description of terminology
• Epistasis modeling for quantitative traits
• Epistasis modeling of linkage data in
humans
• Approaches to interaction modeling in
human/ outbred populations
• Modeling gene-environment interactions
2
What is an ‘interaction’
• Interaction is a kind of action that occurs
as two or more objects have an effect
upon one another. The idea of a two-way
effect is essential in the concept of
interaction, as opposed to a one-way
causal effect. (Wikipedia)
3
4
Gene - Environmental Interaction
Environment
-
+
-
spontaneous
15%
pure
environment
2%
17%
+
pure
genetic
5%
geneenvironmental
interaction
78%
83%
20%
80%
100%
Genetics
Schulte, 1994
5
Interactions
• Biological interpretation
– Two or more factors jointly modify a
phenotype
– e.g. risk from smoking is 14 fold increased
by tobacco smoke, 3 fold in increased by
asbestos exposure and 42 fold increased
by both
50
Risk
40
30
ADDITIVE
OBSERVED
20
10
0
1
2
3
6
• Statistical Interpretation
– Deviation from an additive model (on some
scale). On a multiplicative scale, the above
risks are additive, so there would not be
evidence for interaction on a multiplicative
scale
Both
Smoking
2
1.5
1
0.5
0
Asbestos
Log Risk
Log Additive Model
7
Definitions of Epistasis (Interaction
among alleles at different loci)
• Bateson: gene interaction, in a physical
sense of the direct interaction between
gene products.
– First noticed when crossing chicken strains
that only rarely was the single comb
produced. Using a Punnett square this
feature was shown to result as a doubly
homozygous recessive trait
– Another example is Bombay Phenotype
8
Batesonian Epistasis –
Bombay Phenotype
H allele is a precursor to ABO blood group, its absence (h)
causes ABO phenotypes not to mature, hh genotype
appears to express as O phenotype
9
Fisherian Epistasis
• Joint effects of alleles at two loci do not influence
a trait in an additive fashion
• Deviation from a simple oligogenic or polygenic
model – higher correlation among siblings than
parent-offspring
• Further Developed by Cockerham (Cockerham,
C. C., 1954 An extension of the concept of
partitioning ward to extend loci without further
setting the orthogonal contrast when epistasis is
present. Genetics 39: 859–882.)
10
2b
Types of Epistatic Interactions
AABB
AAbb
f
Additive-Additive Epistasis
b≠c
d
h
Additive-Dominant Epistasis
f≠g
2a
Dominant-Dominant Epistasis
h ≠ (d-e-(f-g))/2
g
aaBB
aabb
11
2c
12
Joint effects from multiple loci for
quantitative traits
• Let loci have alleles A, a and B, b
• A typical approach (F∞) is to set design
matrices -
Then define interactions as additive x additive epistasis x1*x2,
Additive x dominant interactions x1*z2 etc.
13
Characteristic of the (F∞) is
confounding with epistatis (Kao and
Zeng, Genetics 160:1243-1261, 2001)
14
Preffered model with epistasis
for F2 intercross
15
Estimates from Cockerham model
16
Implications of using a model with
confounding of effects
• Inferences about effects can be biased depending upon
the modeling procedure.
• Additive main effect estimate includes a component due
to dominance by additive epistasis
• Dominance by additive epistasis estimate includes a
component due to additive effects
• If the Type I sums of squares procedure is used, then the
main effect estimate is inflated and the epistasis
estimate is reduced. If the Type III sums of squares
procedure is used then both effects are reduced.
• ML method could be used to estimate parameters if
model is correctly specified.
17
Effects of Scale
• For the quantitative trait just indicated, if
additive by additive interaction is noted, it
may be possible to change scale to
remove this source of epistasis. However,
if multiple genetic factors influence the
trait, a change of scale may not be
sufficient.
18
19
Heterogeneity versus Interaction
• In Epidemiological studies, we usually treat all
subjects as if they are exchangeable – i.e. they
are all identically distributed
• In genetics, we often assume that there our
population reflects a mixture of features, may
model with admixture or heterogeneity
parameters
• Admixture/heterogeneity ideas not well
described in the interaction literature
20
Linkage analysis for multilocus
models in humans
• For ‘independence’ models in which the joint
genotype-specific penetrances are products of
each marginal genotype, modeling marginal
penetrances yields sufficiently accurate models
to permit linkage detection (whether using a
parametric or nonparametric approach).
• For ‘additive’ models in which penetrance is
increased by presence of either factor,
heterogeneity models are fitted.
21
22
23
24
Linkage Analysis of Lung cancer, with and
without heterogeneity among families
25
Epistasis modeling
• Linkage analysis is modeled according to
generalization of Risch’s lambda (MLS)
score method:
• Weights depend upon the assumptions of
the model (which can be fitted to
multiplicative – independence – models) or
to more general models (allowing for
epistasis)
26
Joint effects of loci influencing risk for hypertension
27
From Bell JT et al. Human Molecular Genetics 2006 15(8):1365-1374
Associating disease with mutations
1) usual approach for qualitative data logistic regression
(unconditional)
Pr  yi  1 
or
1
1 e
 a b1 x1 b2 x2 
 Pr( yi  1) 
Log 
  a  b1 x1  b2 x2
 Pr( y1  0) 
Where y is the disease outcome y=1 if case, 0 if control
x1 – design matrix with genotype AA =1, Aa=0, aa=-1
x2 - genotype AA=0, Aa =1, aa=0
Additive effect if b2= 0; dominance effects if b2 ≠ 0
If b2 ≠ 0, Can then fit
x1 – design matrix with genotype AA =1, Aa=1, aa=0 (A dominant)
x2 - genotype AA=1, Aa =0, aa=0 (A recessive)
28
Epistasis Modeling Humans
 Pr( yi  1) 
Log 
  a  b1 x1  b2 x2   x1 x2
 Pr( y1  0) 
• Where x1 and x2 are chosen to reflect best
marginal models (dominant, recessive or
additive) from consideration of univariable
analyses
29
Changing scale may remove
‘interactions’
• Lung cancer risk and smoking and asbestos:
interaction on an additive scale (risk from
smoking is 14, asbestos is 3, sum is 17)
• Lung cancer risk and smoking and asbestos
shows no interaction on multiplicative scale (14 x
3 = 42)
• What if you add in radon, which has an additive
effect on risk? E.g. risk from radon is 2, risk from
smoking is 14, risk from radon plus smoking is
16? If someone smokes, has radon exposure
and asbestos exposure is there an additive
scale?
30
“Curse of Dimensionality”
• For 2 SNPs, there are 9 = 32 possible two locus genotype
combinations.
• If the alleles are rare (MAF10%), then some cells will be empty
SNP1
AA
SNP2
Aa
aa
BB
Bb
bb
Empty Cell
31
“Curse of Dimensionality”
4 SNPs: 81 possible combinations with more possible empty cells
SNP 3
Cc
SNP1
CC
SNP1
AA Aa
S
N
P
4
aa
AA
Aa
cc
SNP1
aa
BB
DD SNP2
BB
BB
Bb
Bb
Bb
bb
bb
bb
AA Aa
aa
AA
Aa
aa
BB
Dd SNP2
BB
BB
Bb
Bb
Bb
bb
bb
bb
AA Aa
aa
AA
Aa
aa
AA Aa
aa
AA Aa
aa
Empty Cell
AA Aa
BB
dd SNP2
BB
BB
Bb
Bb
Bb
bb
bb
bb
aa
32
Tree Models
• Response variable can be
– Simple
• disease indicator (categorical)
• IBD sharing (continuous)
• Number of chromosome breaks (counts)
– Complex
• Survival object
• Regression object
• Multivariate object
• Predictor variables can be categorical, counts
or continuous
• Tree models provide some benefit over
logistic regression with respect to identifying
highest risk groups and not requiring
assumptions, but tend to overfit data
33
Tree Models
• First you “grow” the tree
– Like forward regression
– Only “important” predictors are put in the model
• Control the growth of the tree
– Setting limits on how many predictors to allow in the
model
• Then you “prune” the tree
– Like backward regression
– Only “significant” predictors are left in the model
34
Growing a Classification Tree
15 Affected
15 Unaffected
AU U
UA A U U A
AU U A A U A U
U A U AU A
U A
U
U
A
A A
Pr(A) = 0.50
Pr(U) = 0.50
• Data are recursively partitioned into
increasingly homogeneous subgroups
• Partitions of the data are ‘branched out’
through binary splits
35
All Possible Binary Splits
AU U
UA A U U A
AU U A A U A U
U A U AU A
U A U U
A AA
Male
vs
Female
Genotypes
BB
vs
Bb & bb
BB & Bb
vs
bb
BB & bb
vs
Bb
DNA repair capacity (measure of risk of cancer) [6.26,8.96]
6.265
vs
>6.265
6.275
vs
>6.275
36
15 Affected
15 Unaffected
Female
AU U
UA A U U A
AU U A A U A U
U A U AU A
U A
U
U
A
A A
U
U U A
A
UU U A A
U A
U
A
A
A
Pr(A)=0.50
Pr(U)=0.50
Pr(A)=0.50
Pr(U)=0.50
Male
A
A
A U
A
U A
U
U U A
U
U A
Pr(A)=0.50
Pr(U)=0.50
37
15 Affected
15 Unaffected
Family History
Of Cancer
AU U
UA A U U A
AU U A A U A U
U A U AU A
U A
U
U
A
A A
U
Pr(A)=0.50
Pr(U)=0.50
No Family History
Of Cancer
A A A
UU U A A A
U A
A
A
A
A U
A
U A U
U A
U
U U U
U
U A
Pr(A) = 10/15 = 0.667
Pr(U) = 5/15 = 0.333
Pr(A) = 5/15 = 0.333
Pr(U) = 10/15 = 0.667
38
Purity-Impurity of a Node
Pr(A) = 0.50
Pr(U) = 0.50
AU U
UA A U U A
AU U A A U A U
U A U AU A
U A U U
A AA
IMPURE
Pr(A) = 0.667
Pr(U) = 0.333
Pr(A) = 1.0
Pr(U) = 0.0
A
UA U
AU A A A
A A A AU
U
A
A
A A
A AA
A
PURE
39
Choosing splits
• Different measures are used –
– For ith group, Let Prob(Yi=1)=Ci, let wi be
proportion of the sample in a given node
– Entropy measure is
• Σiwi{-(Ci)log2Ci—(1-Ci)log2(1-Ci)}
– Gini Index is Σiwi(Ci)(1-Ci)
– Bayesian (misclassification rate based on
sample) – Σiwimin{Ci,(1-Ci)}
40
Measuring Purity-Impurity of a Node
1
Bayes
0.5
Gini
Entropy
0
0
0.5
1
41
Goodness of a Split
IS= p-Pr(AL)L - Pr(AR)R
Entropy of Parent Node
Proportion Affected
in Left Daughter node
Entropy of Left Daughter Node
Proportion Affected
in Left Daughter node
Entropy of Left Daughter Node
42
Female
P= 0.69
AU U
UA A U U A
AU U A A U A U
U A U AU A
U A
U
Male
U
A
A A
L= 0.69
U UU A
A
UU U A A
U A
U
A
A
A
R= 0.69
A
A
A U
A
U A
U
U U A
U
U A
IS1 = 0.69-0.5*0.69 - 0.50*0.69
=0
43
Family History
Of Cancer
U
AU U
P = 0.69
UA A U U A
AU U A A U A U
U A U AU A
No Family History
U A
U
Of Cancer
U
A
A A
L= 0.64
A A A
UU U A A A
U A
A
A
A
R= 0.64
A U
A
U A U
U A
U
U U U
U
U A
IS 2= 0.69-0.667*0.64 - 0.333*0.64
= 0.05
44
Choice of Best Split
Variable
Sex
Family History
Goodness of Split
0.00
0.05
45
Stopping the Growth of a Tree
• Minimum size of a node to split
• Minimum size of a daughter node after
a split
• Misclassification cost: no more splits if
no gain
46
Pruning a Tree
• Minimum Error: Prune off branches such
that subtree has minimum CV error
• 1-SE Rule: Prune off branches such that
subtree has CV error less than but not
exceeding Rˆ  SE

• Alternative Pruning Rules
47
Alternative Pruning Rule
•Tree is allowed to overgrow
•At each node, OR value is calculated from the test of
Ho: OR=1.0 versus Ha: OR>1.0.
Parent Node
NA Affected
NU Unaffected
Daughter Node 1
NA1 Affected
NU1 Unaffected
Vi  k Vi > k
Aff
UnAff
NA1
NA2
NU1
NU2
N A1 NU 2
OR 
N A 2 NU 1
Daughter Node 2
NA2 Affected
NU2 Unaffected
48
Alternative Pruning Rule
•The natural log of the odds ratio, ln(OR), follows a
normal distribution with a mean of ln(1) = 0
•At each node, we can calculate a standard normal
variate given by
ln OR 
Z
SE
49
Overgrown Tree
Prune if max Z < Z.01=2.32
Pruned branch
OR=1.85
Z =2.53
max Z =8.23
OR =1.90
OR =1.82
Z =1.34
maxZ=2.00
Z =2.93
max Z =8.23
OR =2.00
Z =1.58
max Z =1.58
OR =1.15
OR =0.99
OR =4.00
Z =2.00
max Z =2.00
Z =0.10
max Z =0.10
Z =6.00
max Z =8.23
OR =1.30
OR=1.01
Z =1.96
max Z =2.00
Z =0.80
max Z =0.80
OR =1.3
OR =6.10
Z =1.16
max Z =1.20
Z =8.23
max Z =8.23
OR =1.35
OR =1.05
OR =1.1
OR =1.5
Z =2.00
max Z =2.00
Z =1.00
max Z =1.00
Z =0.20
max Z = 0.20
Z =1.20
max Z =1.20
50
Application of Tree Models
• Data Exploration
• Subgroup Selection
– stratification
– most informative group
– risk groups
•
•
•
•
•
Variable Selection
Identify Interactions among Covariates
Model Assessment
Parsimony
Conversion of Continuous Covariates to
Categorical
• Include Individuals with Missing Data in
Modeling
51
Lung Cancer Risk
1816/1651
No reported
emphysema diagnosis
Self-reported
emphysema diagnosis
155/350
OR=2.91
2.37-3.58
1661/1301
No asbestos
exposure
Asbestos
exposure
Repair<4.59
Repair12.94
4.59Repair<12.94
1186/795
Repair11.46
Repair<11.46
131/43
1055/752
OR=2.17
1.52-3.10
Self- reported
Hay fever
101/69
475/506
OR=1.69
1.21-2.37
12/10
No reported
Hay fever
374/437
OR=1.69
1.21-2.37
143/321
OR=3.14
1.29-7.68
0/19
‡ OR=636.1
29.33-Infinity
Node Legend
Green Nodes are Reference
* No. of controls/no. of cases
† ORs adjusted for age, sex, and ethnicity
‡ OR and 95% CI from exact logistic regression
52
Lung Cancer Risk
1816/1651
No reported
emphysema diagnosis
Self-reported
emphysema diagnosis
155/350
OR=2.91
2.37-3.58
1661/1301
No asbestos
exposure
Asbestos
exposure
Repair<4.59
Repair12.94
4.59Repair<12.94
1186/795
Repair11.46
Repair<11.46
131/43
1055/752
OR=2.17
1.52-3.10
Self- reported
Hay fever
475/506
OR=1.69
1.21-2.37
12/10
No reported
Hay fever
374/437
OR=1.69
1.21-2.37
101/69
143/321
OR=3.14
1.29-7.68
0/19
‡ OR=636.1
29.33-Infinity
Node Legend
Green Nodes are Reference
* No. of controls/no. of cases
† ORs adjusted for age, sex, and ethnicity
‡ OR and 95% CI from exact logistic regression
Repair12.94
4.59Repair<12.94
Repair<4.59
OR=1.18
0.98-1.44
OR=1.18
1.01-1.37
53
Application of Tree Models
• Data Exploration
• Subgroup Selection
– stratification
– most informative group
– risk groups
•
•
•
•
•
Variable Selection
Identify Interactions among Covariates
Model Assessment
Parsimony
Conversion of Continuous Covariates to
Categorical
• Include Individuals with Missing Data in
Modeling
54
Random Forests
1. Building Phase
Select m<M variables at random
default: RF software is m=sqrt(M)]
Select n<N of the subjects at random
2. Grow tree completely; do not prune.
55
Random Forests
3. Testing Phase:
•Use the data not selected to build the tree (out
of box “OOB” data) to test the tree
– Take j to be the class that got the most
votes every time the case was OOB
– OOB error estimate = proportion of
times j is not equal to the true class of
the case is averaged over all cases
56
Variable selection in Random Forests model with 782 participants and 36 variables
Pack Years
57
Scree Plot of the Random Forests model with 782 participants and 36 variables which
include 8 pseudo variables of pack year showing random forest identifies important but
Colinear variables.
58
ROC curve of Random Forests model
with 7 variables and 470 participants
59
Multifactor Dimensionality Reduction
(MDR)
• Uses combinatorial partitioning approach.
• It is a non-parametric and genetic model free alternative
to logistic regression.
• In this procedure, the multilocus genotypes are pooled
into high-risk and low-risk groups. As a result, the
genotype predictors are reduced from n dimensions to
one dimension.
60
1. The data are divided into a training set and testing set
61
2. A group of factors are selected from the factor list
62
3 & 4. In each multifactor cell, the ratio of cases to controls in
training set is calculate; cells are labeled ‘high risk’ or ‘low risk’
63
5 & 6. The best factor model is picked after all possible combinations
of number of factors for a given model are tested on how well cases
and controls in testing set are classified
64
Concepts behind MDR
• ‘Best’ Model Selection (based on cross-validation)
– Testing accuracy: how likely a model is to generalize to
independent data
– CV consistency: the number of times a particular set of
attributes are identified across the CV subsets
– Parsimony: if there is a confusion of what the best model
is, select the smaller model
• Imputation (separate program)
– Simple imputation to impute missing data
65
MDR Output
• Dendogram (on latest version of MDR)
X2
X6
Red – Synergy
Orange
Brown
Green
Blue - Redundancy
X1
X8
66
MDR Output
• Graphical Model
X1
0
1
2
47
39
0
12
16
19
13
48
47
51
32
X8 1
20
1
22
2
16
13
0
0
4
67
MDR Output
• If-Then Rules
IF X1 = 0 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 0 AND X8 = 1 THEN CLASSIFY AS 1.
IF X1 = 0 AND X8 = 2 THEN CLASSIFY AS 1.
IF X1 = 1 AND X8 = 0 THEN CLASSIFY AS 1.
IF X1 = 1 AND X8 = 1 THEN CLASSIFY AS 1.
IF X1 = 1 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 2 AND X8 = 0 THEN CLASSIFY AS 1.
IF X1 = 2 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 2 AND X8 = 2 THEN CLASSIFY AS 0.
68
Esophageal Cancer
•210 cases with histologically confirmed disease recruited
from UT-MDACC
•Presented with resectable adenocarcinoma or squamous
cell carcinoma of the esophagus or gastroesophageal
junction
•Patients were treated with preoperative chemoradiation
(platinum/5-FU) followed by esophagectomy
•Study endpoints were recurrence and survival
69
Esophageal Cancer
•5-Fu Pathway
–Folate metabolism pathway.
MTHFR Glu429Ala
MTR Asp919Gly
TS a+227G
MTHFR Ala222Val
TS C157T
•Cisplatin Pathway
–Genes that potentially modulate cisplatin efficacy and toxicity.
MDR1 C3435T
MDR1 Ala892Ser
GSTP Ile105Val
GSTP Ala114Val
MPO T-764C
P53 G/A
P53 Pro72Arg
FAS G-670A
FASL C-844T
NQO1 Pro187Ser
•Nucleotide Excision Repair (NER) Pathway
–NER is a major cellular system for repairing platinum-induced DNA adducts.
XPA (5’ UTR)
XPC Lys939Gln
XPD Lys751Gln
XPG1104
ERCC1 3’ UTR
ERCC6 Met1097Val
ERCC6 Arg1230Pro
CCNH Val270Ala
RAD23B Ala249Val
70
MDR Results
•Recurrence
•Genes involved: CCNH Val270Ala, MDRE21910, TS227
•Testing Accuracy: 66%
•Cross Validation: 100%
•HR (95% CI): 3.66 (1.76-7.62)
•Survival
•No higher order interaction detected
71
Caveats to MDR: Interpretation of
‘Interaction’
1 Locus Model
IF X1=0 THEN CLASSIFY AS 1
IF X1=1 THEN CLASSIFY AS 0
IF X1=2 THEN CLASSIFY AS 0
2 Locus Model
IF X1 = 0 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 0 AND X8 = 1 THEN CLASSIFY AS 1.
IF X1 = 0 AND X8 = 2 THEN CLASSIFY AS 1.
IF X1 = 1 AND X8 = 0 THEN CLASSIFY AS 1.
IF X1 = 1 AND X8 = 1 THEN CLASSIFY AS 1.
IF X1 = 1 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 2 AND X8 = 0 THEN CLASSIFY AS 1.
IF X1 = 2 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 2 AND X8 = 2 THEN CLASSIFY AS 0.
72
Caveats to MDR
• Interpretation of ‘Interaction’
3 Locus Model
IF X1 = 0 AND X6 = 0 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 0 AND X6 = 0 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 0 AND X6 = 0 AND X8 = 2 THEN CLASSIFY AS 1.
IF X1 = 0 AND X6 = 1 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 0 AND X6 = 1 AND X8 = 1 THEN CLASSIFY AS 1.
IF X1 = 0 AND X6 = 1 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 0 AND X6 = 2 AND X8 = 0 THEN CLASSIFY AS 1.
IF X1 = 0 AND X6 = 2 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 0 AND X6 = 2 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 1 AND X6 = 0 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 1 AND X6 = 0 AND X8 = 1 THEN CLASSIFY AS 1.
IF X1 = 1 AND X6 = 0 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 1 AND X6 = 1 AND X8 = 0 THEN CLASSIFY AS 1.
IF X1 = 1 AND X6 = 1 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 1 AND X6 = 1 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 1 AND X6 = 2 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 1 AND X6 = 2 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 1 AND X6 = 2 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 2 AND X6 = 0 AND X8 = 0 THEN CLASSIFY AS 1.
IF X1 = 2 AND X6 = 0 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 2 AND X6 = 0 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 2 AND X6 = 1 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 2 AND X6 = 1 AND X8 = 1 THEN CLASSIFY AS 0.
IF X1 = 2 AND X6 = 1 AND X8 = 2 THEN CLASSIFY AS 0.
IF X1 = 2 AND X6 = 2 AND X8 = 0 THEN CLASSIFY AS 0.
IF X1 = 2 AND X6 = 2 AND X8 = 2 THEN CLASSIFY AS 0
73
Models for Gene-Environment
Interaction
74
75
Definitions
• Reaction norm: function relating mean phenotypic response of a
genotype to change in the environment
• Labile trait: a phenotype that can react to the environ• ment
• Nonlabile trait: a phenotype that, during development, becomes
fixed
• Both can display gene x environment interactions but measuring
g x e interactions can be done for labile traits by using same
individuals repetitively in different environments
• For nonlabile traits different individuals must be studied in different
environments
• Phenotypic stability: phenotype is similar in different environments
• Phenotypic plasticity: phenotype varies across environment
76
Effects of Gene-Environment
Interactions
• Falconer suggests treating the phenotype in 2
different environments as 2 different traits.
• Then, since (when) the genetic background is the
same for both environments, the correlation of
phenotype in the two environment is 1 for a model
with no interaction, < 1 if there is interaction
Assumptions :
• either work with a labile trait and measure same
individuals in 2 environments
• or work with inbred lines so that all animals are
genetically identical if different animals are
measured in different environments
• or assume only a finite number of measurable
genotypes affect trait and these are measured
77
78
G x E interactions for qualitative
traits/diseases
• - usually analyzed in form of case/control
data where G is measured genotype, E is
some environmental exposure
• Set up two cross classification tables
79
• Tools for sample size estimation to detect
Interactions –
Power program – developed by Garcias-Closas
and Lubin http://dceg.cancer.gov/POWER/
Quanto – developed by James Gauderman
http://hydra.usc.edu/gxe/
80
Results from running Power
Sample sizes to detect a 2 –fold interaction effect,
Main effects (marginal effect is 2.4), exposure frequency is 50%
For each exposure, equal number of cases and controls
NB for alleles have to double sample sizes
81
Summaries
• Methods for gene-gene and geneenvironment interaction analysis are
widely available
• Results may depend subtly on (unstated)
assumptions
• Epistasis is likely to be a key feature in
disease development and expression of
many traits
82