Gene expression variation and eQTL mapping studies in humans

Download Report

Transcript Gene expression variation and eQTL mapping studies in humans

An integrative genomics approach to
infer causal associations between gene
expression and disease
Schadt, E. E., Lamb, J., Yang, X., Zhu, J., Edwards, S., Guhathakurta, D.,
Sieberts, S. K., Monks, S., Reitman, M., Zhang, C., Lum, P. Y., Leonardson, A.,
Thieringer, R., Metzger, J. M., Yang, L., Castle, J., Zhu, H., Kash, S. F., Drake, T.
A., Sachs, A., and Lusis, A. J.
Nature Genetics (37): 710-717
Speaker: Yen-Yi Ho
Advisor: Giovanni Parmigiani
Department of Biostatistics, Johns Hopkins University
Outline
• Introduction
– Background & Definitions
– Scientific Questions
• Previous eQTL Studies
– Gene Expression Data in Humans
– Statistical Analytic Approaches
– Results
• Schadt et al. 2005: An Integrative Approach
– Causality Models
– Application: Gene Expression in BXD Mice
– Results from Application
• Discussion of New Approach
QTL (Quantitative Trait Locus)
L
D
Genetic locus (QTL; L), Disease (D)
• More than 1000 monogenic Mendelian diseases
controlling genes have been identified using
traditional gene mapping approach.
• Multiple genes, environmental factors, and
interactions have limited the successes in human
complex traits (such as cancer, diabetes, asthma).
Introduction
We have more information…
DNA
Genotype Data
(SNP polymorphism)
mRNA
Gene expression Data
Expression QTL (eQTL)
Goal : Identify genomic locations where
genotype significantly affects gene expression.
Cis-, trans- , master trans- eQTLs
cis- eQTL
trans- eQTL
master
trans- eQTL
Constructing regulatory networks (hypothetical example)
Genetic locus
Expression
1.
2.
3.
4.
5.
6.
1 (B) = cis
2 (A) = cis controlled by 1 (B)
No controls
4(D) = cis controlled by 3 (F)
Not a cis, controlled by 1 2 4 3
Not a cis, controlled by all
Jansen, R.C. & Nap, J.P. (2001)
Trends Genet, 2001, 17, 388-391
Scientific Questions
• What is the variation and heritability of gene expression?
• Are there associations between genetic loci and target
gene expression?
• What is the proportion of cis-/trans-eQTLs?
• How do we verify of cis-?
• Are there any master trans-eQTLs?
• Annotation and functional categories do cis-, trans- and
master trans-eQTLs (KEGG, GO,… ).
Genetic locus
Expression
Scientific questions and goals
• Transcript abundance may act as intermediate
phenotype between genetic loci and the clinical
phenotype.
Secondary goal
• Incorporate information of genotype, expression,
and clinical traits together to construct regulatory
networks and to improve understanding of
disease etiologies.
Data
The data
• They all used lymphoblastoid cell line from
CEPH families to measure expression.
Differences
• 1. Selected different expression traits
• 2. Platforms to measure expression /
preprocess
• 3. SNP markers density
• 4. Different statistical approaches.
Statistical methods of human eQTL mapping study
Linkage
• Nonparametric linkage analysis
1. Sib-pair analysis for quantitative trait (ASP)
2. Variance component analysis (VC)
Association (Linkage disequilibrium)
• Family-based association analysis (QTDT)
• Population-based association analysis (GWA)
Generally, the resolution of association approach would be
greater than linkage.
Comparison of resolution between linkage and association analysis
Literature Review
Literature review
Genes with between / within individual variation > 1
Heritability
None
Literature Review
Literature Review
eQTL findings from previous studies
• Hit rate: The proportion of expression traits
significantly linked to eQTLs (range from
0.8-4%)
• Proportion of cis-eQTL is about 30 %
• 2 master trans-eQTLs were identified
Literature Review
Master trans-eQTLs
14q32
20q13
Literature Review
An Integrative Approach:
Schadt et al., Nature Genetics, 2005
Genetic locus
Expression
A integrative approach
• Models for causality
– Causal Model
L
mRNA
Disease
– Reactive Model
L
Disease
mRNA
– Independent Model
mRNA
L
Disease
New approach
M1 Likelihood
L
L: Genotype
R: mRNA level
D: Disease
Disease
mRNA
• Causal Model
– Joint Probability
p( L, R, D)  p( L) p( R | L) p( D | R)
– Likelihood
p(D|R, L)=p( D|R)
N
3
L( | M 1)    p( L j ) L(ri | L j ) L(di | ri )
i 1 j 1
L( r | L) 
L( d | r ) 
1
2 R
exp{
1
2 D|R
(r  R|L )2

exp{
2
R
}
( d   D| R ) 2

2
D| R
}
M2 Likelihood
L
Disease
L: Genotype
R: mRNA level
D: Disease
mRNA
• Reactive Model
– Joint probability
P( L, R, D)  P( L) P( D | L) P( R | D)
– Likelihood
N
3
L( | M 2)    p( L j ) L(di | L j ) L(ri | di )
i 1 j 1
L(d | L) 
L( r | D) 
1
2 D
1
exp{
2 R|D
(d   D ) 2
exp{
p(R|D, L)=p( R|D)
}

( r   R| D ) 2
2
D

2
R| D
}
M3 Likelihood
mRNA
L
L : Genotype
R: mRNA level
D: Disease
Disease
• Independent Model
– Joint Probability
P( L, R, D)  P( L) P( R | L) P( D | R, L)
– Likelihood N 3
L( | M 3)    p( L j ) L(ri | L j ) L(di | ri , L j )
i 1 j 1
L(r | L) 
L ( d | R, L ) 
1
2 R2
exp{
1
2 D|R
( r   R| L ) 2
exp{

2
R
}
(d   D|RL ) 2

2
D| R
}
Model Selection
• Likelihood-based Causality Model
Selection (LCMS)
– Calculating the Likelihood based on the data.
– The model best supported by the data :
smallest AIC (Akaike Information Criterion)
AIC=-2ln L(ˆ)  2 p
Simulation study
L
Ti    Li  
T1
RL2,T1
RT21 ,T2
RL2,T2
The model with an AIC significantly smaller than the AIC’s
of the competing models was noted.
Application to BXD mice data
The data
BXD mice: F2 offspring from C57BL/6J (B6) and DBA/2J (DBA).
•
•
C57BL/6J: ob mutation in the C57BL/6J mouse background (B6-ob/ob) causes
obesity, but only mild and transient diabetes (Coleman and Hummel, 1973).
DBA/2J: mice show a low susceptibility to developing atherosclerotic aortic
lesions
Gene expression
• Liver extracted at 16 months of age
• 23,574 gene expression measured using Agilent arrays
Genetic loci
• 139 autosomal genetic loci (microsatellite markers, 13 cM)
Disease
• Omental fat pad mass (OFPM) trait
New approach
?
Filtering
L
Disease
mRNA
?
?
• Identify 4 candidate regions for OFPM traits
chr1 at 95cM, chr6 at 43 cM, chr9 at 8cM, chr19 at 28cM.
• Expression traits significantly correlated with OFPM
440 intermediate expression traits were selected (P<0.001)
• Expression trait with significant linkage eQTLs at the
candidate regions.
113 expression trait and 267 eQTLs are identified
• Perform LCM model selections for the 113 expression traits
and ranked the expression traits by percent genetic
variation in OFPM causally explained by traits.
Results from Application
Zfp90: zinc finger protein 90
Hsd11b1: 11-beta hydroxysteroid dehydrogenase isoform 1
C3ar1: complement component 3a receptor 1
Tgfbr2: transforming growth factor, beta receptor II
C3ar1 -/- Knockout mice
(n=5-7)
10 weeks of age
Tgfbr2 +/- Knockout mice
(n=5-7)
Discussion
L
mRNA
Disease
• Fail to discriminate highly correlated traits.
• Multiple filtering steps are involved.
• Need more development if try to
automatically apply to general data sets.
• Measurement error of mRNA exceed D
• Advantage of constructing eQTL networks
Disease
L
is less likely.
Reference
•
•
•
•
•
•
•
Morley, M.; Molony, C.M.; Weber, T.M.; Devlin, J.L.; Ewens, K.G.; Spielman, R.S. &
Cheung, V.G., Genetic analysis of genome-wide variation in human gene expression.
Nature, 2004, 430, 743-747
Monks, S.A.; Leonardson, A.; Zhu, H.; Cundiff, P.; Pietrusiak, P.; Edwards, S.; Phillips,
J.W.; Sachs, A. & Schadt, E.E., Genetic inheritance of gene expression in human cell
lines. Am J Hum Genet, 2004, 75, 1094-1105
Cheung, V.G.; Spielman, R.S.; Ewens, K.G.; Weber, T.M.; Morley, M. & Burdick, J.T.
Mapping determinants of human gene expression by regional and genome-wide
association. Nature, 2005, 437, 1365-1369
Stranger, B.E.; Forrest, M.S.; Clark, A.G.; Minichiello, M.J.; Deutsch, S.; Lyle, R.; Hunt,
S.; Kahl, B.; Antonarakis, S.E.; Tavar?, S.; Deloukas, P. & Dermitzakis, E.T., Genomewide associations of gene expression variation in humans. PLoS Genet, 2005, 1, e78
Deutsch, S.; Lyle, R.; Dermitzakis, E.T.; Attar, H.; Subrahmanyan, L.; Gehrig, C.;
Parand, L.; Gagnebin, M.; Rougemont, J.; Jongeneel, C.V. & Antonarakis, S.E.
Gene expression variation and expression quantitative trait mapping of human
chromosome 21 genes., Hum Mol Genet, 2005, 14, 3741-3749
Jansen, R.C. & Nap, J.P., Genetical genomics: the added value from segregation.
Trends Genet, 2001, 17, 388-391
Schadt, E.E.; Lamb, J.; Yang, X.; Zhu, J.; Edwards, S.; Guhathakurta, D.; Sieberts,
S.K.; Monks, S.; Reitman, M.; Zhang, C.; Lum, P.Y.; Leonardson, A.; Thieringer, R.;
Metzger, J.M.; Yang, L.; Castle, J.; Zhu, H.; Kash, S.F.; Drake, T.A.; Sachs, A. & Lusis,
A.J., An integrative genomics approach to infer causal associations between gene
expression and disease. Nat Genet, 2005, 37, 710-717
Thank you ☺