The identification of human quantitative trait loci
Download
Report
Transcript The identification of human quantitative trait loci
The Identification of Human
Quantitative Trait Loci
Dr John Blangero
Southwest Foundation for Biomedical Research
ChemGenex Pharmaceuticals
The Goals: Genetic Analysis of Complex Phenotypes
QTL Localization
Where in the genome is the QTL located?
QTL Identification
What is (are) the gene(s) involved?
QTL Allelic Architecture
What are the specific QTNs? How many QTNs?
What are their frequencies and effect sizes?
Quantitative Traits
Usually closer to gene action than
disease itself.
Have superior statistical power.
Quantitative Endophenotypes
Heritable
Genetically correlated with
disease or other focal phenotype
Closer to the action of the genes
Liability: The Threshold Model
The process of finding and
identifying disease-related genes
involves
Objective Prioritization.
Different Diseases
Different Designs
Different Methods
Family Studies
vs
Studies of Unrelateds
Major Study Designs in Human Genetics:
Possible Inferences
Design
Unrelated individuals
Triads
Sibling pairs
Nuclear families
Extended pedigrees
Inference:
Heritability Linkage Association
No
No
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
You can exploit:
Linkage and Association
Information Jointly in
Family Studies
Relative Per-Subject Power to Localize QTLs
Population
Study
Relative
Efficiency
Jirel (Nepal)
Vermont
SAFHS
GAIT
Framingham
Nuclear (4 sibs)
Nuclear (3 sibs)
Sib-pair
1.00
0.91
0.59
0.35
0.24
0.17
0.11
0.04
Ped.
Size
2300
331
31
19
5
6
5
2
Pedigree
Type
Extended (isolate)
Extended
Extended
Extended
Extended, nuclear
Nuclear
Nuclear
Relative pair
Linkage Designs
vs
Association Designs
Power: Linkage vs Association
Example 1: Positional Candidate Genes
QTL for serum leptin levels in the San
Antonio Family Heart Study
Highly replicated QTL
Chromosome 2 Obesity QTL
Mb Coordinate
45.6
44.6
44
42.9
42.1
40.3
39.1
38.8
38.2
37.4
36.6
33.3
32.4
31.5
30.4
29.1
28.6
27.8
27.6
1000
27.5
27.4
27.3
26.9
26.4
26.2
25.2
24.8
24.3
21.3
Hitscore
Bioinformatic Prioritization: GeneSniffer Results 2p22
7000
POMC
6000
5000
4000
3000
2000
GCKR
UCN
0
What Do You Do With A Good Positional
Candidate Gene?
The ALL or NOTHING principle
Find all of the variation in the gene.
Preference: Resequence everyone (no
bias against rare variants)
Alternative: Resequence a subset of
individuals
POMC: Pattern of LD
POMC QTN Analysis: Marginal Associations
How To Find the Most Likely Functional SNPs
Bayesian Quantitative Trait Nucleotide Analysis has the potential to aid
the discovery of the DNA variants that influence risk of common
disease. Objectively prioritizes SNPs for further functional work.
BQTN Analysis: Bayesian Model
Selection/Model Averaging
Evaluate possible models of gene action. This may be very
large, 2n models of additive gene action.
Use Bayesian model selection to choose best models and
average parameters over models. Eliminates problem of
multiple testing.
Yields unbiased estimates of effect size.
Allows prioritization of polymorphisms for further lab
evaluation. Calculation of Posterior Probability of Effect.
Sequential
Oligogenic
Linkage Analysis
Routines
All analyses were
performed using a
parallel version of
SOLAR on up to
1,500 processors.
For more information on SOLAR, follow the ‘software’ links at: http://www.sfbr.org
BQTN analysis of POMC polymorphisms
Three variants account for 11% of variation in
leptin levels.
The frequencies of these variants are: 0.005,
0.004 and 0.06.
LD with any other SNPs is very low: 0.075,
0.248 and 0.189.
It would be VERY HARD to find these by LD.
Linkage Conditional on POMC SNPs
Marginal
LOD=5.86
Conditional
LOD=3.05
What Do You Do With A Good
Positional Candidate Region?
The ALL or NOTHING principle
Find all of the variation in the region, say
5 – 10 Mb.
Preference: Resequence everyone (no
bias against rare variants).
This can be done NOW! It is the wave of
the future. Don’t waste time with LD. It is
your ENEMY.
Example 2: Identifying Human QTLs Quickly
Expression phenotypes that are cis-regulated
should be much easier to quickly identify
functional variants and correlate them with
disease risk.
Gene Expression Levels as Endophenotypes
Quantitative variation in gene expression
levels explains some proportion of the
variation in many phenotypes.
The amount of mRNA of a specific transcript
in a tissue sample is about as “close to gene
action” as possible; hence, such phenotypes
ought to be dissectible by statistical genetic
approaches.
Array-based technologies make it feasible to
quantify the expression levels of many
transcripts simultaneously.
Project Description
San Antonio Family Heart Study (SAFHS) designed in
1991 to investigate the genetics of CVD in Mexican
Americans
Includes 1,431 individuals from 42 families
2 recalls since 1991
Extensive phenotypic data
anthropometry, blood pressure, lipids, obesity, diabetes,
inflammation, oxidative stress, hormones, osteoporosis, brain
structure/function
Genome scanned
Methodology
Blood samples collected from first SAFHS examination
approx 15 years ago
Lymphocytes isolated from blood and stored in RPMI-C
media in liquid nitrogen
RNA extracted and expression profiles generated on stored
lymphocytes
47,289 transcripts interrogated using the Illumina platform
Detection Statistics
1,280 samples analyzed, good data from 1,240 (~97%)
Of the 47,289 transcripts per array, we significantly
detected 20,413 transcripts.
0.1
1.0
0.09
0.9
0.08
0.8
0.07
0.7
0.06
0.6
pdf
cdf
0.05
0.5
0.04
0.4
0.03
0.3
0.02
0.2
0.01
0.1
0
0.0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
heritability estimate
0.8
0.9
1
cdf
probability mass function
Heritabilities of Autosomal RefSeq Transcripts
Cis-Regulated Expression QTLs
FDR
Expected
Number
850
P-value cutoff
0.01
Significant
Transcripts
859
0.05
1238
1177
0.0031
0.10
1569
1412
0.0080
0.25
2620
1966
0.0333
0.30
3000
2102
0.0480
0.0004
Identifying Novel Candidate Genes for Disease Risk
After determining cis-regulated QTLs, look for
correlations with phenotypes related to disease risk
Transcriptomic Epidemiology—using high dimensional
endophenotypic search
For example, 383 cis-regulated transcripts are
significantly correlated with BMI (an index of obesity).
Many of these are novel genes of unknown function.
Expression QTLs: LOD > 3
Type of QTL
Number of QTLs
Mean LOD
Cis
693
9.46
Trans
1,325
3.53
Approximately, 34% of QTLs are Cis.
Effect size (QTL-specific heritability) is 64% larger for Cis QTLs.
Cis Regulation UTS2 (urotensin 2 preprotein)
Cis and Trans Regulation HBG2 (G-gamma globin)
Trans Regulation LOC389472
Mitochondrial QTLs Influencing Expression
FDR
Expected
Number True
16
P-value cutoff
0.01
Significant
Transcripts
16
0.05
35
33
0.000076
0.10
73
66
0.00035
0.25
159
127
0.00146
0.30
251
176
0.0037
0.000008
Identification of Human QTLs: Example 3
QTL influencing inflammatory response
A novel positional candidate gene
(SEPS1/SELS) found by expression
studies in an animal model
SEPS1 Gene Discovery
SEPS1 (formerly known as Tanis) was first identified
by differential gene expression in liver of diabetic P.
obesus
Putative functions related to ER stress response
through processing and removal of misfolded proteins
(Ye et al (2004). Nature 429, 841-847)
SEPS1 Gene Discovery
Human SEPS1 gene is located on 15q26.3
Mammalian plasma membrane selenoprotein & also a
member of the GRP family
Consists of 6 exons, encodes a 204aa protein
15q26 region shown to contain QTLs influencing
inflammatory disorders:
Zamani et al (1996). Hum Genet 98, 491-6.
- Field et al (1994). Nat Genet 8, 189-94.
- Blacker et al (2003).Hum Mol Genet 12, 23-32.
-
-
Susi et al (2001). Scand J Gastroenterol 36, 372-4.
Mahaney et al (2005) Unpublished.
SEPS1 Variant Identification
Sequenced 9.3kb including putative promoter, exons,
introns and conserved regions in 50 individuals from
three different ethnic populations
16 variants genotyped in cohort of 522 Caucasian
individuals from 92 families
Plasma levels of IL-1, IL-6 and TNF- measured
Results analyzed for association using SOLAR
Association Analysis
IL-1
IL-6
TNF-
BQTN Analysis
BQTN analysis strongly supported a model in which
the G-105A SNP was responsible for the observed
associations with estimated posterior probabilities of
>0.999, 0.95, and 0.79 (for TNF-, IL-1, and IL-6
respectively)
Analysis indicates the G-105A SNP is of direct
functional consequence (or is highly correlated with a
functional variant)
Analysis performed to test the functionality of this G105A variant
Effect of A or G variant on SEPS1 promoter activity under
Tunicamycin stress conditions
Basal
Promoter activity
(fold change in luc activity over basal)
Tunicamycin
2.5
P = 0.00006
2
1.5
1
0.5
0
A variant
G variant
Physiological Role of SEPS1
Cytokine production,
Apoptosis
PM
Activation of JNK,
caspase12, NFkB
Cytoplasm
Poly-Ub
ER
ER lumen
lumen
Misfolded
protein Derlin-1
p97
SelS
26S proteasome
PM
chaperone
Proteins
Folded
protein
chaperone
Secretion
Golgi
mRNA
Cell survival
Nucleus
Exploring the Effects of the SEPS1 G-105A QTN
Looked at the in vivo effects of SEPS1 G-105A QTN on
expression levels of SEPS1 and genes in the following
Gene Ontology categories:
Endoplasmic Reticulum
Unfolded Protein Response
Golgi Stack and Protein Transportation
Oxidative Stress
SEPS1 Expression is Correlated With Disease In Vivo
Phenotype
P-value
2 Hr Glucose
Direction of
Correlation
Diabetes Risk
0.050
BMI
0.0006
Relative Fat
0.032
Triglycerides
0.0023
0.027
SEPS1 G-105A QTN Influences Expression In Vivo
SEPS1 transcript is cis-regulated (as defined by
quantitative trait linkage analysis).
The rare A variant is associated with decreased
expression in lymphocytes (p = 0.032).
SEPS1 G-105A Associated Genes
Gene
Correlation
p-value
Function
CSX2001
0.00096
Transcript. repressor
CSX2002
0.0022
Golgi traffic w/ER
CSX2003
0.0022
Golgi traffic w/ER
SEPN1
0.0025
ER stress
GRP94
0.0034
ER unfolded protein resp
GLUT3
0.0055
Oxidative stress
STX6
0.0068
TFNalpha secretion
Acknowledgements
Southwest Foundation for
Biomedical Research
Joanne Curran Eric Moses
Matt Johnson
Catherine Jett
Tom Dyer
Shelley Cole
Harald Göring Jean MacCluer
Charles Peterson
Tony Comuzzie
Laura Almasy
ChemGenex Pharmaceuticals
Jeremy Jowett
Greg Collier
Special thanks to the Azar family of San Antonio for their financial support
of our research