Transcript Haplotype
Why this paper
• Causal genetic variants at loci contributing to
complex phenotypes unknown
• Rat/mice model organisms in physiology and
diseases
• Relevant to our work
– Integration of GWAS of different traits
– Interpretation of human GWAS
Advantages of genetic mapping using
heterogeneous stocks
• Accuracy of QTL mapping to Mb resolution
• WGS imputation from progenitor genomes
• Haplotypes well defined
– Single SNP vs haplotype (spatial) association
– Difficult in humans, large #of rare/unknown
haplotypes
Design
AJ
AKR
B a lb
C 3H
C 57
DBA
IS
R III
Sequencing
HS
R a n d o m B re e d in g
H S G e n e ra tio n > 6 0
Reconstruction of rat genomes as mosaic of founder haplotypes
based on 265,551 SNPs (“sequence imputation”)
Genotypes
• 1,407 phenotyped NIH-HS animals
• 198 parents (~14.2 litter size)
• RATDIV genotyping array (13 inbred strains)
– 803,485 SNPs
– 560,000 segregating in NIG-HS
– 265,551 used for haplotype reconstruction
• Sequencing of founder samples
– Number ?
– 22x coverage
Phenotypes
• 160 measurements
Sequencing
• 7.2M SNP
• 633,000 indels
• 44,000 structural variants
Sequencing
• False Positives
• 2.7% SNP
• 2.2% indels
• 16.7% structural variants
• False Negatives
• 17.2% SNPs
• 41.4% indels
• 65% structural variants
Nucleotide diversity in NIH-HS
progenitors
• Similar diversity between strains
Nucleotide diversity in NIH-HS
progenitors
• Similar diversity between strains
• 29% SNP private to particular strain
– Unique haplotypes relatively common
• Regions of low diversity are small (~400 kb)
Genotyping
QTL mapping
• Reconstruction of rat genomes as mosaics of
founder haplotypes
– R HAPPY
Svenson K L et al. Genetics 2012;190:437-447
QTL mapping
• Reconstruction of rat genomes as mosaics of
founder haplotypes
– R HAPPY.
– Mixed Linear Model (EMMA, normal phenotypes)
Expected number of haplotypes
random effect
Haplotype from strain s at locus l
– Resample model averaging (BAGPHENOTYPE,non-normal)
• Non-parametric bootstrap aggregation (bagging)
QTL mapping
Haplotype
Strain
A B C
-----------------------------y1 =
2 0 0
y2 = 0 2 0
y3 =
0 1 1
QTL results
• 355 QTLs for 122 phenotypes (avg. 2.9)
QTL results
QTL results
Merge analyses
Haplotype (1)
Strain
A B C
-----------------------------y1 =
2 0 0
y2 =
0 2 0
y3 =
0 1 1
Strain distribution pattern (SDP)
ABC = 0 0 1
ABC = 1 0 0
Sequence variants
A B C
Strain
CC CC TT
-----------------------------SDP
0 0 1
Merge analyses
Haplotype (1)
Sequence variants
Strain
A B C
-----------------------------y1 =
2 0 0
y2 =
0 2 0
y3 =
0 1 1
Strain
CC CC TT
-----------------------------y1 =
2 0 0
y2 =
0 2 0
y3 =
0 1 1
Merge model (2)
• (2) Sub model (1)
• if QTL == single variant
• R2(2)~R2(1)
• [logPmerge – logPhaplotype] > 0
Strain
C T
-----------------------------y1 =
2 0
y2 =
2 0
y3 =
1 1
Merge analyses
• 343 QTLs
– 131 (38%) at least 1 candidate variant
• Increased resolution
– 90% of variants ruled out, d <0
– Candidates in coding regions affecting protein
structure more likely to be causal
– Eliminates candidate genes that are distant from
candidate variant
Merge analyses (examples)
• 3 QTL for patelet aggregation
Merge analyses (examples)
• Candidate variant in single gene
Merge analyses (examples)
• Candidate variant in coding region
Merge analysis
• Single variants rarely account for QTL effects
– 212 (68%) QTL had no candidate variant
• Possible reasons
– Causative variants missed in sequencing
– QTL mapping biased towards QTL without
candidate variants
– Merge underestimates statistical significance
– Multiple causal variants
Merge analysis
– Causative variants missed in sequencing
• Simulation of all possible SDPs for di-tri-allelic SNPs and
merge analysis
• 168 (49%) would still have no causative variant
– Simulation different QTL architectures
• Single variants
• Multiple variants within gene, multiple variants linked
loci
• Haplotype effects/ no individual variants
Merge analysis
– Simulation of causal variants
Merge analysis
• Haplotype mapping overestimates QTL without
causative variant (?)
• Merge analysis underestimates number of QTL
without causative variant (?)
– Multiple causative variants
Concordance between species
• 38 measures common between NIG-HS and mice HS
• Orthologous rarely contribute to the same
phenotype
Concordance between species
• 38 measures common between NIG-HS and mice HS
• Orthologous rarely contribute to the same
phenotype
• KEGG pathways for QTL associated genes in rat in
mice only significantly enriched for “proportion of B
cells”)
Discussion
• Combining sequence with mapping data can identify candidate
loci
• 50% of QTL can not be attributed to single causal variant
– Multiple causal variants, more complex models required
– Rat QTL similar to Trans eQTL
• Not possible to accurately asses overlap between species
– limited power of pathway analysis
– limited power from comparing phenotypes (within species?)
– Variants in orthologous genes rarely contribute to same phenotype