trait - iPlant Pods
Download
Report
Transcript trait - iPlant Pods
Association Modeling With iPlant
Goals of this Section
• Familiarize with the basic concepts of
quantitative genetics:
– Traits, phenotypes, genotypes
• Understand the basics of trait mapping
• Understand the conceptual foundations of
association studies
• Lear how to perform a genome wide association
study in the iPlant Discovery Environment
– Obtain genotypes
– Run a Mixed Linear Model
Phenotype
Observable (measurable) trait (character) of an organism
Trait: eye color
Phenotype: wild type (red), white eyed, orange eyed
http://www.unc.edu/depts/our/hhmi/hhmi-ft_learning_modules/fruitflymodule/phenotypes.html
Qualitative Traits
Campbell, 8e
Controlled by One Locus
Co-segregation in Pedigree
Donahue, R. P., et al., Probable assignment of the Duffy blood group locus to chromosome 1 in man, Proceedings of the National Academy of
Sciences 61, 949-955 (1968).
Quantitative Trait
Carlos Harjes
Frequency
Trait Varies on a Continuous Scale
Trait Value
Quantitative Traits
• Probably caused by multiple loci
– Interaction effects
– Environment
If the mean trait value for individuals with
marker state MM is different from the mean
trait value of individuals with marker state mm
(i.e. the marker is associated with the
phenotype), then the marker is linked to a
quantitative trait locus.
Individuals
Markers
Trait value
Marker #6
Mean Trait Value
Marker #3
Mean Trait Value
Present
110 ± 10
Present
99 ± 5
Absent
115 ± 13
Absent
118 ± 8
Quantitative Genetics
Exploring the Genetic Architecture* Underlying
Quantitative Traits
*Genetic Architecture
• How many loci?
• Which location?
• How strong?
Tools for Statistical Genetics in the DE
Tool
Purpose
Genotype by Sequencing Workflow
Automatic pipeline for extracting SNPs from GBS data (with genome from user or from iPlant database)
UNEAK pipeline
Automatic pipeline for extracting SNPs from GBS data without reference genomes
MLM workflow
Automatic workflow for fitting Mixed Linear Model
GLM workflow
Automatic workflow for fitting General Linear Model
QTLC workflow
Automatic workflow for composite interval mapping
QTL simulation workflow
Automatic workflow for simulating trait data with given linkage map
PLINK
PLINK implementation of various association models
Zmapqtl
Interval mapping and composite interval mapping with the options to perform a permutation test
LRmapqtl
Linear regression modeling
SRmapqtl
Stepwise regression modeling
AntEpiSeeker
Epistatic interaction modeling
Random Jungle
Random Forest implementation for GWAS
FaST-LMM
Factored Spectrally Transformed Linear Mixed Modeling
Qxpak
Versatile mixed modeling
gluH2P
Convert Hapmap format to Ped format
LD
Linkage Disequilibrium plot
Structure
Estimation of population structure
PGDSpider
Data conversion tool
GLMstrucutre
GLM with population structure as fixed effect
A Model for Quantitative Traits
Phenotype
Genotype
Environment
P = GP+=EG+ +GG
e + GE
P=Phenotype
G=Genotype
E=Environment
GG=Interaction between genotypes
GE=Interaction between genotype and environment
A Statistical Model for QTLs
P=G + e
yij = b0 + b1xi + eij
yij
β0
β1
xi
εij
trait value in individual j with genotype i
population average of trait value
effect of marker i on trait value
marker genotype i
error term
General Linear Model (in matrix notation): Y=Xb + e
Note: If errors are not normally distributed, use generalized linear models
http://concord.org/publications/newsletter/2009-spring/genetics
Linkage Mapping (QTL Mapping)
• Designed population
– F2
– Recombinant inbred (RIL)
– Double-Haploid (DH)
– Back-cross (B2)
Limitation of Linkage Mapping
• Needs large number of related individuals
• Resolution limited (interval contains 100s of
genes)
• QTL position and effect are confounded
Association Mapping
• Use random collection of individuals from
natural population
• Very dense marker map = very high resolution
Linkage & Recombination
Recombination causes linkage decay
Other factors affecting LD:
• Selection (artificial or natural)
• Drift
• Mutations
• Population structure
• Demography
LD decay in Major Crops
Non-coding sites
Synonymous sites
Linkage Disequilibrium
Pitfalls: Population Structure
• Difference in allele frequencies
between subpopulations
• Due to neutral or adaptive
processes
• Can create spurious association
Population structure can produce associations
G
G
G
G
G
T
T
T
T
T
T
U.S.
Andes
10
200
8
180
Plant Height
Kernel Hue
G
6
4
2
P<<0.001
0
160
140
120
100
P=0.04
80
T
T
G
TT
G
GG
These non-functional associations can be accounted for by
estimating theNo
population
structure
using random markers.
association
within groups
Slide from Ed Buckler
• Similar effect due to presence of related
individuals (esp. in plants)
• Can be accounted for using the data:
– Estimate number of subpopulations
– Assign individuals to subpopulation
– Estimate kinship
Accounting for Random Effects: Mixed
Linear Models
• "Cost" associated with estimating a parameter
• We are not interested in the value of the parameter, only the variance
• Q-K method (structured association)
y=Xβ+Sα+Qv+Zu+e
Fixed effects:
β Vector of fixed effects
α Vector of SNPs effects
v Vector of subpopulation effects
Q
X, S, Z
Random effects:
u Vector of kinship effects
e Residuals
Matrix of population association (STRUCTURE)
Incidence Matrices
Traits
MLM
Markers
STRUCTURE
Population
Structure
Not Controlling for Population Structure and
Relatedness
TASSEL
Kinship
Obtain Markers
Genome Resequencing
Workflow
Genotyping By
Sequencing
MLM Pipeline for GWAS
Ed Buckler (Cornell University)
TASSEL
marker
filter
K
convert
impute
GLM
trait
impute
Zhang et al. Nature Genetics. 2010; doi:10.1038/ng.546
http://www.maizegenetics.net/statistical-genetics
http://www.maizegenetics.net/tassel/docs/Tassel_User_Guide_3.0.pdf
MLM
MLM Input Files
•
•
•
•
Hapmap file
Phenotype data
Kinship matrix*
Population structure*
* Kinship matrix & population structure data can be
generated using TASSEL or with “MLM Workflow” App in
DE
Phenotype data
traits
strain
Population structure
3 populations sum to 1
strain
Origin
• Hapmap file:
–
–
–
–
Download (e.g. http://triticeaetoolbox.org/)
Convert from PLINK (.map/.ped) using Tassel 3 Conversion
Impute with NPUTE
Transform to numerical format with NumericalTransform
• Phenotype data
• Kinship matrix
– Generate from hapmap marker data with Kinship
• Population structure
– Generate using ParallelStructure
– Convert to matrix with Structure2Tassel
MLM Output
• MLM1.txt
–
–
–
–
–
–
Marker
“df” degrees of freedom
“F” F distribution for test of marker
“p” p-value
“errordf” df used for denominator of F-test
etc.
• MLM2.txt
– Estimated effect for each allele for each marker
• MLM3.txt
– The compression results shows the likelihood, genetic variance, and error variance for
each compression level tested during the optimization process.
See TASSEL manual for details:
http://www.maizegenetics.net/tassel/docs/Tassel_User_Guide_3.0.pdf
THANKS!