Transcript Document

Statistical Issues
in Human Genetics
Jonathan L. Haines Ph.D.
Center for Human Genetics Research
Vanderbilt University Medical Center
COMMON COMPLEX DISEASE
Genes
Environmen
t
Complex Disease
COMMON COMPLEX DISEASE
Genes
Environmen
t
Complex Disease
What Can The Genes Tell Us?
• Give us a better understanding of the
underlying biology of the trait in question
• Serve as direct targets for better treatments
– Pharmacogenetics
– Interventions
• Give us better predictions of who might develop
disease
• Give us better predictions of the course of the
disease
• Lead to knowledge that can help find a cure or
prevention
•Watson and Crick
started it all in 1953
with the description of
DNA
•53 Year Anniversary
of the paper will be in
April.
•Both Won Nobel
Prize
The DNA Between Individuals is Identical.
All differences are in the 0.1% of DNA that varies.
A
C
C
G
T
C
C
A
G
G
A
C
C
G
T
G
C
A
G
G
It’s hard to
believe sometimes!
HUMAN CHROMOSOMES
Single-Nucleotide Polymorphisms
(SNPs)
One of the most common types of variation
1st Chromosome
2nd Chromosome
GATCCTGTAGCT
GATCCTCTAGCT
Normal
GATCCTGTAGCT
GATCCTGTAGCT
G/C
< Normal
< Disease
Affected
GATCCTCTAGCT
GATCCTCTAGCT
Extremely frequent across the genome (~1/400 bp) -> high resolution
Easy to genotype -> high-throughput techniques
What are We Looking For?
Earth
City
Human Genome Chromosome
Street
Band
Address
Gene (DNA)
640 cubic yards
3,000 MB
1/100 cubic inch
1 x 10-6 MB
It really is like finding a needle in a haystack!
(and a very BIG haystack, at that)
The Genome Sequence is not THE answer!
Disease Gene Discovery In Complex Disease
1. Define Phenotype
a. Consistency b. Accuracy
2. Define the Genetic Component
a. Twin Studies b. Adoption Studies c. Family Studies d. Heritability e. Segregation Analysis
3. Define Experimental Design
4. Ascertain Families
a. Case-Control b. Singleton c. Sib Pairs d. Affected Relative Pairs
5. Collect Data
a. Family Histories b. Clinical Results c. Risk Factors d. DNA Samples
6. Perform Genotype Generation
a. Genomic Screen b. Candidate Gene
7. Analyze data
a. Model-dependent
Lod score
b. Model-independent
sib-pair, relative pair
c. Association studies
case-control, family-based
8. Identify, Test, and Localize Regions of Interest
9. Bioinformatics and Gene Identification
10. Identify Susceptibility Variation(s)
11. Define Interactions
a. Gene-Gene
b. Gene-Environment
CLASSES OF HUMAN
GENETIC DISEASE
• Diseases of Simple Genetic Architecture
– Can tell how trait is passed in a family: follows a recognizable
pattern
– One gene per family
– Often called Mendelian disease
– Usually quite rare in population
– “Causative” gene
• Diseases of Complex Genetic Architecture
–
–
–
–
–
No clear pattern of inheritance
Moderate to strong evidence of being inherited
Common in population: cancer, heart disease, dementia etc.
Involves many genes or genes and environment
“Susceptibility” genes
CLASSES OF HUMAN
GENETIC DISEASE
• Diseases of Simple Genetic Architecture
– Can tell how trait is passed in a family: follows a recognizable
pattern
– One gene per family
– Often called Mendelian disease
– Usually quite rare in population
– “Causative” gene
• Diseases of Complex Genetic Architecture
–
–
–
–
–
No clear pattern of inheritance
Moderate to strong evidence of being inherited
Common in population: cancer, heart disease, dementia etc.
Involves many genes or genes and environment
“Susceptibility” genes
Modes of Inheritance
• Autosomal Dominant
– Huntington disease
• Autosomal Recessive
– Cystic fibrosis
• X-linked
– Duchenne muscular dystrophy
• Mitochondrial
– Leber Optic atrophy
• Additive
– HLA-DR in multiple sclerosis
• Combinations of the above
– RP (39 loci), Nonsyndromic
deafness
Linkage Analysis
• Traces the segregation of the trait
through a family
• Traces the segregation of the
chromosomes through a family
• Statistically measures the correlation of
the segregation of the trait with the
segregation of the chromosome
A SAMPLE PEDIGREE
The RED chromosome is key
Measures of Linkage
Parametric Vs Non-Parametric
• Two major approaches toward linkage analysis
• Parametric: Defines a genetic model of the action of the
trait locus (loci). This allows more complete use of the
available data (inheritance patterns and phenotype
information).
– The historical approach towards linkage analysis.
Development driven by need to map simple Mendelian diseases
– Quite powerful when model is correctly defined
• Non-Parametric: Uses either a partial genetic model or
no genetic model. Relies on estimates of allele/
haplotype/region sharing across relatives. Makes far
fewer assumptions about the action of the underlying
trait locus(loci).
Linkage Analysis
• Families
– Affected sibpairs
– Affected relative pairs
– Extended families
• Traits
– Qualitative (affected or not)
– Quantitative (ordinal, continuous)
• There are numerous different methods that can be
applied
• These methods differ dramatically depending on the
types of families and traits
Recombination: Nature’s way of making
new combinations of genetic variants
A.
A.
B.
C.
D.
B.
C.
D.
A diploid cell.
DNA replication and pairing of homologous chromosomes to form bivalent.
Chiasma are formed between the chromatids of homologous chromosomes
Recombination is complete by the end of prophase I.
Linkage Analysis in Humans
• Measure the rate of recombination
between two or more loci on a
chromosome
• Can be done with any loci, but primary
application is to find the location of a
trait variant by measuring linkage to
known marker variants.
LOD Score Analysis
The likelihood ratio as defined by Morton (1955):
L(pedigree| = x)
L(pedigree |  = 0.50)
where  represents the recombination fraction and where 0  x  0.49.
When all meioses are “scorable”, the LR is constructed as:
L.R. =
( R (1   ) NR )
(  0.5) N
The LOD score (z) is the log10 (L.R.)
: z() is the lod score at a particular value
of the recombination fraction

: z() is the maximum lod score, which
occurs at the MLE of the recombination
fraction
CLASSES OF HUMAN
GENETIC DISEASE
• Diseases of Simple Genetic Architecture
– Can tell how trait is passed in a family: follows a recognizable
pattern
– One gene per family
– Often called Mendelian disease
– Usually quite rare in population
– “Causative” gene
• Diseases of Complex Genetic Architecture
–
–
–
–
–
No clear pattern of inheritance
Moderate to strong evidence of being inherited
Common in population: cancer, heart disease, dementia etc.
Involves many genes or genes and environment
“Susceptibility” genes
Study Designs
Linkage Analysis
Large Families
Small Families
Association Studies
Family-Based
Case-Control
Linkage vs. Association
Linkage
Association
Shared within Families
Shared across Families
TESTING CANDIDATE GENES
Disease
5/20
Normal
5/20
Gene is not important
TESTING CANDIDATE GENES
Disease
10/20
Normal
5/20
Gene may be important
Two Basic Study Designs
for Association Analysis
• Family-Based
– Parent-child Trio
– Discordant sibpairs
• Advantages
– Use existing samples
– Robustness to
assumptions
• Disadvantages
– Ascertainment
– Power
• Case-Control
• Advantages
– Power
– Ascertainment
• Disadvantages
– Sensitivity to assumptions
– Matching
METHODS FOR FAMILYBASED ASSOCIATION STUDIES
– Parent-Child
•
•
•
•
AFBAC
TDT
HHRR
QTDT
– Sibpair
• S-TDT
• DAT
• Sibship
– SDT
– WSDT
– FBAT
• Pedigree
– Transmit
– PDT
– FBAT
TRANSMISSION
DISEQUILIBRIUM TEST (TDT)
• Examines transmission of alleles to affected individuals
• Requires:
– Linkage (transmission through meioses); and
– Association (specific alleles)
•
•
•
•
Test of linkage if association assumed
Test of association if linkage assumed
Test of linkage AND association if neither assumed
Uses the non-transmitted alleles, effectively, as the
control group. Can make “pseudocontrol” by creating
genotype of the two non-transmitted alleles
• Requires phenotype only for the child
12
12
11
Non-Transmitted
TDT calculation
Transmitted
2
1
A
B
C
D
(B-C)2
TDT= (B+C)
With > 5 per cell, this follows
a 2 distribution with 1 df
TDT
12
12
Transmitted
11
1
2
Not transmitted 1
0
0
2
2
0
TDT
22
12
Transmitted
12
1
2
Not transmitted 1
0
0
2
1
1
TDT
22
11
Transmitted
12
1
2
Not transmitted 1
1
0
2
0
1
Transmitted
2
1
A
B
C
D
(B-C)2
TDT= (B+C)
Non-Transmitted
Non-Transmitted
TDT Example
Transmitted
2
1
25 42
25 42
(42-25)2
= 4.31
TDT=
(42+25)
Two Basic Study Designs
for Association Analysis
• Family-Based
– Parent-child Trio
– Discordant sibpairs
• Advantages
– Use existing samples
– Robustness to
assumptions
• Disadvantages
– Ascertainment
– Power
• Case-Control
• Advantages
– Power
– Ascertainment
• Disadvantages
– Sensitivity to assumptions
– Matching
Analysis of Case-Control Data
• Standard epidemiological approaches can be
used
• Qualitative trait
– Logistic regression
• Quantitative trait
– Linear regression
• The usual concerns about matching but must
also worry about false-positives from
population substructure
Incorporating Genetics
into Your Studies
• Obtain appropriate IRB approval
–
–
–
–
DNA studies are quite common
Template language exists for IRB approval and consent forms
Genetic Studies Ascertainment Core (GSAC) can help
Kelly Taylor: [email protected]
• Collect family history information
• Obtain DNA sample
– Venipuncture
– Buccal wash/swab
– Finger stick
• Extract/Store DNA
– DNA Resources Core can help
– Cara Sutcliffe: [email protected]
• http://chgr.mc.vanderbilt.edu/
What Can The Genes Tell Us?
• Give us a better understanding of the
underlying biology of the trait in question
• Serve as direct targets for better treatments
– Pharmacogenetics
– Interventions
• Give us better predictions of who might develop
disease
• Give us better predictions of the course of the
disease
• Lead to knowledge that can help find a cure or
prevention