Transcript Document

NWO/IOP Genomics Winterschool
Mathematics and Biology
December 17, 2001
Lecture 1
An quick overview of
human genetic linkage
analysis
Terry Speed
Genetics & Bioinformatics, WEHI
Statistics, UCB
Purpose of human
linkage analysis
To obtain a crude chromosomal
location of the gene or genes
associated with a phenotype of
interest, e.g. a genetic disease or
an important quantitative trait.
Examples: cystic fibrosis (found),
diabetes, multiple sclerosis, and
blood pressure
Why at a Genomics
Winter School?
Because the identification of genes
contributing to genetic disease or
other phenotypes is a (perhaps the)
major application of the tools and
techniques of genomics.
Examples include: the physical
mapping of clones, sequencing
candidate regions, identification of
genes in DNA sequence, sequence
analysis of candidate genes, and
mapping expression differences
between genes in broad regions..
Linkage Strategies
Traditional (from the 1980s or earlier)
– Linkage analysis on pedigrees
– Allele-sharing methods: candidate
genes, genome screen
– Association studies: candidate
genes
– Animal models: identifying
candidate genes
Newer (from the 1990s)
– Focus on special populations
(Finland, Hutterites)
– Haplotype-sharing (many variants)
– Congenic/consomic lines in mice
(new for complex traits)
Linkage analysis
Allele-sharing
methods
Association Studies
Animal Models
Linkage Strategies II
On the horizon (here)
– Single-nucleotide polymorphism
(SNPs)
– Functional analyses: finding
candidate genes
Needed (starting to happen)
– New multilocus analysis techniques,
especially
– Ways of dealing with large
pedigrees
– Better phenotypes: ones closer to
gene products
– Large collaborations
Horses for courses
• Each of these strategies has
its domain of applicability
• Each of them has a different
theoretical basis and method
of analysis
• Which is appropriate for
mapping genes for a disease
of interest depends on a
number of matters, most
importantly the disease, and
the population from which the
sample comes.
The disease matters
Definition (phenotype),
prevalence, features such as
age of onset
Genetics: nature of genes
(resistance, susceptibility),
number of genes, nature of
their contributions (additive,
interacting), size of effect
Environment, other relevant
variables (e.g. sex)
Genotype-by-environment
interactions
The population matters
History: pattern of growth,
immigration
Composition: homogeneous or
melting pot, or in between
Mating patterns: family sizes,
mate choice (level of
consanguinity)
Frequencies of disease-related
alleles, and of marker alleles
Ages of disease-related alleles
Complex traits
Definition vague, but usually thought of as
having multiple, possibly interacting loci,
with unknown penetrances; and
phenocopies. The terms polygenic and
oligogenic are also used, but these do have
more specific meanings.
There is some evidence that using a range of
made-up models can help map genes for
complex traits, but no-one really knows.
Affected only methods are widely used, with
variance component methods becoming
popular. The jury is still out on which, if any
will succeed.
Few success stories so far.
Important: heart disease, cancer
susceptibility, diabetes, …are all “complex”
traits.
Design of gene
mapping studies
How good are your data implying a
genetic component to your trait? Can
you estimate the size of the genetic
component?
Have you got, or will you eventually
have enough of the right sort of data to
have a good chance of getting a
definitive result?
Power studies
Simulations.
Genotyping
Choice of markers: highly
polymorphic preferred.
Heterozygosity and PIC
value are the measures
commonly used.
Reliability of markers
important too
Good quality data critical:
errors can play a surprisingly
large role.
Preparing genotype
data for analysis
Data cleaning is the
big issue here.
Need much ancillary
data…how good is it?
Analysis
A very large range of
methods/programs are
available.
Effort to understand their
theory will pay off in
leading to the right choice
of analysis tools.
Trying everything is not
recommended, but not
uncommon.
Many opportunities for
innovation.
Interpretation of
results of analysis
An important issue here is whether
you have established linkage. The
standards seem to be getting
increasingly stringent.
What p-value or LOD should you
use?
Dealing with multiple testing,
especially in the context of genome
scans and the use of multiple
models and multiple phenotypes, is
one of the big issues.
Replication of results
This has recently become a big
issue with complex diseases,
especially in psychiatry.
Nature Genetics suggested in May
1998 that they will require
replication before publishing results
mapping complex traits.
Simulations by Suarez et al (1994)
show that sample sizes necessary
for replication may be substantially
greater than that needed for first
detection.
Topics not mentioned
Sex-linked traits, sex-specific
recombination fractions, liability classes,
mutations, genetic heterogeneity,
exclusion mapping, homozygosity
mapping, interference, variance
component methods, twin studies, and
much more.
Some of these topics plus the ones are
covered in two books:
Handbook of Human Genetic Linkage
by J.D. Terwilliger & J. Ott (1994)
Johns Hopkins University Press
Analysis of Human Genetic Linkage
by J. Ott, 3rd Edition (1999),
Johns Hopkins University Press