Transcript Document

Center of Statistical Genetics
University of Pisa
The genetic structure of human populations
and the search for complex disease genes
Silvano Presciuttini
European Academy Bozen/Bolzano
EURAC
Dec 16, 2005
Overview
In order to locate genes with moderate phenotypic effect,
we must use methods based on linkage disequilibrium
(LD). LD is a property of the population (contrary to
linkage, which is a property of the species); therefore, we
need detailed knowledge of population’s genetic
structure to design efficient LD mapping studies
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Outline








The classical approach to locating genes with respect to one another in
experimental organisms was based on linkage analysis
The same approach obviously apply to human genes; however, detecting
linkage in human is not easy, and the statistical treatment is complicated
For assessing linkage of Mendelian diseases, classical linkage analysis is a
robust method; however, fine mapping is impractical
At genetic distances where linkage analysis becomes unfeasible, LD
mapping starts being useful
For complex diseases, we still may apply linkage analysis, but we need a
good genetic model; in addition, the power to detect linkage decreases
LD may also be efficient in detecting genes that increase disease risk
LD depends on population history, and varies across different populations
Isolated populations founded by a small number of individuals should be
preferred when planning LD mapping studies
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Linkage analysis in experimental organisms
In the early 1900s, Bateson and Punnett
The first evidence of linkage (deviation
were studying inheritance in the sweet pea. from the Mendelian principle of
They crossed pure lines P/P · L/L (purple
independent assortment).
flower, long pollen) × p/p · l/l (red, round), Phenotype Genotype
Observed Expected
purple, long
P/- L/4831
3910.5
and selfed the F1 heterozygotes. The
P/- l/l
390
1303.5
following table shows the proportions of the purple, round
red, long
p/p L/393
1303.5
four phenotypes in the F2 plants:
red, round
p/p l/l
1338
434.5
Total
6952
6952
The first genetic map in Drosophila
In Sturtevant's own words, "In the latter part of 1911, in conversation with Morgan,
I suddenly realized that the variations in strength of linkage, already attributed by
Morgan to differences in the spatial separation of genes, offered the possibility of
determining sequences in the linear dimension of a chromosome. I went home and
spent most of the night (to the neglect of my undergraduate homework) in
producing the first chromosome map."
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Linkage analysis in humans

In principle, detecting linkage in humans is exactly the same as detecting
linkage in any other sexually reproducing diploid organism. The aim is to
discover if two loci segregate independently and, if not, to measure the
recombination rate.
A2 A1
B2 B1
A2 A2
B2 B2
In this family, only
these chromosomes
can be scored for
crossing over


A1 A1
B2 B1
A2 A1
B1 B1
A2 A1
B 2 B1
A1 A1
B1 B1
A2 A1
B2 B1
A1 A1
B1 B1
A1 A1
B2B1
A2 A1
B2B1
A1 A1
B1 B1
A2 A1
B1 B1
A2 A1
B2B1
NR
NR
R
NR
NR
R
NR
In practice, detecting linkage in humans has been precluded for the large part
of the last century, because suitable polymorphic markers were just not
available.
For much of this period, human geneticists were envious spectators, because
the idea of constructing a human linkage map was generally considered
unattainable.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Some peculiarities of linkage analysis in humans

Unlike the experimental organisms, the human linkage map was never going to be
based on genes because the frequency of mating between two individuals suffering
from different genetic disorders is extremely small.

The only way forward for a human linkage map was to base it on neutral
polymorphic markers. These were found with the discovery of RFLPs in the ’70, but
a true advance came with the discovery of microsatellites only in the ’80.

Matings cannot be controlled in humans; therefore, geneticists try to collect as many
families as possible, in the hope that there will be enough meioses informative for
linkage

Given the vagaries of family sampling (i.e., variable pedigree structures and
different mating types), calculation of recombination fractions would be a nightmare
without the help of computer programs

As linkage is a property of the genome (i.e., of the species), families with rare
conditions can be collected from all over the world, irrespective of their ethnic
background
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Before the ’80, detecting linkage in humans was occasional



Before 1980, only a very few human genes had been identified as
genetic risk factors for hereditary disorders.
Such early successes were very largely the result of exceptional
characteristics: the biochemical basis of the disease had previously
been established and purification of the gene product could be
achieved without too much difficulty. Such advantages do not apply,
however, to great majority of diseases resulting from mutation in
human genes.
In the 1980s, the application of recombinant DNA technology offered
new approaches to mapping and identifying the genes underlying
inherited single gene disorders and number of disease genes identified
started to increase rapidly.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Linkage analysis of Mendelian traits

Over the past two decades, rapid progress has been made by using genetics to
identify the molecular cause of human disease. Most of these diseases are rare,
highly penetrant, traits that are found to follow Mendelian rules of inheritance in
families, and are therefore often referred to as "Mendelian diseases." Linkage
methods for mapping Mendelian traits are well established and have resulted in the
identification of the molecular causes of hundreds of diseases.
Pace of disease gene discovery
1981 – 2000
The number of disease genes
included in the chart is 1112.
Numbers in parentheses indicate
disease-related genes that are
polymorphisms ("susceptibility
genes")
Peltonen and McKusick (2001)
Center of Statistical Genetics
S. Presciuttini – University of Pisa
For Mendelian traits, linkage analysis is a powerful technique


The remarkable success of positional cloning rests not simply on the
advances observed in molecular technology.
It also reflects the enormous power of linkage analysis when applied
to Mendelian phenotypes — that is, those characterized by a (near)
one-to-one correspondence between genotypes at a single locus and
the observed phenotype
In this family, the
disease co-segregates
with a marker allele
A pedigree shoving evidence of linkage of disease to marker
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Genetic heterogeneity





One of the most common deviation from the one-to-one
correspondence between a mutated gene and a disease is genetic
heterogeneity (similar phenotypes caused by mutations in more than
one gene).
This happens when a disease is caused by mutations in different loci
(belonging to different complementation groups) whose gene products
participate in the same cellular processes
In these cases, we have a many-to-one relationship between genes
and phenotypes
From the point of view of linkage analysis, genetic heterogeneity
represents only a minor disturbance
With the advent of dense marker maps, locating Mendelian (or nearly
Mendelian) traits to chromosomes is virtually certain
Center of Statistical Genetics
S. Presciuttini – University of Pisa
The limitation of linkage analysis





Once linkage of a gene to a particular trait has been confirmed, the next step would
be to narrow the region through the analysis of recombinants
The standard procedure is to re-examine the families with markers spaced more
closely in the region of interest. However, even if one has an unlimited supply of
closely linked, STRs or SNPs, the limit of resolution remains the number of meioses
in which crossovers might have occurred
Even when large extended families are available, only a few hundred informative
meiotic events can be observed, limiting the resolution of linkage mapping in the
most favorable cases to about 1 cM (roughly 1% recombination, 1 cM, or ~ 1 Mb of
DNA, still a large amount).
In less favorable cases, there may be as many as a few hundred predicted genes that
might be the relevant disease gene.
Thus, linkage mapping is appropriate for low resolution mapping to localize
disease loci to broad chromosome regions within a few cM (<10 cM), which could
contain tens, or hundreds, of genes.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
A random example from literature

The locus (RP1) for one form of autosomal dominant retinitis pigmentosa (adRP)
was mapped on chromosome 8q11-q22 by linkage analysis in an extended family
ascertained in the USA. Investigating another multigeneration Australian family
with adRP, the critical region was narrowed to about 4 cM, corresponding to
approximately 4 Mb
Linkage mapping in two families with the RP1
form of retinal degeneration places the disease
locus in a 4 mb interval between D8S601 and
D8S285
Xu S-Y et al. Hum Genet: 98, 741-743 (1996)
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Genetic analysis of HPT-JT
18 families with “Hyperparathyroidism with jaw tumors” (HPT-JT) were submitted to
fine mapping to narrow down the region of the locus (HRPT2); affected haplotypes
were constructed and a recombination map was obtained. Therefore, the genetic interval
was reduced to 12 cM, including a chromosome segment of 14 Mb. This region
contained 67 candidate genes.
A region of 12 cM in
chromosome 1 identified by
recombination analysis in 18
families with HPT-JT.
A partial transcript map of the
critical region. Genes
highlighted in BLUE were
initially prioritized for mutational
analysis. C1orf28 is labeled in
RED as the gene of interest.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Linkage disequilibrium analysis as a fine mapping tool




If the region of interest is smaller than a few Mb, then there will be
very few recombinations in this region. Therefore linkage analysis
becomes useless in small regions.
One way to perform fine mapping and confirm linkage of a
susceptibility locus is to test for allele association due to linkage (i.e.
"linkage disequilibrium") between particular genetic markers and the
disease.
In fact, and contrary to linkage analysis, association analysis is highly
efficient for fine mapping, as appreciable linkage disequilibrium exists
in human between loci with recombination fractions of less than 1-2%
Linkage disequilibrium (LD) analysis has often been instrumental in
the final phases of gene localization.

These successes have fueled hopes that similar approaches will be effective in
localizing genes underlying susceptibility to common, complex diseases.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
What is linkage disequilibrium?

Suppose that we have typed a population
sample for two diallelic loci, and let the
results be tabulated as follows:
Observed genotypes
Locus 1
22
12
11
Loc2
6
50
65
33
7
26
34
3
44
16
76
65
121
33
3
157

We can easily estimate
the allele frequency at
the two locy by direct
count:

From these frequencies, we may calculate the
expected frequencies of the four haplotypes (this
situation correspond to linkage equilibrium):
Expected haplotype freqs
1
2
3 (13) 0.575 (23) 0.301
4 (14) 0.081 (24) 0.043

However, if we estimate the four haplotype
frequencies from the genotype data (this is not
obvious), we see that they are out of equilibrium:
Estimated haplotype freqs
1
2
3 (13) 0.656 (23) 0.220
4 (14) 0.000 (24) 0.124

Allele frequencies
Loc1
Loc2
1 0.656 3 0.876
2 0.344 4 0.124
Tot
1
1
In this particular case, D = 0.08 and D’ = 1.0, as
one haplotype (1-4) has zero frequency
Center of Statistical Genetics
S. Presciuttini – University of Pisa
The effects of LD


Linkage disequilibrium is a phenomenon whereby particular alleles of
different loci are associated: people who have one tend to have a
second as well
Linkage disequilibrium of a particular marker allele will occur when
the disease locus and the marker locus are so closely positioned that
recombination events between them are very rare and a certain marker
allele is associated with the disease gene.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Design of a modified L.D. mapping test applied to HPT-JT
.
.
4
2
6
8
8
9
8
3
2
8
.
.
.
.
6
4
9
7
6
9
3
7
7
7
.
.
.
.
4
1
8
7
6
1
7
5
6
1
.
.
.
.
4
2
6
8
8
9
8
3
2
8
.
.
.
.
5
2
6
4
1
2
5
7
6
6
.
.
.
.
4
1
8
7
6
1
7
5
6
1
.
.
Affected
chromosomes
.
.
1
1
8
6
6
7
7
7
4
5
.
.
.
.
8
5
2
3
2
2
1
8
8
7
.
.
.
.
9
6
2
8
5
5
6
1
4
2
.
.
.
.
1
1
8
6
6
7
7
7
4
5
.
.
.
.
9
6
2
8
5
5
6
1
4
2
.
.
.
.
8
8
4
4
6
6
7
1
7
2
.
.
.
.
2
6
6
4
7
8
7
7
4
3
.
.
.
.
3
4
4
4
3
3
1
2
1
3
.
.
.
.
4
1
9
4
8
5
6
8
9
1
.
.
.
.
2
6
6
4
7
8
7
7
4
3
.
.
.
.
7
7
9
1
1
3
7
3
6
6
.
.
.
.
4
1
9
4
8
5
6
8
9
1
.
.
Unaffected
chromosomes
The main advantage of our
method derives from
selecting trios in which both
the child and one of the
parents are affected. This
allows us to determine which
of the two chromosomes in
the child is the "case", so that
we do not have to use all the
four founding chromosomes
to look for transmission
disequilibrium.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
2.2
1.8
1.4
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
6
6
6
4
7
7
7
4
3
7
7
7
0
2
7
6
0
7
2
4
2
8
7
3
3
6
9
4
7
7
1
8
2
7
4
1
D1S477
2
2
4
3
2
4
4
3
2
5
4
2
3
3
2
2
4
3
D1S2622
D1S518
4 4
5 9
6 9
4 4
5 10
3 6
4 10
4 4
5 9
5 1
3 7
4 6
7 9
6 7
7 8
5 9
6 2
5 7
D1S1632
D1S202
4
5
3
4
4
3
3
4
5
3
3
4
3
3
3
5
4
5
A319WH1
D1S2138
1
3
3
1
3
2
2
1
4
3
3
3
4
3
3
4
1
4
D1S1660
D1S2848
2
4
2
5
4
4
3
5
4
3
3
3
5
4
3
4
4
4
D1S413
D1S191
7
8
9
3
3
7
7
3
9
3
7
7
5
3
3
3
7
5
D1S2840
D1S444
2
1
1
1
2
1
4
2
1
6
1
1
1
2
2
4
4
2
A207WB12
D1S254
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
A329VC1
9
D1S2794
8
D1S412
7
D1S384
6
D1S542
5
D1S461
4
D1S238
3
1 7
3 5
1 6
1 14
1 9
5 11
1 8
5 8
1 7
3 5
1 8
1 7
6 8
1 7
1 5
6 7
1 1
6 11
6
3
3
3
6
6
1
6
8
6
6
6
6
3
3
6
3
6
4
3
2
6
4
5
4
4
3
6
4
4
4
4
4
3
3
3
3
1
2
3
3
3
2
3
3
3
3
2
3
3
3
2
3
2
4
5
4
5
4
5
5
5
6
5
6
5
6
5
4
7
5
7
2
1
1
1
1
1
2
1
1
1
1
2
1
1
2
1
1
2
2
2
2
2
2
2
3
2
1
3
2
2
2
2
2
2
2
2
5
4
4
4
7
4
5
4
4
5
4
4
4
4
7
4
4
4
3
1
5
7
5
1
4
7
2
3
5
5
1
5
5
6
1
5
1
3
1
1
4
5
7
5
4
4
1
1
3
1
5
1
5
5
6
4
4
4
7
1
2
3
2
6
5
5
5
6
5
6
3
4
6
3
6
7
3
6
3
3
4
3
3
3
4
6
3
3
6
3
4
2
2
5
2
3
4
6
2
3
3
2
3
2
3
2
8
9
2
10
7
8
5
8
11
10
8
10
14
2
10
2
8
3
4
2
7
7
7
7
2
7
2
5
9
2
7
6
9
5
7
7
7
5
D1S222
2
D1S2127
FAMILY ID
Kindred-01
Kindred-02
Kindred-03
Kindred-04
Kindred-05
Kindred-06
Kindred-07
Kindred-08
Kindred-09
Kindred-10
Kindred-11
Kindred-12
Kindred-13
Kindred-14
Kindred-15
Kindred-16
Kindred-17
Kindred-18
1
D1S466
0.6
D1S215
The peak points
to the interval
between
D1S384 and
D1S412, the
location where
HRPT2 was
actually
identified
Ratio of unique haplotypes in
case to control chromosomes
A peak of haplotype sharing among the 18 HPT-JT
HRPT2-carrying chromosomes
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Complex diseases




What is a “complex disease”? Some definitions include:
The term complex trait/disease refers to any phenotype that does not exhibit
classic Mendelian inheritance attributable to a single gene; although they
may exhibit familial tendencies (familial clustering, concordance among
relatives). Other hallmarks of complex diseases include known or suspected
environmental risk factors; seasonal, birth order, and cohort effects; late or
variable age of onset; and variable disease progression (M.T. Dorak)
Complex diseases are characterized by risk to relatives of an affected
individual which is greater than the incidence of the disorder in the
population. Complex traits may involve the interaction of two or more genes
to produce a phenotype, or may involve gene-environment interactions
(PhRMA Genomics)
Complex diseases are those that do not show perfect cosegregation with any
single locus owing to such problems as incomplete penetrance, phenocopy,
genetic heterogeneity, and polygenic inheritance (Lander and Shork)
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Complex diseases


In short, from a genetical point of view complex disease are those which
show many-to-many relationships between genes and phenotypes
This means that if we focus our attention on a particular gene associated
with a disease, we can still reason in terms of the Mendelian paradigm only
if we allow for two kinds of exceptions to the one-to-one rule:
1. Not all individuals who carry the gene are affected with the disease
(incomplete penetrance)
2. Some individuals who do not carry the gene are affected with the trait
(phenocopies)



Both these conditions can be treated by defining appropriate penetrance
functions for the carriers and the non-carriers of the gene
These penetrance functions, coupled with the specification of the gene
frequency, constitute waht we call a “genetic model”
Thus, a complex disease can be viewed as a collection of genetic models,
each specifying the contribution of a particular gene to the development of
the disease.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Model-based linkage analysis

Model-based is also called parametric linkage analysis, as the input
must include parameters defining how we think the genotypes at the
locus influence the phenotype, i.e., the mode of inheritance
This is a phenocopy
This is a nonpenetrant case
Contrary to what may appear at first sight, this pedigree may
support linkage under an appropriate genetic model
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Linkage analysis of complex diseases

Thus, classical linkage analysis can be used to map genes involved in
the etiology of complex diseases; however:
1. The genetic model must be specified correctly, otherwise spurious results (false
positives) could happens;
2. The power to detect linkage decreases very fast as far as the effect of the gene
on the phenotype becomes smaller and smaller


Whereas the genetics community has achieved great success in
finding the genes that are responsible for a wide range of Mendelian
diseases, the search for complex disease genes has been relatively
frustrating, despite intense research effort in both the academic and
commercial sectors.
Linkage mapping, which is a powerful tool for finding Mendelian
disease genes, often produces weak, and sometimes inconsistent,
signals in complex disease studies.

To date, only a few variants that contribute to complex diseases have been
conclusively identified.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
LD mapping of complex diseases




The rationale underlying LD mapping of complex disease genes is
straightforward and similar to the justification for LD mapping of
Mendelian disease genes.
With both types of disease genes, the primary advantage of LD
analysis remains its ability to use the effects of dozens or hundreds of
past generations of recombination to achieve fine-scale gene
localization.
A major difference, of course, is that weak associations complicate the
analysis of complex diseases and may be more extensive for these
diseases than for most Mendelian diseases.
Despite these challenges, LD mapping holds considerable appeal, and
there is great demand to resolve the genetics of complex diseases.
Consequently, many new techniques have been recently devised to
carry out LD analysis, often with a view toward mapping complex
disease loci.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
How LD is generated?

LD is the consequence of the genetic-demographic history of a
population
Emergence of Variations Over
Time
Variations in Chromosomes
Within a Population
Disease Mutation
Common
Ancestor
time
present
Center of Statistical Genetics
S. Presciuttini – University of Pisa
LD is always in a dynamic state

The actual extent of LD is determined by a balance between the
opposite forces of mutation, selection and drift on one side and
recombination on the other side
...
The mutation arises on
a particular genetic
background
...
If the mutation
increases in frequency
by drift (or selection)
the associated
haplotype will also
increase in frequency
...
Over time the
association between
the new mutation and
linked mutations will
decay by
recombination
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Decay of LD over generations



A mutation has occurred in the gene of the ancestral chromosome, and
this has spread in the population
In a series of generations (G), recombinations occur between disease
allele and the surrounding marker (M) alleles, gradually dissipating
the disequilibrium (gray color).
Marker alleles, which are located in the close vicinity of disease allele,
encompass stronger linkage disequilibrium than marker alleles located
more distantly.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Genetic-demographic history of the human species



Analysis of data on genetic variation suggests for the human species
an ancestral population size of approximately 10,000 during the
period when the current pattern of genetic variation was largely
established, approximately 100,000-200,000 years ago
Thus, while we know that the human population has grown
enormously since the development of agriculture approximately
15,000 years ago, most human genetic variation arose and became
established in the human population much earlier than this, when the
human population was still small.
This means that the number of generations elapsed since the origin of
the species has been insufficient to cancel the effect of the founding
chromosomes on LD, so that significant LD exist at the
interpopulation level at small genetic distances (~1 Mb)
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Local patterns of LD



Past human demography included population founding events, expansion, and
migration, and each of these factors plays a complex role in determining local
patterns of LD
In particular, there seems to be less LD within African populations than in
populations outside of Africa
The primary reason is that human migrations out of Africa probably only sampled a
subset of the total diversity that was within Africa, and the resulting founder effect
could have inflated LD
Plot of the decay of average LD versus the
physical distance of SNPs.
RED: Asians; BLUE: European Americans;
BLACK: African Americans.
The observed pattern varies widely across
different regions of the genome.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Two-locus LD in European’s X chromosome
Standardized linkage disequilibrium (D') between markers of the X chromosome
as a function of the intermarker distance. Large symbols: D' values with nominal
P ≤ 0.01 (blue: adjacent markers; red: LD computed at 5marker intervals). Dots:
D' values with P >0.01. Marker pairs with distance < 1 kb have been omitted.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Multilocus LD of the X chromosome
Bars represent sliding windows of 5 markers each, whose D* value is
plotted. The line under the chart shows the marker location; a large
gap centered at 60 Mb may be noted.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
The interest of isolated populations



Because LD reflects the history of recombination, populations with different
demographic histories will often display different LD patterns.
In recently founded groups, such as the Finnish or Mennonite populations, LD may
be seen for loci separated by several cM or more. These patterns have led to the
suggestion that younger populations may be most useful for the initial detection of a
disease locus via LD at large distances. Subsequently, older populations, in which
more recombinants have accumulated, may be more useful for the fine-scale LD
mapping of the disease locus.
Encouraged by the singular successes of LD-based mapping of Mendelian disorders
in isolated populations, many investigators are now turning to these populations in
the search for loci underlying complex diseases. The reasoning is simple: isolated
populations typically have a simpler population history, with fewer founders and less
population admixture. In effect, the ideal isolated population is a large pedigree with
many, many generations. Therefore, it is expected that allelic and locus
heterogeneity should be more limited, permitting easier detection of allelic
associations.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Conclusions

When making inferences of association between genes and complex
diseases, the need to understand population subdivision is critically
important.



For example, if one does a case-control study, and the samples under study are a
mix of two somewhat isolated population groups — one with high disease
incidence and one with low incidence— then there will be a spurious association
between this disease and any genetic marker that shows allele frequency
differences between the population strata.
When embarking in linkage disequilibrium mapping, it is important to
collect detailed information on the population structure and
genealogical history to maximally utilize the founder populations for
novel gene discoveries.
A major challenge in human genetics is to learn to recognize those
relatively few genetic variants that are functionally important against
the large background of neutral variation that distinguishes the
genome. Knowing the population genetic structure is the necessary
prerequisite to investigate the genetics of complex diseases
Center of Statistical Genetics
S. Presciuttini – University of Pisa