map distance

Download Report

Transcript map distance

Lecture: 7
Human Gene Mapping
&
Disease Gene Identification

Whether a disease is inherited in a recognizable
mendelian pattern or just occurs at a higher frequency
in relatives of affected individuals, the genetic
contribution to disease must result from genotypic
differences among family members.



The Human Genome Project, has provided geneticists
with a complete list of all human genes, knowledge
of their location and structure, and a catalogue of
some of the millions of variants in DNA sequence
found among individuals in different populations.
Some of these variants are common, others are rare,
and still others differ in frequency among different
ethnic groups.
Whereas some variants clearly have functional
consequences, others are neutral. For most, their
significance for human health and disease is
unknown.
Two fundamental approaches to disease gene
identification:
 linkage analysis, is family-based. Linkage analysis
takes explicit advantage of family pedigrees to follow
the inheritance of a disease over a few generations by
looking for consistent, repeated inheritance of a
particular region of the genome whenever disease is
passed on in a family.

Association analysis, is population-based.
Association analysis does not depend explicitly on
pedigrees but instead looks for increased or decreased
frequency of a particular allele or set of alleles in a
sample of affected individuals taken from the
population, compared with a control set of unaffected
people.

Disease gene mapping has immediate clinical application by
providing information about a gene's location that can be used
to develop indirect linkage methods for use in prenatal
diagnosis, pre-symptomatic diagnosis, and carrier testing

Disease gene mapping is a critical first step in identifying a
disease gene. Mapping the gene focuses attention on a limited
region of the genome in which to carry out a systematic
analysis of all the genes so we can find the mutations or
variants that contribute to the disease (known as positional
cloning).

Positional cloning of a disease gene provides an
opportunity to characterize the disorder as to the extent
of:
◦ locus heterogeneity,
◦ the spectrum of allelic heterogeneity,
◦ the frequency of various disease-causing or predisposing variants
in various populations,
◦ the penetrance and positive predictive value of mutations,
◦ the fraction of the total genetic contribution to a disease
attributable to the variant at any one locus, and
◦ the natural history of the disease in asymptomatic at-risk
individuals.

Characterization of a gene and the mutations in it
furthers our understanding of
◦ disease pathogenesis
◦ development of specific and sensitive diagnosis by direct
detection of mutations, population-based carrier
screening to identify individuals at risk for disease in
themselves or their offspring,
◦ development of cell and animal models,
◦ drug therapy to prevent or ameliorate disease or to slow
its progression, and
◦ treatment by gene replacement
The effect of recombination
on the origin of various
portions of a chromosome.
Because of crossing over in
meiosis, the copy of the
chromosome the boy
(generation III) inherited
from his mother is a
mosaic of segments of all
four of his grandparents'
copies of that
chromosome.



Since homologous chromosomes look
identical under the microscope, we must be
able to differentiate them in order to trace
the grandparental origin of each segment,
and to determine if and where recombination
has occurred.
Genetic marker: any characteristic located at
the same position on a pair of homologous
chromosomes and allows distinguishing
them.
Millions of genetic markers are now available
that can be genotyped by PCR.
Independent assortment of
alleles at two loci, 1 and 2,
when they are located on
different chromosomes.
Assume that alleles D and M
were inherited from one
parent, d and m from the
other
Half (50%) of gametes will be
parental (DM or dm) and half
(50%) will be non-parental
(dM or Dm).



Assume D and M are
paternally derived and
d and m are
maternally derived.
Gametes containing
DM or dm are nonrecombinant
Alleles at Loci on the
Same Chromosome
Assort Independently
if at Least One
Crossover Occurs
Between Them in
Every Meiosis
Note: Genes that reside on the same
chromosome are said to be syntenic
If crossing over occurs at
least once in the segment
between the loci, the
resulting chromatids may be
either nonrecombinant or
Dm and dM, which are not
the same as the parental
chromosomes; such a
nonparental chromosome is
therefore a recombinant
chromosome

The ratio of recombinant to nonrecombinant
genotypes will be, on average, 1 : 1, just as if the
loci were on separate chromosomes and assorting
independently
•Crossing over between homologous
chromosomes in meiosis is shown in
the
quadrivalents
on
the
left.
Crossovers result in new combinations
of maternally and paternally derived
alleles
on
the
recombinant
chromosomes present in gametes.
•If no crossing over occurs in the
interval between loci 1 and 2, only
parental
(nonrecombinant)
allele
combinations, DM and dm, occur in the
offspring.
•If one or two crossovers occur in the
interval between the loci, half the
gametes will contain a nonrecombinant
and half the recombinant combination.
The same is true if more than two
crossovers occur between the loci.
The smaller the recombination frequency, the closer
together two loci are.

A common notion for recombination
frequency is θ, where θ varies from 0 (no
recombination at all) to 0.5 (independent
assortment).


Detecting the recombination events between
loci requires that (1) a parent be heterozygous
(informative) at both loci and (2) we know
which allele at locus 1 is on the same
chromosome as which allele at locus 2.
Alleles on the same homologue are in
coupling (or cis), whereas alleles on the
different homologues are in repulsion (or
trans).
Possible phases of alleles M and m at a marker locus with alleles D and d at
a disease locus
• Co-inheritance of the gene for an
autosomal dominant form of retinitis
pigmentosa, RP9, with marker locus 2
and not with marker locus 1.
• Only the mother's contribution to
the children's genotypes is shown.
The mother (I-1) is affected with this
dominant disease and is heterozygous
at the RP9 locus (Dd) as well as at
loci 1 and 2. She carries the A and B
alleles on the same chromosome as
the mutant RP9 allele (D). The
unaffected father is homozygous
normal (dd) at the RP9 locus as well
as at the two marker loci (AA and
BB); his contributions to his offspring
are not considered further.


All three affected offspring have inherited the B allele at locus
2 from their mother, whereas the three unaffected offspring
have inherited the b allele. Thus, all six offspring are
nonrecombinant for RP9 and marker locus 2.
However, individuals II-1, II-3, and II-5 are recombinant for
RP9 and marker locus 1, indicating that meiotic crossover has
occurred between these two loci.

Linkage is the term used to describe a departure from the
independent assortment of two loci, or, in other words, the
tendency for alleles at loci that are close together on the same
chromosome to be transmitted together, as an intact unit,
through meiosis.

Analysis of linkage depends on determining the frequency of
recombination as a measure of how close different loci are to
each other on a chromosome. If two loci are so close together
that θ= 0 between them, they are said to be tightly linked; if
they are so far apart that θ= 0.5, they are assorting
independently and are unlinked.

Suppose that among the offspring of informative meioses (i.e.,
those in which a parent is heterozygous at both loci), 80% of
the offspring are non-recombinant and 20% are recombinant.
At first glance, the recombination frequency is therefore 20%
(θ= 0.2). the accuracy of this measure of depends on the size
of the family used to make the measurement.





The map distance between two loci is a theoretical concept
that is based on real data, the extent of observed
recombination, θ, between the loci.
Map distance is measured in units called centimorgans (cM),
defined as the genetic length over which, on average, one
crossover occurs in 1% of meioses.
Therefore, a recombination fraction of 1% ( θ= 0.01)
translates approximately into a map distance of 1 cM
As the map distance between two loci increases, however, the
frequency of recombination we observe between them does
not increase proportionately (Fig. 10-7). This is because as
the distance between two loci increases, the chance that the
chromosome carrying these two markers could undergo more
than one crossing over event between these loci also
increases.
As a rule of thumb, recombination frequency begins to
underestimate true genetic distance significantly once rises
above 0.1.
The relationship between map
distance in centimorgans and
recombination fraction,θ.
Recombination fraction (solid line)
and map distance (dotted line) are
nearly equal, with 1 cM = 0.01
recombination, for values of genetic
distance below 10 cM, but they
begin to diverge because of double
crossovers as the distance between
the markers increases. The
recombination fraction approaches
a maximum of 0.5 no matter how
far apart loci are; the genetic
distance increases proportionally to
the distance between loci.
Genetic maps and physical maps




To measure true genetic map distance between two widely
spaced loci accurately, therefore, one has to use markers
spaced at short genetic distances in the interval between
these two loci and add up the values of θ between the
intervening markers. (Fig. 10-8).
For example, human chromosome 1 is the largest human
chromosome in physical length (283 Mb) and also has the
greatest genetic length, 270 cM (0.95 cM/Mb); the q arm of
the smallest chromosome, number 21, is 30 Mb in physical
length and 62 cM in genetic length (∼2.1 cM/Mb).
Overall, the human genome, which is estimated to contain
about 3200 Mb, has a genetic length of 3615 cM, for an
average of 1.13 cM/Mb.
Furthermore, the ratio of genetic distance to physical length
is not uniform along a chromosome as one looks with finer
and finer resolution at recombination versus physical length.
diagram showing how adding together short genetic distances, measured as
recombination fraction,θ , between neighboring loci A, B, C, and so on allows
accurate determination of genetic distance between the two loci A and H located
far apart. The value of between A and H is not an accurate measure of genetic
distance.



Just as male and female gametogenesis shows sex
differences in the types of mutations and their
frequencies, there are also significant differences
in recombination between males and females.
Across all chromosomes, the genetic length in
females, 4460 cM, is 72% greater than the genetic
distance of 2590 cM in males, and it is consistently
about 70% greater in females on each of the
different autosomes.
The reason for increased recombination in females
compared with males is unknown, although one
might speculate that it has to do with the many
years that female gamete precursors remain in
meiosis I before ovulation.


When a disease allele first enters the population (by
mutation or a founder), the particular set of alleles at
markers linked to the disease locus constitutes a
disease-containing haplotype
The degree to which this haplotype will persist as such
over time depends on probability of recombination

1)
2)

The speed with which recombination will
move disease allele onto a new haplotype is
the product of two main factors:
The number of generations, and therefore
the number of opportunities for
recombination
The frequency of recombination between
the loci
A third factor, selection for or against a
particular haplotype, but its effect has been
difficult to prove in humans
Alleles in linkage disequilibrium with the mutation and
constitute a disease-associated haplotype


The shorter the time since the disease allele appeared
and the smaller the value of θ, the greater is the
chance that the disease-containing haplotype will
persist intact.
With longer time periods and greater values of θ,
shuffling will go to completion and the allele
frequencies for marker alleles in the haplotype that
includes the disease allele will come to equal the
frequencies of these marker alleles in all
chromosomes in the population i.e., alleles in the
haplotype will have reached equilibrium.

One of the biggest human genomics efforts to follow completion of
the sequencing is a project designed to create a haplotype map
(HapMap) of the genome. The goal of the HapMap project is to
make LD measurements between a dense collection of millions of
single nucleotide polymorphisms (SNPs) throughout the genome.

To accomplish this goal, geneticists collected and characterized
millions of SNP loci, developed methods to genotype them rapidly
and inexpensively, and used them, one pair at a time, to measure LD
between neighboring markers throughout the genome.

The measurements were made in samples that included both
unrelated population samples and samples containing one child and
both parents, obtained from four geographically distinct groups: a
primarily European population, a West African population, a Han
Chinese population, and a population from Japan
The study showed that:
1) More than 90% of all SNPs are shared among
such geographocally disparate populations,
with allele frequencies that are quite similar
in the different populations
- This finding indicates that most SNPs are old
and predate the waves of emigration out of
East Africa that populated the rest of the
world
- Differences in allele frequencies in a small
fraction of SNPs may be the result of either
genetic drift/founder effect or selection in
localized geographical regions after
migration out of Africa.



Such SNPs, termed ancestry informative
markers, are used in studies of human origin,
migration and gene flow.
In forensic investigations, to determine the
likely ethnic background when the only
available evidence is DNA
2) When pairwise measurements of LD were
made for neighboring SNPs across the
genome, contiguous SNPs can be grouped
into clusters of varying size in which SNPs in
any one cluster shows high levels of LD with
each other.
- These clusters of SNPs in high LD, located
across segments of a few kb to a few dozen
Kb are termed LD blocks.
- The sizes of LD blocks are not identical in all
populations. African populations have smaller
blocks as compared to other populations.
A 145-kb region of chromosome 4 containing 14 SNPs. In cluster 1,
containing SNPs 1 through 9, five of the 29 = 512 theoretically possible
haplotypes are responsible for 98% of all the haplotypes in the population,
reflecting substantial linkage disequilibrium among these SNP loci. Similarly,
in cluster 2, only three of the 24 = 16 theoretically possible haplotypes
involving SNPs 11 to 14 represent 99% of all the haplotypes found. In
contrast, alleles at SNP 10 are found in linkage equilibrium with the SNPs in
cluster 1 and cluster 2 .
3) Pairwise measurements of recombination
between closely neighboring SNPs revealed
that the ratio of map distance to base pairs
was not constant (~1 cM/Mb). Instead ranged
from far below 0.01 cM/Mb to more than 60
cM/Mb.
- This indicates that rate of recombination
between polymorphic markers which was
thought to be uniform is, in fact, the result of
an averaging of “hotspots” of recombination
interspersed among regions of little or no
recombination.
B, A schematic diagram in which each box
contains the pairwise measurement of the
degree of linkage disequilibrium between two
SNPs (e.g., the arrow points to the box,
outlined in black, containing the value of D'
for SNPs 2 and 7). The higher the degree of
LD, the darker the color in the box, with
maximum D' values of 1.0 occurring when
there is complete LD. Two LD blocks are
detectable, the first containing SNPs 1
through 9, and the second SNPs 11 through
14. In the first block, pairwise measurements
of D' reveal LD. A similar level of LD is found
in block 2. Between blocks, the 14-kb region
containing SNP 10 shows no LD with
neighboring SNPs 9 or 11 or with any of the
other SNP loci. Below is a graph of the ratio
of map distance to physical distance
(cM/Mb) showing that a recombination
hotspot is present in the region around SNP
10 between the two blocks, with values of
recombination that are 50- to 60-fold above
the average of approximately 1.13 cM/Mb for
the genome.