PCR Lecture - Woods Hole Oceanographic Institution
Download
Report
Transcript PCR Lecture - Woods Hole Oceanographic Institution
Population Genomics
Friend or Foe?
Tim Shank
4/2/03
[email protected]
Woods Hole Oceanographic Institution
Genome
Projects
Microbial
Genomics
GenomeGenome
Interactions
Comparative
Genomics
Population Genomics- A View
Genomics
Projects
Microbial
Genomics
Population
Genomics
Functional
Genomics
Pharmaco
genomics
Population Genomics- definitions
The study of forces that determine patterns of DNA
variations in populations (Michel Veuille, European Consortium)
Field of genomics that links complex genotypes and phenotypes
by comparing the flow of genotypic and phenotypic information
in breeding and natural populations (Andrew Benson, U. Neb)
Genomic variation within species permitting the construction of
detailed linkage maps using polymorphic markers, and through
crossing experiments between individuals with different
phenotypes, identification of genes responsible for phenotypic
variation (e.g, disease susceptibility, drug toxicity) (Andrew Clark, PSU)
Characterization of genetic relationships of populations important for understanding:
• Genetic management of protected or threatened populations (e.g. Jones et al. 2002)
• Historical migrations and connectivity of populations (e.g. Eizirik et al. 2001)
• Kin selection and social behavior (e.g. Morin et al. 1994)
• Mating systems (e.g. Engh et al. 2002)
• Dispersal, temporal and spatial genetic structure (e.g. Goodisman & Crozier 2001)
Questions in Marine Population Genetics
• How do marine larvae disperse between
localities that may be isolated?
• What role do larval retention and stepping
stone habitats play in species maintenance?
• Do topographic and hydrographic
features like transform faults and
currents disrupt or facilitate gene flow
between demes?
• Does the pattern of colonization and mode of
dispersal affect the retention of genetic
diversity in marine animals?
Dispersal models
Continuous populations
• Isolation-by-distance
Discrete populations
• Stepping-stone
• Island model
FST -approaches
2
sA
FST
pA (1 pA )
Wright (1951) [The genetical structure of populations. Ann.
Eugen. 15:323-354.] noted the following relationship holds
when populations reach an equilibrium between genetic drift
and migration:
1
FST
4Nm 1
where N is the variance effective population size of the
average population, and
1
0.8
0.6
Nm is a virtual number
FST
0.4
m is the average proportion of immigrants in each
population
Problem: Useful parameter space is for FST values
between 0.1 and 0.4
0.2
0
0
2
4
Nm
6
8
10
The giant tubeworm, Riftia pachyptila
20
Guaymas
21°
11°
2°
9°
Galapagos
Rift
N
W
10
5
E
S
Fst. Migration rate
East Pacific Rise
13°
100
1000
DISTANCE (Km)
Reject expectations of "island model"
Consistent with stepping-stone model
Inference: a species with more limited dispersal abilities
Black et al. 1994 Gene flow among vestimentiferan tube worm
(Riftia pachyptila) populations from hydrothermal vents of the Eastern Pacific. Marine Biology 120: 33-39.
10,000
Molecular Toolkit: markers for inferring population structure and gene flow
Allozymes
multiple, independent, codominant loci; relatively easy; low cost
need to freeze samples; state characters
RFLPs
variation in restriction fragment lengths
polymorphic due to restriction site mutation
mtDNA
relatively easy; maternally inherited; effectively haploid; non-recombining;
modest cost; amenable to genealogical analysis
linked loci and psuedoreplication
nuclear DNA sequences
amenable to genealogical analysis
diploid; recombination; start-up time may be considerable
AFLPs
can get 100s of loci relatively easily
dominance; recombination; state characters; mutation models not available
minisatellites
repeats of 10-40 bp units
polymorphic due to unequal crossing over
Molecular Toolkit: markers for inferring population structure and gene flow
DNA microsatellites
Repeat unit 2-3 bp; nuclear; can get dozens of loci relatively easily; method of
choice for parentage
recombination; state characters; start-up time is great; issues of homoplasy in
geographical studies; mutation must be taken into account in gene flow models
Single-Nucleotide Polymorphisms (SNPs)
Most simple form and most common source of genetic polymorphism in most
genomes.
large amount of sequencing effort in nonmodel organisms
Violation of analyitcal assmumption of independence among marker loci
Sequence Tagged Sites (STSs)
(physical marker)
A short DNA segment that occurs only once in the genome and whose exact
location and order of bases are known. (They can be used as primers for PCR
reaction).
Very labor intensive; very few loci
Expressed Sequence Tags (ESTs)
(physical marker)
Short (100-300bps) part a cDNA which can be used to fish the rest of the gene
out of the chromosome by matching base pairs with part of the gene.
large amount of sequencing effort
Molecular Markers:Random Amplified Polymorphic DNA, AP-PCR
PCR-based method
Target Sequence
= arbitrary primer (e.g. ggcattactc)
High Variability: Probably due to mutations in priming sequences
Amplify regions between priming sites by polymerase chain reaction
Analyze PCR products by agarose gel electrophoresis.
Marker is dominant (presence/absence of band).
No prior sequence knowledge required
Many variations on the theme (e.g., RAMP, ISSR)
Amplified Fragment Length
Polymorphism (AFLPs)
Polymorphism based
on gain or loss of
restriction site, or
selective bases
Technically demanding
and expensive
Many markers
generated, mostly
dominant
More reliable than
RAPD, less so than
SSR
No prior sequence
knowledge required
Single-Strand Conformational Polymorphism
1. Amplify Target Sequence
2. Denature product with heat and formamide
Highly sensitive to
DNA sequence:
can detect single
base changes
3. Analyze on native (nondenaturing) polyacrylamide gel
4. Base sequence determines 3-dimensional conformation
Simple process
but can be
difficult to repeat
Denaturing Gradient Gel Electrophoresis
1. Amplify Target Sequence
2. Run product on gel with denaturing gradient (parallel or perpendicular to direction gel runs)
3. Product begins denaturing at a certain
point, depending on base sequence: greatly
retards migration and allows discrimination
of alleles based on small sequence
differences
4. Denaturing gradient gels can be difficult to
produce: use perpendicular gradient to identify
optimal conditions, move to CDGE: constant
denaturant gel electrophoresis
Cleaved Amplified Polymorphic Sequence (CAPS)
1. Amplify Target Sequence
2. Cut with a restriction enzyme that differentiates alleles
Fairly simple
analysis (cutting
can be a hassle)
X
Allele 1
Allele 2
3. Alleles can be differentiated by size based on loss or gain
of restriction site; May be able to analyze on agarose gel
Requires
sequence
information from
several alleles (or
luck)
Allele Discrimination via Quantitative PCR (Taqman)
Microsatellites (Simple Sequence Repeats)
Microsatellites
“…reiterated short sequences [of DNA] tandemly arrayed, with
variations in copy number accounting for a profusion of
distinguishable alleles”
- (Avise 1994)
Locations:
- Nuclear DNA
- Chloroplast
Microsatellite Types
1. Dinucleotide
• Animals - CA
• Plants - TA, GA
2. Trinucleotide
GTG, CAG, and AAT
Related to disease and cancers
3. Tetranucleotide
GATA/GACA
Highly polymorphic
Microsatellite Uses
1.
Population Genetics
1. Gene flow
2. Stock Structure
2.
Genetic Probes
1.
2.
3.
4.
Larvae
Gut contents
Scat
Source populations
3.
Pedigree Maps
4.
Understanding Diseases
Microsatellite Advantages
1.
Highly Polymorphic
2.
Codominant
3.
In every organism examined to date
4.
Very abundant
5.
Random spacing in the genome
6.
Can find same loci in closely related species
7.
Easy and reliable scoring
8.
Highly sensitive
9.
Neutral markers
Microsatellite Disadvantages
1.
Expensive
2.
Time consuming
3.
Several loci are needed to obtain sufficient statistical
power
4.
Current analyses methods do not distinguish between
changes in flanking regions vs. changes within the
microsatellite regions
5. Different rates of evolution at different loci
Mutation Mechanisms
1.
Slippage in DNA at Replication (Slip-Strand
Mispairing, SSM)
2.
increases or decreases the repeat by one unit
most supporting evidence
Recombination
A.
B.
Unequal crossing over (UCO)
Gene conversion
Microsatellite Mutations
10-3 to 10-6 events per locus per generation (point
mutation 10-9 to 10-10)
Varies by
•
repeat type
•
base composition of the repeat
•
taxonomic group
•
length of the allele
most common - addition or deletion of a single repeat
occasionally 2 to several repeats
strong evidence that the number of repeats is limited
Mutation Models
1. Infinite Allele Model (IAM)
•
gain or loss of any number of repeats and always results in an
allelic state not present in the population
2. Stepwise Mutation Model (SMM)
•
gain or loss of a single repeat
3. Two-Phase Model (TPM)
•
gain or loss of X repeats
4. K-allele Model (KAM)
•
Intermediate step in the IAM (IAM = KAM with infinite K)
•
K possible allelic states
Creating A Microsatellite-Enriched Library
Genomic
DNA
DNA
Extraction
Digestion
DNA Library
Add
Linkers
PCR
Enriching Microsat Library
Hybridize
to Beads
CACA
GTGT
PCR
Microsatellite-Enriched
DNA Library
Microsatellite Library Screening
Cloning
Blots/
Hybridizations
Plasmid
Preps
Enzyme
Digest
Isolated Plasmids
Check Insert Size
Dot Blot Hybridizations
References
www.biotech.ufl.edu/WorkshopsCourses/mm_manual.htm
Avise, J.C. 1994. Molecular Markers, Natural History and Evolution. Chapman and Hall, New
York. 511 pp.
Balloux, F. and N. Lugon-Moulin. 2002. The estimate of population differentiation with
microsatellite markers. Molecular Ecology. 11: 155-165.
Goldstein, D.B. and C. Schloterrer (Editors). 1999. Microsatellites: Evolution and Applications.
Oxford University Press, Oxford, 352 pp.
Jarne, P and P.J.L. Lagoda. 1996. Microsatellites, from molecules to populations and back.
Trends in Ecology and Evolution 11(10): 424-429.
Slatkin, M. 1995. A measure of population subdivision based on microsatellite allele frequencies.
Genetics 139: 457-462.
Fluorescent Labeling of Microsatellites
Acrylamide gel with 5
microsatellite loci and
internal size standard
Simultaneous analysis
of a dozen loci
Comparing “Genomic” Methods for Population Studies
Polymorphism
Codominance
Prior Seq.
Knowledge
Difficulty
Repeatability
DNA Quality
Development
Cost
Genotyping
Cost/Locus
Ease of
scoring
Gel
Method
RAPD
++
N
N
+
-
++
-
+
+
Agarose
AFLP
++
N
N
+
++
++
+
+
++
Polyacryl.
Microsats
+++
Y
Y
++
++
+
+++
++
++
Polyacryl.
CAPS
+
Y
Y
+
+
+
+(+)*
+(+)*
++
Agarose
DGGE,
SSCP
+++
Y
N
++
+
+
++
++
++
Polyacryl.
TaqMan
+++
Y
Y
+++
++
++
+++
+++
++
None
* Depends
on cost of restriction enzymes employed
All population genetic/genomic markers are vulnerable to
violations of assumptions- linkage equilibrium, mendelian inheritance, neutrality.
Linkage Disequilibrium- alleles at different loci are found together more or less
often than expected based on their frequencies (and location in the genome).
Goldstein and Weale 2001 Population genomics: linkage disequilibrium holds the key. Current Biology 11:576-579
Population Genomics Research
Understandings population structure, historical migrations, and gene flow
among populations (e.g. SNP density distribution, coalescent approaches)
Need relatively moderate polymorphism, low cost per sample
mtDNA, Microsatellites, SNPs
Understanding current gene flow and mating systems by direct methods
(e.g., maternity analysis, paternity analysis)
Need high polymorphism, codominance, repeatability, low cost per sample
Microsatellites, SNPs
Pharmacogenomics: polymorphism-based approaches for the
discoveryand development of new medications; translating
polymorphisms into “new genomic medicine”*
Need rapid, low-cost, repeatable way to distinguish alleles
screening large numbers of individuals; SNPs and Sequencing
*New York Times, Nov. 2002
Two main hypotheses for human evolution:
“Recent African origin” hypothesis- modern humans
originated in Africa 100 - 200k years ago, and spread
“Multi-regional” hypothesis- modern humans evolved
in different parts of the world
MtDNA favored out of Africa hypothesis but lacked
statistical support for deep African branches
53 human mtDNA sequences (16,500 bp)
examined timing of evolutionary events
mtDNA evolving in a “clocklike” fashion
Linkage Disequilibrium not evident
3 deepest branches lead exclusively to sub-Saharan
Neighbor-joining phylogram based on complete
mtDNA genome sequences (excluding D-loop).
Note star-like vs deep branching topology- larger Ne
1000 bootstrap replicates shown on nodes.
Asterisk refers to the MRCA of the youngest clade
or longer genetic history in Africa; bottleneck in non-Affican containing both African and non-African individuals.
mtDNA mismatch distributions for Africans and non-Africans
• Individuals of African origin show a ragged distribution
consistent with constant population size
• Individuals of non-African origin show a bell-shaped distribution
strongly suggests a recent population expansion
Exodus from Africa began 100 million years ago
Divergence of Africans and non-Africans occurred
52,000 28,000 years ago
Mismatch distributions of pairwise nucleotide
differences between a) African and b) non-African
Human genome mining to produce 507,152 high-confidence SNP candidates
as uniform resource for describing nucleotide diversity and regional variation
within and between human populations
So What’s a SNP?
A mutation that causes a single base change is known as a
Single Nucleotide Polymorphism (SNP)
SNPs are the most simple form and most common source of
genetic polymorphism in the human genome
90% of all human DNA polymorphisms;1SNP in 1000 bp; 1.42 million
SNP Haplotype is a particular pattern of sequential SNPs (or
alleles) found on a single chromosome
Microarrays, mass spectrometry and sequencing are all used to
accomplish grouping or blocking of SNPs= haplotyping
Haplotype Determination Problem- find all haplotypes given a
genome and all identified SNPs (algorithm development)
Approaches to SNP discovery and Genotyping
Many and numerous!
(Reviewed Pui-Yan Kwok Annu. Rev. Genomics Hum Genet. 2001. 2:235-258
SNP discovery can be based on expressed sequence tags (ESTs), genomic restriction fragments,
aligned BAC sequences, random shot gun clone sequences, overlapping genomic clone sequences
Parallel genotyping of SNPs using generic high-density oligonucleotide tag arrays
• Fan et al. (2000) Genome Research 10:853-860. (see Stickney et al 2002 for zebrafish SNP arraying)
• PCR + single base extension chimeric primers, allele specific (labeled) dideox NTPs and then
hybridized to arrays containing thousands of preselected 20-mer oligonucleotide tags
Polymorphism ratio sequencing: a new approach for SNP discovery and genotyping
• Blazej et al. (2003) Genome Research 13:287-293.
• Dideoxy-terminator extension ladders generated from a single sample and reference template are
labeled with fluorescent dyes and coinjected into a separation capillary for comparison of
relative signal intensities.
A novel method for SNP detection using a new duplex-specific nuclease from crab hepatopancreas
• Shagin et al. (2002) Genome Research 12:1935-1942.
• “Duplex Specific Nuclease Preference” - SNP region amplified, template, signal probe, and
matched duplexes are then cleaved by DSN to generate sequence-specific fluorescence
GenBank has a dbSNP
One year ago: dbSNP had 2,842,021 SNP submissions total
Today, 2003, dbSNP has 6,250,820 submissions for human
1,368,805 submissions for mosquito
197,414 submissions for mouse
2,031 submissions for zebrafish
It is possible to search dbSNP by BLAST
comparisons to a target sequence
The SNP Consortium is an alliance of pharmaceutical and computer companies managed
by Lincoln Stein at Cold Spring Harbor Lab.
“The SNP Consortium Ltd.. is a non-profit foundation organized for
the purpose of providing public genomic data. Its mission is to develop up
to 300,000 SNPs distributed evenly throughout the human genome and to
make the information related to these SNPs available to the public without
intellectual property restrictions. The project started in April 1999 and is
anticipated to continue until the end of 2001.”
We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the
human genome, providing an average density on available sequence of one SNP every 1.9 kilobases.
These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone
overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly
available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall
within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP.
Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard
population genetic model of human history. This high-density SNP map provides a public resource for
defining haplotype variation across the genome, and should help to identify biomedically important
genes for diagnosis and therapy.
• Built a set of pairwise sequence
alignments by analyzing the overlapping regions of large insert clones
• Looked for mismatches; SNPs
if Polybayes probability was 0.80
• SNP marker density grouped by
overlapping regions
• Modeled the marker density
distribution
Marker density distributions predicted under
competing population genetic models
No demographic history
Poisson distribution driven by mutation rate
Distribution of polymorphic sites profoundly impacted
Increased pop size yields abundance of new lineages with more mutation
Decreased pop size raises likelihood of relatedness resulting in
over-representation of sequence identity
Collapse followed by a phase of recent population recovery
Evaluated degree of fit between observed density distribution and
probability predicted using the log likelihood of the data for a given model
r indicates the per nucleotide, per generation recombination rate
Superior fit of the modeled parameters (with or without recombination) suggests a
severe, 2- to 7 fold, collapse of population size 40,000 years (1600 generations) ago
….followed by a modest recovery
% of successful trials for each model, at each data fraction;
Assessments based on the amount of data required for rejection by X2 test.
Interestingly, data fit between observations and best-fitting models decays with more data.
History of the inbred laboratory mouse
•Compared the C57BL/6J Mouse genome sequence with 59 finished segments of the 129/Sv inbred strain
•Discovered nearly 70,000 SNPs on blocks of high SNP density (40 SNPs per 10kb)
separated by blocks of low density (0.5 SNPs per 10kb)
•Surveyed panels of inbred mouse strains to find that distinct SNP haplotypes
were shared among common inbred populations.
•Surveyed wild strains showed that 67% of each of the inbred genomes are derived from
European mice and 33% from Asian mice
How about other organisms? or new ‘model’ organisms;
organisms that exemplify phenomena not well studied in human/worm/mouse?
Three-Spined Sticklebacks
morphological evolution
populations isolated after last glaciation, have
diverged morphologically and in sequence
(CAn microsatellites)
strategy: cross benthic and limnetic fish;
intercross F1s, follow morphological traits and
polymorphisms in F2s
see Peichel et al (2001) The genetic architecture of
divergence between threespine stickleback species. Nature
414: 901-5.
Stickleback genetic map
(Woods et al. 2000)
227 polymorphisms
1 SNP marker per 4 cM
took ~4 person-years
now mapping genetic basis of
morphological variations
Zebrafish
Genes
Postlehwait et al. 1994 A genetic linkage map for zebrafish. Science 264: 699-703.
Woods et al. 2000 A comparative map of zebrafish genome. Genome Research 10: 1903-1914.
Geisler et al. 1999 A radiation hybrid map of the zebrafish genome. Nature Genetics 23: 86-89.
Microsatellites
Shimoda et al. 1999 Zebrafish genetic map with 2000 microsatellite markers. Genomics 58: 219-232.
First zebrafish SNP map
5 months ago
2102 SNPs for mutation mapping
Hundreds of SNPs on single array
Stickney et al. 2002 Rapid mapping of zebrafish mutations
with SNPs and oligonucleotide microarrays. Genome Res.
12: 1929-1934.
Vertical lines = 25 linkage groups
Red dots correspond to SNPs represented on the olig. microarray
Population Genomics Research
Understandings population structure, historical migrations, and gene flow
among populations (e.g. SNP density distribution, coalescent approaches)
Need moderate polymorphism, low cost per sample
Allozymes, mtDNA, RAPDs, Microsatellites, AFLPs, RFLPs, SNPs
Understanding current gene flow and mating systems by direct methods
(e.g., maternity analysis, paternity analysis)
Need high polymorphism, codominance, repeatability, low cost per sample
Microsatellites, SNPs
Pharmacogenomics: polymorphism-based approaches for the discovery
and development of new medications; translating polymorphisms into
“new genomic medicine”*
Need rapid, low-cost, repeatable way to distinguish alleles
screening large numbers of individuals; SNPs and Sequencing
*New York Times, Nov. 2002
Inferring Pairwise Relationships with SNPs (in Your Favorite Metazoan)
(Glaubitz, Rhodes, and Dewoody 2003 Molecular Ecology 12: 1039-1047)
Problem:
Need to determine genetic relationships in populations without known pedigrees
Microsatellites current methods of choice among close kin within a population,
but the number of independently segregating microsatellite markers is limited
SNPs may provide large number of segregating loci with
a large number of alleles at even frequencies
Goal:
To assess known pairwise relationships - via single nucleotide polymorphisms
where already have parallel microsatellite results.
Recent advances in microarray technology permit genotyping of large #s of individuals
at 100s to 1000s of SNP loci (reviewed by Kwok 2001)- this could be big!
Need to know if SNPs equal or exceed the power of practical numbers
of microsatellite loci in estimating relationships?
Glaubitz et al. 2003Computer simulations designed to evaluate SNPs ability
to discriminate a variety of (pairwise) relationships likely
to occur in natural populations, comparisons to
microsatellites from Blouin et al 1996
•SNPs segregate independently, ideal genome with 20
autosomes, 5 SNPs per chromosome, 10,000 individuals
random genotypes
Constructed 5 catagories of relationships types
Constructed an array of pedigrees
estimated pairwise relatedness at a single locus (r1)
Evaluated the performance of 100 simulated SNPs by
estimating misclassification (rate) of relationships
• illustrates that different pairwise relationships can have different amounts of inherent variance in relatedness
• the parent offspring (PO) and unrelated (U) relationships have 0 inherent variance (share one or no alleles)
• FS has largest variance; second order relatives can not be distinguished from each other via estimation of r
100 independently segregating SNPs determinined parent-offspring pairs
as well as about 16 or fewer microsatellite loci when both parents are unknown
Even under the optimistic scenario of 100 independent loci, results show little promise
for discriminating higher order relationships on the basis of pairwise relatedness.
Microsatellite approaches are still better…
Conclusion:
“SNPs have limited potential for the delineation of genealogical relationships…”
My two cents:
Based on 1) assumption of independence among the sampled SNP loci
2) that the microsatellites themselves are independent (not linked)
• In the absence of a linkage map, the number of microsatellite or SNP loci scored must be
increased to compensate for the loss of information as a result of nonindependence
between markers
• An alternative to using independently segregating SNPs is to use independently
segregating haplotypes, with each haplotype defined by a cluster of tightly linked
SNPs. (e.g., Heaton et al 2002 sequenced regions around 32 cattle SNPs; additional
183 polymorphic sites; and more haplotypes for better resolution)
To take full advantage of the “vast” abundance of SNPs in metazoan genomes
and their potential automation, we will need analytical methods that account
for tight genetic linkage (McPeek and Sun 2000) and known recombination
frequencies….
until then, SNP population genomics will likely only be used on model organisms.
Population Genomics Research
Understandings population structure, historic migrations, and gene flow among
populations (e.g., Fst, coalescent approaches)
Need moderate polymorphism, low cost per sample
Allozymes, mtDNA, RAPDs, Microsatellites, AFLPs, RFLPs, SNPs
Understanding current gene flow and mating systems by direct methods (e.g.,
maternity analysis, paternity analysis)
Need high polymorphism, codominance, repeatability, low cost per sample
Microsatellites, allozymes
Pharmacogenomics: polymorphism-based approaches for the discovery and
development of new medications; translating polymorphisms into “new genomic
medicine”*
Need rapid, low-cost, repeatable way to distinguish alleles
screening large numbers of individuals; SNPs and Sequencing
*New York Times, Nov. 2002
Pharmacogenomics
The use of DNA sequence information to measure and predict the
reaction of individuals to drugs.
Pharmacogenetics is the study of this variation at the level of a
single gene, while pharmacogenomics studies variation at the
genome wide level.
Observation that there is great individual variation in response to
drugs- genetically determined.
It is possible to measure many thousands of SNPs simultaneously
in a small blood sample from a patient
Can compare “genotypes” for SNP markers linked to virtually any
trait
Evolving Paradigm for Discovery of Genetic Polymorphisms
associated with aberrant drug disposition or effects
Observed phenotype - family studiesinherited basis
More discoveries thru polymorphisms
in candidate genes (metabolism; transport;
targets of candidate medication
New Drug Targets Expected from the Human
Genome Project
Number of Drug Targets
12,000
5,000–10,000
10,000
8,000
6,000
4,000
2,000
Approx. 500
0
Cumulative Number
of Targets Known
Today
New Targets Expected
from Human Genome
Project
Source: Drews J. Nat Biotechnol 1996;14.
The Gene for…
Disease Genes Discovered
For 1100 genes at least one disease-related mutation has been identified
Clinical disorders and gene mutations
Different mutations in the same gene can give rise to more or less
distinct disorders, so total number of diseases for which there are
known mutations is ~1500
Functional Classifications
Disease genes classed by function and their relative representations
Some Diseases Involve Polygenic Effects
There are a number of classic “genetic diseases” caused by
mutations of a single gene
– Huntington’s, Cystic Fibrosis, Tay-Sachs, PKU, etc.
There are also many diseases that are the result of the
interactions of many genes:
– Asthma, Heart disease, Cancer
Each of these genes may be considered to be a risk factor for the
disease.
Groups of SNP markers may be associated with a disease
without determining mechanism
Gene Product- Drug Interaction
There are proteins that chemically activate or inactivate
drugs.
Other proteins can directly enhance or block a drug's activity.
There are also genes that control side effects.
Some Examples
10% of African Americans have polymorphic alleles of Glucose-6phosphate dehydrogenase that lead to haemolyitic anemia when
they are given the anti-malarial drug primaquine.
Succinylcholine Toxicity
0.04% of individuals are homozygous for alleles of
psedocholineseterase that are unable to inactivate the muscle
relaxant drug succinylcholine, leading to respiratory paralysis.
Isoniazid Metablolism
There are many polymorphic alleles of the N-acetlytransferase
(NAT2) gene with reduced (or acclerated) ability to inactivate the
drug isoniazid.
Some individuals developed peripheral neuropathy in reaction
to this drug
Some alleles of the NAT2 gene are also associated with
succeptibility to various forms of cancer
Cytochrome P450
~10% of the Caucasian population is homozygous for alleles of
the Cytochrome P450 gene CYP2D6 that do not metabolize the
hypertension drug debrisoquine, which can lead to dangerous
vascular hypotension.
ACE
Patients homozygous for an allele with a deletion in intron 16 of
the gene for angiotensin-converting enzyme (ACE) showed no
benefit from the hypertension drug enalapril while other patients
benefit.
Collect Drug Response Data
These drug response phenotypes are associated with a set of
specific gene alleles.
Identify populations of people who show specific responses to a
drug.
In early clinical trials, it is possible to identify people who react
well and react poorly.
Make Genetic Profiles
Scan these populations with a large number of SNP markers.
Find markers linked to drug response phenotypes.
Use the Profiles
Genetic profiles of new patients can then be used to prescribe
drugs more effectively & avoid adverse reactions.
Can also speed clinical trials by testing on those who are likely to
respond well.
Major pharmacogenetics approaches in post-genomic era
Identifying SNP variations in the genome and populations
Study of differential gene expression
Chips with mRNAs from different tissue types or normal and
diseased tissue
Can detect expression of a target gene among 50,000-300,000
transcripts on a microarray
Possibility of simultaneously monitoring expression of every
gene in any tissue will be possible
Detecting new metabolic disease pathways
Based on comparisons with other model organisms
Micro-Array technology to analyze gene expression
The principle behind this is to
look at differences in gene
expression when variables are
changed eg. Yeast cells grown in
the presence of EtOH- what
genes are turned on or off in
response to that change in the
environment
Another variable could be
normal versus diseased tissue
Pool the cDNAs
The cDNAs are hybridized
to microarrays on which
every gene that has been
cloned is present [the DNA
is spotted on the
microslides and each spot
corresponds to DNA from a
different gene]
If a particulatr gene is
expressed, then it will be
present and labelled in the
the cDNA pool. It can then
hybridize to the spot of the
plate corresponding to that
particular gene
The results from such an
experiment look like this
where the color of the
spot tells you something
about that gene
expression and drug
therapy optimization.
The data can then be analyzed
and sorted into tables that show
which genes are expressed in
response to the stimulus and
which are turned off
This sort of experiment can be
done with any collection of
RNAs that you want to
compare- particularly useful to
compare ‘normal’ to
mutant/disease state- eg. tells
you what genes are turned on
in cancerous cells, may give
you a clue as to how cancer
works
Link Gene Expression to Genome Sequence
Identify promoter and 5' sequence for a group of
co-expressed genes.
Scan for known transcription factor binding sites.
Predict new regulatory sites based on common
sequence elements.
Diagnostic arrays
-Examples of factors
showing variability
that could be detected
on arrays
-Provide information
of status of SNPs and
gene expression
profiles
Pharmacogenomics - The Future
Ultimate goal is to personalize drug treatment regimes
$
Faster clinical trials
$
Less drug side effects
$
Identify how genetic factors interact to affect variation in drug
outcomes
Inactivation or activation by oxidation by cytochrome P450a
Clearance from bloodstream through kidney
Target sensitivity
Toxicity
Heterogeneity of disease mechanisms
Pharmacogenomics - The Future…continued
Mutations in coding sequences will probably only play a small role
in disease susceptibility between individuals
Variations affecting splicing and gene regulation will play a
greater role
We know very little about the the importance that variations in
regulatory and intronic sequences have and how they differ
between populations
Issues:
associating sequence variations with heritable phenotypes
how genotypes affect common diseases, drug responses, and
other complex phenotypes
Booming Population Databases
Science News Focus
The promise is to deliver “personalized” medicine