Genome-wide ssociation studies & chromosome walking
Download
Report
Transcript Genome-wide ssociation studies & chromosome walking
New Class Offering
MSC 381/BIL 385 – 3 Credits
Marine Field Ornithology
Spring Semester 2016
Enrollment Limited to 10 Students
Supplemental Course Fee: $640
Instructor:
Kevin G. McCracken
http://www.bio.miami.edu/mccracken/
Waterbirds such as seabirds, shorebirds, and wading birds are key components of the marine
ecosystem, including pelagic, coastal, and estuarine communities, and they are prominent and
conspicuous members of South Florida avifauna. Their abundance provides a proxy for the health and
well being of these ecosystems and populations of the economically important fish and many different
species of invertebrates they prey upon. Waterbirds have also served as important model systems for
studies of behavior, evolutionary biology, and ecological theory. This course will provide an
introduction to waterbird biology and conservation, including lectures about the natural history of
species to be encountered in South Florida and a series of four weekend day and/or overnight fieldtrips
to important waterbird habitats such as the Everglades, Florida Keys, and Dry Tortugas.
Field Trip Destinations:
Big Cypress National Preserve
Tigertail Beach (Shorebird Trip)
Everglades National Park
Dry Tortugas National Park
Today’s Lecture
Genetic mapping studies: two approaches
Classical linkage map/genome-wide association study
Physical map
Cloning and isolating genes the old-fashioned way using
positional cloning
Search for the cystic fibrosis gene
Next Lecture
Modern genome sequencing
Shotgun sequencing an entire genome
Sequencing the human genome
Functional/comparative genomics
RNA-Seq – sequencing the transcriptome
Classical genetic linkage & association studies:
Genetic linkage mapping involves determining the statistical association
of specific traits with genetic markers on chromosomes using pedigrees
and crosses.
Genome–wide Association Studies = GWAS
Use recombination frequencies to determine a relative distances
between markers on a chromosome + statistical association
with trait or disease of interest.
Humans require 24 different maps, one for each of the 22
autosomes and one each for the X and Y chromosomes.
Linked genes = on the same chromosome or genome
The unit measured for each linkage map is the recombination
frequency = # recombinants/total progeny
Reported as map units (mu) or centiMorgans (cM) --- distinct
from physical distances.
Different types of markers used in genetic mapping:
1.
Genes can be used as genetic markers, but they are not ideal
choices because they occur infrequently (ca. every 100 kb in
humans).
2.
Greater marker density is usually required.
3.
3 major types of markers have been used:
1.
RFLPs = substitutions at a restriction site.
2.
Microsatellites (STR) = short tandem repeats
3.
STS = sequence tagged site
4.
Single nucleotide polymorphisms (SNPs)
Genome-wide mapping:
High density genetic mapping was revolutionized in the 1980s by
the discovery of abundant polymorphic genetic markers like
microsatellites.
Research teams collaborated and added to a common database.
By 1994, human genetic map had localized:
5,264 microsatellites to 2,335 chromosome loci
(average density of one marker every 599 kb)
In the process, thousands of sequence tagged site (STS) identified.
STS = couple hundred base pairs of known sequence
High-density genetic
map of 5,264
microsatellites
localized to each of
23 chromosomes.
From genome-wide mapping to genome sequencing…
For many species with small genomes, such a map would have
provided enough landmarks to begin sequencing the entire genome
using a conventional map and sequence approach using PCR +
sequencing.
Human map still lacked resolution, large stretches of uncharted DNA
remained.
Average distance between markers was 600 kb.
Physical mapping was required to assist with the sequencing.
Physical map = map of physically identifiable regions of genomic
DNA constructed without recombination analysis.
Time and effort could be minimized by targeting sequencing efforts
to a specific chromosome (or smaller regions).
Two types of physical maps useful for sequencing a genome:
1. Low Resolution-Cytogenetic/FISH maps
•
Stained chromosomes produce banding patterns composed of bands
that average 6 Mb.
•
Regions are designated by their position relative to the centromere.
“q” = long arm
“p” = short arm
Numbered from the centromere starting with “1”
•
Genes and other sequences are localized to chromosome maps with
probes and by using a technique called fluorescent in situ
hybridization (FISH)
•
Various types of radioactive probes and stains also can be used to
mark specific regions of chromosomes.
•
Provides a physical map of the overall structure of each
chromosome/region.
http://www.mun.ca/biology/scarr/FISH_chromosome_painting.htm
http://www.euchromatin.org/E09.htm
Two types of physical maps useful for sequencing a genome:
2. High Resolution-YAC/BAC Clone Contig Maps
•
Mechanically shear or partially digest genomic DNA with restriction
enzymes and clone large 200-500 kb overlapping fragments to YACs
or BACs.
•
An entire genome or single chromosome can be represented in a
YAC or BAC clone library (depends on starting point).
•
Overlapping YAC/BAC clones can be assembled into a scaffold
without sequencing by DNA fingerprinting using markers like
microsatellites.
•
BAC vectors with a capacity of 300 kb and ability to replicate in E.
coli have become popular for genome sequencing (now routinely
sequenced using the shotgun approach).
Fig. 10.1 2nd edition, YAC contig physical map assembled by
microsatellite mapping (combination YACs + microsatellite mapping)
Cloning, isolating, and sequencing genes:
Locating a gene is easy if the gene product (protein) is identified.
1.
Create a cDNA library using an expression vector.
2.
Probe with antibodies that bind the gene product.
3.
Isolate and sequence positive clones.
If the gene product is unknown, locating and sequencing a gene is
more difficult.
1.
Identify a marker (microsatellite, RFLP, SNP) that:
1.
2.
1.
Shows a strong statistical association with the disease
phenotype in test crosses or genome-wide association study
(GWAS).
Is physically linked to the gene on the same chromosome.
Use a technique called positional cloning + chomosome walking
to home in on gene and actually sequence it.
e.g., cloning and discovery of the cystic fibrosis (CF) gene.
Positional Cloning- identification the cystic fibrosis (CF) gene:
Most common lethal genetic disease in the U.S. (~1 in 2,000).
First human gene identified by positional cloning.
Required 4 years and the work of many laboratories.
Overview of cystic fibrosis:
CF results from defect in protein that regulates the movement of salt
and water in and out of cells.
Causes thick mucus secretions in the lungs, pancreas, and intestines.
Causes lung disease and organ failure, patients experience chronic
bacterial infections.
Life expectancy is abut 40 years.
First steps to identifying the CF gene by positional cloning:
1.
Many hundreds of individuals with CF pedigrees were screened with
a large number of RFLPs.
2.
A single recurring RFLP showed weak linkage (statistical
association) to the cystic fibrosis trait.
3.
CF gene was next localized to chromosome 7 using a labeled RFLP
probe and in situ hybridization to condensed chromosomes.
4.
All other known RFLPs from chromosome 7 were simultaneously
screened for linkage to CF.
5.
Two more linked RFLPs were discovered on a 500,000 bp subregion
(31-32) of the long arm of chromosome 7 (7q31-q32).
6.
The data indicated CF locus is within a 500,000 bp region of
chromosome 7.
Steps to identifying the CF gene (cont.):
1.
Section (500 kb) of chromosome 7 containing the CF gene was cut,
cloned, and mapped using two techniques called chromosome
walking & chromosome jumping.
1.
End of a cloned sequence is used as a probe to find adjacent
overlapping fragments in a genomic library.
2.
Clones that overlap are mapped with RFLPs to determine the
extent of overlap.
3.
A new labeled probe designed for the second clone is used to
screen the library once again.
4.
Repeat…
Chromosome walking/jumping techniques do not work well with
highly repetitive sequence that are scattered throughout the
genome.
Length of each step in the chromosome walk/jump is limited by the
size of inserts in the library and the size of the overlap.
Fig. 9.10,
2nd edition
Illustration
showing how
chromosome walking
was used to
identify a candidate
gene for a disease like
cystic fibrosis.
Technique called chromosome jumping also was used:
1.
Use partial restriction digestion to cut a large section of
chromosomal DNA into large overlapping fragments.
2.
Circularize fragments with DNA ligase, bringing ends of DNAs that
previously were distant close together.
3.
Cut the circles with a restriction enzyme yet again to release the
junction region (ends are now inverted).
4.
Clone junction regions to form a jumping library.
5.
Subclone a small fragment of DNA and use as a probe to find the
next junction fragment occurring in the library (same technique as
chromosome walking).
6.
Repeat… and/or start chromosome walking.
7.
Chromosome jumping reaches the target gene faster than walking.
8.
Similar technique called “mate pair” is used in today’s nextgeneration sequencing to sequence ends of very long DNA
fragments.
Chromosome
jumping
Preparation of next-generation mate-pair library:
http://www.investigativegenetics.com
Summary of the search for the CF gene:
1.
7 chromosome jumps were made for CF.
2.
Chromosome walks were made from each jump site to identify
overlapping clones.
3.
Clones spanning a total 500 kb eventually were characterized.
4.
Next, cloned DNA was used as a probe against other species using a
restriction digest + Southern blot.
*Genes are more conserved than non-coding sequences and
similar sequences should be found in other species.
5.
Five subclones (or candidates) hybridized with other organisms.
6.
Two of the subclones were ruled out by linkage analysis, and a third
was a pseudogene (gene-like sequence lacking expression signals).
7.
Remaining two clones were hybridized with mRNA on a Northern blot
to test whether their sequences are transcribed.
8.
One more candidate was eliminated, and the 5th candidate was
sequenced…
Characteristics of the CF gene:
1.
cDNA (mature mRNA of same size) is 6,500 bp.
2.
Genomic DNA: CF gene spans 250 kb and contains 24 exons.
3.
68% of Caucasians with cystic fibrosis show a 3-bp deletion that
results in the loss of phenylalanine (Phe).
4.
Sixty other mutations described.
Fig. 4.13, CFTR Structure
Cystic Fibrosis
Transmembrane
Conductance
Regulator
Protein