The Human Genome Project - Homepages | The University of
Download
Report
Transcript The Human Genome Project - Homepages | The University of
The Human Genome Project
Lecture 4
Strachan and Read Chapter 8
The HGP’s primary aims
• The main aims of the Human Genome
Project (HGP) were to:
– Construct maps of the genome (genetic and
physical)
– Identify all the genes (now known to be about
30,000)
– Determine the entire DNA sequence
(3,000,000,000 bp)
Other aims of HGP
• As well as the genome sequence, the
aims were:
• Technology development
• Model organism genome projects (E. coli,
yeast, mouse, fruit fly, C. elegans)
• Ethical, legal and societal implications
(ELSI)
The linkage map
• The map was built by linkage studies in 60 large families with
grandparents and large numbers of children, collected by the
University of Utah and the Centre d'Étude du Polymorphisme
Humain (CEPH), Paris
• Families were typed with over 5000 polymorphic DNA sequences:
60% were microsatellite repeats (mostly dinucleotide (CA) repeats,
also some tri- and tetra-nucleotides). Only about 400 of them were
actual genes
• Construction of the genetic map:
– Obtain genotypes of all markers on all family members (PCR and gel
electrophoresis, using robots and automated gel apparatus
– Calculation of recombination fractions between markers
– Observe crossovers between closely linked markers, use this
information to confirm order of markers
• Construction of the linkage map is a very big problem; sophisticated
software was used to work out the "best fit" map of all the markers,
with advanced statistical methods and algorithms
STSs and ESTs
•
Sequence tagged sites (STSs) are specific loci in the genome, for which
enough DNA sequence is available to make PCR primers to amplify the
locus (usually as a fragment of a few 100bp). These include microsatellites
(e.g. CA repeats) that can be used for linkage studies.
•
The information required to use an STS is just the sequences of the PCR
primers; therefore it is very easy to make databases of STSs that can be
used by anyone. No actual bits of DNA need change hands. This is crucial
in allowing genome projects to proceed as international collaborations, with
many laboratories participating in a co-ordinated way.
•
ESTs act as specific tags for each human gene, since they are derived by
sequencing cDNA clones which came from mRNA and therefore represent
the actual transcribed sequences (as opposed to STSs, which can be
derived from anywhere in the genome and are mostly non-coding). They
allow rapid access to the actual genes, ignoring introns and “junk” DNA
ESTs can be 3' or 5' depending on which end of the cDNA was sequenced.
Because of the methods used to make cDNA libraries, parts of the 5' end of
the gene are often lost during cloning whereas the 3' end is more reliable.
Therefore, the same gene may give different 5' ESTs and it will difficult to
deduce whether they have come from the same gene. This shown on the
diagram by the white boxes representing cDNA clones being different
lengths. Another complication is due to alternative splicing. On the left is
shown the genomic structure of a gene, with the exons as boxes - the red
one is subject to alternative splicing.
X-ray hybrid mapping
• X-ray hybrids are made by irradiating a human cell line with 3000
rad of X-rays, fusion to hamster cells, and isolation of hybrid cell
lines in culture
• A panel of 100-200 hybrids with 5-10 different fragments of human
DNA in each gives about 1000 fragments in total, i.e. the human
genome has been divided into 1000 bits.
• The closer together 2 markers are in the genome, the more likely it
is that they will be present in the same hybrids (since they are less
likely to be separated by an X-ray induced break).
• By doing a PCR assay for each marker on all the hybrids, a map can
be made. The units are called cR (centiray, where 1cR is a 1%
chance that the markers will be separated by X-ray breakage).
For each pair of markers in turn the "co-retention frequency" is the number of
hybrids in which both markers are present, divided by the number of hybrids
in which one or other (or both) markers are present. On the figure, there are
5 hybrids containing both markers B and C, and 6 containing B and/or C.
Therefore the co-retention frequency is 5/6 or 0.83. Likewise it is 6/7 for
markers E and F, and 2/10 for markers C and E. This shows that B and C are
close together, E and F are close together, but C and E are further apart. The
analysis is extended to all the markers and their order is worked out by
considering all the co-retention frequencies.
Clone contigs
• A clone contig is a series of cloned DNA
segments that overlap each other,
assembled in the correct order along the
genome
• The clones are made using vectors:
– cosmids (capacity 45 kb)
– BACs or YACs (Bacterial or Yeast Artificial
Chromosomes) which can clone 100s of kb of
DNA - more suitable for dealing with large
stretches of mammalian DNA.
Making a clone contig by fingerprinting
Putting it together
• The physical map consists of 1000s of cloned genomic
DNA fragments, in E coli host cells (BACs, cosmids, 40250kb) or yeast (100-1500kb: "Yeast artificial
chromosomes" or YACs), X-ray hybrids, and hundreds of
thousands or STSs and ESTs.
• The linkage map contains several thousand STSs.
• All of these can be linked together to produce an
integrated genome map.
• The presence or absence of each STS or EST in each
X-ray hybrid and cloned DNA is simply determined by
PCR.
• Because of the huge numbers involved, automation of
the assays is required.
Sequencing
• There was a great deal of human genome to sequence
(3000 Mb, or 3 x 109 bp).
• Due to the limitations of the techniques, each
sequencing reaction can only generate up to 700 bp of
DNA sequence.
• So the total sequence must be assembled from millions
of short, overlapping bits of sequence. The starting point
for this is the contigs of overlapping BAC clones.
• Each clone in the contig is subcloned into 100s of
smaller fragments, using a plasmid vector suitable for
preparing templates for the DNA sequencing reactions.