The E. coli genome.

Download Report

Transcript The E. coli genome.

IB404 - 3. Bacterial Genomics - Jan 25
1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp
genome of the phiX174 phage in 1978, the 16kb human mitochondrial
genome in 1981, and then developed the method of whole genome
shotgun cloning and sequencing to determine the 48kb lambda phage
genome in 1982.
2. All alternatives, such as primer-walking, nested deletions, transposoninsertions, etc., involve additional costs.
3. When faced with the 4.6 Mbp E. coli genome,
Fred Blattner at the University of Wisconsin, chose
to map the genome physically as overlapping large
clones, before shotgun sequencing each clone to
build the genome. It took a decade, using mostly
manual radioactive sequencing, finally published in
1997. It is annotated as containing about 4,500
genes, so one gene per kb (generally true for
bacteria and viruses, e.g. 10kb HIV has 10 genes).
The E. coli genome.
The origin and terminus
of replication are shown
as green lines, with blue
arrows indicating
replichores 1 and 2. A
scale indicates the
coordinates both in base
pairs and in “minutes”
of recombination. The
distribution of genes is
depicted on two outer
rings: The orange boxes
are genes located on the
presented strand, and
the yellow boxes are
genes on the opposite
strand. Red arrows
show the location and
direction of
transcription of rRNA
genes, and tRNA genes
are shown as green
arrows.
4. Craig Venter, who had already shaken up the human genome field by
generating large numbers of ESTs (expressed sequence tags) from the
ends of randomly picked human cDNA clones at his new TIGR institute
(The Institute for Genome Research – later run by his wife, Claire
Fraser) in Maryland, tried a whole genome shotgun (WGS) in 1995 to
sequence the 1.8 Mbp genome of Haemophilus influenzae and the 0.58
Mbp genome of Mycoplasma genitalium, together with Hamilton Smith
at Johns Hopkins (he grew up here, went to Uni and UIUC, won a Nobel
for the first endonuclease restriction enzyme in H. influenzae).
J. Craig Venter
Claire Fraser
Hamilton Smith
Origin of replication
60kb total here
H. influenzae genome - outer circle is
genes in one direction, inner circle the
other. Colors are functional categories,
e.g. enzyme, channel, receptor, repair,
transporter, structural, replication,
transcription, translation, etc. Arrowhead
is the origin of replication.
Detail of region around the origin of
replication. Note that there is little
“spacer” DNA between genes. There are
operons of multiple genes. Not all genes
are named or had known functions, e.g.
HIN0006, at least when this was done.
Even today, ~100 of the 483 genes in M.
genitalium have unknown functions.
Whole genome shotgun sequencing strategy
1. Randomly shear genomic DNA into small pieces, size-fractionate on
a gel (e.g. only 2-3kb or 9-11kb pieces), and clone in a plasmid.
2. Sequence each randomly picked plasmid clone insert from each end
using flanking primers that anneal to the plasmid vector sequence. These
plasmid insert end sequences don’t usually overlap, but their
orientation and a rough size are known - they are mate-pairs.
3. Do this enough times that you have generated 6-10X coverage of the
entire genome, usually from roughly 20-30X clone coverage.
4. Use an assembly program to build the genome, for bacteria usually
circular, by first building contigs of contiguous overlapping
sequence, and then link these contigs into scaffolds using mate-pair
information, leaving sequence gaps between contigs.
5. Finish sequence gaps, and any clone gaps between scaffolds, by
directed methods, e.g. using PCR with primers to the ends of contigs
or scaffolds to amplify across gaps and sequence the purified PCR
products, usually directly without cloning them.
WGS schema
One plasmid clone with two mate-pairs sequenced from ends –
dots are unknown sequence.
Contig1
Sequence
gap
A scaffold
Contig2
Clone gap
Many bacterial genomes
1. Today there are >2000 genomes available and >200 from Archaea.
2. For example, Blattner sequenced several strains of E. coli, including
the “hamburger” strain, and related Shigella and Salmonella species,
yielding information on pathogenicity islands of genes implicated in
causing disease.
3. Many others are other famous pathogens, e.g. Borrelia burgdorferi,
Helicobacter pylori, Treponema pallidum, Neisseria menigitidis, Yersinia
pestis, and Vibrio cholera.
4. Others exhibit unusual biology, e.g. Deinococcus radiodurans,
Thermatoga maritima, and Methanococcus jannaschii.
5. They range in size from around 0.5 Mbp for various intracellular
parasites, such as Buchnera species, to over 12 Mbp for Streptomyces
species, which form colonies making antibiotics.
6. The small genomes of intracellular parasites result from gene loss, e.g.
Rickettsia only have about 800 genes, while the aphid endosymbiont
Buchnera genome is largely colinear with E. coli, but has lost about 4000
genes!
7. The phylogenetic trees derived from these genome sequences largely
agree with the 3-domain 16S rRNA-based trees of Carl Woese, but only
when the core set of replication, transcription, and translation proteins
are employed.
8. When other gene sets are examined the result is usually a web rather
than a tree, indicating that horizontal gene transfer between distantly
related bacteria, and even archaea, but seldom eukaryota, has been
widespread.
Metagenomics
Venter and others have continued to push the envelope of bacterial
genome sequencing, most prominently by doing metagenomics, in which
genomic DNA is extracted from environmentally collected samples, e.g.
ocean water or a mine dump or human skin, without trying to culture
bacteria, and sequenced extensively. These studies have confirmed that
there is an extraordinary diversity of uncultured Bacteria and Archaea
out there, and that some have entirely novel metabolic abilities. They
also confirm that there are only the known three domains of life.
When the sample is relatively simple, e.g. a few species from a toxic
mine sample, entire circular genomes will sometimes assemble.
Otherwise they generally obtain long scaffolds containing multiple genes
together in operons, which is often enough to define metabolic pathways.
Today a major effort is underway to do this for human commensal
bacteria, called the microbiome, including oral, gut, vaginal, and skin
bacterial communities.
As an example of the kinds of findings from this work, last year a group
published an analysis of the frequency of horizontal gene transfer
(HGT)across bacteria that are human commensals versus those that are
not.
They had ~1000 genomes
in each category, and
looked for regions with
99% DNA sequence
identity in species with
<97% rRNA identity (so
they were not closely
related). They found high
levels of HGT across
human commensals, and
even higher HGT across
species living in the same
regions of the human
body. Thus ecology
facilitates or drives HGT.