Microbial Genomics - Microbiology and Molecular Genetics

Download Report

Transcript Microbial Genomics - Microbiology and Molecular Genetics

Microbial genomics
Genomics: study of entire genomes
Logical next step after genetics: study of genes
Genomics:
1) “Structural genomics”
* Determine and annotate DNA sequence of entire genome
* Determining crystal structure for all predicted proteins
2) Functional genomics
* What genes are expressed when? DNA microarrays
3) Comparative genomics
* Compare and contrast metabolisms, evolution, gene transfer
Sequencing a genome
Whole-genome shotgun sequencing (Venter & Smith):
* Shear DNA into random fragments of appropriate length
* Clone many fragments into plasmid: library
* Determine DNA sequence many of the inserts
* Chain terminator approach, fluorescently labeled
nucleotides, capillary electrophoresis of DNA.
* Assemble sequences into long “contigs”, then entire genome
* Hundreds of bacterial genomes have been sequenced, rapidly
growing
* Next 2 years: 1,000-fold increase in DNA sequencing
capacity (for the same price).
A bacterial genome for $100, instead of $100,000
How to sequence a DNA fragment?
1) In vitro DNA replication using DNA polymerase.
2) Use small % of chain-stopper nucleotide derivatives that lack a 3’ OH
group: incorporation stops further growth of DNA chain
Frederic Sanger: 2 Noble prizes.
For protein sequencing and for
DNA sequencing
Use 4 chain-stopper nucelotides
Recent advance: use
4 fluorescent colors
for the 4
chainstopper
nucleotides
Some completed microbial genomes
A typical bacterial genome
* 1-5 million base pairs (Mbp)
* ~500-5000 different proteins encoded
New questions:
* What is the minimal set of proteins that can sustain
bacterial life?
* Can an artificial genome be assembled to yield a novel
bacterium? (“Frankencell”)
Genome annotation
* Identify open reading frames (ORFs)
* Identify ORFS that have been observed before in other
organisms (databases)
* If amino acid sequence very (?) similar, and the related protein
has a known function: gene successfully annotated.
* Problem: ~50% of all genes are of unknown function; many
“conserved hypothetical proteins”
* Methods to analyze various other features of the genome:
* Operons, promotors, rbs, transcription terminators
Operons often contain valuable information:
analysis of flanking genes
* tRNA and rRNA genes
* Domain structures of proteins
How to use a genome sequence?
* How many
genes are
used for
which
functions?
How to use a genome sequence?
* Metabolic reconstruction
How to use a genome sequence?
* Mass spectrometry is very efficient link between
(complex) protein sample and genomic information
LC-MS can identify the mass of many proteins in a complex
protein sample, and also partial amino acid sequence
information (by fragmenting a selected protein peak).
By combining this information with the predicted molecular
weights of all predicted proteins in the genome: from protein
sample to knowledge of the sequence of the protein, and
which gene encodes it.
DNA microarrays
* Chemically synthesize one (or more) DNA fragments
corresponding to sequence of each predicted gene: few
thousand DNA fragments.
* Put a very small droplet of each DNA fragment on a
glass slide, “dry and bake” to attach DNA
* Purify RNA from cells (for example grown under two
different conditions), then copy RNA into DNA using
reverse transcriptase (cDNA) using fluorescent nucleotide
analogs.
* Hybridize cDNA to glass slide and use microscope to
see which gene is copied into RNA (fluorescence)
DNA microarrays
Environmental genomics
DNA sequencing of uncultured organisms. An example:
Beja et al., 2000. Evidence for a new type of phototrophy in
the sea. Science 289: 1902-1906.
“Sequence analysis of a 130-kb genomic fragment that
encoded the ribosomal RNA (rRNA) operon from an
uncultivated member of the marine -Proteobacteria (that is,
the "SAR86" group) also revealed an open reading frame
(ORF) encoding a putative rhodopsin (referred to here as
proteorhodopsin).”
Expression of proteorhodopsin in E. coli
Beja et al., 2000. Evidence for a new type of phototrophy in
the sea. Science 289: 1902-1906.
“Extremely halophilic archaea contain retinal-binding integral
membrane proteins called bacteriorhodopsins that function as
light-driven proton pumps. So far, bacteriorhodopsins capable
of generating a chemiosmotic membrane potential in response
to light have been demonstrated only in halophilic archaea.
We describe here a type of rhodopsin derived from bacteria
that was discovered through genomic analyses of naturally
occuring marine bacterioplankton. The bacterial rhodopsin
was encoded in the genome of an uncultivated gammaproteobacterium and shared highest amino acid sequence
similarity with archaeal rhodopsins. The protein was
functionally expressed in Escherichia coli and bound retinal
to form an active, light-driven proton pump.”
Rhodopsins in Archaea
Archaea
Sensory
rhodopsin
↓
Methylaccepting
chemotaxis
protein
↓
CheA
↓
CheY
↓
flagellar
switch
↓
Change in
swimming
Bacteriorhodopsin: light-driven proton
pump in Halobacterium salinarum
Bacteriorhodopsin
Photosynthetic reaction center:
Photosynthesis based on light-driven electron transfer
from (bacterio)chlorophyll
Bacteriorhodopsin/proteorhodopsin:
Photosynthesis based on photoisomerization of retinal
Beja et al., 2001. Proteorhodopsin phototrophy in the
ocean . Nature 411: 786-789.
“Here we report that photoactive proteorhodopsin is
present in oceanic surface waters. We also provide evidence of
an extensive family of globally distributed proteorhodopsin
variants. The protein pigments comprising this rhodopsin
family seem to be spectrally tuned to different habitatsabsorbing light at different wavelengths in accordance with
light available in the environment. Together, our data suggest
that proteorhodopsin-based phototrophy is a globally
significant oceanic microbial process.”
Color adaptation in proteorhodopsin
Metagenomics
Sequencing of many genomes simultaneously using
environmental DNA samples
Unsolved challenge: assembling data into distinct genomes
Example: Venter et al., 2004. Environmental Genome Shotgun
Sequencing of the Sargasso Sea. Science 304: 66-74.
Venter et al., 2004. Environmental Genome Shotgun Sequencing
of the Sargasso Sea. Science 304: 66-74.
“We have applied "whole-genome shotgun sequencing" to
microbial populations collected en masse on tangential flow and
impact filters from seawater samples collected from the Sargasso
Sea near Bermuda. A total of 1.045 billion base pairs of
nonredundant sequence was generated, annotated, and analyzed
to elucidate the gene content, diversity, and relative abundance of
the organisms within these environmental samples. These data
are estimated to derive from at least 1800 genomic species based
on sequence relatedness, including 148 previously unknown
bacterial phylotypes. We have identified over 1.2 million
previously unknown genes represented in these samples,
including more than 782 new rhodopsin-like photoreceptors.”
Proteorhodopsin is ubiquitous in the ocean
Summary on proteorhodopsin
In ~5 years bacteriorhodopsin changed from a model
system for proton pumping in halophilic archaea to a
protein (proteorhodopsin) that contribute significantly
to global photosynthetic activity in marine
proteobacteria.