Transcript Document
Coming Soon to
a Lab Near you…
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
AST734
Andrew Boal
25 November 2004
Image credits: NASA, NPS,
and Protein Data Bank
Extreme environments and astrobiology
Numerous extreme terrestrial habitats are seen as potential
analogs to life-bearing niches in the solar system
Extreme environments are those which exist outside of the conditions of
a “mesophilic environment” (T~30-40oC, salt concentration <3%, etc)
Terrestrial examples include hot
springs (high temp.), salt lakes
(high salt), deep sea vents (high
pressure), deserts (low water)
These “extreme” environments
might model conditions found on
Mars, Europa, Titan, elsewhere
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Image credits: NASA, NPS
Extreme environments: microbes in residence
Extremeophiles are defined by the type of environment
required for growth
There is no overall consensus on the definition of an extreme environment
Organisms that can survive in an extreme environment but do not require
those conditions for growth are extremeotolerent
Mesophile: Lives in an ambient environment
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Thermophile: Temp. > ~45oC
Psychrophile: Temp. < ~20oC
Barophile: High pressure
Xerophile: Low water content
Halophile: Salt content > 3-10%
Acidophile: pH < 5
Alkaliphilie: pH > 9
Image credit: CDC
Radiophile: high amounts of radiation
Biogeography
Biogeography is the study of the environmental distribution of
species
One can explore several, isolated, analagous extreme environments
which may not allow transport of microbes between them to develop
a better understanding of microbial evolution
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Map credit: CIA World Factbook,
Image credits: NPS
But, what about a deeper look?
Molecular components of cells
The predominant components of the molecular makeup of
cells include lipids, nucleotides, and proteins
Nucleotides: protein
blueprints and fabrication
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Lipids:
provide cell
membranes
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Proteins:
do the
work of
the cell
The ability of these molecules to function is directly related to
molecular shape, which is influenced by the environments, so…
Biomolecular structural endemism
The Big Questions:
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Are there
molecular
structures which
are endemic in
an environment?
If so, how and
why are those
structures
arrived at?
Photo Credits: National Park
Service Web pages
What are biomolecules?
Biomolecular structure
Biomolecular structure is determined by a combination of
covalent and noncovalent bonds
Covalent bonds are static entities which are little effected by environment
Noncovalent bonds (hydrophobic interactions, hydrogen bonding, and
electrostatic attraction) exist in a dynamic equilibrium, and thus can be
attenuated by factors such as temperature, ion content, and pH
Biomolecules must both be somewhat flexible and somewhat rigid to
attain proper functioning, therefore the forces that hold the molecular
shape must attain a balance with the environment
Too static- function
is compromised
Balance- function and
function preserved
Too dynamic- structure
is compromised
Lipid structure
Lipids are made up of a hydrophilic (water-loving) head group
and a hydrophobic (water fearing) tail
Hydrophilic
head group
H 3C
CH3
N+ CH3
N+
H 2O
H 2C
CH2
O
O
P-
O
O
O
P-
O
O
O
CH2
H 2C C H
O
O
O C O C
CH2 CH2
H 2C H 2C
O
O
O
O
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
CH2 CH2
H 2C
CH2 CH2
H 2C H 2C
CH2 CH2
H 2C
H 2C
H 2C
CH2 CH2
H 2C H 2C
CH2 CH2
H 2C
H 2C
H 2C
CH2 CH2
H 2C
CH3
CH3
Hydrophobic tail
In cell membranes, lipids pack to form a bilayer
so that the heads are in water and the tails are
mixed together
Lipids in thermal environments
Lipids from thermophilic archaea have a dramatically different
chemical structure
PO
PO
O
O
O
PO
O
O
O
O
O
OP
O
O
O
Hyperthermophile
Archaea lipid
Increased
hydrocarbon
branchingincreased
hydrophobicity
Mesophile
lipid
bilayer
O
Thermophile
Archaea
bilayer
O
OP
O
Head-tail
linkages are
ethers, not
esters, and are
chemically
more robust
Backbone of
both layers is
chemically
connected,
again
increased
stability
O
OP
O
DNA and RNA: chemistry
DNA and RNA are polymers of nucleotides (oligo- or
polynucleotide)
Nucleotides are comprised of nucleobases attached to a sugar
Nucleobases:
O
O
N
H
N
S ugar
H
N
N H
N
Cytosine
H
N
H
N
N
N
N u cle o b a se
O
O
N
S ugar
N
N
N u cle o b a se
O
O
O
B a ck b o n e
O
Sugar
H
Sugars:
B a ck b o n e
O
Thymine (DNA
Uracil (RNA
only)
only)
N
N
H
N
O
N
H
H
O
OH
O
Backbone
Backbone
Ribose is in
RiboNucleic
Acid (RNA)
Deoxyribose is in
DeoxyriboNucleic
Acid (DNA)
N
S ugar
Adenine
S ugar
Guanine
Nucleobases are cyclic structures
which are basic (like ammonia)
The extra -OH (alcohol) in
ribose makes it much less
chemically stable
DNA and RNA: polynucleotide structure
DNA and RNA structure is based on hydrophobic interactions
and hydrogen bonding
Hydrogen bonding is a weak interaction where two
electronegative elements “share” a hydrogen atom (note that
carbon-hydrogen bonds do not partake in hydrogen bonding
Center of duplex is
hydrophobic H
H N
O
Polynucleotide
backbone has
charged
phosphate
groups which
are hydrophilic
N
O
N
N
H
N
N
O
Thymine:Adenine (T:A)
base pair
N
O
H
O
P
O
O-
O
N
N
O
O
N
H
N
N
N
N
O
H N
Guanine:Cytosine (G:C)
base pair
H
O
H
Dashed lines indicate
hydrogen bonds
DNA: secondary structure
Base pairing determines the nature of the secondary
structure
The basic elements of DNA secondary structure are the duplex
(which is by far the most prevalent), the junction, and the hairpin
T
C
G
C
G
G
C
T
C
C G C
T
A
A
G
G
T
A
A
G G C G
A
T
T
C
A
A
T
T
A
G
C
hairpin
duplex
T
T
G C
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
junction
A
DNA melting
One of the easiest ways to measure DNA stability is to obtain
a “melting curve” which is a spectroscopic measurement of
duplex unzipping
Example of a DNA melting curve
obtained spectroscopicly
Representation of DNA
melting by duplex unzipping
or unwinding
Figure taken from: Drukker, K., et. al. J.
Phys. Chem. B. 2000, 104, 6108-6111
Stability of DNA in extreme environments
Main determinant of DNA stability is the fraction of C:G base
pairs in a given oligonucleotide sequence
The primary difference between
an A:T and G:C base pair is that
G:C has three hydrogen bonds,
and is thus more stable
90
T , 69mM NaCl
m
T , 220mM NaCl
m
T , 1020mM NaCl
80
m
H
N
N
N
H
70
N
N
A:T
O
m
N
N
T (Þ C )
H
O
60
H
O
N
N
H
N H
N
50
N
N
N
N
H
H
O
40
0.1
0.2
0.3
0.4
0.5
0.6
DNA content (f
G:C
0.7
0.8
0.9
)
GC
Data taken from: Owczarzy, R., et. al.
Biochemistry, 2004, 43, 3537-3554.
Other factors include hydrophobicity and
interaction between salt and the DNA backbone
The many faces of RNA
RNA is primarily involved in protein synthesis and comes in
three major types:
Ribosome RNA (rRNA) forms the
skeleton of the ribosome, the
machine which makes proteins
Message RNA (mRNA)
is made by transcription
of DNA and lists the
amino acid sequence of
a protein
Growing
protein
Amino
acid
Transfer RNA
(tRNA) transports
amino acids into
the ribosome
Structure of tRNA
tRNA is a good molecule to explore for environmental studies
tRNA molecules are
usually fairly small
(less than 100 nucleic
acid monomers)
G
G
T
C
A
A
T
C
G
G C
G
G
G
A
G
G
A
G C
T
C
T
C G
C G A
T
A
T
AA
C
G
T
T
C
A G C G G
T
A
T
A
T
T
A
C G C
C G
T
G C
T
C
G
C
C
G C
G
G G
T
A
tRNA has a relatively
simple secondary
structure
G C
A
C
C
A
tRNA usually exists
as free molecule in
the cell
Like DNA, RNA secondary structure has elements such as duplexes,
loops, bulges, and hairpins
Stability of tRNA
The stability of tRNA can be both measured spectroscopicly
like DNA but can also be calculated
Calculated free energy is obtained by factoring in the strength of
noncovalent interactions in a folded and unfolded tRNA and is expressed
as the free energy of complex formation, ∆Gf (NOTE: lower ∆Gf value
indicates increased stability, formation is more favorable)
Initial ∆Gf values and predicted secondary structure can be calculated
from raw sequence data:
Calculated ∆Gf values of the GGC
codon tRNA from E. coli. and T.
acidophilum
E. coli.: ∆Gf = -28.9 kcal/mol
T. acidophilum.: ∆Gf = -30 kcal/mol
Thermoplasma acidophilum: GGGCCGGTAGATCAGAGGTAGATCGCTTCCTTGGCATGGAAGAGGcCAGGGGTTCAAATCCCCTCCGGTCCA
E. coli.: GGGGCTATAGCTCAGCTGGGAGAGCGCTTGCATGGCATGCAAGAGGtCAGCGGTTCGATCCCGCTTAGCTCCACCA
Proteins: amino acids and primary structure
Primary structure is determined by covalent amide bonds
between individual amino acids
Examples of amino acids
O
Amino acids
Amino
Acid
functionality O H functionality
N
HO
H
R
O
H
N
HO
H
H
N
HO
H
O
H
O
H
N
HO
H
O
H
H
N
HO
H
NH2
OH
G lu ta m ic a cid (G , G lu )
h y d ro p h ilic , n e g a tiv e
HO
N
NH
HO
HO
H
N
O
N
H
H
N
N
O
N
H
O
H
N
O
N
H
H
O
H 2N
A peptide or protein is a chain of 10-1000 amino acids
O
H
OH
L ys in e (K , L y s )
h yd ro p h ilic, p o sitiv e
O
H
S e rin e (S , S e r)
A la n in e (A , A la )
L e u cin e (L , L e u )
s lig h tly h yd ro p h o b ic stro n g ly h y d ro p h o b ic h yd ro p h ilic
Side chain “R-group”- defines the
chemical and physical nature of
the amino acid
O
N
HO
H
H
HN
H
O
H
N
O
N
H
O
H
N
O
N
H
O
H
N
O
NH2
N
H
OH
O
H 2N
O
Proteins: secondary structure
Secondary structures (folds) are defined by hydrogen
bonding and steric interactions of the side chain
The -helix is a coil of a
peptide chain and has 3.6
residues per helical turn
Primary interactions are
hydrogen bonding between
residues along the helical
axis and steric interactions
between side chains
The -sheet is a linear
arrangement of amino acids
Structure is defined by interstrand hydrogen bonds, less
by sterics of side chains
Sheets can be parallel or
anti-parallel, defined by
orientation of the backbone
Other, but far less common, peptide folds include the coiled-coil, random
coil, bulge, -turn, 310 helix, 27 helix, -helix, -barrel, and so on…
Proteins: tertiary and quaternary structure
Tertiary and quaternary structure is defined almost entirely by
noncovalent interactions
Quaternary Structure: the
Tertiary Structure: the overall
shape of a folded protein
assembly of multiple protein
units into a larger structure
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Top View
Side View
Protein model systems: helices
The helix is a common protein structural element which
can be readily studied
helices are the secondary structural element which is most susceptible
to sequence and environment factors and the stability of helices is
related to the stability of the overall protein
Like DNA melting,
helix (and protein)
stability is related to a
structural denaturation
Structural stability
is measured by
spectroscopicly
observing helix
unfolding
Graph taken from:
Whitington, S. J., et. al.
Biochemistry, 2003, 42,
14690-14695.
As for tRNA, ∆Gf can be calculated for helices or can be measured
using Circular Dichrosim spectroscopy by employing the relationship
∆Gf = -RTlnK, where K can be measured from the spectrum
Example of environment related structural
differences
One example is the study of the helices of RecA
RecA is a protein involved with DNA repair, cell division and other
processes and is found in all environments
Crystal structure of
RecA from E. coli
RecA sequences from 29
proteins were aligned with
that of E. coli, allowing for
the determination of helical
fragments
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
There are 10 helical
regions in RecA
This work was published as: Petukov, M. et. al. Proteins: Structure, Function,
and Genetics 1997, 29, 309-320
Crystal structure of RecA
from E. coli was used as a
template
∆Gf values for these
sequences were calculated
and analyzed
Thermophile helices are more stable
Calculated ∆Gf values indicated that helices of thermophlie
origin were more stable than mesophile helices
Eight of the thermophile helices were found to be more stable- these
helices are likely related to STRUCTURAL stability
Total helix ∆Gf
No change was found for two helices, both of which are directly involved
in interactions with DNA and other proteins, these helices likely need to
retain flexibility for FUNCTIONAL stability
T. thermophilus (80oC)
E. coli (37oC)
P. areuglinosa (20oC)
20oC 37oC 80oC
Interestingly, total helix stability was found to be the same value if the
optimal temperature for protein activity is taken into account- this is again
related to the need for molecular flexibility
Biomolecular structural endemism
The Big Questions:
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Are there
molecular
structures which
are endemic in
an environment?
If so, how and
why are those
structures
arrived at?
Photo Credits: National Park
Service Web pages
Study roadmap
Bioinformatics
Develop comprehensive
listing of known protein/RNA
sequences from public
database
Search for environmentspecific structural elements
Sample Collection
Identify environments for
study (Hawaii lakes, Chile:
Andes and Patagonia?)
Travel/Sample Collection/
Data Analysis
Model Studies
Synthesis of short RNA and peptide sequences
Study structure of these molecules in lab-generated
extreme (thermal/salt/pressure) environments
Computer models of these systems
Environments to be explored
Initial work will be carried out in Hawaiian lakes
These include Lake Kauhako (Moloka’I), Lake Wai’ele’ele (Maui), Green
Lake and lake Waiau (Hawai’I)
These lakes are relatively accessible and will provide a ready data set
that we will use to develop our sampling and analysis methodologies
This data set will also establish part of the mesophile baseline
South American Lake Environments
South America, specifically the Andes and Patagonia, have
numerous extremeophilic environments
South American lakes are less well studied from the
biogeographical view point- will be able to describe new
environments
These environments are also geographically isolated from other
extreme environments will allow for greater geographic variability
Other possible environments include deep sea trenches and
subglacial lakes- UH collaborations
What we will look at: “adaptive” proteins
Proteins which serve a function adapted to the environment
Antifreeze protein: inhibits ice
crystal formation
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Mechanosensitive
channel: responds
to osmotic stress
Potassium Channel:
transports K+ into the cell
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
ATPsulfurylase:
critical in
sulfate reducing
bacteria
What we will look at: “conserved” proteins
Conserved proteins are those which would be expected to be
more similar given a function which is ubiquitous
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
ATPase:
synthesis of
ATP, a cell
energy
source
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
DNA gyrase:
involved in DNA
packaging
Pyruvate kinase: involved in
glycolysis
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Rhodopsin:
light sensing
QuickTime™ and a
and
TIFF (LZW) decompressor
transduction are needed to see this picture.
Planned Methodologies: Bioinformatics and
Sample Collection
Bioinformatics is the term used to describe the mining of
biological sequence and structural data bases
The initial work here will be to develop a database of molecular
sequences correlated with the organism of origin (which will tell us
the nature of the environments they came from)
These sequences will then be examined for environment-specific
structural motifs
This database will help to establish environmental targets and can
be modified by biogeographical studies
Data that will be collected in the environment
Environmental DNA- will be used to establish the biodiversity of a
site as well as provide information regarding molecular sequences
Physical factors will also be taken into account, including the
temperature, salinity, nutrient composition, etc…
Methodologies: Model systems and computations
Synthesis and physical or computational characterization of
model and natural peptides or nucleotide sequences
helices
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
tRNA
sequences
More complicated peptides
such as the helix bundle
(common in membrane
proteins)
These studies will provide us with a numerical quantity (∆Gf) for stability as
well as molecular level insights of the mechanism of stability
Other variants of this work includes the study of the folding of proteins
isolated from the environment and the study of peptide-oligonuicleotides
interactions
Bringing it all together
We will attempt to establish a relationship between the
physical environment, biodiversity, and molecular structure
One way this can be accomplished is to generate plots of
stability vs. structural similarity for individual environments
Increasing
structural similarity
This range will indicate
stability window
This range
will indicate
the variance
of structures
which are
capable of
surviving
Increasing stability
or protein activity
A small stability range would
indicate that there are rigorous
energetic requirements
A small structural similarity range
would indicate environment
specific structures
If both values are small, it may
indicate that structures evolved to
meet the specific requirements of
that environment