Transcript function

Protein Structure and Function
CHAPTER4.
From Sequence to Function
: Case Studies in Structural
and Functional Genomics
4-0. Overview
: From Sequence to Function in the Age of Genomics

-
-
Genomics is making an increasing contribution to
the study of protein structure and function
Many computational and experimental tools are now
available.
Different experimental methods are required to define a
protein’s function.
In this chapter : methods of comparing amino-acids
sequences to determine their similarity and to search for
related sequences in the sequence databases.
Predicting a protein’s function from its structure.
4-0. Overview
: From Sequence to Function in the Age of Genomics
Figure4-1.Time and distance scales in functional genomics
4-0. Overview
: From Sequence to Function in the Age of Genomics
Figure4-1.Time and distance scales in functional genomics
4-1. Sequence Alignment and Comparison
Sequence comparison provides a measure of the relationship between genes
-Homologous : genes or proteins related by divergent evolution from a
common ancestor.
-Homology : evolutionary similarity between them.
Alignment is the first step in determining whether two sequences are similar to
each other
-Alignment : comparing two or more sequences.
-Sometimes insertions and deletions causes sequences slid. Sliding creates
gaps.
Figure4-2. Pairwise alignment
4-1. Sequence Alignment and Comparison
High
- E-value : the probability that an
alignment score as good as the
one found between two
sequences.
- Up to an E-value of
approximately 10-10, the likelihood
Low
of an identical function is
reasonably high, but then it starts
to decrease substantially.
Figure4-3. Plot of percentage of protein pairs having the same biochemical
function as sequence changes
4-1. Sequence Alignment and Comparison
Multiple alignments and phylogenetic trees
-The alignment process can by expanded to give a multiple sequence alignment.
-Any residue, or short stretch of sequence, that is identical in all sequences in a
given set is said to be CONSERVED.
Figure4-4. Multiple alignment
4-1. Sequence Alignment and Comparison
-Multiple sequence alignments of
homologous proteins or gene sequences
from different species are used to derive a
so-called evolutionary distance.
-These distances can be used to
construct phylogenetic trees that attempt
to reflect evolutionary relationships
between species.
Figure4-5. Phylogenetic tree comparing the three major MAP kinase subgroups
4-2. Protein Profiling
Structural data can help sequence comparison find related proteins
-Straightforward sequence alignment does not
indicate any relationship between the
prokaryotic and eukaryotic domain.
-However, when the alignment is performed
by comparing residues in the corresponding
secondary structure elements of the
prokaryotic and eukaryotic domains, some
regions of sequence conservations appear.
Figure4-6. Some Examples of Small Functional Protein Domains
4-2. Protein Profiling
Sequence and structural motifs and patterns can identify proteins with similar
biochemical functions
-Sometimes, only a part of a protein sequence can be aligned with that of
another protein.
-Local alignment can identify a functional module within a protein.
-These function-specific blocks of sequence are called functional motifs.
-Two broad classes : short, contiguous motif = usually specify binding site
: discontinuous or non-contiguous motif = catalytic sites
4-2. Protein Profiling
Figure4-7. Representative examples of short contiguous binding motifs
4-2. Protein Profiling
PSI-BLAST : position-specific iterated
BLAST.
Amino acid position
Five sequences
Probability for Cys
Figure4-8. Construction of a profile
4-3. Deriving Function from Sequence
Sequence information is increasing exponentially
- The growth of sequence
information is exponential, and
shows no sign of slowing down.
Figure4-9.The growth of DNA and protein sequence information collected
by GenBank over 20 years
4-3. Deriving Function from Sequence
- As one proceeds form
prokaryotes to eukaryotes, and
from single-celled to multicellular
organisms, the number of genes
increases markedly.
Figure4-10.Table of the size of the genomes of some representative organisms
4-3. Deriving Function from Sequence
In some cases function can by inferred from sequence
- If a protein has more
than about 40% sequence
identity to another protein
whose biochemical
function is known, and if
the functionally important
residues are conserved
between them.
Green : non-enzymatic
Blue : enzymatic
Figure4-11. Relationship of sequence similarity to similarity of function
4-3. Deriving Function from Sequence
-Local alignments of functional motifs in the sequence can often identity at least
one biochemical function of a protein. (Ex. Helix-turn-helix, zinc finger motifs)
- Walker motif : ATP or GTP binding motif.
Figure4-12.The P loop of the Walker motif
4-3. Deriving Function from Sequence
-Sequence comparison is an active area of research because it is now the easiest
technique to apply to a new protein sequence.
-Large proportion are inferred only by overall sequence similarity to known proteins.
Figure4-13. Analysis of the functions of the protein-coding sequences in the yeast genome
4-4. Experimental Tools for Probing Protein Function
Gene function can sometimes be established experimentally without
information from protein structure or sequence homology
-Experience suggests that genes of similar function often display similar
patterns of expression.
-Expression can by measured at the level of mRNA or protein.
-The mRNA-based techniques :
DNA microarrays and SAGE
- Microarray technology can provide
expression patterns for up to 20,000
genes at a time.
Figure4-14. DNA microarray
4-4. Experimental Tools for Probing Protein Function
-High throughput protein
expression monitor can be
achieved by two-dimensional gel
electrophoresis.
- Protein spot can be identified
by Mass spectrometry.
- 2D GE can detect the amount
of protein and modifications.
-But it is slow and expensive.
-It can fail to detect proteins tat
are only present in a few copies
per cell.
Figure4-15. 2-D protein gel
4-4. Experimental Tools for Probing Protein Function
-The phenotype produced by inactivating a gene, a gene knockout, is highly
informative about the cellular pathway.
-Knockout can be obtained by classical mutagenesis, targeted mutations, RNA
interference, the use of antisense message RNA, or by antibody binding.
Figure4-16. The phenotype of a gene knockout can give clues to the role of the gene
4-4. Experimental Tools for Probing Protein Function
-The location of a protein in the cell often provides a valuable clue to its functions.
- Technique : attachment of a tag sequence to the gene in question. Commonly
used method is to fuse the sequence encoding GFP(green fluorescent protein).
Figure4-17. Protein localization in the cell
4-4. Experimental Tools for Probing Protein Function
- Interacting proteins can be
found by yeast two-hybrid system.
-Two distinct domains are
necessary to activate transcription
in yeast.
①. A DNA binding domain(bind to
promoter)
②. An activation domain
- DBD fused A protein + AD fused Y protein.
- If A and Y protein interact each other, DBD and AD close together. And
transcription will start.
Figure4-18.Two-hybrid system for finding interacting proteins
4-5. Divergent and Convergent Evolution
-In general, if the overall identity
between the two sequences is
greater than about 40%, they will
code for proteins of similar fold.
-Rmsd : rood-mean-square
difference in spatial positions of
backbone atoms.
40
Figure4-19. Relationship between sequence and structural divergence of proteins
4-5. Divergent and Convergent Evolution
Benzoylformate decarboxylase
Pyruvate decarboxylase
Low seq.similarity
Similar structure
Proteins with low sequence similarity but very similar overall structure and active
sites are likely to be homologous
Figure4-20. Ribbon diagram of the structure of a monomer of
benzoylformate decarboxylase (BFD) and pyruvate decarboxylase (PDC)
4-5. Divergent and Convergent Evolution
Divergent evolution can produce proteins
with sequence and structural similarity but
different function
Similar structure
Different function
-Steroid delta-isomerase
-Nuclear transport factor2
-Scytalone dehydratase
Figure4-21. Seuperposition of the three-dimensional structures of steroid-deltaisomerase, nuclear transport factor-2 and scytalone dehydratase
4-6. Structure from Sequence
: Homology Modeling
Homology modeling is used to
deduce the structure of a
sequence with reference to the
structure of a close homolog
-Upper : sequence similarity is
likely to yield enough structural
similarity for homology modeling.
-Lower : highly problematic to
homology modeling.
Figure4-22.The threshold for structural homology
4-6. Structure from Sequence
: Homology Modeling
Conservation is measured by Gstat
- High value = more conserved
Homology modeling
Integral membrane protein rodopsin
with the cluster of conserved
interacting residues(red)
based on conservancy
Figure4-23. Evolutionary conservation and interactions between residues
in the protein-interaction domain PDZ and in rhodopsin
4-6. Structure from Sequence
: Homology Modeling
Plasminogen(blue) and
chymotipsinogen(red) are very
similar.
Chymotripsin(green),
Plasminogen(blue) and
chymotipsinogen(red) different
active site conformation.
Figure4-24. Structural changes in closely related proteins
4-7. Structure from Sequence
: Profile-Based Threading and “Rosetta”
Profile-based threading tries to predict the
structure of a sequence even if no sequence
homologs are known
-Computer program forces the sequence
to adopt every known protein fold in turn,
and in each case a scoring function is
calculated that measures the suitability of
the sequence for that particular fold.
-The highest Z-value score indicates that
the sequence almost certainly adopts that
fold.
Figure4-25.The method of profile-based threading
4-7. Structure from Sequence
: Profile-Based Threading and “Rosetta”
The ROSETTA method attempts to predict
protein structure form sequence without the aid
of a homologous sequence or structure
-Rosetta is that the distribution of
conformations sampled for a given short
segment.
-Each calculated structures similar to real
crystal structure but not perfect.
Figure4-26. Some decoy structures produced by the Rosetta method
4-7. Structure from Sequence
: Profile-Based Threading and “Rosetta”
The level of agreement with the known native structure varies, but in many
cases the overall fold is predicted well enough to be recognizable.
Figure4-27. Examples of the best-center cluster found by Rosetta
for a number of different test proteins
4-7. Structure from Sequence
: Profile-Based Threading and “Rosetta”
The level of agreement with the known native structure varies, but in many
cases the overall fold is predicted well enough to be recognizable.
Figure4-27. Examples of the best-center cluster found by Rosetta
for a number of different test proteins
4-8. Deducing Function from Structure
: Protein Superfamilies
- In contrast to the exponential
increase in sequence information,
(=Sequence information)
structural information(X-ray or NMR)
has up to now been increasing at a
much lower rate.
-Superfamily : loosely defined as a set
of homologous proteins with similar
three-dimensional structures.
- Within each superfamily, there are
families with more closely related
functions and significant(>50%)
sequence identity.
Figure4-28. Growth in the number of structures in the protein data bank
4-8. Deducing Function from Structure
: Protein Superfamilies
The four superfamilies of serine proteases are examples of convergent evolution
- Serine proteases fall into several structural superfamilies, which are
recognizable from their amino-acid sequences and the particular disposition
of the three catalytically important residues in the active site.
Same superfamily
Chymotrypsin
Subtilisin
Figure4-29.The overall folds of two members of different superfamilies of serine proteases
4-8. Deducing Function from Structure
: Protein Superfamilies
Taq. DNA polymerase
Reverse transcriptase
DNA polymerase
- Another large enzyme superfamily with numerous different biological
roles is characterized by the so-called polymerase fold, which resembles
an open hand.
Figure4-30. A comparison of primer-template DNA bound to three DNA polymerases
4-9. Strategies for Identifying Binding Sites
Binding sites are identified as regions where the computed interaction
energy between the probe and the protein is favorable for binding
- Zone1 : good site for binding positive
charged group.
- Zone2 : good site for binding
hydrophobic group.
- Zone3 : good site for binding negative
charged group.
Figure4-31. Example of the use of GRID
Overlay of three pieces of a known
inhibitor of dihydrofolate reductase
onto the zones.
By GRID method(program)
4-9. Strategies for Identifying Binding Sites
MSCS(multiple solvent crystal structures) is a crystallographic technique
that identifies energetically favorable binding sites and orientations of
small organic molecules on the surface of proteins.
Figure4-32. Some organic solvents used as probes for binding sites for functional groups
4-9. Strategies for Identifying Binding Sites
Small organic molecules
bind to on the protein
surface
Figure4-33. Structure of subtilisin in 100% acetonitrile
4-9. Strategies for Identifying Binding Sites
- The binding sites for different
organic solvent molecules were
obtained by X-ray crystallography
of crystals of thermolysin soaked
in the solvent.
Figure4-34. Ribbon representation showing the experimentally derived
functionality map of thermolysin
4-10. Strategies for Identifying Catalytic Residues
Active-site residues in a structure can sometimes by recognized computationally
by their geometry
-Searches the structure for geometrical
arrangements of chemically reactive side
chains that match those in the active
sites of known enzymes.
- The geometry of the catalytic triad of
the serine proteases as used to locate
similar sites in other proteins.
Figure4-35. An active-site template
4-10. Strategies for Identifying Catalytic Residues
THEMATICS : net charge of potentially ionizable groups on each residue in
the protein structure is calculated as a function of pH.
- Amino acids, which show abnormal ionization curve (green His 95 and
blue Glu 165 in triosephosphoate isomerase), are possibly catalytic
residues.
Figure4-36.Theoretical microscopic titration curves
4-10. Strategies for Identifying Catalytic Residues
Structure of triosephosphate isomerase.
His 95 and Glu 165 are both
located in the active site.
Figure4-37. Residues that show abnormal ionization behavior with
changing pH define the active site
4-11. TIM Barrels
: One Structure with Diverse Functions
- Mandelate racemase :
intercpmvert R- and S-mandelate.
Figure4-38.The chemical reaction catalyzed by mandelate racemase
4-11. TIM Barrels
: One Structure with Diverse Functions
- Muconate lactonizing enzyme :
transforms the cis, cis-muconic
acid derived from mandelate into
muconolactone.
Figure4-39.The chemical reaction catalyzed by muconate lactonizing enzyme
4-11. TIM Barrels
: One Structure with Diverse Functions
Mandelate racemase
Muconate lactonizing enzyme
26% sequence identity and overall fold are essentially identical.
Figure4-40. Mandelate racemase (left) and muconate lactonizing enzyme
(right) have almost identical folds
4-11. TIM Barrels
: One Structure with Diverse Functions
Mandelate racemase
Muconate lactonizing enzyme
The amino acids that coordinate with the metal ion are conserved between
the two enzymes and similar catalytic residues.
Figure4-41. A comparison of the active sites of mandelate racemase (left)
and muconate lactonizing enzyme (right)
4-12. PLP Enzymes
: Diverse Structures with One Function
L-aspartate aminotransferase : L-aspartate → L-glutamate
Use the cofactor “puridoxal phosphate(PLP)”
Figure4-42.The overall reaction catalyzed by the pyridoxal
phosphate-dependent enzyme L-aspartate aminotransferase
4-12. PLP Enzymes
: Diverse Structures with One Function
Step 1 : The amino group
of the amino acid
substrate displaces the
side-chain amino group
of the lysine residue that
holds the cofactor PLP in
the active site.
Step 2 : PLP catalyzes
a rearrangement of the
amino acid substrate.
Step 3 : followed by
hydrolysis of the
kero0acid portion,
leaving the nitrogen of
the amino acid bound
to the cofactor to form
the intermediate PMP.
Figure4-43.The general mechanism for PLP-dependent catalysis of
transamination, the interconversion of α-amino acids and α-keto acids
4-12. PLP Enzymes
: Diverse Structures with One Function
L-aspartate aminotransferase
D-amino acid aminotransferase
Absolutely no identity and folding structures totally different.
Figure4-44.The three-dimensional structures of L-aspartate aminotransferase
(left) and D-amino acid aminotransferase (right)
4-12. PLP Enzymes
: Diverse Structures with One Function
L-aspartate aminotransferase
D-amino acid aminotransferase
However, the active sites are found to be strikingly similar.
Figure4-45. Comparison of the active sites of L-aspartate aminotransferase
(left) and D-amino acid aminotransferase (right)
4-12. PLP Enzymes
: Diverse Structures with One Function
Bacterial D-amino acid aminotransferase
Humanl D-amino acid aminotransferase
Two enzymes recognizes only L-amino acids → similar structure.
Figure4-46.The three-dimensional structures of bacterial D-amino acid aminotransferase (left)
and human mitochondrial branches-chain L-amino acid aminotransferase (right)
4-13. Moonlighting
: Proteins with More than One Function
In multicellular organisms, multifunctional proteins help expand the number of
protein functions that can be derived from relatively small genomes
Figure4-47. Some examples of multifunctional proteins with their various functions
4-13. Moonlighting
: Proteins with More than One Function
Cytokine macrophage inhibitory factor (MIF)
Substrate binding and active site
-Proinflammatory cytokine that
activates T cells and macrophages.
-Catalyzes the tautomerization of
phenylpuruvic acid.
Figure4-48.The three-dimensional structure of the monomer of macrophage inhibitory factor, MIF
4-14. Chameleon Sequences
: One Sequence with More than One Fold
Cyclodextrin glycosyltransferase
Beta-galactosidase
-Chameleon sequence : exists in different conformations in different environments.
-LITTAHA (red) has different conformation in two different enzyme.
Figure4-49. Chameleon sequences
4-14. Chameleon Sequences
: One Sequence with More than One Fold
Dimerization of sequence specific
DNA binding protein Fis.
Single-site mutation(pro26→ala26)
can converted form a beta strand
to an alpha helix.
Figure4-50. Chameleon sequences in the DNA-binding protein Fis
4-14. Chameleon Sequences
: One Sequence with More than One Fold
-Some proteins contain natural chameleon sequences that may be important
to their function.
-DNA-binding transcriptional regulator from yeast.
Figure4-51. Chameleon sequence in the DNA-binding protein MATα2 from yeast
4-15. Prions, Amyloids and Serpins
: Metastable Protein Folds
-Some structures may be metastable-able
to change into one or more different stable
structures.
-The best characterized of these
changeable structures is the prion.
-The precise structure of the diseasecausing form is not yet known, but is known
to have much more beta sheet that the
cellular form
Figure4-52.The prion protein
4-15. Prions, Amyloids and Serpins
: Metastable Protein Folds
-Alzheimer’s, Parkinson’s and type
Ⅱ diabetes. Each disease is
associated with a particular protein,
and extracellular aggregates of
these proteins are thought to be
the origin of the disease.
-Produce fibrous protein aggregates
of identical, largely beta-sheet,
structure.
Figure4-53. A possible mechanism for the formation of amyloid fibrils by a globular protein
4-15. Prions, Amyloids and Serpins
: Metastable Protein Folds
Cleavage the
loop by protease.
Cleavage triggers a refolding
of the cleaved structure that
makes it more stable.
Figure4-54. Structural transformation in a serine protease inhibitor on binding protease
4-16. Functions for Uncharacterized Genes
: Galactonate Dehydratease
-Similar structures and mechanisms
between same family members.
-MR, MLE, enolase.
Figure4-55. Active sites of MR, MLE, and enolase
4-16. Functions for Uncharacterized Genes
: Galactonate Dehydratase
Carbon source
The unknown enzyme, F587 has now been identified as the gene dgoD,
encoding galactonate dehydratase.
Figure4-56.The pathway for the utilization of galactonate in E.coli
4-16. Functions for Uncharacterized Genes
: Galactonate Dehydratase
The fold is the same as those of MR, MLR and enolase(belongs to same family).
Figure4-57. Structure of galactonate dehydratase
4-16. Functions for Uncharacterized Genes
: Galactonate Dehydratase
The active site is the same as those of MR, MLE, and enolase (belongs to
same family).
Figure4-58. Schematic diagram of a model of the active site
of galactonate dehydratase with substrate bound
4-17. Starting from Scratch
: A Gene Product of Unknown Function
Alanine racemase
YBL035c in yeast
- The yeast protein lacks the largely antiparallel
beta-sheet domain of the racemase, however,
the active sites, indicated by the presence of
the bound cofactor.
Figure4-59.The three-dimensional structures of bacterial alanine racemase
and yeast YBL036c
4-17. Starting from Scratch
: A Gene Product of Unknown Function
Alanine racemase
YBL035c in yeast
Enzyme-cofactor binding residues are preserved.
Figure4-60. Comparison of the active sites of bacterial alanine racemase and YBL036c
CHAPTER5.
Structure Determination
5-1. The Interpretation of Structural Information
The objective end=product of a crystallographic structure determination is
an electron density map.
3Å resolution
2Å resolution
1Å resolution
Figure5-1. Portion of a protein electron density map at three different resolutions
5-1. The Interpretation of Structural Information
The figure shows the superposition of
the set of models derived from the
internuclear distances measured for this
protein in solution.
Figure5-2. NMR structure ensemble
5-2. Structure Determination
by X-Ray Crystallography and NMR
Figure5-3. Structure determination by X-ray crystallography
5-2. Structure Determination
by X-Ray Crystallography and NMR
Figure5-4. Structure determination by NMR
5-3. Quality and Representation
of Crystal and NMR Structures
(a). Wire model : useful for example in comparisons of two conformations.
(b). Ribbon diagram : alpha and beta strand. easily recognizable.
(c). Ball and stick model : bonded and non-bonded distances can be assessed,
which is important for evaluating interactions
Figure5-5. Different ways of presenting a protein structure
5-3. Quality and Representation
of Crystal and NMR Structures
(d). Space filling : useful for assessing the fit of a ligand to a binding site.
(e). Surface topography : can be colored according to different local properties
such as the electrostatic potential at different points in the molecules.
Figure5-5. Different ways of presenting a protein structure