Protein folding. Anfinsen`s experiments.

download report

Transcript Protein folding. Anfinsen`s experiments.

Protein structure prediction.
Protein domains can be defined based on:
• Geometry: group of residues with the high contact
density, number of contacts within domains is higher
than the number of contacts between domains.
- chain continuous domains
- chain discontinous domains
• Kinetics: domain as an independently folding unit.
• Physics: domain as a rigid body linked to other domains
by flexible linkers.
• Genetics: minimal fragment of gene that is capable of
performing a specific function.
Domains as recurrent units of proteins.
• The same or similar domains are found in different
proteins.
• Each domain has a well determined compact structure
and performs a specific function.
• Proteins evolve through the duplication and domain
shuffling.
• Protein domain classification based on comparing their
recurrent sequence, structure and functional features –
Conserved Domain Database
Protein folds.
• Fold definition: two folds are similar if they
have a similar arrangement of SSEs
(architecture) and connectivity (topology).
Sometimes a few SSEs may be missing.
• Fold classification: structural similarity
between folds is searched using structurestructure comparison algorithms.
Definition of protein folds.
Protein fold – arrangement of secondary structures into a unique
topology/tertiary structure.
Example of alpha+beta proteins:
•TIM beta/alpha-barrel
contains parallel beta-sheet barrel, closed; n=8, S=8;
strand order 12345678, surrounded by alpha-helices
•NAD(P)-binding Rossmann-fold domains
core: 3 layers, a/b/a; parallel beta-sheet of 6 strands,
•order 321456
Fold recognition.
Unsolved problem: direct prediction of protein structure from
the physico-chemical principles.
Solved problem: to recognize, which of known folds are
similar to the fold of unknown protein.
Fold recognition is based on observations/assumptions:
- The overall number of different protein folds is limited
(1000-3000 folds)
- The native protein structure is in its ground state (minimum
energy)
Protein structure prediction flowchart
Protein
sequence
Database
similarity
search
Yes
Predicted threedimensional
structural model
Does
sequence
align with
a protein
of known
structure
?
Three-dimensional
comparative
modeling
Yes
Threedimensional
structural analysis
in laboratory
No
Is there a
predicted
structure?
Yes
No
Protein
family
analysis
Relationship
to known
structure?
No
Structural
analysis
From D.W.Mount
Protein structure prediction.
Prediction of three-dimensional structure from its protein
sequence. Different approaches:
- Homology modeling (predicted structure has a very close
homolog in the structure database).
- Fold recognition (predicted structure has an existing fold).
- Ab initio prediction (predicted structure has a new fold).
Homology modeling.
Aims to produce protein models with accuracy
close to experimental and is used for:
- Protein structure prediction
- Drug design
- Prediction of functionally important sites (active
or binding sites)
Steps of homology modeling.
1.
2.
3.
4.
5.
Template recognition & initial alignment.
Backbone generation.
Loop modeling.
Side-chain modeling.
Model optimization.
1. Template recognition.
Recognition of similarity between the target and template.
Target – protein with unknown structure.
Template – protein with known structure.
Main difficulty – deciding which template to pick, multiple
choices/template structures.
Template structure can be found by searching for structures
in PDB using sequence-sequence alignment methods.
Two zones of sequence alignment.
Two sequences are guaranteed to fold into the same structure if their
length and sequence identity fall into “safe” zone.
Sequence identity
100
Homology modeling zone
50
Twilight zone
50
100
150
200
Alignment length
2. Backbone generation.
If alignment between target and template is ready,
copy the backbone coordinates of those
template residues that are aligned.
If two aligned residues are the same, copy their
side chain coordinates as well.
3. Insertions and deletions.
insertion
AHYATPTTT
AH---TPSS
deletion
Occur mostly between secondary structures, in the loop
regions. Loop conformations – difficult to predict.
Approaches to loop modeling:
- Knowledge-based: searches the PDB for loops with known
structure
- Energy-based: an energy function is used to evaluate the
quality of a loop. Energy minimization or Monte Carlo.
4. Side chain modeling.
Side chain conformations – rotamers. In similar proteins side chains have similar conformations.
If % identity is high - side chain conformations can be copied
from template to target. If % identity is not very high modeling of side chains using libraries of rotamers and
different rotamers are scored with energy functions.
Problem: side chain configurations depend on backbone
conformation which is predicted, not real
E2
E3
E1
E = min(E1, E2, E3)
5. Model optimization.
Energy optimization of entire structure.
Since conformation of backbone depends on
conformations of side chains and vice versa iteration approach:
Predict rotamers
Shift in backbone
Classwork I: Homology modeling.
-
Go to NCBI Entrez, search for gi461699
Do Blast search against PDB
Repeat the same for gi60494508
Compare the results
Fold recognition.
Goal: to find protein with known structure which best
matches a given sequence.
Since similarity between target and the closest to it
template is not high, sequence-sequence
alignment methods fail.
Solution: threading – sequence-structure alignment
method.
Threading – method for structure
prediction.
Sequence-structure alignment, target sequence is
compared to all structural templates from the
database.
Requires:
- Alignment method (dynamic programming, Monte
Carlo,…)
- Scoring function, which yields relative score for
each alternative alignment
Scoring function for threading.
• Contact-based scoring function
depends on the amino acid types of
two residues and distance between
them.
• Sequence-sequence alignment
scoring function does not depend on
the distance between two residues.
• If distance between two nonadjacent residues in the template is
less than 8 Å, these residues make
a contact.
Scoring function for threading.
Ala
Trp
Tyr
Ile
S
N
 w(a , a
i , j 1
i
j
); S  w( Ala, Tyr )  w( Ile, Trp )
w is calculated from the frequency of amino acid contacts in PDB;
ai – amino acid type of target sequence aligned with the position “i”
of the template; N- number of contacts
Classwork I: calculate the score for target
sequence “ATPIIGGLPY” aligned to template
structure which is defined by the contact matrix.
A
T
1
1
2
3
4
*
5
6
*
7
8
9
10
*
Y
2
3
I
G
*
*
6
7
*
*
*
8
*
9
10
T
P
Y
I
G
L
-0.2
-0.1
0
-0.1
0.5
-0.2
0.2
0.3
-0.1
-0.2
-0.3
0.1
0
-0.2
-0.4
-0.1
0.1
-0.2
-0.4
-0.2
-0.1
-0.2
0.3
0.2
0.4
0.4
0.2
*
4
5
P
A
*
*
L
0.3
Alignment algorithms.
• Dynamic programming.
“frozen approximation”: traceback in the alignment
matrix is not possible for interactions between two
amino acids, so that:
S
N
 w(a , b )
i , j 1
i
j
b – amino acid type from template, not from target;
now the score of every position does not depend on
the alignment elsewhere in the sequence.
• Monte Carlo
Optimize the Sum of
Residue-Residue
Contact Potentials ...
…. by a Monte Carlo
Alignment Algorithm
CASP prediction competitions.
Threading model validation.
• Correct bond length and bond angles
>> 3.8 Angstroms
• Correct placement of functionally important sites
• Prediction of global topology, not partial alignment
(minimum number of gaps)
Placement of functionally important sites
in threading.
Prediction of structure of methylglyoxal synthase based on the
template of carabamoyl phosphate synthase
Classwork II: Homology modeling.
-
Go to NCBI Entrez, search for gi461699
Do Blast search against PDB
Repeat the same for gi60494508
Predict functionally important sites
GenThreader
http://bioinf.cs.ucl.ac.uk/psipred.
1. Predicts secondary structures for target
sequence.
2. Makes sequence profiles (PSSMs) for
each template sequence.
3. Uses threading scoring function to find
the best matching profile.
Classwork III.
- Go to http://bioinf.cs.ucl.ac.uk/psipred
- Go over the options of protein structure
prediction program
- Predict structure for protein sequence
(“gwu_thread_seq.txt”)
http://bioinf2.cs.ucl.ac.uk/psiout/29594540
ad0cf784.gen.html
Protein engineering and protein
design.
Protein engineering – altering protein sequence to change protein function or
structure
Protein design – designing de novo protein which satisfies a given requirement
Protein engineering strategies.
Goals:
• Design proteins with certain function
• Increase activity of enzymes
• Increase binding affinity and specificity of proteins
• Increase protein stability
• Design proteins which bind novel ligands
Protein engineering uses combinatorial
libraries.
• Random mutagenesis introduces different mutations in
many genes of interest.
• Active proteins are separated from inactive ones:
- in vivo (measuring effect on the whole cell)
- in vitro (phage display, gene is inserted into phage
DNA, expressed, selected if it binds immobilized target
protein)
Specificity of Kunitz inhibitors can be
optimized by protein engineering.
• Kunitz domains – specific inhibitors of
trypsin-like proteinases, highly conserved
structure with only 33% identity.
• Each Kunitz domain recognizes one or
more proteinases through the binding loop
(yellow).
• Phage display method found mutants of
Kunitz inhibitors which have higher
specificity than native ones.
• Modeling of mutant proteins showed that
enhanced specificity is caused by
increased complementarity between
binding loop and the active site.
Native state can be stabilized by reducing
the difference in entropy between folded
and unfolded conformations
G
G  H  TS
U
F
ΔG
Reaction coordinate
Model system: lysozyme from
bacteriophage T4.
• Lysozyme has the ability to lyse certain
bacteria by hydrolyzing the b-linkage
between N-acetylmuramic acid (NAM) and
N-acetylglucosamine (NAG) of the
peptidoglycan layer in the bacterial cell
wall.
• Conformational transition in lysozyme
involves the relative movement of its two
lobes to each other in a cooperative
manner
Disulfide bridges increase protein
stability.
• Increasing stability by reducing the number of unfolded
conformations (since enthalpic contribution will be the
same for folded and unfolded states).
• Task: to find positions on backbone where Cysteines can
be introduced for disulfide bonds formation.
Strategy of introducing a new disulfide
bond.
B. Mathews, 1989:
• Analysis of disulfide bonds geometries in existing structures.
• Analysis of all pairs of amino acids which are close in space.
• Energy optimization of candidate disulfide bonds.
• Analysis of destabilizing effect of exchanging native amino acids into
Cys.
As a result: three disulfide bonds were introduced through mutagenesis
experiments in lysozyme
Stability of mutants compared to wildtype protein.
Measure of stability – melting
temperature at which 50% of enzyme is
inactivated during reversible heat
denaturation. For wild-type Tm = 42 C.
• all mutants were more stable than
wild-type.
• the longer the loop between Cys, the
larger the effect (the more restricted is
unfolded state).
• the more disulfide bonds were
introduced, the more stable was the
mutant.
From B. Mathews et al
Attempts to fill cavities to stabilize
lysozyme failed…
• Introduction of cavities of size –CH3 group
destabilizes protein by ~ 1kcal/mol.
• T4 lysozyme has two cavities; mutations Leu 
Phe and Ala  Val destabilize the protein by ~
0.5-1.0 kcal/mol.
• New side-chains (Val and Phe) adopt
unfavorable conformations in cavities.
Classwork IV: analyzing the lysozyme’s
mutants.
• Retrieve structure neighbors (1PQM and 1KNI)
of 2LZM.
• Which mutant might have an increased stability
and why?
Can structural scaffolds be reduced in
size with maintaining function?
A. Braisted & J.A. Wells used Z-domain (58 residues) of
bacterial protein A:
• removed third helix (truncated protein - 38 residues);
• mutated residues in the first and second helices;
• used phage display to select active forms;
• restored the binding of truncated protein.
Designing an amino acid sequence that
will fold into a given structure.
• Inverse protein folding problem:
designing a sequence which will fold
into a given structure – much easier
than folding problem!
• B. Dahiyat & S. Mayo: designed a
sequence of zinc finger domain that
does not require stabilization by Zn.
• Wild type protein domain is
stabilized by Zn (bound to two Cys
and two His); mutant is stabilized by
hydrophobic interactions.
Paracelsus challenge: convert one fold into
another by changing 50% of residues.
• Challenge because all proteins
with > 30% identity seem to have
the same fold.
• L.Regan et al: Protein G (mainly
beta-sheet) was converted to Rop
protein (alpha-helical) by
changing only 50% residues