Protein Threading - Laboratory of Molecular Modelling
Download
Report
Transcript Protein Threading - Laboratory of Molecular Modelling
Protein Threading
Zhanggroup 2003 10 22
Overview
Background protein structure protein
folding and designability
Protein threading
Current limitations to protein threading
Computational complexity of certain
formulations of the protein threading
problem
Performance of protein threading
systems
References
Protein Structure
Primary, secondary, tertiary structure
Can only refer to the structure
of a protein if a particular
environment is assumed
solvent environment (aqueous
trans-membrane ……)
temperature
pH etc
Different environments yield different
structures or no stable structure at all
Proteins molecules are not
completely rigid structures
kinetic energy energetic collisions with
solvent molecules
vibrations sidechain conformational
changes
flexible sections of the peptide chain
The native tertiary structure of a protein
is thus an average
Protein Folding
Protein folding = searching for a
conformation having minimum energy
Factors in protein folding
hydrophobic effects
electrostatic charges in residues
hydrogen bonding
Chaperonins,ribosomes
3 stages of folding
denatured unfolded state
molten globule state
native compact state
most proteins will return to their native
state after forced denaturation
The Protein Folding Problem
Given a proteins amino acid sequence
what is its tertiary structure
The protein folding problem is hard
Direct approach :molecular
dynamics simulation
Simulate on an atomic level the folding
of a single protein molecule
protein = thousands of atoms
solvent environment = hundreds to
thousands of molecules => thousands
of atoms
Sub-picosecond time scales
run the simulation for 1-5 seconds
We need another years of Moores law
to make this computation feasible
Designability
A protein with a stable native state can
not have another low-energy state
nearby in conformational space
A structure is highly designable if its
minimum energy state has no lowenergy neighbours
Protein Threading
inverse protein folding problem: given
a tertiary structure, find an amino acid
sequence that folds to that structure
Protein threading: given a library of
possible protein folds and an amino acid
sequence find the fold with the
best sequence -> structure alignment
(threading)
Evolution depends on designability to
preserve function under mutation
Estimate only different protein
structures exist in nature (Chothia,1992)
four components
a library of protein folds (templates)
a scoring function to measure the
fitness of a sequence -> structure
alignment
a search technique for finding the best
alignment between a fixed sequence
and structure
a means of choosing the best fold from
among the best scoring alignments of a
sequence to all possible folds
Scoring Schemes for
Sequence->Structure
Alignments
The scoring scheme for a particular
threading of a sequence onto a
structure measures the degree to which
environmental preferences are satisfied
Different amino acid types prefer different
environments e.g.
structural preferences:
in helix
in sheet
not exposed to solvent
pairwise interactions with neighbouring amino
acids
Formal Statement of the
ProteinThreading Problem
C is a protein core having m segments
Ci representing a set of contiguous
amino acids Let ci be the length of Ci
Sequence a = a1a2…an of amino acids
Current limitations to protein
threading
Statistical problems
Definition of neighbor and /or pairwise
contact environments:
energetic neighbor ? contact neighbor
Computational Complexity of
Finding an Optimal Alignment
The complexity of the protein threading
problem depends on whether:
Variable-length gaps are allowed in
alignments
the scoring function for an alignment
incorporates pairwise interactions
between amino acids
Property(I) makes the search space
exponential in size to the length of the
sequence
Property(Ii) forces a solution to take
non-local effects into account
Any protein threading scheme with both properties is NP-complete
(3-SAT Lathrop 1994)
(MAX-CUT Akutsu,Miyano 1999)
Thus all protein threading approaches can be divided
into four groups:
1 no variable length gaps allowed
2 no pairwise interactions considered in scoring function
3 no optimal solution guarantee
4 exponential runtime
Performance of Protein Threading Systems
CASP1(1994) CASP2(1996) CASP3(1998): Critical
Assessment of Structure Prediction meetings
protein threading methods have consistently been
the winners
success depends on structural similarity of target to
known structures
successful even when target sequence and library
sequence have low homology
Much room for improvement in all areas of protein threading e.g.:
algorithms for searching the threading space
reliable biologically accurate scoring functions