Protein Threading - Laboratory of Molecular Modelling

Download Report

Transcript Protein Threading - Laboratory of Molecular Modelling

Protein Threading
Zhanggroup 2003 10 22
Overview
Background protein structure protein
folding and designability
 Protein threading
 Current limitations to protein threading

Computational complexity of certain
formulations of the protein threading
problem
 Performance of protein threading
systems
 References

Protein Structure

Primary, secondary, tertiary structure
Can only refer to the structure
of a protein if a particular
environment is assumed
solvent environment (aqueous
trans-membrane ……)

temperature

pH etc
 Different environments yield different
structures or no stable structure at all

Proteins molecules are not
completely rigid structures
kinetic energy energetic collisions with
solvent molecules
 vibrations sidechain conformational
changes
 flexible sections of the peptide chain
 The native tertiary structure of a protein
is thus an average

Protein Folding

Protein folding = searching for a
conformation having minimum energy
Factors in protein folding
hydrophobic effects
 electrostatic charges in residues
 hydrogen bonding
 Chaperonins,ribosomes

3 stages of folding
denatured unfolded state
 molten globule state
 native compact state
 most proteins will return to their native
state after forced denaturation

The Protein Folding Problem

Given a proteins amino acid sequence
what is its tertiary structure

The protein folding problem is hard
Direct approach :molecular
dynamics simulation
Simulate on an atomic level the folding
of a single protein molecule
 protein = thousands of atoms
 solvent environment = hundreds to
thousands of molecules => thousands
of atoms

Sub-picosecond time scales
 run the simulation for 1-5 seconds
 We need another years of Moores law
to make this computation feasible

Designability
A protein with a stable native state can
not have another low-energy state
nearby in conformational space
 A structure is highly designable if its
minimum energy state has no lowenergy neighbours

Protein Threading
inverse protein folding problem: given
 a tertiary structure, find an amino acid
sequence that folds to that structure
 Protein threading: given a library of
possible protein folds and an amino acid
sequence find the fold with the
 best sequence -> structure alignment
(threading)


Evolution depends on designability to
preserve function under mutation

Estimate only different protein
structures exist in nature (Chothia,1992)
four components
a library of protein folds (templates)
 a scoring function to measure the
fitness of a sequence -> structure
alignment
 a search technique for finding the best
alignment between a fixed sequence
and structure


a means of choosing the best fold from
among the best scoring alignments of a
sequence to all possible folds
Scoring Schemes for
Sequence->Structure
Alignments

The scoring scheme for a particular
threading of a sequence onto a
structure measures the degree to which







environmental preferences are satisfied
Different amino acid types prefer different
environments e.g.
structural preferences:
in helix
in sheet
not exposed to solvent
pairwise interactions with neighbouring amino
acids
Formal Statement of the
ProteinThreading Problem
C is a protein core having m segments
Ci representing a set of contiguous
amino acids Let ci be the length of Ci
 Sequence a = a1a2…an of amino acids

Current limitations to protein
threading


Statistical problems
Definition of neighbor and /or pairwise
contact environments:
 energetic neighbor ? contact neighbor
Computational Complexity of
Finding an Optimal Alignment
The complexity of the protein threading
problem depends on whether:
 Variable-length gaps are allowed in
alignments
 the scoring function for an alignment
incorporates pairwise interactions
between amino acids

Property(I) makes the search space
exponential in size to the length of the
sequence
 Property(Ii) forces a solution to take
non-local effects into account

Any protein threading scheme with both properties is NP-complete
(3-SAT Lathrop 1994)
(MAX-CUT Akutsu,Miyano 1999)
Thus all protein threading approaches can be divided
into four groups:
1 no variable length gaps allowed
2 no pairwise interactions considered in scoring function
3 no optimal solution guarantee
4 exponential runtime
Performance of Protein Threading Systems
CASP1(1994) CASP2(1996) CASP3(1998): Critical
Assessment of Structure Prediction meetings
protein threading methods have consistently been
the winners
success depends on structural similarity of target to
known structures
successful even when target sequence and library
sequence have low homology
Much room for improvement in all areas of protein threading e.g.:
algorithms for searching the threading space
reliable biologically accurate scoring functions