No Slide Title

Download Report

Transcript No Slide Title

Protein Structure Prediction
N. Gautham
Department of Crystallography and Biophysics
University of Madras, Guindy Campus
Chennai 600 025
Lecture Outline
• The problem of protein structure prediction - its
statement
• The Levinthal paradox and computational
complexity
• Methods of structure prediction
• Ab initio methods
• Genetic algorithms
• Potential energy minimisation
• MOLS
Information Transfer pathway within the cell
……ATGCATGCATGCATGCATGC..
………CGUACGUACGUACGU…………
DNA
………CGUACGUACGUACGU…………
RNA
DECODING
MECHANISM
PROTEIN Sequence
PROTEIN Structure
Biological function
Statement of the problem
• The sequence of amino acids in a protein
determines its three dimensional structure
….AVTYRGSED….
• The structure of a protein is essential for its
function
Statement of the problem
• Structures are determined experimentally
using X-ray crystallography and NMR
• This is expensive and time-consuming
• Instead, can this be done using computers ?
• The Problem:
Given the sequence of a protein, can we use
available information from Physics, Chemistry (and
databases of previous structures, etc.) to calculate
its three dimensional structure ?
Levinthal Paradox
• The Levinthal paradox arises when we consider
•
protein folding as a thermodynamic phenomenon
(driven by entropy)
This means:
- the native fold of a protein is its minimum
energy state
the protein folds by sampling its conformational
space to find the one with least energy
-
Levinthal Paradox
• The time taken to search all possibilities
increases exponentially with the size of the
protein (by known algorithms)
• THE PARADOX:
In real life proteins fold in a
few milliseconds, though we expect them to
take centuries!!
• In other words – the problem of protein
structure prediction (or protein folding) is
NP in terms of computational complexity
Levinthal Paradox
The ‘Golf
Course’ model
of the
potential
energy
landscape
Levinthal Paradox – The new
view of protein folding
The ‘folding
funnel’ model of
the potential
energy
landscape
Computational complexity
• If an algorithm is such that the computation
time increases as a polynomial function of the
size of the problem – it is a ‘Polynomial time’
algorithm. It belongs to the set P
e.g. Time = const x (size)2 + const x (size)5
• If an algorithm is such that the computation
time increases as an exponential function of
the size of the problem – it is a ‘NonPolynomial time’ algorithm. It belongs to the
set NP
e.g. Time = const x 2.5size
Computational complexity
• The Travelling Salesman problem
– an example of a problem in NP
City 3
City 1
City 6
City 2
City 4
City 5
• Problem – Find the path with the least
distance that covers all cities at least once.
• The number of paths to be tried increases
an exponential function of the number of
cities
as
Computational complexity
• Problems in computational biology that are
in NP :
• Construction of phylogenetic trees
A
B
A
C A
B
A
B
B
D C
D D
C
• Multiple sequence alignment
• Protein Structure Prediction
Methods of Protein Structure
Prediction
• Homology modelling
• Fold recognition
• Ab initio methods
• Genetic algorithms
• Potential energy
minimisation
• MOLS
Ab initio methods: Genetic
algorithms
• a.k.a. ‘Evolutionary Computation’
• The method operates on pieces of information
•
(like Nature on genes)
Start with a group of individuals (binary coded ?)
that represent possible solutions to the problem
• Apply mutation, variation and crossover operators
to the individuals
• From the resulting population, select individuals with
high values of fitness to populate next generation
• Iterate till best individual is obtained
Genetic algorithms: Application to
Protein Structure Prediction
• The initial generation consisted of protein structures
with random choice of torsion angle values
• The fitness function was a semi-empirical
potential
energy function, i.e. EvdW + Eel + Etor + Epseudoentropy
• The mutate operator randomly changed torsion angle
values
• The variate operator made small, random increments
or decrements to torsion angle values
• The crossover operator interchanged portions of
randomly selected pairs of individuals
Potential Energy Minimization
• Minimize Potential Energy (Least squares,
Conjugate Gradient, Molecular Dynamics….)
• The problem – where to start? How to avoid
local minima?
• Many methods
- Build-up method
- Conformational Space Annealing
- Monte Carlo Minimization
- Diffusion Equation and Distance Scaling
- Simulated Annealing
- …….
Potential Energy Minimization
• Build-up method
5
1
Step 1
Minimize
10
6
1
Step 2
10
Minimize
Mutually Orthogonal Latin Squares
OBJECTIVE: To build a library of the lowest
energy conformations of an oligopeptide
Mutually Orthogonal Latin Squares
METHOD (IN BRIEF): The MOLS cycle
Parameterize the search space
Use these to build a set of MOLS (chosen at
random) to globally sample the space
Analyze the samples to obtain a low energy conformation
(This is followed by gradient minimization)
Yes
Another low energy
conformation?
Mutually Orthogonal Latin Squares
Results: The (23) best structures for Met-enkephalin
GA
MOLS
Initial population of 50
individual structures for
the sequence
Sequence is split into
overlapping fragments of
five/seven/nine residues
Mutations
Conformational search
for all fragments using
MOLS yielding ~ 1000
structures each
Resulting structures
are clustered
A library
of
structures
for
each fragment
Variations
Structures from
MOLS libraries
Crossing over
Selection
of individuals with
lower energy
Avian pancreatic
polypeptide
36 residues
RMSD 4.0 A
Prediction
Experiment
(X-ray crystallography)
Villin headpiece
36 residues
RMSD 5.2 A
Prediction
Experiment
(X-ray crystallography)
Tryptophan
zipper
16 residues
RMSD 2.7 A
Prediction
Experiment
(NMR)
Bovine Pancreatic Trypsin
Inhibitor
58 residues
3 disulphide bridges
Prediction
Experiment
(X-ray)
Protein Structure Prediction:
Conclusion
• If the sequence of the protein of unknown structure has greater
than 40% identity with one of known structure, the structure
prediction problem may be considered solved – especially with the
structural genomics initiative
• Ab initio
structure prediction, using knowledge only of
sequence, and of physics and chemistry, is as yet an unsolved
problem