Transcript Charles

Protein Design
CS273: Final Project
Charles Kou
[email protected]
Crystal structure of top7 –
A novel protein structure created with RosettaDesign.
http://rosettadesign.med.unc.edu/
What is Protein Design
 Opposite of structure prediction:
determine low energy sequence that
yield given structure.
 Computationally difficult:
 Search space of 20^n where n =
sequence length (20 amino acids)
 Major algorithms: Dead-end elimination,
genetic algorithms, Monte Carlo, Branch
& Bound.
http://www.stanford.edu/class/cs273/project/project.html
Major Algorithms
 Trade off between thoroughness and
computational speed.
 Monte Carlo / Genetic Algorithm:
 Can sample space with infinite number of solutions
 Sidechain identity, side chain orientation and
backbone structure can be varied continuously.
 No guarantee of reaching global energy minimum.
 Dead-End Elimination
 Allows only discrete conformations.
 Rejection criteria is used to prune the search
space.
Desjarlais JR, Clarke ND.
Computer search algorithms in protein modification and design.
Curr Opin Struct Biol. 1998 Aug;8(4):471-5. PubMed
Review: Energy Landscape
qj
defined over large dimensional
conformation space
qi
qN-1
qN
q2
q1
JC Lantombe, Energy2.ppt
Review: Example Energy Function
• E=
•



S bonded terms
+ S non-bonded terms
+ S solvation terms
E = (ES + EQ + ES-B + ETor) + (EvdW + Edipole)
Bonded terms
- Relatively few
Non-bonded terms
- Depend on distances between pairs of atoms
- O(n2)  Expensive to compute
Solvation terms
- May require computing molecular surface
JC Lantombe, Energy2.ppt
Review: Monte Carlo Simulation
(MCS)
 Random walk through conformation space
 At each cycle:
– Perturb current conformation at random
– Accept step with probability:
 - ΔE

kT
P(accept)=min 1,e



(Metropolis acceptance criterion)
 The conformations generated by an arbitrarily long
MCS are Boltzman distributed, i.e.,
#conformations in V ~

V
e
-
E
kT
dV
JC Lantombe, Energy2.ppt
Monte Carlo Simulation
 Tend to waste time in local min.
 May consist of millions of steps.
 Energy must be evaluated frequently
(computationally expensive).
 Use ChainTree to improve performance.
Lotan, I., Schwarzer, F., Halperin, D., Latombe, J.C.:
Efficient maintenance and self-collision testing for kinematic chains.
In: Symposium on Computational Geometry (2002) 43–52
Koehl, P and Levitt, M. De novo protein design. I. In search of stability and specificity.
Journal of Molecular Biology, 293, 1161-1181 (1999).
Genetic Algorithm
Starts with First generation pool.
 Iteratively apply genetic operators
(selection, recombination, mutation).
 Evloves toward better solution (low
energy function).
S. M. Larson, J. England, J. DesJarlais, and V. S. Pande.
Thoroughly sampling sequence space: large-scale protein design of structural ensembles.
Protein Science 11 2804-281 (2002). Protein Science
Selection
• Selection function takes into account
the value of fitness function. This gives
priority to the “fit” organism but also
gives chance for “less fit” organisms.
http://en.wikipedia.org/wiki/Genetic_algorithm
Selection Method
• Roulette Method: probability of selection is
proportional to the value of fitness function
• Tournament: picks k individuals (tournament
size), and choose the individual with
probability p. Iterate with probability p*(1-p),
then p*(p*(1-p)) …
• Higher k = less chance for weaker individual.
http://en.wikipedia.org/wiki/Roulette_wheel_selection
http://en.wikipedia.org/wiki/Tournament_selection
Recombination, Mutation
 Recombination: different segment of the
structure which is optimized in parallel can be
recombined into the same model.
Recombination occurs with a set probability.
Otherwise, the population is propogated to
the next generation.
 Mutation: avoids local minima by mutating the
child with a set probability.
 Similar to MC: there is no guarantee to
converge into global minimum.
http://en.wikipedia.org/wiki/Genetic_operator
Genome@home
• Genome@Home uses
distributed computing
and genetic algorithm.
• It also incorporates
backbone flexibility
using Monte Carlo
(random perturbation
with RMSD<1.0a) which
improves the result.
http://www.stanford.edu/group/pandegroup/genome/
Dead-end Elimination
 Discrete conformational search.
 Functionally equivalent to exhustive search.
 It uses rejection criteria to prune the search
space.
 The robustness depends on the discreteness
and the rejection criteria used.
 Guaranteed convergence to global min.
 Initially used for sidechain placement. More
difficult for protein design because of high
degrees of freedom.
Looger LL, Hellinga HW.
Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable:
implications for protein design and structural genomics.
J Mol Biol. 2001 Mar 16;307(1):429-45. PubMed
Energy of conformation
 Reformulation of sidechain placement problem:
Amino acid identity is used instead of rotamer.
 The general DEE allows residue up to 300.
 Energy of conformation is defined as sum of
interaction among side chains and sum of interaction
of sidechain and the backbone.
 Rejection criteria is used and iterated until no more
rotamers can be eliminated. Convergence occurs, or
reduces the problem sufficiently for exhaustive
serach.
DEE filter: Rejection Criteria
 Simple Criterion: If lowest energy
struct that can be found using a given
sidechain rotamer (low energy side chain
conformation) is higer than the highest
energy struct w/ different rotamer, the
first rotamer is eliminated.
DEE filter: Rejection Criteria
 Goldstein Criteria: if energy struct
containing one rotamer is always
lowered by changing to a second one,
the first one is eliminated.
DEE filter: Rejection Criteria
 Generalized Criterion: residues are
added in group, eliminated clusters of
rotamers in the groups maybe excluded
from the minimum operator, in addition
to those which form dead-end clusters
with c.
Mean Field Theory
• Reduce search space.
• Self-consistency is sought
by placing amino acids at
pre-selected positions in a
given structure.
• Energy function is
minimized by mean field.
Voigt CA, Mayo SL, Arnold FH, Wang ZG.
Computational method to reduce the search space for directed protein evolution.
Proc Natl Acad Sci U S A. 2001 Mar 27;98(7):3778-83. PNAS
Review: Branch & Bound
•
•
Set of solutions can be
partitioned into subsets
(branch)
Upper limit on a subset’s
solution can be computed
fast (bound)
Branch & Bound
1. Select subset with best
possible bound
2. Subdivide it, and
compute a bound for
each subset
S.Batzoglou, Threading2.ppt
Rosetta Design
• Initial backbone designed
without regard to side-chain
packing.
• Iterates between sequence
design and backbone optimization
using Monte Carlo.
• Perturbation in random change in
the torsional angles of 1-5
random residue, or substitution
of backbone torsonal angles of 13 consecutive residues with
torsional angles from a structure
in the PDB. Sidechain
optimization. Accept/reject
using Metropolis criterion.
• 1.17-a backbone atom RMSD
between model and structure.
Crystal structure of top7 –
A novel protein structure created with RosettaDesign.
http://rosettadesign.med.unc.edu/
Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D.
Design of a novel globular protein fold with atomic-level accuracy.
Science. 2003 Nov 21;302(5649):1364-8. PubMed
Using Rosetta Design
• Red: PDB 1A1M: Mhc
Class I Molecule
B*5301 Complexed
With Peptide
Typdinqml From Gag
Protein Of Hiv2
• Blue: Rosetta Stone
Designed
• Visualized with Deep
View / SwissPdbViewer.
http://us.expasy.org/spdbv/
http://www.rcsb.org/pdb/cgi/explore.cgi?pid=195321117535569&pdbId=1A1M
b.e.a.n.s.
• A simple openGL based
program was developed
to test monte carol and
genetic algorithms on
designing “chain of
jelly beans.”
• User is able to vary the
initial structure of the
“beans” and compare
the efficiency of the
algorithms via built-in
timer.