ppt - Brown University
Download
Report
Transcript ppt - Brown University
Face-centered cubic (FCC) lattice
models for protein folding: energy
function inference and biplane
packing
Allan Stewart
Proteins carry out the work of the cell
Reményi A et al. Genes Dev. 2003;17:2048-2059
Dogma of computational protein
structure prediction (PSP)
The biological “native” has the minimum energy
conformation over the entire fold landscape.
Controversial whether native is unique or there
if there may generally be an ensemble
Protein folding is NP-hard in most
formulations
Reduction to partition problem (Ngo & Marks '92)
The problem is hard under any
reasonable model
The protein energy is minimized iff
there is an assignment of move vectors into two
subsets st. the sum of the subsets are equal
We find NP-hardness results for other
formulations: Ising model, bin packing,
Hamiltonian path. (See Istrail & Lam, Combinatorial Algorithms for
Protein Folding in Lattice Models: A Survey of Mathematical Results, 2009)
This sort of worst-case intractability analysis will
not improve in even the simplest of models.
Biological basis of folding principally
involves hydrophobic collapse
“Truth is much too complicated to allow
anything but approximations.”
~ John von Neumann
Protein primary structure:
N - AGECH... - C
The tertiary (3D) structure is dependent on
primary structure alone (experimental evidence)
Biological basis of folding principally
involves hydrophobic collapse
Suppose the protein sequence to be a string
over the 20-letter amino acid alphabet
Avoiding the complexity (charge, size) of amino
acids, we classify the residues as
'H':
hydrophobic residues
'P':
hydrophilic (polar) residues
The HP model (Ken Dill, 1985) is a simplest
framework for folding, and prioritizes H-H
interactions.
Biological basis of folding principally
involves hydrophobic collapse
Right: red are hydrophobic, green hydrophilic
Reds form a core in the center → Long-range interactions
There are two camps in protein folding
Off-lattice (continuous mathematics)
More flexibility
Heuristic methods perform fairly well
Optimality of simulation is uncertain
On-lattice (discrete mathematics)
Exhaustive enumeration of space
Provably timely and near-optimal results
But a lot is yet unknown...
We don't know if the lattice gives good
prediction
PDB
Repo
continuous
discrete
LatFit
FCC
SC
Energy
Fit an
Energy
function
Predict
Native
Conjecture/
theorems
Structures
Statistical
Evaluation
of existing
methods
My Thesis
Model
Face-centered cubic lattice
Lattices are discrete subgroups of 3R
distinguished by their basis vectors (connectivity)
and coordination number
Folding is the minimization of the
energy potential function
Protein conformations are Boltzmann distributed
Typical energy functions sum values for each of the
pairs of amino acids in the protein sequence.
(Some don't)
Caveat foldtor
Your fold is only as good as your potential function,
and how hard you work is dependent on the function.
Prior work shows that we can do
with just a few parameters
HP model typically only scores H-H contacts
This corresponds to a symmetric interaction
matrix
We look for empirical parameters
which improve over the 'HP' matrix
• Extract 1198 PDB structures
• Generate decoys
decoys are natives which have been perturbed by
roughly 16%
• Count all types of contacts
• Use gradient ascent to optimize choice of
parameters
max
We found an optimum energy
function, but not a universal one.
H
P
H
P
S
-.13
0
-.05
H
-.04 -.06
P
S
X
13997
H
P
S
-.15
0
0
-.04
0
S
X
13365
13997 → 72% successful prediction.
A large fraction of the decoys are very
deceptive
Pairwise Function Impossibility
Conjecture
In collaboration with Warren Schudy and Sorin
Istrail, we conjecture that no linear function f
which sums pairwise potential satisfies axioms
(1) and (2)
(1)
(2)
We formulate it as an LP with the above as linear
constraints.
Towards Realistic Models of Folding
“For me, the first challenge for computing
science is to discover how to maintain order in
a finite, but very large, discrete universe that is
intricately intertwined.”
~ Edsgar Dijkstra
What do lattice algorithms look like?
We chop the protein into blocks and align
blocks with high hydrophobicity.
(Hart and Istrail 1995)
Use inequalities to bound numbers of contacts
Approximation algorithms
What do lattice algorithms look like?
Hart and Istrail (1997) show an 86%
approximation ratio for a 4x2 biplane on FCC
sidechain model
The biplane is near-optimal, but is it
realistic???
We found an optimal center cutting plane
through each protein and annotated the
hydrophobics lying within distance k.
There is high variance in biplane hydrophobicity.
Roughly 50:50 biplanar to non-biplanar
The biplane is near-optimal, but is it
realistic???
set
alpha
beta
unstructured
solvent
biplanar
30%
19%
49%
46%
non-biplanar
37%
20%
44%
47%
Biplane corresponds best to a globular fold.
The alpha helix is a problem!
Rescuing the alpha-helix with the FCC
The alpha-helix is a right-handed helix, ~4
residues per turn.
The FCC places spheres
at angles : the
dihedral angles of the
helix.
Idea #1: Find a 4-tuple of alpha
vectors in FCC
Alpha bundles from
Pokarowski et al. 2003
Idea #2: Assemble octahedrons in
FCC lattice
Goal
73
69
Build an octahedron-like conformation with
hydrophobics towards center.
Exploits angles of FCC: face angles = 120deg.
Conclusion: new frontiers for FCC
sidechain folding
Implications for algorithm design
Block partitioning
Fold block into biplane or octahedron
Can we prove bounds for increasingly complex
methods?
If we prove pairwise impossibility, how do we
construct our energy function?
Questions?
Fire away.
“If you don't work on important problems, it's not
likely that you'll do important work.”
~ Richard Hamming
Thank you for doing important work.
If I reach this slide, something went
wrong
“There's no sense in being precise when you
don't even know what you're talking about.”
– John von Neumann
“It is better to do the right problem the wrong
way than the wrong problem the right way.
— Richard Hamming