Transcript Chapter15
Chapter 15
Structure Prediction: Threading
Motivation
Given a protein, can you predict molecular
structure
Want to avoid repeated x-ray crystallography,
but want accuracy
You could use nucleotide alignment, but what
do you do with the gapped regions?
More complex methods are only justified if
they can be shown to perform better than
simpler methods
Simpler methods are only justified if they can
perform better than basic sequence
alignment
First Step
Some structure comparison methods
use secondary structures of the new
sequence
Predict location of secondary structure
elements along the protein’s backbone
and the degree of residue burial
Supervised learning has been shown to
perform well in this task
Artificial Neural Network
Predicts
Structure
at this
point
Danger
You may train the network on your
training set, but it may not generalize to
other data
Perhaps we should train several ANNs
and then let them vote on the structure
Profile network from HeiDelberg
family (alignment is used as input) instead of just
the new sequence
On the first level, a window of length 13 around
the residue is used
The window slides down the sequence, making a
prediction for each residue
The input includes the frequency of amino acids
occurring in each position in the multiple
alignment (In the example, there are 5
sequences in the multiple alignment)
The second level takes these predictions from
neural networks that are centered on neighboring
proteins
The third level does a jury selection
PHD
Predicts 4
Predicts 5
Predicts 6
Threading
Threading matches structure to
sequence
True threading considers 3D spatial
interactions
3D-1D Matching (Bowie et al.)
Convert 3D structure into a string
Include -helix, -sheet or neither
Include buried or solvent accessible (6
levels)
Total of 3X6=18 distinct states
With Pa:j= probability of finding amino
acid (a) in environment (j) and
Pa=probability of finding (a) anywhere
Pa: j
saj log
Pa
3D-1D
Calculate the information values score
on a training set of multiple alignments
and the score was used as a profile for
each column
When applied to the globin family an
clearly identified myoglobins from
nonglobins but not from other globins
Methods using 3D interactions
Residues that have large separation in
the sequence may end up next to each
other when the protein is folded.
Define a measure of contact between
residues (two atoms within 5Å) and
count frequency of contact between all
pairs in PDB
Use measure in alignment to evaluate
cost, or to select the best alignment
3D interactions
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Potentials of mean force (POMF)
Since the notion of contact is somewhat
arbitrary, a more general formulation
can be tried
Derive an empirical function for the
propensity of each of the 400 pairs of
residues to be any given distance apart.
Multiple Sequence Threading
Multiple Sequence Alignment
Align the most similar to create a consensus
sequence
Align consensus sequences to create overall
alignment
Use the same strategy with structures
Assume that conserved hydrophobic
positions should pack in the core
This appears to be work in progress (1997)
Example
The POMF would have a
peak around 5A
Aspartate (D) and valine
since do not often pack
together
POMF(A,V)
Probability
Two small hydrophobic
residues alanine (A) and
valine (V), both of which
favor packing in the core
of the protein.
The POMF will have a dip
around 5A
5A
Distance
POMF(D,V)
Probability
5A
Distance
Sequence-Structure Alignment
For all know structures
Align
the unknown sequence to that
structure
Find the best alignment
Return the structure with the best global
alignment
Unfortunately, we cant use dynamic
programming (NP Complete)
Heuristics
space.
must be used to explore the
Evaluating Methods
Is the complexity worth it?
This is difficult without a benchmark
Few comparative studies have been
performed
When they have been performed, authors of
competing methods have complained that wrong
parameters were used …
Critical Assessment of Structure Prediction
(CASP 1994) releases protein structures prior
to publication.
All methods submit their predictions
Predictions are analyzed based on fold
recognition, modeling accuracy and alignment
accuracy.
No one method or approach is obviously superior