Transcript slides

Machine Learning Methods of Protein
Secondary Structure Prediction
Presented by Chao Wang
• What is secondary structure?
• How to evaluate secondary structure
prediction?
• How secondary structure prediction affects
the accuracy of tertiary structure prediction?
• Our perspective: ``elite''
What is secondary structure?
• Hydrogen bond: a non-covalent bond
A hydrogen bond is identified if E in the
following equation is less than -0.5 kcal/mol
8-state annotation by DSSP
Prediction
• Early methods of secondary-structure prediction
were restricted to predicting the three
predominate states: helix, sheet, or random coil.
These methods were based on the helix- or
sheet-forming propensities of individual amino
acids, sometimes coupled with rules for
estimating the free energy of forming secondary
structure elements. Such methods were typically
~60% accurate in predicting which of the three
states (helix/sheet/coil) a residue adopts.
A significant increase in accuracy (to nearly ~80%) was made by
exploiting multiple sequence alignment; knowing the full distribution of
amino acids that occur at a position (and in its vicinity, typically ~7
residues on either side) throughout evolution provides a much better
picture of the structural tendencies near that position. For illustration, a
given protein might have a glycine at a given position, which by itself
might suggest a random coil there. However, multiple sequence
alignment might reveal that helix-favoring amino acids occur at that
position (and nearby positions) in 95% of homologous proteins
spanning nearly a billion years of evolution. Moreover, by examining the
average hydrophobicity at that and nearby positions, the same
alignment might also suggest a pattern of residue solvent accessibility
consistent with an α-helix. Taken together, these factors would suggest
that the glycine of the original protein adopts α-helical structure, rather
than random coil. Several types of methods are used to combine all the
available data to form a 3-state prediction, including neural networks,
hidden Markov models and support vector machines. Modern
prediction methods also provide a confidence score for their predictions
at every position.
Outline
•
•
•
•
CNF model by Jinbo
Multi-step learning model by Yaoqi
Iterative deep learning model by Yaoqi
Our perspective: Elite.
– A new enperiment to detect how elite affects
secondary structure prediction.
• Methods
– How to model the probability
– Feature Selection
• Results
– vs. other methods
– Improvement
Protein 8-class secondary
structure prediction using
conditional neural fields
Zhiyong Wang, Feng Zhao, Jian
Peng, and Jinbo Xu
Proteomics. 2011
Model
Training & Prediction
Features
Training/testing set
Results
• Outperform SSpro8 on each state
• Regularization factor effect: insensitive,
optimal when the factor is set to 9.
• Neff effective: for SS prediction, it may not be the best strategy to
use evolutionary information in as many homologs as possible.
Instead, we should use a subset of sequence homologs to build
sequence profile when there are many sequence homologs
available.
J Comput Chem. 2012