Transcript Document

High Throughput
Computing and Protein
Structure
Stephen E. Hamby
Overview
•
•
•
•
•
•
•
•
Introduction To Protein Structure
Dihedral Angles
Previous Work
Support Vector Regression
Optimisation
Prediction
Results
Conclusions
Introduction To Protein
Structure
Molecules with massive biological importance
Structure determination gives insight into ….
• Function, Dynamics, Potential drug targets.
Experimental structure determination is….
• Expensive, Slow, Difficult
Introduction To Protein
Structure
Primary Structure:
Order of Amino Acids
Secondary Structure:
Building blocks
Tertiary Structure:
Complete 3D Structure
Introduction To Protein
Structure
Secondary Structure
Types
α-helix
β-sheet
Random Coil
Dihedral Angles
Dihedral Angles
Dihedral Angles
Finding the secondary structure of a protein is
a step towards finding its complete structure
Predicting dihedral angles can help us to get
the secondary structure
How Can We Predict Dihedral
Angles?
Previous work
Destruct
Multiple neural networks.
Iterative method.
Predicts secondary structure
and dihedral angles.
Previous work
Real Spine
Twin neural networks give a consensus prediction.
Predicts dihedral angles from various amino acid
properties amino acid composition and predicted
structure.
Support Vector Regression
Kernel machine learning raises the data to a
higher dimension so a linear relationship can be
found.
Support Vector Regression
Attempts to fit a linear function to the data in a
high dimensional feature space
Accurate but…
Slow, needs optimisation, black box.
Support Vector Regression
Kernel Choice
We tested the various kernels available through
the PyML package.
These the are linear, polynomial, and gaussian
kernels.
We tested them using the CASP4 dataset.
Gaussian kernel produced the best results.
Optimisation
Three interdependent parameters
Grid based optimisation on a the CASP4 dataset
Around 10000 3 hour jobs.
Run in blocks of 10 on Jupiter
Accuracy assessed using the Pearson
correlation coefficient
Prediction
Support vector machine using a Gaussian kernel
and optimal parameters.
Training on the CB513 dataset.
Tested by 10 fold cross validation
CASP 4 used as a test set.
Results
Results measured by cross validation
Destruct
Pearson
0.42
Correlation
Coefficient
Real Spine SVM
Prediction
0.62
0.57
CASP4 Test set gives Pearson Correlation
Coefficient of 0.56
Results
Using Secondary structure predictions made by
cascade correlation neural networks:
Dihedrals assisted by predicted structure
Pearson correlation coefficient 0.582.
Subsequent iterations should lead to better
predictions of both structure and dihedral angles.
What Next?
Using further iterations to improve accuracy.
Current method is a black box.
Can we use a program like Trepan to get some
definite rules about secondary structure.
Conclusions
• Dihedral Angles define protein secondary
structure
• Using Support Vector Machines it is possible to
predict dihedral angles
• We (hopefully!) can use predicted dihedral
angles to improve the accuracy of secondary
structure prediction.
Acknowledgements
Jonathan Hirst
Hirst group members
BBSRC
The University of Nottingham