Example of 3DIGARS-PSP modeling results on known Hard E. Coli

Download Report

Transcript Example of 3DIGARS-PSP modeling results on known Hard E. Coli

Next Generation Evolutionary Sampling and Energy Function Guided Ab Initio Protein
Structure Prediction
Example of 3DIGARS-PSP modeling results on known Hard E. Coli and Protease Inhibitor proteins
Avdesh Mishra, Md Tamjidul Hoque
email: {amishra2, thoque}@uno.edu
Additional beta sheet region
Additional beta sheet region
One amino acid is not
assigned to beta sheet
One amino acid is
added to the beta sheet
One amino acid is not
assigned to beta sheet
Department of Computer Science
University of New Orleans, LA, USA
One amino acid is
added to the beta sheet
Introduction
The confirmation of a protein is vital to understand the function it
performs within the cell. Towards this goal, we developed a
computer program that applies a memory assisted evolutionary
algorithm to sample the energy hyper-surface of the protein
folding process, searching for the global minimum or the native
fold of the protein. Sampling of the energy hyper-surface of the
protein is achieved by novel mutation and crossover operations
based on angular rotation and translation capabilities.
Furthermore, the crossover operations in current generation are
enhanced by the use of the best parents selected from previous
generations. In addition, we employ a knowledge-based novel
energy function, 3DIGARS3.0, which can differentiate the native
structure that corresponds to the most thermodynamically stable
state, compare to the possible decoy structures most effectively.
The 3DIGARS3.0 energy function is an optimized combination of
crucial properties such as hydrophobic versus hydrophilic,
sequence-specific predicted accessibility and ubiquitous phi-psi
characterization.
Missing Helixes
Figure 1 | Cysteine Protease Inhibitor (PDB ID: 1nyc); towards left – superposition of 3DIGARSPSP model on native (initial seeds from Rosetta); towards right – superposition of top Rosetta model
(based on TMScore) on native.
Figure 2 | E. Coli protein (PDB ID: 1pohA); towards left – superposition of 3DIGARS-PSP model
on native (initial seeds from Rosetta); towards right – superposition of top Rosetta model (based on
TMScore) on native.
Beta sheet predicted correctly
Figure 3 | E. Coli protein (PDB ID: 1pohA); towards left – superposition of 3DIGARS-PSP model
on native (initial seeds from I-Tasser); towards right – superposition of top I-Tasser model (based on
TMScore) on native.
Missing beta and helix regions
Missing helixes
Helixes are gained
Additional beta region
Additional beta sheet, potential
area of improvement
Additional beta sheet, potential
area of improvement
Missing beta sheet region
Figure 4 | E. Coli protein (PDB ID: 2z9hA); towards left – superposition of 3DIGARS-PSP model
on native (initial seeds from Rosetta); towards right – superposition of top Rosetta model (based on
TMScore) on native.
Figure 5 | E. Coli protein (PDB ID: 2z9hA); towards left – superposition of 3DIGARS-PSP model
on native (initial seeds from I-Tasser); towards right – superposition of top I-Tasser model (based on
TMScore) on native.
Figure 6 | E. Coli protein (PDB ID: 2p7vA); towards left – superposition of 3DIGARS-PSP model
on native (initial seeds from Rosetta); towards right – superposition of top Rosetta model (based on
TMScore) on native.
Methods
Additional helixes, potential
area of improvement
Missing Helixes
Backbone Models
Initialize Population for GA
using Single Point Angular
Mutation
Dataset of 4332
Protein Structures
Obtain Secondary
Structure (SS) and Φ, Ψ
Angles using DSSP
Save Best Model in Memory
Select 5% Elite Models
Missing beta sheet
Generate Frequency
Distribution of Φ, Ψ
Angles and SS Types
Perform Memory Assisted
Crossover @ 70 %
Missing helixes
Additional beta sheet
Additional beta sheets, potential
area of improvement
Additional beta sheets
Figure 7 | E. Coli protein (PDB ID: 2p7vA); towards left – superposition of 3DIGARS-PSP model on
native (initial seeds from I-Tasser); towards right – superposition of top I-Tasser model (based on
TMScore) on native.
Figure 8 | E. Coli protein (PDB ID: 1k4nA); towards left – superposition of 3DIGARS-PSP model
on native (initial seeds from Rosetta); towards right – superposition of top Rosetta model (based on
TMScore) on native.
Note: Natives are shown in cyan and pink and Models are shown in red and yellow
Results
Fill Rest Randomly
Perform Angular Mutation
@ 60%
Calculate Fitness using
3DIGARS3.0
Save Models
Generation
< 2000
End
Best Models
 Effective use of Ramachandran Plot
 Effective initialization and use of associated memory
 Development of new operator to implement move sets
Ongoing Research
Acknowledgements
Authors gratefully acknowledge the Louisiana Board of Regents through the
Board of Regents Support Fund, LEQSF (2013-16)-RD-A-19.
Figure 9 | E. Coli protein (PDB ID: 1k4nA); towards left – superposition of 3DIGARS-PSP model
on native (initial seeds from I-Tasser); towards right – superposition of top I-Tasser model (based on
TMScore) on native.
Discussions and Conclusions
 In past we have shown that our energy function,
3DIGARS3.0 outperforms the state-of-arts method
significantly.
 Also, in our prior work we have shown that our associate
memory based sampling algorithm provides superior
performance.
 In this work, we are working on to find the right
combination of our energy function and the sampling
algorithm to have better prediction of 3D structure of protein
in comparison to the state-of-art approaches.
 To this end, we have been able to successfully apply
dihedral angles mutation by rotation and crossover by
protein segment translation rules to enhance the mutation
and crossover operations of the sampling algorithms.
 We are working on case by case basis to obtain an accurate
prediction of the useful secondary structures in a protein.
Towards this, we have utilized the Ramachandran Plot
information within our sampling algorithm.
 We have found that the use of Ramachandran Plot yields in
significant improvement.
 We are exploring on the topics such as effective use of
Ramachandran Plot, move sets and associated memory to
find more efficient and effective rules to apply within the
sampling algorithm.
 We plan to further improve the PSP problem by combining
3DIGARS and sDFIRE energy function in near future to
make it further robust.