Folie 1 - imb

Download Report

Transcript Folie 1 - imb

3D Structures of Biological Macromolecules
Part 5
Protein Structure Prediction - II
Jürgen Sühnel
[email protected]
Institute of Molecular Biotechnology, Jena Centre for Bioinformatics
Jena / Germany
Supplementary Material: http://www.imb-jena.de/www_bioc/3D/
Molecular Mechanics (Force Field)
http://cmm.info.nih.gov/modeling/guide_documents/molecular_mechanics_document.html
How Do We Get the Parameters ?
Experimental Data
(Examples: Geometrical Parameters)
Quantum-chemical Calculations
(Examples: Charges)
Quantum Chemistry
Quantum Chemistry
Geometry Optimization
Optimization Methods
Optimization Methods
Optimization Methods – Steepest Descent
Selection of an initial point x0
Determination of direction and step size for calculating the next point
Optimization Methods – Conjugate Gradients Method
Optimization Methods – Newton-Raphson Methods
g -. gradient
h - Hessian
Molecular Dynamics
Simulation of Protein Folding – Molecular Dynamics
AMBER
GROMOS
CHARMM
TINKER
Molecular Dynamics Simulation
Protein Capsid Of Filamentous Bacteriophage Ph75 From Thermus Thermophilus
1HGV, extended structure
1HGV, actual structure
1HGV, 61% helix, 1.928 ns
1HGV, 75% helix, 3.428 ns
Images created using VMD (Visual Molecular Dynamics) (HUMPHREY, W.,
DALKE, A. and SCHULTEN, K., 1996.VMD - Visual Molecular Dynamics. Journal
Molecular Graphics,14, pp33-38).
Molecular Dynamics Packages
amber.scripps.edu
Molecular Dynamics Packages
www.igc.ethz.ch/gromos/
Molecular Dynamics Packages
www.charmm.org
Molecular Dynamics Packages
dasher.wustl.edu/tinker/
Visualizing and Analyzing Molecular Dynamics Simulations
www.ks.uiuc.edu/Research/vmd/
Folding Surface for Lysozyme
Dobson, Sali, Karplus, Angew. Chem. Int. Ed. 1998, 37, 868.
Protein Folding States
Dobson, Sali, Karplus, Angew. Chem. Int. Ed. 1998, 37, 868.
Monitoring Protein Folding by Experimental Methods
Dobson, Sali, Karplus, Angew. Chem. Int. Ed. 1998, 37, 868.
Monitoring Protein Folding by Experimental Methods
Paxco, Dobson, Curr. Opin. Struct. Biol. 1996, 6, 630.
Protein Folding by Molecular Dynamics
Protein Folding by Molecular Dynamics
Protein Folding by Molecular Dynamics
Villin headpiece domain
(PDB code: 1vii)
Actin binding site highlighted
36 amino acids
Protein Folding by Molecular Dynamics
Protein Folding by Molecular Dynamics
Protein Folding by Molecular Dynamics
Radius of Gyration
The radius of gyration Rg is defined by the root-mean-square distance between
all atoms in a molecule and the centroid.
In a globular protein the radius of gyration Rg can be predicted with reasonable
accuracy from the relationship
Rg(pred)
= 2.2 N 0.38
where N is the number of amino acids.
Protein Folding by Molecular Dynamics
Protein Folding by Molecular Dynamics
Statistical Potentials
wij(r)
ij(r)
*
–
-
interaction free energy
pair density
reference pair density at
infinite separation
Statistical potentials can be determined by
simply counting interactions of a specific type
in a dataset of experimental structures.
The distance dependence may or may not be taken
into account. If not, the interaction free energy is usually
called a contact potential. It represents an average over
distances shorter than some cutoff distance rc.
Thomas, Dill, J. Mol. Biol. 1996, 257, 457-469
Lattice Folding
Lattice Algorithm
•
•
•
•
•
•
Red = hydrophobic, Blue = hydrophilic
If Red is near empty space E = E+1
If Blue is near empty space E = E-1
If Red is near another Red E = E-1
If Blue is near another Blue E = E+0
If Blue is near Red E = E+0
Ab Initio Protein Structure Prediction
http://rosettadesign.med.unc.edu/
Ab Inition Protein Structure Prediction - Rosetta
Structure representation:
Only main-chain heavy atoms and Cbeta-atom of sidechains are taken into account,
Bond lengths and bond angles are held constant and correspond to the alanine geometry.
The only remaining geometrical variables are the backbone torsion angles.
Structure generation:
Generation of fragment libraries from experimental structures (3 and 9 amino acids).
Splicing together fragments of proteins of known structure with similar sequences.
The conformational space defined by these fragments is then searched by a Monte Carlo procedure
with an energy function that favors compact structures with paired beta-strands and buried hydrophobic
amino acids.
A total of 1000 independent simulations are carried out (starting from different random number seeds)
for each query sequence.
The resulting structures are clustered.
Initial evaluation by the scoring function
Low-scoring conformations are identified by simulated annealing with a move set that
involves replacing the torsion angles of a segment of the chain with a related amino acid
sequence.
Further evaluation by
Protein Backbone Torsion Angles and Ramachandran Plot
Bayesian Statistics
Bayesian statistical methods differ from other types of statistics by the use of
conditional probabilities.
Bayes Theorem
P(A|B) = [P(B|A) x P(A)] / P(B)
ROSETTA Results
Simons, Strauss, Baker. J. Mol. Biol. 2001, 306, 1191-1199.
Computational Thermostabilization
Computational Thermostabilization
Prediction of stabile mutations with Rosetta Design
Computational Thermostabilization
PDB code: 1ox7
Cytosine deaminase (CD) catalyzes the deamination of cytosine (converts cytosine to uracil)
and is only present in prokaryotes and fungi, where it is a member of the pyrimidine salvage pathway.
The enzyme is of interest both for antimicrobial drug design and gene therapy applications against tumors.
Computational Thermostabilization
Computational Thermostabilization
Computational Thermostabilization
Superposition of double and triple mutant structures (PDB codes: 1ysb, 1ysd)
A23L
I140L
V108I
Comparing Protein Structures
• The RMSD is a measure to quantify structural similarity
• Requires 2 superimposed structures (designated here as
“a” & “b”)
• N = number of atoms being compared
RMSD =
S (xai - xbi)2+(yai - ybi)2+(zai - zbi)2
N
Comparing Protein Structures
http://wishart.biology.ualberta.ca/SuperPose/
Comparing Protein Structures
http://www.ebi.ac.uk/DaliLite/
Comparing Protein Structures
http://cl.sdsc.edu/
Comparing Protein Structures – Superpose Server
Beginning with an input PDB file or set of files, SuperPose first extracts the sequences of all chains in the file(s).
Each sequence pair is then aligned using a Needleman–Wunsch pairwise alignment algorithm.
If the pairwise sequence identity falls below the default threshold (25%), SuperPose determines the
secondary structure using VADAR (volume, area, dihedral angle reporter) and performs a secondary structure
alignment using a modified Needleman–Wunsch algorithm.
After the sequence or secondary structure alignment is complete, SuperPose then generates a
difference distance (DD) matrix between aligned alpha carbon atoms. A difference distance matrix can be
generated by first calculating the distances between all pairs of C atoms in one molecule to generate an initial
distance matrix. A second pairwise distance matrix is generated for the second molecule and,
for equivalent/aligned Calpha atoms, the two matrices are subtracted from one another,
yielding the DD matrix. From the DD matrix it is possible to quantitatively assess the structural
similarity/dissimilarity between two structures. In fact, the difference distance method is particularly good
at detecting domain or hinge motions in proteins. SuperPose analyzes the DD matrices and
identifies the largest contiguous domain between the two molecules that exhibits <2.0 Å difference.
From the information derived from the sequence alignment and DD comparison, the program then makes a
decision regarding which regions should be superimposed and which atoms should be counted in calculating
the RMSD. This information is then fed into the quaternion superposition algorithm and the RMSD calculation
subroutine. The quaternion superposition program is written in C and is based on both Kearsley's method
and the PDBSUP Fortran program developed by Rupp and Parkin. Quaternions were developed by
W. Hamilton (the mathematician/physicist) in 1843 as a convenient way to parameterize rotations in a simple
algebraic fashion. Because algebraic expressions are more rapidly calculable than trigonometric expressions
using computers, the quaternion approach is exceedingly fast.
SuperPose can calculate both pairwise and multiple structure superpositions [using standard hierarchical methods
and can generate a variety of RMSD values for alpha carbons, backbone atoms, heavy atoms and all atoms
(average and pairwise). When identical sequences are compared, SuperPose also generates ‘per residue’
RMSD tables and plots to allow users to identify, assess and view individual residue displacements.
http://wishart.biology.ualberta.ca/SuperPose/
Comparing Protein Structures
Comparing Protein Structures
Comparing Protein Structures
http://www-structure.llnl.gov/xray/comp/suptext.htm
IMB LINUX Cluster by IBM
1 Frontend
2 Storage Nodes
26 Compute Nodes
Compute nodes:
2 x 2.4GHz Intel Xeon [tm] processors, 1GByte RAM,
40 GByte local IDE Hard Disk
Frontend:
Mirrored 73 GByte SCSI Disk
Interconnect:
Myrinet
Disk array:
10 x 73 GByte Fiber Channel Disks
Operating system:
Linux Red-Hat 7.3
Cluster software:
CSM (Cluster Systems Management), GPFS (General Parallel File System)
Cluster vs. Grid Computing
Clusters
are made up of dedicated components and all
components in a cluster are exclusively owned and
managed as part of the cluster. All resources are known,
fixed and usually uniform in configuration. It is a
static environment.
Grids
differ from clusters because grids share
resources from and among independent system owners.
Grids are configured from computer systems that are
individually managed and used both as independent
systems and as part of the grid. Thus, individual
components are not 'fixed' in the grid and the overall
configuration of the grid changes over time. This
results in a dynamic system that continually assesses
and optimises its utilisation of resources.
EUROGRID - BioGRID
www.eurogrid.org/wp1.html
Simulation of Protein Folding
Simulation of Protein Folding
thousan trillon FLOPs
IBM Blue Gene Project | System-on-a-Chip Approach
~ 65.000 processors
teraflop – a trillion floating point operations
per second
IBM Blue Gene Project | System-on-a-Chip Approach