homologymodeling

Download Report

Transcript homologymodeling

Comparative Protein Modeling
Jason Wiscarson ([email protected]), Lloyd Spaine ([email protected])
Introduction
Comparative or homology modeling, is a computational tool
used to predict three-dimensional structure of proteins with
unknown structures. If the sequence and the protein share
sequence similarity, proteins with known 3-D structures may
serve as templates to predict the unknown protein structure.
The term “homology” refers to evolutionary relationship
between two or more proteins that have the same ancestor in
an evolution tree regardless of their sequence similarity.
Proteins from similar families often have similar functions,
yet there are many instances in which proteins have similar
structure but different functions. Therefore the process to
construct 3-D models of proteins shown in Figure 1 is
paramount.
Find known
Align the target
and template
amino acid
residues
sequences and
3-D structures
related to the
target protein
Final
Model
Evaluate
Model
Select
templates and
adjust/improve
the alignments
Refine
Model
Construct
Model
Figure 1 Flow chart that shows construction of comparative protein models.
The solid lines represent comparative modeling steps, and dotted lines
represent parameters (template, alignment, construction environment, or
refinement method) that can improve the quality of the protein model
Finding related sequences and structures
In comparative protein modeling several databases are used to
find genomic, amino acid, and protein data.
The Expert Protein Analysis System (ExPASy) is the
start for searching for proteins and their related sequences.
Swiss-Prot contains data that has been refined by removing
unnecessary information and TrEMBL receives and stores
initial genomics data.
PROSITE uses tertiary structure and key amino acid residues
based on biologically significant patterns.
ENZYME retrieves an enzyme’s recommended name,
alternative names, catalytic activity, cofactors, human genetic
diseases, and cross-references.
SWISS-MODEL holds comparative protein models that do
not have a known 3-D structure.
Basic Local Alignment Search Tool (BLAST) uses
protein sequence to search and analyze the sequences of
interest; locates similar protein sequences: sequence
alignments.
Protein Data Bank (PDB) is a repository for experimentally
determined protein 3-D structures.
Sequence Alignment and Modeling System with
Hidden Markov Models (SAM)-T02 provides sequence
alignment from the target sequence to all templates in steps:
1. Find sequences similar to the target sequence.
2. Predict the secondary structure.
3. Find probable templates for threading.
4. Align the target with the templates.
5. Construct a fragment library for the target.
6. Build a 3-D model of the target.
Threading different proteins that have similar structures
1. Creates pseudo-protein models based on solved proteins.
2. Calculates energy value for the pseudo-protein models.
3. Ranks the alignments based on that energy value.
Sequence Alignment
Alignment based on evolutionary history is done to amino
acid residues of target protein. The types of alignment are:
a) Global alignment of regions that lack similarity and then
search for similar regions.
b) Local alignment in regions with significant similarity first,
and then align regions of optimally aligned residues.
To prepare sequences a database Sequence to Coordinates
(S2C) is used to examine the differences that originate from
the mutagenesis studies.
Alignment programs differ in the methods used but they
score or evaluate the final alignment using gap penalties,
similarity matrices and alignment scores.
Similarity Matrices describe the probability of a specific
amino acid residue mutating to a different residue type.
Common similarity matrices include :
1. Point-Accepted Mutation per 100 amino acid
residues (PAM), is based on the probability of an amino
acid residue mutating to another amino acid residue.
2. BLOck SUstitution Matix (BLOSUM) matrices is
similar to PAM but uses more diverse set of sequences.
3. Gonnet similarity matrices index and reorganize amino
acids using a tree on small cluster of computers.
Clustal is an alignment program that aligns large sequences
of varying similarity quickly. Sequences are progressively
aligned based on the branching order in the phylogenetic tree.
Tree-Based Consistency Objective Function for
Alignment Evaluation (T-Coffee) is a method to rectify
progressive-alignment (heuristic) methods where errors in the
first alignment cannot be corrected as other sequences are
added to the alignment. It suffers from greediness, its
inability to correct errors (addition or extension of a gap).
Divide-and-Conquer Alignment (DCA) method aligns
sequences simultaneously. It uses the multiple sequence
simultaneously (MSA) methodology.
Selecting Templates and Improving Alignments
The first step is to improve the alignment and select the
template. This is where the sequence of interest (target) and
other sequences and structures (template) are aligned.
Afterwards, the best templates are chosen based on
evolutionary distance as determined by a phylogenic tree.
Selecting Templates: structure for a protein model is done
by considering R-factor (residual index), the value that relates
how well predicted structure matches experimental electron
density maps.
Improving Sequence Alignment With Primary and
Secondary Structure Analysis is used to reveal regions
rich in proline, glutamic acid, serine, and threonine (PEST
regions)  locate sequence repeats; predict percentage of
buried versus accessible residues; and provide information
about protein’s isoelectric point.
Pattern and Motif-Based Secondary Structure
Prediction: AA sequence  3D structure. Well-known
pattern and motif-based secondary structure prediction
methods include PSIPRED, GenTHREADER, PREDATOR,
PROF, MEMSAT, and PHD.
Constructing Protein Models
Protein Model Refinement
Side-Chains with Rotamer Library (SCWRL)
determines the most likely side-chain conformations by
1) Reading the initial structure and determining possible low
energy side-chain conformations (rotamers).
2) Defining disulfide bridges and performing a dead-end
elimination to get rid of rotamers.
3) Constructing a residue graph and determining the rotamer
clusters and outputing the final structure.
Molecular Mechanics (MM) is a method that removes
repulsive contacts between side chains by allowing the side
chains to relax to low-energy rotamers.
Molecular Dynamics (MD) simulation involves:
1. Warm-up, equilibrium, cool down
2. Sampling the trajectory during a “production” run time
period and analyzing results.
Molecular Dynamics with Simulated Annealing (MDSA) is an optimization method that works by heating a system,
samples many energy states, and then slowly cools the system
to ensure that the low-energy structures are found.
Evaluating Protein Models
Several methods exist to check imperfections in the models
including:
Satisfaction of Spatial Restraints (SSR) constructs a 3D protein model using spatial restraints based on distances,
bond angles, dihedral angles, dihedral pairs, etc.
PROCHECK which does statistical checks and indicates
regions of a protein structure that might require modification
because of nonoptimal stereochemistry.
Segment Match Modeling (SMM) constructs protein by:
1. Choosing protein template.
2. Building list of possible template matches
3. Sorting templates by best fit to target’s structure.
4. Using probabilities to select the “best segment” from a low
pseudo-energy subset group.
5. Moving coordinates from best segments template protein.
Verify 3D scores 3-D models with probability table and assess
probability that each amino acid residue would occupy specific
position in the 3-D structure.
Multiple Template Method (MTM) uses solved X-ray
structures to build the target sequence’s protein model.
3D-JIGSAW creates a homology model:
1. Select and align templates, based on sequence.
2. Select template segments.
3. Create backbone (framework, scaffold).
4. Add side chains, refine and evaluate target protein model.
ERRAT examines nonbonded distances of C-C, C-N, C-O, NN, N-O, and O-O atoms.
Protein Structure Analysis (ProSa) uses potential of
mean force which is change in potential energy of a system
caused by the variation of a specific coordinate to locate the
regions of the protein structure that may contain improper or
unsuitable geometries.
Protein Volume Evaluation (PROVE) uses computed
volume of individual atoms as a means of evaluating the
viability of a protein model.
Model Clustering Analysis uses NMRCLUST,
NMRCORE, and OLDERADO which are programs that aid in
the superposition and clustering of protein structure.
References
Figure 2 Peptide bonds create rigid plates which
rotate about phi and psi.
Figure 3 A Ramachandran plot for the
tripeptide in Figure 2.
[1] Esposito, E. X.; Tobi, D.; Madura, J. D. “Comparative Protein Modeling” Reviews in Computational
Chemistry, Volume 22, 2006, Wiley-VCH, John Wiley & Sons, Inc. – to be published.
[2] Ramachandran Plot and analine structure: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/AAA/AAA.html