Transcript Document

Protein Classification II
CISC889: Bioinformatics
Gang Situ
04/11/2002
Parts of this lecture borrowed
from lecture given by Dr. Altman
Outline
1. Terminology
2. Classes of protein structures
3. Why do we need to align structures
4. Viewing protein structures and
5. How to recognize structural similarities
6. Algorithms
7. Summary
2
Terminology
Tertiary structure, three-dimensional
Class
- similar 2° structure
- all a, all b, a + b, a/b
Fold
- major structural similarity
- similar arrangement of 2°
Superfamily (topology)
- probable common ancestry
Family
- clear evolutionary relationship
- sequence similarity > 25%
Individual Protein
3
Class of Protein Structure
1. Class a
2. Class b
3. Class a / b
4. Class a + b
5. Multidomain proteins
6. Membrane and cell surface
proteins
7. And more …
4
Structure of a class proteins
5
Structure of b class proteins
6
Structure of a/b class proteins
7
Structure of a + b class proteins
8
Structure of membrane proteins
9
What is structure alignment
•In performing, the three-dimensional structure of
one protein domain is superimposed upon the a
second protein domain, to achieve minimal RMS
•To discover structural similarity
10
Why Align Structures
1. For homologous proteins (similar ancestry),
this provides the “gold standard” for
sequence alignment--elucidates the common
ancestry of the proteins.
2. For nonhomologous proteins, allows us to
identify common substructures of interest.
3. Allows us to classify proteins into clusters,
based on structural similarity.
11
Evaluating Structural Alignments
to be considered:
1. Number of amino acid correspondences created.
2. RMSD of corresponding amino acids
3. Percent identity in aligned residues
4. Number of gaps introduced
5. Size of the two proteins
6. Conservation of known active site environments …
There are no universally agreed upon criteria. As
usual, it depends on what you are using the
alignment to do.
12
Methods
Protein
sequence1
Database
Similarity
search2
Align w/
known
structure?3
No
Protein family
analysis4
Yes
Tertiary
comparative
modeling8
Predicted tertiary
structure
Yes
Tertiary
Structure
analysis9
13
No
Predicted
Structure?7
Yes
Relationship
To know
structure?5
No
Structural
analysis6
Viewing Protein Structures
•Chime http://www.umass.edu/microbio/chime/
A Web browser plug-in to display and manipulate structures inside a Web
page.
•Cn3da http://www.ncbi.nlm.nih.gov/Structure/
•Provides viewing of three-dimensional structures from Entrez and MMDBa.
Cn3D runs on Windows, MacOS, and Unix; simultaneously displays structural
and sequence alignments; can show multiple superimposed images from NMR
studies.
•Mage http://kinemage.biochem.duke.edu/ (see Richardson and Richardson 1994)
•Standard molecular viewing features with animation and kaleidoscope
effects.
•Rasmolb http://www.umass.edu/microbio/rasmol/
•Most commonly used viewer for Windows, MacOS, UNIX, and VMS operating
systems. Performs many functions.
•Swiss 3D viewer, Spdbv http://www.expasy.ch/spdbv/mainpage.html (Guex and
Peitsch 1997)
•Protein models can be built by structural alignments; calculates atomic
angles and distances, threading, energy minimation, and interacts with the
Swiss Model server.
14
Protein Structure Classification Databases
•SCOP -- structural classification of proteins
•FSSP -- fold classification and multiple structure
alignments
•CATH -- structural classification of proteins
•MMDB – by VAST program
•SARF
15
Alignment of Protein Structures
• Difference: Sequence vs. Structural similarity.
Indicator to evolutionary relationship?
• More difficult to align structures
1. Similar structure may form by many
different foldings of the amino acid Ca
2. Although the local environments of many
molecules in two proteins may be similar,
there may also be some local differences.
16
How to recognize structural similarities
1. By eye
2. Algorithmically
• point-based methods use properties of
points (distances) to establish
correspondences
•
17
secondary structure-based methods use
vectors representing secondary structures
to establish correspondences.
Align Structures by Secondary Structures
18
Three prototypical methods
1. STRUCTAL, uses dynamic programming iteratively
to refine an arbitrary starting alignment.
2. DALI, Uses distance matrix to find similar patterns of
distances, indicating correspondences.
3. LOCK, uses vectors associated with secondary
structures to do quick screen for similar structures.
19
STRUCTAL
Uses dynamic programming iteratively to refine an arbitrary starting
alignment.
STEPS:
1. Start with any set of correspondences between two structures
(sequence
alignment, secondary structure alignment, by eye, random).
2. Compute a score matrix by computing a score between all pairs of
points
based on their distance.
3. Trace back through the score matrix to find a new set of
correspondences that maximizes the score (standard DP)
4. Iterate 2 and 3 until score doesn’t change.
Note: heuristic, no guarantees of success, depends on quality
of starting structure.
20
Scoring in STRUCTAL
Need to find a score that is maximal when alignment
is good (good distances are small). Also may want
to include other computable attributes of the
point.
Where M is maximum score desired, d is the measured value (of distance
or some other attribute), and do is value at which score is 0. All values
between do and d get some “credit” but values less than do are penalized.
21
Distance Matrix
•Similar to a dot matrix to identify the atoms that
lie most closely together
•If two proteins have a similar structure, the
graphs of these structures will be
superimposable.
22
DALI
Uses distance matrix to find similar patterns of distances,
indicating correspondences.
STEPS:
1. Systematically look through 2 distance matrices to
find pairs of segments with similar pattern of
distances. Provides pairs of similar segments.
2. Assemble pairs into larger sets, to maximize the
number of atoms and minimize the RMS distance
between them.
The assembly step is done in a random fashion, since
the search space is too large.
23
DALI
24
DALI
25
DALI
26
Fast Structural Search based on Secondary
Structure Analysis
Steps for LOCK
1. Define local secondary structures.
2. Find an initial superposition by using DP (and score
functions shown) to align secondary structure vectors.
3. Use greedy algorithm to find nearest neighbors and
minimize RMSD.
4. Prune the atoms to get core with minimal RMSD
27
Summary
1. Structural alignment is a key activity,
combinatorially expensive, used for :
•
Gold standard for alignments
•
Elucidating evolutionary relationships
•
Creating classifications of protein structure
2. Multiple methods exist, often based on a basic DP
approach including
28
•
Analysis of distances
•
Analysis of vectors
•
Combinations of both
Summary
1. STRUCTAL – dynamic programming using a
distance metric
2. DALI – analysis of distance maps
3. LOCK – analysis of secondary structure vectors,
followed by refinement with distances
29