Transcript Slide 1
Model Database
Scene
Recognition
Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Geometric Matching task =
Geometric Pattern Discovery
Inexact Alignment.
Simple case – two closely related proteins with the same number of
amino acids.
T
Question: how to measure
alignment error?
Superposition - best least squares
(RMSD – Root Mean Square Deviation)
Given two sets of 3-D points :
P={pi}, Q={qi} , i=1,…,n;
rmsd(P,Q) = √
S i|pi - qi |2 /n
Find a 3-D rigid transformation T* such that:
rmsd( T*(P), Q ) = minT
√ S i|pi - qi |2 /n
A closed form solution exists for this task.
It can be computed in O(n) time.
Structure Alignment
(Straightforward Algorithm)
• For each pair of triplets, one from each
molecule which define ‘almost’ congruent
triangles compute the rigid transformation
that superimposes them.
• Count the number of point pairs, which are
‘almost’ superimposed and sort the
hypotheses by this number.
• For the highest ranking hypotheses improve the
transformation by replacing it by the best RMSD
transformation for all the matching pairs.
• Complexity : assuming order of n points in both
molecules - O(n8) .
O(n4) if one exploits protein backbone geometry.
Accuracy improvement during detection of 3D
transformation.
Instead of 3 points use more. How many?
Align any possible pair of fragments - Fij(k)
i+k-1
i
j
j+k-1
Accept Fij(k) if rmsd(Fij(k)) <e.
Complexity O(n3 n).
(For each Fij(k) we need compute its rmsd)
can be reduced to O(n3)
Improvement : BLAST idea - detect short
similar fragments, then extend as much as
possible.
k+l-1
k
t
i-1
i+1
i
j-1
j
j+1
ai-1 ai ai+1
bj-1 bj bj+1
Extend while: rmsd(Fij(k)) <e.
Complexity: O(n2)
t+l-1
Protein Structural Alignment based on
Geometric Hashing
Sequence Based
Structure Alignment
•Run pairwise sequence alignment.
•Based on sequence correspondence compute 3D
transformation (least square fit can be applied).
•Iteratively improve structural superposition.
Alignment of Flexible
Molecular Structures
Motivation
• Proteins are flexible. One would like to align
proteins modulo the flexibility.
• Hinge and shear protein domain motions
(Gerstein, Lesk , Chotia).
• Conformational flexibility in drugs.
Motivation
Flexible protein alignment
without prior hinge knowledge
FlexProt - algorithm
– detects automatically flexibility regions
– exploits amino acid sequence order
Examples
Experimental Results
• Task: largest flexible alignment by
decomposing the two molecules into a
minimal number of rigid fragment pairs
having similar 3-D structure.