Sequence order independent structural alignment
Download
Report
Transcript Sequence order independent structural alignment
Sequence order independent
structural alignment
Joe Dundas, Andrew Binkowski, Bhaskar DasGupta, Jie Liang
Department of Bioengineering/Bioinformatics, University of Illinois at Chicago
Background
o Extended Central Dogma of molecular biology
DNA RNA primary structure 3D structure function
o Evolution conserves the 3D structure more than
amino acid sequence.
o Structural similarity often reflects a common
function or origin of proteins.[1]
o It is useful to classify proteins based on their
structures. (SCOP, CATH, FSSP).
o Many methods for structure alignment have
been reported. (CE, DALI, FAST, Matchprot)
Circular Permutation
o Ligation of the N and C termini, and
subsequent cleavage elsewhere.
o In 1979, first natural circular
permutation was observed in favin
vs. concanavalin A.[2]
o In 1983, the first engineered circular
permutation was performed on
bovine pancreatic trypsin inhibitor.[3]
o Since, studies have shown that
artificially permuted proteins are
able to fold into a stable structures
that are similar to the native
protein.[4]
o Circular permutations have been
discovered in lectins, β-glucanases,
swaposin…[5]
Uliel S., Fliess A., Amir A., Unger R. (1999)[6]
Uliel S., Fliess A., Unger R. (2001)[7]
Alignment Problem
o Most structural alignment methods rely on the
structural units of each protein to align
sequentially i.e. CE, FAST.
o Some newer methods will perform non-sequential
alignments i.e. Dali, Matchprot.
After explaining our method, will we compare the
results against Dali and Matchprot.
Our Method
• We exhaustively fragment protein A and protein B
into lengths ranging from 4 to 7 residues.
Notation: fragment λa = (a1, a2), where a1 and a2 are the beginning
and ending positions relative to the N termini of protein A.
Πa = {λa,1, λa,2,… λa,n} is the set of all fragments from protein A.
La,i is the length of fragment Πa,I
• Each fragment from protein A is aligned to all fragments
of protein B if La,I = Lb,j, forming a set of Aligned
Fragment Pairs ( Λ Πa x Πb ).
•
A similarity function σ maps Λ
Similarity Function
( ) ( ss _ corr ( ) * rmsd ( )) * seq _ sim( )
i
i
i
i
ss _ corr (i ) = 1.0 * C(i) +5.0 * H(i) +1.25 * E(i) +1.0 * M(i )
H(i ) is the percentage of aligned Helical residues.
E(i ) is the percentage of aligned Strand residues.
C(i ) is the percentage of algined Other residues.
M(i ) is the percentage of Mismatched aligned residues.
rmsd (i) is the optimal root mean square distance of the aligned fragment pair.
seq _ sim(i ) is the sequence similarity of the aligned fragment pair.
The parameters , , were empirically set to 100, .5, and 1, respectively.
All Λi with σ(Λi) > Threshold are used to create a conflict graph.
Conflict Graph
• Two fragment pairs Λi and Λj are in conflict
if any residue in λi,A is also in λj,A or any
residue in λi,B is also in λj,B.
Simplified Example
δ1
Reference Protein Residues
Conflicts can be
found by a vertex
sweep.
δ2
δ3
δ4
Query Protein Residues
LP Formulation
maximize:
( )* x
i
i
i
x is a relaxed integer between 0 and 1
0 = don’t use fragment
1 = use fragment
Subject to:
y
y
t
, a
1
, b
a
1
No conflicting residues in
query or reference protein.
t b
y , a - x 0
y , b - x 0
0 y , a,y , b,x 1
Consistency between variables
All variables are between 0 and 1
Solve using linear programming package
Local Conflict Number
LP will assign a number between 0 and
1 for each xδ.
δ4
σ(Λ4) = 15
x Λ4 = 0.01
ΘΛ4 = 0.26
δ1
For each Λ compute a local conflict
number Θ
i =
xj
j Neigh(i)
σ(Λ3) = 20
x Λ3 = 0.6
ΘΛ3 = 0.85
Define δmin as the vertex with the
smallest local conflict number.
Assign a new σ
δ3
σ(Λ2) = 20
x Λ2 = .25
ΘΛ2 = 1.46
σ(Λ4) = 0
δ4
δ1
(i) - (min) if i Neigh(min)
(i) =
otherwise
(i)
Remove all vertices with σ ≤ 0 from Λ
and push them onto a stack Ω in
descending order of σ
δ2
σ(Λ1) = 50
x Λ1 = .85
ΘΛ1 = 1.10
δ3
σ(Λ3) = 20
δ2
σ(Λ1) = 50
σ(Λ2) = 15
Repeat
Repeat LP formulation until all vertices have been pushed onto the stack Ω.
Begin with 5 empty alignments.
While the stack is not empty, retrieve a aligned pair by popping the stack.
Insert it into each non-empty alignment if and only if:
1.
No residue conflicts occur.
2.
The global RMSD does not change by some threshold.
If it can not be inserted into any alignment, insert it into an available empty
alignment.
Determine which alignment with highest similarity score.
Results – Circular Permutation?
1jqsC
70s ribosome functional complex
Fold: Ribosome & Ribosomal fragments
RMSD: 2.3194
2pii
PII (Product of glnB)
Fold: Ferredoxin-like
Results – Circular Permutation
1iudA
Aspartate Racemase
Fold: ATC-like
1h0rA
Type II 3-dehydrogenate dehydralase
Fold: Flavodoxin
Results
1fe0
ATX1 Metallochaperone
Fold: ferredoxin-like
1vet
Mitogen activated protein kinase
Results
1e50
Core binding factor
Fold: Core binding factor beta
1pkv
Riboflavin Synthase