Bayesian Refinement of Protein Functional Site Matching

Download Report

Transcript Bayesian Refinement of Protein Functional Site Matching

Kanti V Mardia,
Vysaul B Nyirongo*,
Peter J Green,
Nicola D Gold,
David R Westhead
Presented by Deephan, Mohan
 Background
 Conventional
 Bayesian
Methods
Refinement
 Results
 Conclusion
Disclaimer : Contrary to the assumption made by the authors, the paper
presenter does have a thorough understanding of all the concepts
related to the topics of advanced statistical, graph theory and structural
genomics discussed in the paper..
 Structural
Genomics
• Structural Site comparison
• Functional Site comparison
 Knowledge based methods
 Similarity Search Algorithms

Modeled as a graph theoretic problem
• Shape analysis of Proteins
 Crucial for prediction of molecular interactions
 Infer functional relationship of proteins
 Classification of Binding Patterns

Resource: SITESDB Database
 Contains Protein Structural data
 Entries formed from PDB (Protein Data Bank)
 Graph Similarity
• Objective:
Problem
Matching Functional sites -Comparing amino
acid configurations (Cα and Cβ atoms)
• Functional site – Graph
• Amino acid positions – Vertices

Refining the Graph Match
• Application of Bayesian Strategy
• Markov Chain Monte Carlo (MCMC) procedure
Bayesian Inference:

Complete Distribution of matches

Solution space

Noise Adaptation

Flexibility

Edge over combinatorial methods
 Common
Tool used in Statistical
Inference
 Based on Posterior Joint Distribution
 Product of Prior density and Likelihood
Biologically speaking,
• Prior Density - Distribution of Transformation
Parameters
• Likelihood - Related to matches between
functional sites


Functional sites X and Y represented as
Graphs G1 and G2
Vertex sets
V1 = {Xj, j = 1, 2, ..., m} , V2 = {Yk, k = 1, 2, ..., n}
Xj , Yk - represents coordinates of amino acids in jth
and kth positions of X,Y
x1j, y1k – Cα coordinates for X,Y
x2j, y2k – Cβ coordinates for X,Y
x1 = {x1j : j = 1 ..., m}, x2 = {x2j : j = 1 ..., m}
y1 = {y1k : k = 1 ..., n}, y2 = {y2k : k = 1 ..., n}
 Objective:
• Creation of Vertex Product Graph (Hv)
 Hv = G1 ○v G2
 VH=V1 x V2
An edge between two vertices vh = (Xj, Yk), vh' =
(Xj', Yk') ∈ VH exists for j ≠ j' and k ≠ k' when
 1. the absolute difference between distances
|x1j - x1j'| and |y1k - y1k'| and
 2. also the absolute difference between distances
|x2j - x2j'| and | y2k - y2k'|
are both less than 1.5Å (matching distance
threshold).


Matching between amino acids X and Y
represented by matrix M,
Mjk =
1 if jth amino acid corresponds to kth amino acid
0 otherwise

Transformations to bring the configurations into
alignment is given by
•
xij = Ayik + τ for Mjk = 1, i = 1, 2
A – Rotation Matrix, τ – Translation vector
 Joint
Posterior Distribution:
 p(A), p(τ)
and p(σ) denote prior distributions
for A, τ and σ
 |A| - Jacobian Transformation
 presence of Gaussian noise N(0, σ2) in in the
atomic positions for x1j and y1k
 Side
chains orientation:
Extending the model by taking into
account the relative orientation of Cα and
Cβ in matching amino acids
 Markov
Chain Monte Carlo (MCMC) –
used to sample the full joint distribution
function p(M, A, τ, σ, x1, y1, x2, y2)
 p(M, A, τ, σ, x1, y1, x2, y2)
– function of
RMSD and angle for orientation
difference between amino acids
 RMSD
– Root Mean Square Distribution
 Matches
of lower RMSD over larger
numbers of matching residues are more
statistically significant
 MCMC
Refinement improved the RMSD
(reduction) and the number of matching
residues ( increase)

Decision tree for refining the graph solution by the MCMC method. Boxes with curved corners
show processes and their output while boxes with sharp corners are for branching conditions.
The procedure starts with graph solution MG. The graph solution's RMSD and number of
matches are denoted by RMSDG and LG respectively. MCMC is re-iterated until the MCMC
solution: MB is better. The RMSD and number of matches for MB are denoted by RMSDB and LB
respectively. MB and MG are compared using 1) RMSDs and the number of matches or 2) Pvalues for MG and MG, denoted by PG and PB respectively.
 Two


Binding Sites:
Alcohol dehydrogenase structure
(60 amino acids)
17 – β hydroxysteroid dehydrogenase
( 63 amino acids)
4 Matching Studies were performed
 Each study was performed with and
without considering the physicochemical properties of amino-acids.

Case 1: Site 1hdx_1
matching against its own
SCOP family
 125/145 sites produced
significant matches –
increased to 131/145 (after
refinement)
 RMSD is improved
from > 1.5Å to less than
1Å
 Increase in the number
of matching residues
Case 2: 17 – β
hydroxysteroid
dehydrogenase and
family
After MCMC
Refinement step
significant matches
increased from 248 to
318 of 326 sites
Increased number of
matching residues at a
similar RMSD
 RMSD improvement
in minority of the sites




Case 3: alcohol
dehydrogenase and
superfamily
Matching sites
increased form 200 to
324
Case 4: Alcohol
dehydrogenase and
FAD/NAD(P)-binding
domain
12 sites improved after
MCMC refinement
 MCMC
refinement step provides significant
improvement over Graph Matching
Techniques
 Success
– Lack of dependence on strict
distance matching criteria
 Computationally
 Refinements
expensive
adapts to shape variations in
binding sites
Thank You!!!!