dimaio.icml03
Download
Report
Transcript dimaio.icml03
Using Pictorial Structures to Identify
Proteins in X-ray Crystallographic
Electron Density Maps
Frank DiMaio
Jude Shavlik
George N. Phillips, Jr.
[email protected]
[email protected]
[email protected]
ICML Bioinformatics Workshop
21 August 2003
Task Overview
Given
• Electron density for a
region in a protein
• Protein’s topology
Find
• Atomic positions of
individual atoms in the
density map
Pictorial Structures
A pictorial structure
is…
a collection of image
parts
together with…
a deformable
conformation of these
parts
Pictorial Structures
Formally, a model consists of
Set of parts V={v1, …, vn}
Configuration L=(l1, …, ln)
Edges eij E, connect neighboring parts vi, vj
v1
e13
Appearance parameters Ai for each part
Connection parameters Cij for each edge
e23
v3
– Explicit dependency between li, lj
– G=(V,E) forms a Markov Random Field
v2
e35
e34
v4
v5
e46
v6
Matching Algorithm Overview
Want configuration L of model Θ maximizing
P(L|I,Θ)
P(I|L,Θ) · P(L|Θ)
P(I|L,Θ) = Πi P(I|li,Θ) =
1
Z1
- Σi matchi(li)
e
P(L|Θ) = Π (v ,v )E P(li,lj|Cij) =
i
j
1
Z2
- Σ(vi,vj)E dij(li,lj)
e
Equivalent to minimizing
Σi matchi(li) + Σ(vi,vj)E dij(li,lj)
Linear-Time Matching Algorithm
A Dynamic Programming implementation runs in
quadratic time
Requires tree configuration of parts
Felzenszwalb & Huttenlocher (2000) developed
linear-time matching algorithm
Additional constraint on part-to-part cost function dij
Basic “Trick”: Parallelize minimization computation over
entire grid using a Generalized Distance Transform
Pictorial Structures for Map Interpretation
Basic Idea: Build pictorial structure that is able to
model all configurations of a molecule
Each part in “collection of parts” corresponds to an atom
Model has low-cost conformation
for low-energy states of the molecule
The Screw-Joint Model
Ideally, we would have
cost function = atomic energy
Problem: Impossible to represent atomic
energy function using pairwise potentials while
maintaining tree-structure
Solution: screw-joint model
Ignore non-bonded interactions
Edges correspond to covalent bonds
Allow free rotation around bonds
Screw-Joint Model Details
Each part’s configuration has six params (x,y,z,α,β,γ) with
(x,y,z) is part’s position
αi
α is part’s rotation (about bond connecting vi and vj)
(β,γ) is part’s orientation
(βi,γi)
(βj,γj)
vi
vj
vj
(xj,yj,zj)
αj
vi
(xi,yi,zi)
(xij,yij,zij)
Part-to-part cost function dij based on child’s deviation from ideal
Matching cost function matchi based on 3x3x3 template match
Pictorial Structures for Map Interpretation
Ideally, we would …
Build pictorial structure for the entire protein
Run the matching algorithm to get best layout
However, computationally infeasible
Instead, we use two-phase algorithm that …
a)
b)
computes best backbone trace
computes best sidechain conformation
(current focus)
Sidechain Refinement
Assume we have a rough Cα trace of the protein
Next use pictorial structure matching to place sidechains
Walk along chain one residue at a time, placing individual atoms
Cα, ALA_82
Cα, MET_80
Cα, ARG_81
Cα, PRO_83
Sidechain Refinement
Given:
residue type
approximate Cα locations
Find: most likely location for sidechain atoms in the residue
Example Alanine
N
C-1
O
N
Cα
Cα-1 O-1
C
Cβ
N
O
O
N+1
Cα+1
Matching
algorithm
Learning Model Parameters
O
N
N
N
O
Cα
Cβ
Averaged 3D Template
N
N
C-1
O
Alanine Cα
C
Cα
Cβ
C
N+1
Cα
r = 1.51
θ = 118.4°
φ = -19.7°
Canonic Orientation
r = 1.53
θ = 0.0°
φ = -19.3°
Cβ
C
Averaged Bond Geometry
Soft Maximums
Sometimes we may get
an optimal match like
the one to the right
When this occurs, explore
the space of non-optimal
solutions via soft
maximums in DP
Basic Idea: Take a path
with probability inversely
proportional to its cost
ACTUAL
PREDICTED 1
Soft Maximums
Figure to the right
shows soft maximums
Red molecule eventually
found
Annealing increases
“softness” until legal
structure found
PREDICTED 2
ACTUAL
Legal structure may not
be “right”
PREDICTED 1
Results
Only sidechain refinement implemented & tested
Experimental Methodology
Assume Cα’s known to within 2Å
Trained on 1.7 Å resolution protein,
tested on 1.9 Å resolution protein
Templates built for ALA, VAL, TYR, LYS
Model Parameters
Grid spacing of 0.5 Å within diameter 10 Å sphere
Rotational discretization:
12 rotational steps
84 orientations
Sidechain Placement
Compared predicted vs.
actual location for 599 atoms
on testset protein
1
29.9% atoms within 0.5Å
% atoms
0.8
0.6
0.4
72.3% atoms within 1.0Å
0.2
93.0% atoms within 2.0Å
0
0
2
4
6
Accuracy (angstroms)
Recall 0.5Å grid spacing
8
Predictive Accuracy Task
We used DP matching score as
a predictor of amino acid type
Tested 49 ALA, LYS, TYR, VAL
residues
Highest scoring normalized
template determined type
61.2% accuracy
(majority classification = 33%)
The Good…
PREDICTED vs. ACTUAL
LYSINE
LYSINE
TYROSINE
VALINE
… and the Bad
PREDICTED vs. ACTUAL
LYSINE
VALINE
TYROSINE
ALANINE
Future Work
Implement & integrate backbone tracing algorithm, to
create complete two-tiered solution
Better strategies to handle illegal molecule configurations
perturbation of branches involved in collisions
more accurate representation of atomic energy function,
e.g. torsion angle
Better match function … make use of previous work?
More tests (larger training set, higher resolution)
Acknowledgements
NLM grant 1T15 LM007359-01
NLM grant 1R01 LM07050-01
NIH grant P50 GM64598.