Tracing Protein Backbones in Electron Density Maps using a
Download
Report
Transcript Tracing Protein Backbones in Electron Density Maps using a
A Probabilistic Approach to
Protein Backbone Tracing in
Electron Density Maps
Frank DiMaio, Jude Shavlik
Computer Sciences Department
George Phillips
Biochemistry Department
University of Wisconsin – Madison
USA
Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006),
Fortaleza, Brazil, August 7, 2006
X-ray Crystallography
X-ray beam
FFT
Protein
Crystal
Collection
Plate
Electron
Density Map
(“3D picture”)
Given: Sequence + Density Map
O
O
N
O
N
N
O
N
O
N
O
N
O
N
H
Sequence + Electron Density Map
Find: Each Atom’s Coordinates
O
O
N
O
N
N
O
N
O
N
O
N
O
N
H
Our Subtask: Backbone Trace
Cα
Cα
Cα
Cα
The Unit Cell
3D density function ρ(x,y,z) provided over unit cell
Unit cell may contain multiple copies of the protein
The Unit Cell
3D density function ρ(x,y,z) provided over unit cell
Unit cell may contain multiple copies of the protein
Density Map Resolution
2Å
4Å
3Å
ARP/wARP
TEXTAL
(Perrakis et al. 1997)
(Ioerger et al. 1999)
Resolve
(Terwilliger 2002)
Our focus
Overview of ACMI (our method)
Local Match
Algorithm searches for sequence-specific
5-mers centered at each amino acid
Many false positives
Global Consistency
Use probabilistic model to filter false positives
Find most probable backbone trace
5-mer Lookup and Cluster
…VKH V LVSPEKIEELIKGY…
PDB
Cluster 1
wt=0.67
Cluster 2
wt=0.33
NOTE: can be done in precompute step
5-mer Search
6D search (rotation + translation) for
representative structures in density map
Compute “similarity”
t ( x) frag ( y) frag ( y) map ( x y)
2
y
Computed by Fourier convolution (Cowtan 2001)
Use tuneset to convert similarity score to probability
Convert Scores to Probabilities
NEG
POS
match to tuneset
score
distributions
5-mer representative
Bayes’
rule
probability
distribution
over unit cell
P(5-mer at ui | Map)
search density map
scores ti (ui)
In This Talk…
Where we are now
For each amino acid in the protein, we have a
probability distribution over the unit cell
P(ui | Map)
Where we are headed
Find the backbone layout maximizing
P(u | Map)
P(conformation {ui , u j })
i
AAs i
AA-pairs i, j
Pairwise Markov Field Models
A type of undirected graphical model
Represent joint probabilities as
y
product of vertex and edge potentials
Similar to (but more general than)
u1
u2
u3
Bayesian networks
p(U | y )
edges s t
ψst (us ,ut )
vertices s
s (us | y )
Protein Backbone Model
ALA
GLY
LYS
LEU
Each vertex is an amino acid
Each label ui xi , qi
is location + orientation
Evidence y is the electron density map
Each vertex (or observational) potential
comes from the 5-mer matching
i (ui | y )
Protein Backbone Model
ALA
GLY
LYS
LEU
Two types of edge (or structural) potentials
Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation
Protein Backbone Model
ALA
GLY
LYS
LEU
Two types of structural (edge) potentials
Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation
Occupancy constraints ensure nonadjacent amino
acids do not occupy same 3D space
Backbone Model Potential
p(u | Map)
ψadj (ui ,u j )
adjacent AAs
i j
ψocc (ui ,u j )
nonadjacent AAs
i j
i (ui | Map)
amino acid i
Constraints between adjacent amino acids:
ψadj (ui ,u j )
=
px (|| xi x j ||)
x
p (ui ,u j )
Backbone Model Potential
p(u | Map)
adjacent AAs
i j
ψadj (ui ,u j )
ψocc (ui ,u j )
nonadjacent AAs
i j
i (ui | Map)
amino acid i
Constraints between nonadjacent amino acids:
0 if || xi x j || K
ψocc ui ,u j
1 otherwise
Backbone Model Potential
p(u | Map)
ψadj (ui ,u j )
adjacent AAs
i j
nonadjacent AAs
i j
ψocc (ui ,u j )
i (ui | Map)
amino acid i
Observational (“amino-acid-finder”) probabilities
ψi ui | Map
Pr(5mer si -2 ...si 2 at ui )
Probabilistic Inference
Want to find backbone layout that maximizes
adjacent AAs
i j
ψadj (ui ,u j )
ψocc (ui ,u j )
nonadjacent AAs
i j
i (ui | Map)
amino acid i
Exact methods are intractable
Use belief propagation (BP) to approximate
marginal distributions
pi (ui | Map)
p(U | Map)
uk , k i
Belief Propagation (BP)
Iterative, message-passing method (Pearl 1988)
n
A message, mi j , from amino acid i to
amino acid j indicates where i expects to find j
n
An approximation to the marginal (or belief) bi ,
is given as the product of incoming messages
Belief Propagation Example
ALA
0
2
ALA
b
( x ALA | y )
GLY
12
mGLY
( xGLY
)
ALA
GLY
ALA
ALA
10
bGLY ( xGLY | y )
Technical Challenges
Representation
of potentials
Store Fourier coefficients in Cartesian space
At each location x, store a single orientation r
Speeding
X
up O(N2X2) naïve implementation
= the unit cell size (# Fourier coefficients)
N = the number of residues in the protein
Speeding Up O(N2X2) Implementation
O(X2) computation for each occupancy message
O(N2) messages computed & stored
Each message must integrate over the unit cell
O(X log X) as multiplication in Fourier space
Approx N-3 occupancy messages with a single message
O(N) messages using a message product accumulator
Improved implementation O(NX log X)
1XMT at 3Å Resolution
prob(AA at location)
HIGH
0.82
0.17
LOW
1.12Å RMSd
100% coverage
1VMO at 4Å Resolution
prob(AA at location)
HIGH
0.25
0.02
LOW
3.63Å RMSd
72% coverage
1YDH at 3.5Å Resolution
prob(AA at location)
HIGH
0.27
0.02
LOW
1.47Å RMSd
90% coverage
Experiments
Tested ACMI against other map interpretation
algorithms: TEXTAL and Resolve
Used ten model-phased maps
Smoothly diminished reflection intensities
yielding 2.5, 3.0, 3.5, 4.0 Å resolution maps
RMS Deviation
Cα RMS Deviation
12
AC
MI
ACMI
ACMI
10
Te
xta l
Textal
8
RResolve
e s o lve
6
4
2
0
2 .0
2 .5
3 .0
3 .5
4 .0
Density Map Resolution
4 .5
Model Completeness
% residues identified
% chain traced
100%
80%
60%
ACMI
MI
ACACMI
40%
xta l
TeTextal
20%
s o lv1 1 e
R eResolve
0%
2 .0
2 .5
3 .0
3 .5
4 .0
4 .5
100%
80%
60%
40%
20%
0%
2 .0
2 .5
Density Map Resolution
3 .0
3 .5
4 .0
4 .5
Resolve RMS Error
TEXTAL RMS Error
Per-protein RMS Deviation
16
14
12
10
8
2 .5 A
3 .0 A
6
4
3 .5 A
4 .0 A
2
0
0
2
4
6
8
10
12
14
16
16
14
12
10
8
6
4
2
0
0
ACMI RMS Error
2
4
6
8
10
12
14
16
Conclusions
ACMI effectively combines weakly-matching
templates to construct a full model
Produces an accurate trace even with
poor-quality density map data
Reduces computational complexity from
O(N2 X2) to O(N X log X)
Inference possible for even large unit cells
Future Work
Improve “amino-acid-finding” algorithm
Incorporate sidechain placement /
refinement
Manage missing data
Disordered regions
Only exterior visible (e.g., in CryoEM)
Acknowledgements
Ameet Soni
Craig Bingman
NLM grants 1R01 LM008796 and
1T15 LM007359