Tracing Protein Backbones in Electron Density Maps using a

Download Report

Transcript Tracing Protein Backbones in Electron Density Maps using a

A Probabilistic Approach to
Protein Backbone Tracing in
Electron Density Maps
Frank DiMaio, Jude Shavlik
Computer Sciences Department
George Phillips
Biochemistry Department
University of Wisconsin – Madison
USA
Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006),
Fortaleza, Brazil, August 7, 2006
X-ray Crystallography
X-ray beam
FFT
Protein
Crystal
Collection
Plate
Electron
Density Map
(“3D picture”)
Given: Sequence + Density Map
O
O
N
O
N
N
O
N
O
N
O
N
O
N
H
Sequence + Electron Density Map
Find: Each Atom’s Coordinates
O
O
N
O
N
N
O
N
O
N
O
N
O
N
H
Our Subtask: Backbone Trace
Cα
Cα
Cα
Cα
The Unit Cell
 3D density function ρ(x,y,z) provided over unit cell
 Unit cell may contain multiple copies of the protein
The Unit Cell
 3D density function ρ(x,y,z) provided over unit cell
 Unit cell may contain multiple copies of the protein
Density Map Resolution
2Å
4Å
3Å
ARP/wARP
TEXTAL
(Perrakis et al. 1997)
(Ioerger et al. 1999)
Resolve
(Terwilliger 2002)
Our focus
Overview of ACMI (our method)
 Local Match


Algorithm searches for sequence-specific
5-mers centered at each amino acid
Many false positives
 Global Consistency


Use probabilistic model to filter false positives
Find most probable backbone trace
5-mer Lookup and Cluster
…VKH V LVSPEKIEELIKGY…
PDB
Cluster 1
wt=0.67
Cluster 2
wt=0.33
NOTE: can be done in precompute step
5-mer Search
 6D search (rotation + translation) for
representative structures in density map
 Compute “similarity”
t ( x)    frag ( y) frag ( y)   map ( x  y) 
2
y
 Computed by Fourier convolution (Cowtan 2001)
 Use tuneset to convert similarity score to probability
Convert Scores to Probabilities
NEG
POS
match to tuneset
score
distributions
5-mer representative
Bayes’
rule
probability
distribution
over unit cell
P(5-mer at ui | Map)
search density map
scores ti (ui)
In This Talk…
 Where we are now
For each amino acid in the protein, we have a
probability distribution over the unit cell
P(ui | Map)
 Where we are headed
Find the backbone layout maximizing

  P(u | Map)   
P(conformation {ui , u j }) 


i
 AAs i

 AA-pairs i, j

Pairwise Markov Field Models
 A type of undirected graphical model
 Represent joint probabilities as
y
product of vertex and edge potentials
 Similar to (but more general than)
u1
u2
u3
Bayesian networks
p(U | y ) 

edges s t
ψst (us ,ut )

vertices s
 s (us | y )
Protein Backbone Model
ALA
GLY
LYS
LEU
 Each vertex is an amino acid
 Each label ui  xi , qi
is location + orientation
 Evidence y is the electron density map
 Each vertex (or observational) potential
comes from the 5-mer matching
 i (ui | y )
Protein Backbone Model
ALA
GLY
LYS
LEU
 Two types of edge (or structural) potentials

Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation
Protein Backbone Model
ALA
GLY
LYS
LEU
 Two types of structural (edge) potentials


Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation
Occupancy constraints ensure nonadjacent amino
acids do not occupy same 3D space
Backbone Model Potential
p(u | Map) 

ψadj (ui ,u j ) 
adjacent AAs
i j

ψocc (ui ,u j ) 
nonadjacent AAs
i j

 i (ui | Map)
amino acid i
Constraints between adjacent amino acids:
ψadj (ui ,u j )
=
px (|| xi  x j ||)
x
p (ui ,u j )
Backbone Model Potential
p(u | Map) 

adjacent AAs
i j
ψadj (ui ,u j ) 

ψocc (ui ,u j ) 
nonadjacent AAs
i j

 i (ui | Map)
amino acid i
Constraints between nonadjacent amino acids:
0 if || xi  x j || K
ψocc  ui ,u j   
1 otherwise
Backbone Model Potential
p(u | Map) 

ψadj (ui ,u j ) 
adjacent AAs
i j

nonadjacent AAs
i j
ψocc (ui ,u j ) 

 i (ui | Map)
amino acid i
Observational (“amino-acid-finder”) probabilities
ψi  ui | Map  
Pr(5mer si -2 ...si  2 at ui )
Probabilistic Inference
 Want to find backbone layout that maximizes

adjacent AAs
i j
ψadj (ui ,u j ) 

ψocc (ui ,u j ) 
nonadjacent AAs
i j

 i (ui | Map)
amino acid i
 Exact methods are intractable
 Use belief propagation (BP) to approximate
marginal distributions
pi (ui | Map)  
 p(U | Map)
uk , k  i
Belief Propagation (BP)
 Iterative, message-passing method (Pearl 1988)
n
 A message, mi  j , from amino acid i to
amino acid j indicates where i expects to find j
n
 An approximation to the marginal (or belief) bi ,
is given as the product of incoming messages
Belief Propagation Example
ALA
0
2
ALA
b
( x ALA | y )
GLY
12
mGLY
( xGLY
)
ALA
GLY
ALA
ALA
10
bGLY ( xGLY | y )
Technical Challenges
 Representation
of potentials
Store Fourier coefficients in Cartesian space
 At each location x, store a single orientation r

 Speeding
X
up O(N2X2) naïve implementation
= the unit cell size (# Fourier coefficients)
 N = the number of residues in the protein
Speeding Up O(N2X2) Implementation

O(X2) computation for each occupancy message



O(N2) messages computed & stored



Each message must integrate over the unit cell
O(X log X) as multiplication in Fourier space
Approx N-3 occupancy messages with a single message
O(N) messages using a message product accumulator
Improved implementation O(NX log X)
1XMT at 3Å Resolution
prob(AA at location)
HIGH
0.82
0.17
LOW
1.12Å RMSd
100% coverage
1VMO at 4Å Resolution
prob(AA at location)
HIGH
0.25
0.02
LOW
3.63Å RMSd
72% coverage
1YDH at 3.5Å Resolution
prob(AA at location)
HIGH
0.27
0.02
LOW
1.47Å RMSd
90% coverage
Experiments
 Tested ACMI against other map interpretation
algorithms: TEXTAL and Resolve
 Used ten model-phased maps
 Smoothly diminished reflection intensities
yielding 2.5, 3.0, 3.5, 4.0 Å resolution maps
RMS Deviation
Cα RMS Deviation
12
AC
MI
ACMI
ACMI
10
Te
xta l
Textal
8
RResolve
e s o lve
6
4
2
0
2 .0
2 .5
3 .0
3 .5
4 .0
Density Map Resolution
4 .5
Model Completeness
% residues identified
% chain traced
100%
80%
60%
ACMI
MI
ACACMI
40%
xta l
TeTextal
20%
s o lv1 1 e
R eResolve
0%
2 .0
2 .5
3 .0
3 .5
4 .0
4 .5
100%
80%
60%
40%
20%
0%
2 .0
2 .5
Density Map Resolution
3 .0
3 .5
4 .0
4 .5
Resolve RMS Error
TEXTAL RMS Error
Per-protein RMS Deviation
16
14
12
10
8
2 .5 A
3 .0 A
6
4
3 .5 A
4 .0 A
2
0
0
2
4
6
8
10
12
14
16
16
14
12
10
8
6
4
2
0
0
ACMI RMS Error
2
4
6
8
10
12
14
16
Conclusions
 ACMI effectively combines weakly-matching
templates to construct a full model
 Produces an accurate trace even with
poor-quality density map data
 Reduces computational complexity from
O(N2 X2) to O(N X log X)
 Inference possible for even large unit cells
Future Work
 Improve “amino-acid-finding” algorithm
 Incorporate sidechain placement /
refinement
 Manage missing data
Disordered regions
 Only exterior visible (e.g., in CryoEM)

Acknowledgements
 Ameet Soni
 Craig Bingman
 NLM grants 1R01 LM008796 and
1T15 LM007359