Transcript Document

Probabilistic Methods
for Interpreting
Electron-Density Maps
Frank DiMaio
University of Wisconsin – Madison
Computer Sciences Department
[email protected]
3D Protein Structure
backbone
backbone
sidechain
sidechain
C-alpha
3D Protein Structure
…
…
ALA
LEU
PRO
?
?
VAL
ARG
?
High-Throughput Structure
Determination
 Protein-structure determination important



Understanding function of a protein
Understanding mechanisms
Targets for drug design
 Some proteins produce poor density maps
 Interpreting poor electron-density maps is very
(human) laborious
 I aim to automatically interpret
poor-quality electron-density maps
…
…
Electron-Density Map
Interpretation
GIVEN: 3D electron-density map,
(linear) amino-acid sequence
Electron-Density Map
Interpretation
…
…
FIND:
All-atom Protein Model
Density Map Resolution
1.0Å
2.0Å
Morris et al. (2003)
3.0Å
Ioerger et al. (2002)
Terwilliger (2003)
My focus
4.0Å
Thesis Contributions
 A probabilistic approach to protein-backbone tracing
DiMaio et al., Intelligent Systems for Molecular Biology (2006)
 Improved template matching in electron-density maps
DiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)
 Creating all-atom protein models using particle filtering
DiMaio et al. (under review)
 Pictorial structures for atom-level molecular modeling
DiMaio et al., Advances in Neural Information Processing Systems (2004)
 Improving the efficiency of belief propagation
DiMaio and Shavlik, IEEE International Conference on Data Mining (2006)
 Iterative phase improvement in ACMI
ACMI Overview
 Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)

Independent amino-acid search

Templates model 5-mer conformational space
 Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)

Protein structural constraints refine local search

Markov field (MRF) models pairwise constraints
 Phase 3: Sample all-atom models


Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
ACMI Overview
 Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)

Independent amino-acid search

Templates model 5-mer conformational space
Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)

Protein structural constraints refine local search

Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models


Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
5-mer Lookup
…SAW C VKFEKPADKNGKTE…
Protein
DB
 ACMI searches map for each template independently
 Spherical-harmonic decomposition allows rapid search
of all template rotations
Spherical-Harmonic
Decomposition
f (θ,φ)
5-mer Fast Rotation Search
electron density map
pentapeptide fragment
from PDB (the “template”)
sampled region of
density in 5A sphere
calculated (expected)
density in 5A sphere
map-region
sampled in
spherical shells
template-density
sampled in
spherical shells
5-mer Fast Rotation Search
map-region sphericalharmonic coefficients
map-region
sampled in
spherical shells
fast-rotation
function
(Navaza 2006,
Risbo 1996)
template sphericalharmonic coefficients
correlation
coefficient
as function
of rotation
template-density
sampled in
spherical shells
Convert Scores to
Probabilities
scan density map
for fragment
Bayes’
rule
correlation coefficients
over density map ti (ui)
probability
distribution
over density map
P(5-mer at ui | EDM)
ACMI Overview
 Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)

Independent amino-acid search

Templates model 5-mer conformational space
 Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)

Protein structural constraints refine local search

Markov field (MRF) models pairwise constraints
 Phase 3: Sample all-atom models


Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
Probabilistic Backbone Model
 Trace assigns a position and orientation
ui={xi, qi} to each amino acid i
 The probability of a trace U = {ui} is
P(U | EDM)  P(u1 
 uN | EDM)
 This full joint probability intractable to compute
 Approximate using pairwise Markov field
Pairwise Markov-Field Model
ALA
GLY
LYS
LEU
SER
 Joint probabilities defined on a graph as
product of vertex and edge potentials
P(U | EDM) 
   (u | EDM)      (u , u ) 
 AAs i i i
  AAs i , j ij i j 
ACMI’s Backbone Model
ALA
GLY
LYS
LEU
SER
Observational potentials tie the map to the model
ACMI’s Backbone Model
ALA
GLY
LYS
LEU
SER
 Adjacency constraints ensure adjacent amino
acids are ~3.8Å apart and in proper orientation
 Occupancy constraints ensure nonadjacent
amino acids do not occupy same 3D space
Backbone Model Potential
p(U | EDM) 
  adj (ui , u j )    occ (ui , u j )    i (ui | EDM)
AAs i , j
|i  j | 1
AAs i , j
|i  j | 1
AAs i
Backbone Model Potential
p(U | EDM) 
  adj (ui , u j )    occ (ui , u j )    i (ui | EDM)
AAs i , j
|i  j | 1
AAs i , j
|i  j | 1
AAs i
Constraints between adjacent amino acids
 adj (ui , u j )
=
px ( || xi  x j || )
×
p (ui , u j )
Backbone Model Potential
p(U | EDM) 
  adj (ui , u j )    occ (ui , u j )    i (ui | EDM)
AAs i , j
|i  j | 1
AAs i , j
|i  j | 1
AAs i
Constraints between all other amino acid pairs
0 if ||xi  x j||  K
 occ (ui , u j )  
1 otherwise
Backbone Model Potential
p(U | EDM) 
  adj (ui , u j )    occ (ui , u j )    i (ui | EDM)
AAs i , j
|i  j | 1
AAs i , j
|i  j | 1
AAs i
Observational (“template-matching”) probabilities
 i (ui | EDM) 
Pr(5mer si 2 ...si  2 at ui )
Inferring Backbone Locations
 Want to find backbone layout that maximizes
  adj (ui , u j )    occ (ui , u j )    i (ui | EDM)
AAs i , j
|i  j | 1
AAs i , j
|i  j | 1
AAs i
Inferring Backbone Locations
 Want to find backbone layout that maximizes
  adj (ui , u j )    occ (ui , u j )    i (ui | EDM)
AAs i , j
|i  j | 1
AAs i , j
|i  j | 1
AAs i
 Exact methods are intractable
 Use belief propagation (Pearl 1988)
to approximate marginal distributions
pi (ui | EDM)  
 p(U | EDM)
uk , k  i
Belief Propagation Example
LYS31
LEU32
mLYS31→LEU32
pˆ LYS31
pˆ LEU32
Belief Propagation Example
LYS31
LEU32
mLEU32→LYS31
pˆ LYS31
pˆ LEU32
Scaling BP to Proteins
(DiMaio and Shavlik, ICDM 2006)
 Naïve implementation O(N2G2)
 N = the number of amino acids in the protein
 G = # of points in discretized density map
 O(G2) computation for each message passed
 O(G log G) as Fourier-space multiplication
 O(N2) messages computed & stored
 Approx (N-3) occupancy msgs with 1 message
 O(N) messages using a message accumulator
 Improved implementation O(NG log G)
Scaling BP to Proteins
(DiMaio and Shavlik, ICDM 2006)
 Naïve implementation O(N2G2)
 N = the number of amino acids in the protein
 G = # of points in discretized density map
 O(G2) computation for each message passed
 O(G log G) as Fourier-space multiplication
 O(N2) messages computed & stored
 Approx (N-3) occupancy msgs with 1 message
 O(N) messages using a message accumulator
 Improved implementation O(NG log G)
Occupancy Message
Approximation
 To pass a message
n 1
ˆ
p
(ui )
n
i
mi  j (u j )    occ (ui , u j ) n1
dui
m j i (ui )
ui
occupancy
edge potential
product of incoming msgs to i
except from j
Occupancy Message
Approximation
 To pass a message
n 1
ˆ
p
(ui )
n
i
mi  j (u j )    occ (ui , u j ) n1
dui
m j i (ui )
ui
 “Weak” potentials between nonadjacent amino
acids lets us approximate
min (u )   occ (ui , u ) pˆ in 1 (ui ) dui
ui
occupancy
edge potential
product of all
incoming msgs to i
Occupancy Message
Approximation
pˆ 3
x  occ  m53
3
pˆ 3
x  occ  m13
3
1
2
3
4
pˆ 3
x  occ  m63
3
5
6
Occupancy Message
Approximation

occ

 pˆ 3
occ
 pˆ 3
x3
x3
1

occ
 pˆ 3
x3
2
3
4
5
6
Occupancy Message
Approximation
ACC
1
2
4
3
5
6
Send outgoing occupancy message product to
a central accumulator
ACC ( x )   mi  ( x )
AAs i
Occupancy Message
Approximation
ACC
1
2
3
4
5
6
Then, each node’s incoming message product
is computed in constant time
m23  m43
pˆ 3   3  ACC 
m2  m3  m4
BP Output
 After some number of iterations, BP gives
probability distributions over Cα locations
…
ALA
LEU
…
PRO
…
pLEU  xLEU 
VAL
ARG
…
…
pVAL  xVAL 
ACMI’s Backbone Trace
 Independently choose Cα locations that
maximize approximate marginal distribution
b  arg max pˆ i ( xi )
*
i
xi
…
Example: 1XRI
prob(AA at location)
3.3Å resolution density map
39° mean phase error
HIGH
0.9
0.1
LOW
0.9009Å RMSd
93% complete
Density-map mean
phase error (deg.)
Testset Density Maps (raw data)
75
60
45
30
15
1.0
2.0
3.0
4.0
Density-map resolution (Å)
% Cα’s located within 2Å of
some Cα / correct Cα
Experimental Accuracy
% backbone correctly placed
% amino acids correctly identified
100
80
60
40
20
0
ACMI
ARP/ Resolve Textal
wARP
ACMI % Cα’s located
Experimental Accuracy on a
Per-Protein Basis
100
100
100
80
80
80
60
60
60
40
40
40
20
20
20
0
0
0
0
20
40
60
80 100
ARP/wARP
% Cα’s located
0
20
40
60
80 100
Resolve
% Cα’s located
0
20
40
60
80 100
Textal
% Cα’s located
ACMI Overview
Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)

Independent amino-acid search

Templates model 5-mer conformational space
Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)

Protein structural constraints refine local search

Markov field (MRF) models pairwise constraints
 Phase 3: Sample all-atom models


Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
Problems with ACMI
 Biologists want location of all atoms
 All Cα’s lie on a discrete grid
 Maximum-marginal backbone model may be
physically unrealistic
 Ignoring a lot of information
 Multiple models may better represent
conformational variation within crystal
Probability=0.4
Probability=0.35
Probability=0.25
Maximummarginal structure
ACMI with Particle Filtering
(ACMI-PF)
Idea: Represent protein using a set of static 3D all-atom
protein models
Particle Filtering Overview
(Doucet et al. 2000)
 Given some Markov process x1:KX
with observations y1:K Y
 Particle Filtering approximates some posterior
probability distribution over X using a set of N
weighted point estimates
N

p  x1:K | y1:K    wt  x1:K  x
i 1
(i )
K
(i )
1:K

Particle Filtering Overview
 Markov process gives recursive formulation
p  x1:k | y1:k   p  yk | xk   p  xk | xk 1   p  x1:k 1 | y1:k 1 
 Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles
 Recursive weight update,
wt
(i )
k
 wt
(i )
k 1


 
p yk | xk(i )  p xk(i ) | xk(i)1

q xk(i ) | xk(i)1 , yk


Particle Filtering for
Protein Structures
 Particle refers to one specific 3D layout of
some subsequence of the protein
 At each iteration advance particle’s trajectory
by placing an additional amino-acid’s atoms
Particle Filtering for
Protein Structures
 Alternate extending chain left and right
Particle Filtering for
Protein Structures
bk-1
bk
bk+1
sk
 Alternate extending chain left and right
 An iteration alternately places


Cα position bk+1 given bk
All sidechain atoms sk given bk-1:k+1
Particle Filtering for
Protein Structures
bk-1
bk
bk+1
sk
 Key idea: Use the conditional distribution
p(bk|bik-1,Map) to advance particle trajectories
 Construct this conditional distribution from
BP’s marginal distributions
Particle Filtering for
Protein Structures
sk
…
bk-1
bk
bk+1
Algorithm
place “seeds” bki for each particle i=1…N
while amino-acids remain
place bki+1 / bji-1 given bj:ki for each i=1…N
place ski given bki-1:k+1 for each i=1…N
optionally resample N particles
end while
…
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b1…L
k+1
b k-1
(1) Sample L bk+1’s from bk-1–bk–bk+1
pseudoangle distribution
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b1…L
k+1
pk+1(b k1+1 )
pk+1(b k2+1 )
b k-1
pk+1(b kL+1 )
(2) Weight each sample by its
ACMI-computed approximate marginal
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b1…L
k+1
1
pk+1(b k+1
)
2
pk+1(b k+1
)
b k-1
L
pk+1(b k+1
)
(3) Select bk+1 with probability
proportional to sample weight
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b k+1
L
b k-1
 
wtk 1   pk 1 bk 1  wtk
1
(4) Update particle weight as sum
of sample weights
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
Protein
Data
Bank
(1) Sample sk from a database of
sidechain conformations
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
pk(EDM | s k2)
pk(EDM
| s k1)
pk(EDM | sk3 )
(2) For each sidechain conformation, compute
probability of density map given the sidechain
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
pk(EDM | s k2)
pk(EDM
| s k1)
pk(EDM | sk3 )
(3) Select sidechain conformation
from this weighted distribution
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
M


wtk 1   p EDM | skm  wtk
m 1
(4) Update particle weight as sum of
sample weights
Particle Resampling
wt = 0.4
wt = 0.2
wt = 0.3
wt = 0.2
wt = 0.1
wt = 0.2
wt = 0.1
wt = 0.2
wt = 0.1
wt = 0.2
Amino-Acid Sampling Order
j
k
 Begin at some amino acid k with probability
P(k )  exp  entropy  pˆ k (bk ) 
 At each step, move left to right with probability

P( j  1)  exp entropy  pˆ j 1 (b j 1 ) 

P(k  1)  exp  entropy  pˆ k 1 (bk 1 )
Experimental Methodology
 Run ACMI-PF 10 times with 100 particles each
 Return highest-weight particle from each run
 Each run samples amino-acids in a different order
 Refine each structure for 10 iterations in Refmac5
 Compare 10-structure model to others using Rfree

R
Fobs  Fcalc
F
obs
Refined Rfree
ACMI-PF Versus ACMI-Naïve
0.5
0.4
0.3
Acmi-PF
Acmi-Naive
0.2
1
2
3
4
5
6
7
8
9 10
Number of ACMI-PF runs
Additionally, ACMI-PF’s models have …


Fewer gaps (10 vs. 28)
Lower sidechain RMS error (2.1Å vs. 2.3Å)
ACMI-PF Rfree
ACMI-PF Versus Others
0.65
0.65
0.65
0.55
0.55
0.55
0.45
0.45
0.45
0.35
0.35
0.35
0.25
0.25
0.25
0.25
0.35
0.45
0.55
0.65
ARP/wARP Rfree
0.25
0.35
0.45
0.55
Resolve Rfree
0.65
0.25
0.35
0.45
0.55
Textal Rfree
0.65
ACMI-PF Example: 2A3Q
2.3Å resolution
66° phase err.
1.79Å RMSd
92% complete
ACMI Overview
 Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)

Independent
amino-acid
searchphase
 Phase
4: Iterative

Templates
model 5-mer conformational space
improvement
Use particle-filtering models to
 Phase 2: Coarse
backbone
model
improve
density-map
quality
(ISMB 2006, ICDM 2006)
 Rerun entire pipeline on
 Protein structural constraints refine local search
improved density map
 Markov field (MRF) models pairwise constraints
 Repeat until convergence

 Phase 3: Sample all-atom models


Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
Phase Problem
Intensities
Measured by X-ray
crystallography
f I,Φ

Phases
Experimentally
estimated (e.g. MAD, MIR)
Density-Map Phasing
0°
30°
60°
75°
mean phase error
Iterative Phase Improvement
Revised
Initial
density map
Predicted
3D model
I obs
Icalc
Φexp
calc
Φcalc
(deg. mean phase error)
Error in ACMI-PF’s phases
ACMI-PF’s Phase Improvement
75
60
45
30
15
0
0
15
30
45
60
75
Error in initial phases
(deg. mean phase error)
% backbone located
Iteration 2
Two-Iteration ACMI
100
90
80
70
60
50
50
60
70
80
90
100
% backbone located
Iteration 1
Average
mean phase error
60
50
40
?
30
20
10
0
0
1
2
3
4
Number of ACMI iterations
Average
% uninterpreted AAs
Future Work:
Many-iteration ACMI
20
15
?
10
5
0
1
2
3
4
5
Number of ACMI iterations
Conclusions
 ACMI’s three steps construct a set of all-atom
protein models from a density map
 Novel message approximation allows inference
on large, highly-connected models
 Resulting protein models are more accurate
than other methods
Ongoing and Future Work
 Incorporate additional structural biology
background knowledge
 Incorporate more complex potential functions
 Further work on iterative phase improvement
 Generalize my algorithms to other
3D image data
Acknowledgements
 Advisor Jude Shavlik
 Committee
George Phillips
 Charles Dyer
 David Page
 Mark Craven
 Collaborators
 Ameet Soni
 Dmitry Kondrashov
 Eduard Bitto
 Craig Bingman
 6th floor MSCers

 Center for Eukaryotic
Structural Genomics
 Funding



UW-Madison Graduate
School
NLM 1T15 LM007359
NLM 1R01 LM008796