Transcript Document
Probabilistic Methods
for Interpreting
Electron-Density Maps
Frank DiMaio
University of Wisconsin – Madison
Computer Sciences Department
[email protected]
3D Protein Structure
backbone
backbone
sidechain
sidechain
C-alpha
3D Protein Structure
…
…
ALA
LEU
PRO
?
?
VAL
ARG
?
High-Throughput Structure
Determination
Protein-structure determination important
Understanding function of a protein
Understanding mechanisms
Targets for drug design
Some proteins produce poor density maps
Interpreting poor electron-density maps is very
(human) laborious
I aim to automatically interpret
poor-quality electron-density maps
…
…
Electron-Density Map
Interpretation
GIVEN: 3D electron-density map,
(linear) amino-acid sequence
Electron-Density Map
Interpretation
…
…
FIND:
All-atom Protein Model
Density Map Resolution
1.0Å
2.0Å
Morris et al. (2003)
3.0Å
Ioerger et al. (2002)
Terwilliger (2003)
My focus
4.0Å
Thesis Contributions
A probabilistic approach to protein-backbone tracing
DiMaio et al., Intelligent Systems for Molecular Biology (2006)
Improved template matching in electron-density maps
DiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)
Creating all-atom protein models using particle filtering
DiMaio et al. (under review)
Pictorial structures for atom-level molecular modeling
DiMaio et al., Advances in Neural Information Processing Systems (2004)
Improving the efficiency of belief propagation
DiMaio and Shavlik, IEEE International Conference on Data Mining (2006)
Iterative phase improvement in ACMI
ACMI Overview
Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)
Independent amino-acid search
Templates model 5-mer conformational space
Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)
Protein structural constraints refine local search
Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models
Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
ACMI Overview
Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)
Independent amino-acid search
Templates model 5-mer conformational space
Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)
Protein structural constraints refine local search
Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models
Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
5-mer Lookup
…SAW C VKFEKPADKNGKTE…
Protein
DB
ACMI searches map for each template independently
Spherical-harmonic decomposition allows rapid search
of all template rotations
Spherical-Harmonic
Decomposition
f (θ,φ)
5-mer Fast Rotation Search
electron density map
pentapeptide fragment
from PDB (the “template”)
sampled region of
density in 5A sphere
calculated (expected)
density in 5A sphere
map-region
sampled in
spherical shells
template-density
sampled in
spherical shells
5-mer Fast Rotation Search
map-region sphericalharmonic coefficients
map-region
sampled in
spherical shells
fast-rotation
function
(Navaza 2006,
Risbo 1996)
template sphericalharmonic coefficients
correlation
coefficient
as function
of rotation
template-density
sampled in
spherical shells
Convert Scores to
Probabilities
scan density map
for fragment
Bayes’
rule
correlation coefficients
over density map ti (ui)
probability
distribution
over density map
P(5-mer at ui | EDM)
ACMI Overview
Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)
Independent amino-acid search
Templates model 5-mer conformational space
Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)
Protein structural constraints refine local search
Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models
Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
Probabilistic Backbone Model
Trace assigns a position and orientation
ui={xi, qi} to each amino acid i
The probability of a trace U = {ui} is
P(U | EDM) P(u1
uN | EDM)
This full joint probability intractable to compute
Approximate using pairwise Markov field
Pairwise Markov-Field Model
ALA
GLY
LYS
LEU
SER
Joint probabilities defined on a graph as
product of vertex and edge potentials
P(U | EDM)
(u | EDM) (u , u )
AAs i i i
AAs i , j ij i j
ACMI’s Backbone Model
ALA
GLY
LYS
LEU
SER
Observational potentials tie the map to the model
ACMI’s Backbone Model
ALA
GLY
LYS
LEU
SER
Adjacency constraints ensure adjacent amino
acids are ~3.8Å apart and in proper orientation
Occupancy constraints ensure nonadjacent
amino acids do not occupy same 3D space
Backbone Model Potential
p(U | EDM)
adj (ui , u j ) occ (ui , u j ) i (ui | EDM)
AAs i , j
|i j | 1
AAs i , j
|i j | 1
AAs i
Backbone Model Potential
p(U | EDM)
adj (ui , u j ) occ (ui , u j ) i (ui | EDM)
AAs i , j
|i j | 1
AAs i , j
|i j | 1
AAs i
Constraints between adjacent amino acids
adj (ui , u j )
=
px ( || xi x j || )
×
p (ui , u j )
Backbone Model Potential
p(U | EDM)
adj (ui , u j ) occ (ui , u j ) i (ui | EDM)
AAs i , j
|i j | 1
AAs i , j
|i j | 1
AAs i
Constraints between all other amino acid pairs
0 if ||xi x j|| K
occ (ui , u j )
1 otherwise
Backbone Model Potential
p(U | EDM)
adj (ui , u j ) occ (ui , u j ) i (ui | EDM)
AAs i , j
|i j | 1
AAs i , j
|i j | 1
AAs i
Observational (“template-matching”) probabilities
i (ui | EDM)
Pr(5mer si 2 ...si 2 at ui )
Inferring Backbone Locations
Want to find backbone layout that maximizes
adj (ui , u j ) occ (ui , u j ) i (ui | EDM)
AAs i , j
|i j | 1
AAs i , j
|i j | 1
AAs i
Inferring Backbone Locations
Want to find backbone layout that maximizes
adj (ui , u j ) occ (ui , u j ) i (ui | EDM)
AAs i , j
|i j | 1
AAs i , j
|i j | 1
AAs i
Exact methods are intractable
Use belief propagation (Pearl 1988)
to approximate marginal distributions
pi (ui | EDM)
p(U | EDM)
uk , k i
Belief Propagation Example
LYS31
LEU32
mLYS31→LEU32
pˆ LYS31
pˆ LEU32
Belief Propagation Example
LYS31
LEU32
mLEU32→LYS31
pˆ LYS31
pˆ LEU32
Scaling BP to Proteins
(DiMaio and Shavlik, ICDM 2006)
Naïve implementation O(N2G2)
N = the number of amino acids in the protein
G = # of points in discretized density map
O(G2) computation for each message passed
O(G log G) as Fourier-space multiplication
O(N2) messages computed & stored
Approx (N-3) occupancy msgs with 1 message
O(N) messages using a message accumulator
Improved implementation O(NG log G)
Scaling BP to Proteins
(DiMaio and Shavlik, ICDM 2006)
Naïve implementation O(N2G2)
N = the number of amino acids in the protein
G = # of points in discretized density map
O(G2) computation for each message passed
O(G log G) as Fourier-space multiplication
O(N2) messages computed & stored
Approx (N-3) occupancy msgs with 1 message
O(N) messages using a message accumulator
Improved implementation O(NG log G)
Occupancy Message
Approximation
To pass a message
n 1
ˆ
p
(ui )
n
i
mi j (u j ) occ (ui , u j ) n1
dui
m j i (ui )
ui
occupancy
edge potential
product of incoming msgs to i
except from j
Occupancy Message
Approximation
To pass a message
n 1
ˆ
p
(ui )
n
i
mi j (u j ) occ (ui , u j ) n1
dui
m j i (ui )
ui
“Weak” potentials between nonadjacent amino
acids lets us approximate
min (u ) occ (ui , u ) pˆ in 1 (ui ) dui
ui
occupancy
edge potential
product of all
incoming msgs to i
Occupancy Message
Approximation
pˆ 3
x occ m53
3
pˆ 3
x occ m13
3
1
2
3
4
pˆ 3
x occ m63
3
5
6
Occupancy Message
Approximation
occ
pˆ 3
occ
pˆ 3
x3
x3
1
occ
pˆ 3
x3
2
3
4
5
6
Occupancy Message
Approximation
ACC
1
2
4
3
5
6
Send outgoing occupancy message product to
a central accumulator
ACC ( x ) mi ( x )
AAs i
Occupancy Message
Approximation
ACC
1
2
3
4
5
6
Then, each node’s incoming message product
is computed in constant time
m23 m43
pˆ 3 3 ACC
m2 m3 m4
BP Output
After some number of iterations, BP gives
probability distributions over Cα locations
…
ALA
LEU
…
PRO
…
pLEU xLEU
VAL
ARG
…
…
pVAL xVAL
ACMI’s Backbone Trace
Independently choose Cα locations that
maximize approximate marginal distribution
b arg max pˆ i ( xi )
*
i
xi
…
Example: 1XRI
prob(AA at location)
3.3Å resolution density map
39° mean phase error
HIGH
0.9
0.1
LOW
0.9009Å RMSd
93% complete
Density-map mean
phase error (deg.)
Testset Density Maps (raw data)
75
60
45
30
15
1.0
2.0
3.0
4.0
Density-map resolution (Å)
% Cα’s located within 2Å of
some Cα / correct Cα
Experimental Accuracy
% backbone correctly placed
% amino acids correctly identified
100
80
60
40
20
0
ACMI
ARP/ Resolve Textal
wARP
ACMI % Cα’s located
Experimental Accuracy on a
Per-Protein Basis
100
100
100
80
80
80
60
60
60
40
40
40
20
20
20
0
0
0
0
20
40
60
80 100
ARP/wARP
% Cα’s located
0
20
40
60
80 100
Resolve
% Cα’s located
0
20
40
60
80 100
Textal
% Cα’s located
ACMI Overview
Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)
Independent amino-acid search
Templates model 5-mer conformational space
Phase 2: Coarse backbone model
(ISMB 2006, ICDM 2006)
Protein structural constraints refine local search
Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models
Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
Problems with ACMI
Biologists want location of all atoms
All Cα’s lie on a discrete grid
Maximum-marginal backbone model may be
physically unrealistic
Ignoring a lot of information
Multiple models may better represent
conformational variation within crystal
Probability=0.4
Probability=0.35
Probability=0.25
Maximummarginal structure
ACMI with Particle Filtering
(ACMI-PF)
Idea: Represent protein using a set of static 3D all-atom
protein models
Particle Filtering Overview
(Doucet et al. 2000)
Given some Markov process x1:KX
with observations y1:K Y
Particle Filtering approximates some posterior
probability distribution over X using a set of N
weighted point estimates
N
p x1:K | y1:K wt x1:K x
i 1
(i )
K
(i )
1:K
Particle Filtering Overview
Markov process gives recursive formulation
p x1:k | y1:k p yk | xk p xk | xk 1 p x1:k 1 | y1:k 1
Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles
Recursive weight update,
wt
(i )
k
wt
(i )
k 1
p yk | xk(i ) p xk(i ) | xk(i)1
q xk(i ) | xk(i)1 , yk
Particle Filtering for
Protein Structures
Particle refers to one specific 3D layout of
some subsequence of the protein
At each iteration advance particle’s trajectory
by placing an additional amino-acid’s atoms
Particle Filtering for
Protein Structures
Alternate extending chain left and right
Particle Filtering for
Protein Structures
bk-1
bk
bk+1
sk
Alternate extending chain left and right
An iteration alternately places
Cα position bk+1 given bk
All sidechain atoms sk given bk-1:k+1
Particle Filtering for
Protein Structures
bk-1
bk
bk+1
sk
Key idea: Use the conditional distribution
p(bk|bik-1,Map) to advance particle trajectories
Construct this conditional distribution from
BP’s marginal distributions
Particle Filtering for
Protein Structures
sk
…
bk-1
bk
bk+1
Algorithm
place “seeds” bki for each particle i=1…N
while amino-acids remain
place bki+1 / bji-1 given bj:ki for each i=1…N
place ski given bki-1:k+1 for each i=1…N
optionally resample N particles
end while
…
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b1…L
k+1
b k-1
(1) Sample L bk+1’s from bk-1–bk–bk+1
pseudoangle distribution
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b1…L
k+1
pk+1(b k1+1 )
pk+1(b k2+1 )
b k-1
pk+1(b kL+1 )
(2) Weight each sample by its
ACMI-computed approximate marginal
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b1…L
k+1
1
pk+1(b k+1
)
2
pk+1(b k+1
)
b k-1
L
pk+1(b k+1
)
(3) Select bk+1 with probability
proportional to sample weight
Backbone Step (for particle i )
place bki+1 given bki for each i=1…N
bk
b k+1
L
b k-1
wtk 1 pk 1 bk 1 wtk
1
(4) Update particle weight as sum
of sample weights
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
Protein
Data
Bank
(1) Sample sk from a database of
sidechain conformations
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
pk(EDM | s k2)
pk(EDM
| s k1)
pk(EDM | sk3 )
(2) For each sidechain conformation, compute
probability of density map given the sidechain
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
pk(EDM | s k2)
pk(EDM
| s k1)
pk(EDM | sk3 )
(3) Select sidechain conformation
from this weighted distribution
Sidechain Step (for particle i )
place ski given bki-1:k+1 for each i=1…N
M
wtk 1 p EDM | skm wtk
m 1
(4) Update particle weight as sum of
sample weights
Particle Resampling
wt = 0.4
wt = 0.2
wt = 0.3
wt = 0.2
wt = 0.1
wt = 0.2
wt = 0.1
wt = 0.2
wt = 0.1
wt = 0.2
Amino-Acid Sampling Order
j
k
Begin at some amino acid k with probability
P(k ) exp entropy pˆ k (bk )
At each step, move left to right with probability
P( j 1) exp entropy pˆ j 1 (b j 1 )
P(k 1) exp entropy pˆ k 1 (bk 1 )
Experimental Methodology
Run ACMI-PF 10 times with 100 particles each
Return highest-weight particle from each run
Each run samples amino-acids in a different order
Refine each structure for 10 iterations in Refmac5
Compare 10-structure model to others using Rfree
R
Fobs Fcalc
F
obs
Refined Rfree
ACMI-PF Versus ACMI-Naïve
0.5
0.4
0.3
Acmi-PF
Acmi-Naive
0.2
1
2
3
4
5
6
7
8
9 10
Number of ACMI-PF runs
Additionally, ACMI-PF’s models have …
Fewer gaps (10 vs. 28)
Lower sidechain RMS error (2.1Å vs. 2.3Å)
ACMI-PF Rfree
ACMI-PF Versus Others
0.65
0.65
0.65
0.55
0.55
0.55
0.45
0.45
0.45
0.35
0.35
0.35
0.25
0.25
0.25
0.25
0.35
0.45
0.55
0.65
ARP/wARP Rfree
0.25
0.35
0.45
0.55
Resolve Rfree
0.65
0.25
0.35
0.45
0.55
Textal Rfree
0.65
ACMI-PF Example: 2A3Q
2.3Å resolution
66° phase err.
1.79Å RMSd
92% complete
ACMI Overview
Phase 1: Local pentapeptide search
(ISMB 2006, BIBM 2007)
Independent
amino-acid
searchphase
Phase
4: Iterative
Templates
model 5-mer conformational space
improvement
Use particle-filtering models to
Phase 2: Coarse
backbone
model
improve
density-map
quality
(ISMB 2006, ICDM 2006)
Rerun entire pipeline on
Protein structural constraints refine local search
improved density map
Markov field (MRF) models pairwise constraints
Repeat until convergence
Phase 3: Sample all-atom models
Particle filtering samples high-prob. structures
Probs. from MRF guide particle trajectories
Phase Problem
Intensities
Measured by X-ray
crystallography
f I,Φ
Phases
Experimentally
estimated (e.g. MAD, MIR)
Density-Map Phasing
0°
30°
60°
75°
mean phase error
Iterative Phase Improvement
Revised
Initial
density map
Predicted
3D model
I obs
Icalc
Φexp
calc
Φcalc
(deg. mean phase error)
Error in ACMI-PF’s phases
ACMI-PF’s Phase Improvement
75
60
45
30
15
0
0
15
30
45
60
75
Error in initial phases
(deg. mean phase error)
% backbone located
Iteration 2
Two-Iteration ACMI
100
90
80
70
60
50
50
60
70
80
90
100
% backbone located
Iteration 1
Average
mean phase error
60
50
40
?
30
20
10
0
0
1
2
3
4
Number of ACMI iterations
Average
% uninterpreted AAs
Future Work:
Many-iteration ACMI
20
15
?
10
5
0
1
2
3
4
5
Number of ACMI iterations
Conclusions
ACMI’s three steps construct a set of all-atom
protein models from a density map
Novel message approximation allows inference
on large, highly-connected models
Resulting protein models are more accurate
than other methods
Ongoing and Future Work
Incorporate additional structural biology
background knowledge
Incorporate more complex potential functions
Further work on iterative phase improvement
Generalize my algorithms to other
3D image data
Acknowledgements
Advisor Jude Shavlik
Committee
George Phillips
Charles Dyer
David Page
Mark Craven
Collaborators
Ameet Soni
Dmitry Kondrashov
Eduard Bitto
Craig Bingman
6th floor MSCers
Center for Eukaryotic
Structural Genomics
Funding
UW-Madison Graduate
School
NLM 1T15 LM007359
NLM 1R01 LM008796