PHASES: A Program Package for the Processing and Analysis

Download Report

Transcript PHASES: A Program Package for the Processing and Analysis

Direct Methods and Many Site
Se-Met MAD Problems using
BnP
W. Furey
Classical Direct Methods



Main method for “small molecule” structure
determination
Highly automated (almost totally “black box”)
Solves structures containing up to a few
hundred non-hydrogen atoms in the
asymmetric unit.
Direct Methods Assumptions
and Requirements



Non-negativity of electron density
Atoms are “resolved”, i.e. “atomic resolution”
data are available
Unit cell, symmetry and contents are known
Important Concepts - 1



Normalized Structure Factors EH given by
EH = FH / < |FH|2>1/2
with averaging in
resolution shells
The phase fH of EH is the same as for FH
< |EH|2> = 1
hence “normalized”
Important Concepts - 2

Structure Invariant - structural quantity
independent of choice of unit cell origin

Probabilistic estimates can be made for the
values of structure invariants given the
associated E magnitudes and cell contents
Fundamental formulas
involving individual triplets


P(yHK) = [2p I0(AHK)]-1 exp(AHK cos yHK)
where P(yHK) is the probability of the structure
invariant having the value yHK
AHK = 2 |EHEKE-H-K| / N1/2 where N is the
number of atoms in the cell and the E’s are
normalized structure factors

Note probability P(yHK) increases as AHK
increases, and that AHK is proportional to
product of E’s and inversely proportional to
N1/2

Expected value of cos yHK is given by
<cos yHK> = I1(AHK) / I0(AHK)
F3 = YHK, K=AHK
Cochran Distribution
for various K’s
s vs K
Classical Direct Methods
Applications for Proteins



Used for phase extension to very high
resolution
Used with moderate success to locate heavy
atom sites in isomorphous derivatives
E values used in molecular replacement
calculations
Current Direct Methods
Applications for Proteins


Shake n Bake (based on minimum function)
used to solve complete protein structures with
over 1,000 atoms (rubredoxin, lysozyme,
calmodulin etc.), provided data to 1.1Å or
better is available
Used to locate anomalous scatterer sites from
MAD or SAS data
General Shake n Bake Concept


Use a multi-solution method starting with
random phases (or randomly positioned atoms)
in each trial.
For each trial phase set, use a “dual space”
procedure iterating between real and reciprocal
space optimization/constraints.



Reciprocal space optimization based on
shifting phases to reduce the “minimum
function” R(y)
Real space optimization and constraints based
on computing new phases only from the largest
peaks in map based on previous cycle phases
Each trial phase set ranked by value of R(y)
SnB inner loop for trial structure
Generate random
trial structure
Stop after
N iterations
Compute phases
from structure
Select “structure”
from largest peaks
Shift phases to
reduce R(y)
Compute map
from new phases
Choice of data for Se determination




Use | |FH|+ - |FH|- | (anomalous) difference
at single l
Use | |FH|li - |FHllj | (dispersive) difference
between two l’s
Use FA values (derived from data at all l’s)
Use FHLE values based on max anomalous
and max dispersive differences
MAD Phasing

For data collected at l1, l2 etc, choose a
wavelength ln as “native” data, and “reduce”
that data set by averaging Bijvoet pairs.

For other “derivative” wavelengths ld,
reduce both by averaging Bijvoet pairs to
form “isomorphous” data sets, and without
averaging to form “anomalous” data sets.
MAD Phasing

For “isomorphous” and “derivative
anomalous” data sets, scale “derivative” to
“native” and use scattering factors of
f0= 0, f’= f’(ld) - f’(ln), f”= f”(ld)

For “native anomalous” data use original
native Bijvoet pairs and scattering factors of
f0= 0, f’ = 0, f”= f”(ln)
Phase Refinement Minimizing
 Wh  Pf | FPHobs | h  | FPHcalc f P  | h 
h
f
2
P
P
where
2
2
2
|FPHcalc ( fP )| h  |FPobs | h  | FHcalc | h
 2 | FPobs |h |FHcalc | h cos f P  f H h
Phase Refinement Options
 Wh  Pf | FPHobs |h  |FPHcalc (fP )|h
h
fP
P


“Classical” - fP = centroid, Wh=1/E2,1/ <E2> or
unity, PfP=1, use reflections with FOM > 0.4-0.6

“Maximum Likelihood” - fP stepped over allowed
phases, PfP= corresponding probability, Wh=1/E2,
1/ <E2> or unity, use reflections with FOM > 0.2
fP, PfP can also come from external source, i.e
solvent flattened or NC-symmetry averaged maps.
2
Projection of peaks down NC twofold
MAD l1, l2, l3 data
(Scalepack files)
final map
FSFOUR
CMBISO
CMBANO
EXTRMP
“submap” file
“iso” and “ano” scaled files
all “native”
(l3) data
MAPAVG
PHASIT
“averaging”
mask file
“phase” file
MISSNG
BNDRY
“extension”
file
MAPINV
BLDCEL
MAD Phasing/Averaging Statistics
Wavelength
type
dm in
(Å)
No. refl
Rano
Riso
dm in (Å)
(phasing)
Rc
Phasing
Power
<FOM>
l1, edge
l2, peak
l3, remote
l1-l3
l2-l3
ano
ano
ano
iso
iso
2.3
2.3
2.3
2.3
2.3
72,632
72,996
72,650
74,407
74,774
0.063
0.060
0.048
-
0.039
0.035
2.6
2.6
2.6
2.6
2.6
-
3.47
3.45
2.09
1.89
1.59
0.380
0.447
0.389
0.393
0.357
-
0.55
0.61
Mean FOM (combined) = 0.759 for 48,632 reflections (2.6Å)
Correlation coefficient between monomer density prior to
NCS averaging = 0.764
Correlation coefficient between monomer density after NCS
averaging/phase combination = 0.906
Peak anomalous (l2)
difference Patterson
With SnB it’s possible to automatically
locate the anomalous scatterer
substructure with data from any one of
the dispersive combinations or
anomalous pair sets
 As expected, sets with the maximum
dispersive or anomalous signal typically
yield a greater frequency of success

Automated Applications of
BnP: Methodology
W. Furey,1 L. Pasupulati,1
S. Potter2, H. Xu2, R. Miller3 & C. Weeks2
1University
of Pittsburgh School of Medicine
and VA Medical Center
2Hauptman-Woodward Medical Research Institute
3Center for Computational Research, SUNY at Buffalo
Goal: Provide user-friendly software for automatic
determination of protein crystal structures
PHASES Strengths
SnB Strengths
1.
2.
Powerful, state-of-the-art
direct methods for
automatically locating heavy
atom sites
Friendly graphical user
interface.
SnB Weaknesses
1.
2.
Stops after finding sites, i.e
no protein phasing
No software interface
1.
2.
Proven protein phasing (MAD,
MIRAS, etc), solvent flattening,
NCS
averaging, external program
interfacing
Interactive graphics
PHASES Weaknesses
1.
2.
Doesn’t automatically find
heavy atom sites
Script based, i.e. no GUI
Adopted Strategy
 Combine the SnB program with the “PHASES” package,
putting everything under GUI control
 Establish default parameters and procedures allowing all
aspects of the structure determination to be fully
automated
 Also provide a manual mode allowing experienced users
more control, and to facilitate development
 Provide graphical feedback when possible
 Facilitate coupling with popular external software
Main Developments Required for
Automated Structure Determination
 Automatic substructure solution detection
 Automatic substructure validation
 Automatic hand determination (including space group
changes, when needed)
Automatic Substructure Solution
Detection
Original Method
Based on histogram
(Manual, time consuming,
requires user interaction)
Current Method
Based on Rmin and
Rcryst statistics
(Automatic, fast, no
user interaction)
Automatic Substructure Validation
Original Method
Left up to user to decide
which peaks correspond to
true sites (Manual)
Current Method (auto mode)
Based on occupancy
refinement against Bijvoet
differences (Automatic, fast,
requires no coordinate
refinement, hand insensitive)
Current Method (manual mode)
As in auto but can also compare
peaks from different solutions (Manual)
Automatic Substructure Validation
Automatic Hand Determination
Original Method
Visual inspection of
map projections (Manual,
requires user interaction)
Current Method
(MAD, SIRAS or MIRAS)
Based on variance
differences in protein
and solvent regions
(Automatic, fast since
requires no refinement,
also requires no user
interaction)
Automatic Hand Determination
Current Method
(SAS data only)
Comparative analysis of R,
FOM and CC after solvent
flattening/phase
combination. (Automatic,
fast, requires no refinement)
Current Method
(SIR, MIR data only)
Both hands tried, map
examination needed.
(Requires user interaction)
No man (or program) is an island
Importing data files
Exporting control files
 Scalepack files
 D*Trek files
 MTZ files$
 Free format files
O
 RESOLVE 2.08
 Arp/wARP 6.1.1
Exporting data files
Job submission from GUI
 Free format files
 RESOLVE$ 2.08
 CNS files
 Arp/wARP$ 6.1.1
 MTZ files$
 O files
 CHAIN files
 PDB files
$RESOLVE, Arp/wARP and/or CCP4 must be obtained
from their respective authors/distributors for these
options to work
Results for 1jc4
a=43.6 b=78.6, c=89.4 Å, b= 91.95°, P21
4 molecules (592 residues) in asu
2.1Å data, 3l MAD data
Substructure: Found 24 of 24 Se
Phasing: mean PP- 2.95; mean FOM- 0.661
Time to map: ~41 min on G4 (1.5 GHz) Powerbook
~13 min on G5 (2.7 GHz) Desktop
Auto Tracability:
Resolve- 87% main chain, 68% side chain
Arp/wARP- 82% main chain, 73% side chain
SeMet ASU Size & Data Resolution
PDB
Code
No.
Sites
No.
Residues
d(Å)
PDB
Code
No.
Sites
No.
Residues
NCS
NCS
d(Å)
1QC2
4
169
1
1.5
1CLI
28
1380
4
3.0
1BX4
7
345
1
2.25
1A7A
30
864
2
2.8
1CB0
8
283
1
2.2
1L8A
40
1772
2
2.6
1T5H
10
504
1
2.5
1E3M
45
1600
2
3.0
2JXH
12
576
2
3.1
1HI8
50
1328
2
2.8
1GSO
13
431
1
2.22
1GKP
54
2748
6
2.5
2TPS
15
454
2
2.7
1DQ8
60
1868
4
2.33
1DBT
19
717
3
2.49
1E2Y
60
1880
10
3.2
1JEN
22
668
2
2.25
1M32
66
2196
6
2.55
1JC4
24
592
4
2.1
1EQ2
70
3100
10
2.9
Phasing Flexibility (Manual Mode)
Conclusion
BnP is a user friendly, efficient, package for the
automated determination of protein structures
from x-ray diffraction data
BnP downloads for Linux, Apple G4, G5, & Intel, and
SGI’s available (academic & non-profit institutions) at
http://www.hwi.buffalo.edu/BnP/