Crystal Screening statistics by year # crystals screened
Download
Report
Transcript Crystal Screening statistics by year # crystals screened
JCSG Overview. PSI large-scale center, overall pipeline overview
and organization
SDC role within JCSG, highlight production demands for PSI-2
The Joint Center for Structural Genomics
from pilot to production center.
Ashley Deacon
Joint Center for Structural Genomics
Stanford Synchrotron Radiation Laboratory
GM/CA CAT, July 2006
JCSG Overview
PSI-1: 7 pilot centers funded in 2000, 2 more added in 2001
PSI-2: 4 production centers funded in 2005
JCSG target selection and PSI goals
PSI-1
PSI-2
Thermotoga Maritima
PSI network goals:
Coarse and fine grain PFAM Coverage
Technology development
targets (mouse)
Biological theme: Central Machinery of Life
JCSG HT pipeline and data flow
Parallel HT
Expression
Automated HT
Purification
Target
Selection
Automated HT
Crystallization
Automated HT
Imaging
Integration of custom and commercial instrumentation
a HT pipeline.
Data tracking parallels the pipeline collecting 424
parameters from 28 stages.
Graphical interfaces to the experimental stages.
Feedback to the experimental pipeline and Integrated LIMS.
JCSG
DB
Semi-Auto
Publication
Structure Validation
& Deposition
Semi-automated
Structure determination
Automated HT
Crystal Screening
+ Data Collection
T. maritima High-throughput Proteome Screen
Original Tier-1 screen (single pass, single step purification,
Jan 2006
crystals from initial screen)
Targets
1877
100%
1877
100%
Amplifers
1791
95%
1791
95%
Expression Clones
1376
73%
1738
93%
Proteins Attempted Expression
1376
73%
1738
93%
Proteins Delivered for Crystallography
542
29%
765
41%
Proteins Crystallized
456 24% 714 38%
•8/01 assembled collection of TM clones for expression
•8/01 begin processing for expression
•1/02 completed expression / purification / crystal setup
•2/02 completed image analysis of crystallization screen
Lesley S et al., Proc. Natl. Acad. Sci. USA, 99, 11664-11669 (2002)
Growing structural coverage of T. maritima
• From1877 ORFs, 1738 clones attempted expression, 765 made it to crystallization trials.
• 350 targets screened, 180 with data collected, 168 solved and 162 deposited in PDB
• Structures: 20 novel features, 15 new folds
TM0875
orphan
• Direct structural coverage of 32% of the expressed soluble proteins and ~12% of proteome;
(227 unique PDB structures; 269 total).
• With homology and fold recognition models, structural coverage of T.maritima covers 80% of
soluble proteins and more than 91% of predicted crystallizable proteins - one of the highest
structural coverage of for a single organism.
Structural coverage of T.maritima proteome
80
70
% covered
60
~73% of feasible targets
in PDB
sequence identity > 30%
blast e-value < 0.001
FFAS score < -9.5
50
40
30
20
10
0
1980
1985
1990
1995
year
2000
2005
Strong synergy with core PX program at SSRL
Y05
Y01
PSI-1
Minimal production
Extensive technology development
Prototyping
1 scientist
2 software developers
1 engineer
PSI-2
Large-scale production
Focused development effort
Testing, debugging and feedback
9 scientists
2 software developers
1 research associate
Crystal Screening statistics by year
5127
PSI-2
1884**
2006
6807
2005
4253*
2004
6262
2003
1530
2002
235
2001
0
2000
4000
6000
8000
# crystals screened
* Reduced screening activity due to SSRL Spear III upgrade
** Covers first quarter 2006
Overall >21000 crystals screened
Average time from receipt of crystal to completion of screening reduced to 4 days
Crystal Screening Statistics by Target category
TM
TM
orthologs
Mouse
Mouse
Orthologs
PFAM
CML
Other
Total
Screened crystals
11540
1373
660
3372
872
1484
1670
20971
Screened targets
353
61
29
58
35
52
58
643
Crystal screened per
target
33
22
28
58
25
28
28
32
# Solved targets
163
21
11
25
14
18
21
273
Unsolved =<2Å
7
1
1
1
0
7
1
18
Unsolved 2.1 – 3 Å
32
6
1
7
7
8
9
70
Unsolved 3.1 - 4 Å
35
11
5
11
2
6
13
82
Unsolved 4.1 - 6 Å
27
1
0
1
3
5
2
39
* NMR targets are excluded.
Streamlining JCSG structure determination
SDC software
development
Xsolve
Optimize quality
of initial phasing
Using parallel
processing
publication
CNSS
Automatically fill in gaps
in initial model to provide most
complete model for refinement
Xpleo
Build consensus model
by combining models all
initial models
annotation
phasing
validation
tracing
refinement
Refinement
management
PDB
Process ensures
high quality
“deposition ready”
structures are
output
Centralized
refinement
staff
SSTS
database
Refinement
and QC list
Six dedicated scientists
Peer review/quality control
Peer-peer training
Close communication
Captures all data and
information needed for
deposition and mmCIF
generation
Tracks progress
of each target
Identifies problems
and bottlenecks
JCSG
refinement
guidelines
Uniform standards
and quality criteria
TM0449
TM0064
What is real impact of PSI - are new folds most important ?
TM0875 from t.maritima
53686717 from N.punctiforme
• New fold
• Two domains of known folds but no
recognizable sequence similarity to
known structures
• No homologs – an “orphan”
• No corresponding Pfam family
• C-terminal domain provides the first
structural template for Pfam family of over
500 sequences (PF00877)
CL1011C archaeal ATPase
•
•
•
•
•
•
•
•
•
PF01637 (conserved P-loop motif)
2 wavelength Se-MAD, 2 molecules per asu
356 amino acids, 5 MET per molecule
Xsolve solution gave 590 out of 712 residues
Resolution 2.0A
Completeness 95.4%
Rfree 22.6%, Rcryst 17.4%
Bound ADP and Mg
PDB ID 2fna
Thermotoga maritima proteins
TM0189
Periplasmic transporter /
iron binding protein.
Expressed without N-terminal
transmembrane helix
PF01497
1.7A,
Reductively methylated
and truncated protein used
for crystallization
Ni-bound confirmed by
fluoresence scan
Coordinated to HIS-tag
TM0957
TM1010
2.25A, P21
PF07883
20 homologs
1.9A and 1.0A, two
in different organisms
crystal forms
New PFAM
conformational change
Core OB-fold with
Common cupin fold
insertions
with low sequence
Reductively methylated
identity
and truncated protein used 2 unknown ligands
for crystallization
observed
4 mols/asu
Thermotoga Maritima proteins
TM1049 (endoglucanase)
PF05343
PDB ID 2fvg
TM1012 (putative nucleotidyltransferase)
1.6A (Se-MET for MAD)
This form has a UNL which may indicate new active site
1.2A (Native, reductively methylated)
Other proteins
XB5462A
MB3864A
2.6A
Acyl CoA Hydrolase
PF03061