Structural Databases the PDBe - European Bioinformatics Institute
Download
Report
Transcript Structural Databases the PDBe - European Bioinformatics Institute
Structure Databases:
The Protein Data Bank
Swanand Gore & Gerard Kleywegt
PDBe – EBI
May 7th 2010, 9-10 am
Macromolecular Crystallography Course
Outline
• Structural Biology and Bioinformatics
• Databases in Structural Bioinformatics
• Protein Data Bank
• PDBe
Promise of Structural Biology
• Basic research
– Insights in biophysics of folding
– Insights into Evolution
– Insights into enzymatic catalysis
• Applications
– Design of drug / antibody / epitope / pesticide / enzymes
– Design of new materials
– Understanding disease
• Structural bioinformatics
– Big computational and informatics toolbox
– Full of techniques to translate insights to application
– Databases are a vital aspect
Sequence-Structure-Function
Sequence
Determination
Archival / Retrieval
Classification
Prediction
Modelling
Structure
Design
Engineering
Searching
Mining
Comparison
Alignment
Function
A rich toolbox
Structure
Refinement
Databases
Prediction
Annotation
Structural
Bio-info-computing
Analysis
Mining
Classification
Comparison
Databases are central to structural
bioinformatics pipeline
Align
Compare
Mine
Classify
Determine
Annotate
Primary
Structural
Databases
Model
Predict
Secondary
Structural
Databases
Databases help in Structure
Determination
• Dihedral preferences
– Ramachandran contours
– Sidechain rotamer libraries
– RNA backbone and puckers
• Likely ring conformations
– Small-molecules (CCDC)
• Molecular replacement
– Choice of probe using homology
– fragment-based MR
• Validation
– Electron density server and PrEDS
•
•
•
•
•
•
•
•
Dunbrack, R.L., Jr. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 12:431-440, 2002.
Jane S. Richardson et al (2008) "RNA Backbone: Consensus All-angle Conformers and Modular String Nomenclature (an RNA Ontology Consortium contribution)" RNA 14 :465-481
The Cambridge Structural Database: a quarter of a million crystal structures and rising, F. H. Allen, /Acta Cryst./, B*58*, 380-388, 2002
S.C. Lovell et al. (2003) "Structure Validation by Cα Geometry: φ,ψ and Cβ Deviation." Proteins: Structure, Function and Genetics 50, 437-450.
Claude et al. CaspR: a web server for automated molecular replacement using homology modelling. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W606-9.
McCoy, A.J., Grosse-Kunstleve, R.W., Adams, P.D., Winn, M.D., Storoni, L.C. and Read, R.J. (2007). Phaser crystallographic software. J. Appl. Cryst. 40: 658-674.
Gubbi et al. (2007) Solving Protein Structures Using Molecular Replacement Via Protein Fragments, Lecture Notes In Artificial Intelligence;.Vol. 4578. 627.
GJ Kleywegt et al. (2004) "The Uppsala Electron-Density Server", Acta Crystallographica, D60, 2240-2249
Databases are vital to
archiving structures!
• Structures represent invaluable
scientific insights
• But it is costly to solve a
structure
– Time, effort, money
• Organize and safe-keep
painstakingly determined data
– Formal mechanisms of
arranging, searching, backing
up
• Wide-ranged access to
invaluable repository without
compromising data integrity
• Very low cost of maintenance
in comparison with the cost of
content!
Databases are vital to archiving structures
• “Database is a structured collection of data held
in computer storage, often incorporating software
to make it accessible in various ways”
• Databases
– Provide accessibility with safety and persistence
– Provide context for your data against other data
– Facilitate comparisons and data-mining
• Primary structural databases
– Experimental data and model coordinates
• NDB, wwPDB, BMRB, CSD, EMDB
• Secondary structural databases
– Classification, function annotation
• SCOP, EC2PDB, PALI, and many many more!
Databases / Archival / Retrieval
• Formats of databases
– Flat files (csv, tsv, columnar), supporting scripts
– Relational (MySQL, Oracle): professional, indexed
• Access
– Modes: read, write, edit, delete (PDB provides entry deposition mechanisms)
– Means: Download (wwPDB ftp), Command-line or GUI (SQL queries, Oracle
desktop client), Web-based interfaces (PDBeDatabase service)
– Access frequency
• Schema design
– Tables, primary keys,
foreign keys, views….
– Normal forms: avoid
data repetition,
inconsistencies
Databases for Classification
• Structural hierarchy
– CATH
• Class, Architecture, Topology, Homology
– SCOP
• Class, Fold, Superfamily, Family
• Enzyme hierarchy
– EC-PDB
• Oxidoreductase, ligase, lyase, isomerase,
hydrolase, transferase.
• Functional ontology
– GOA
• Gene Ontology: Cellular component,
Biological process, Molecular Function
• Linked to structures via SIFTS
•
•
•
•
Christos A. Ouzounis et al. (2005)Classification schemes for protein structure and function Nature Reviews Genetics 4, 508-519.
Andreeva et al. (2007) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36:D419
Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium (2000) Nature Genet. 25: 25-29
Barrell D. et al. (2009) The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Research 2009 37: D396-D403.
Databases for Comparison
• Structural and structure-sequence alignments
• Phylogeny
– Evolutionary trace
• Evolutionarily important residues
• Mapping onto structure
•
•
•
•
•
•
•
•
Mizuguchi K, Deane CM, Blundell TL, Overington JP. (1998) HOMSTRAD: a database of protein structure alignments for homologous families. Protein Science 7:2469-2471.
SISYPHUS - structural alignments for proteins with non-trivial relationships Andreeva et al, Nucleic Acid Research Database Issue 2007, 35, D253-D259
Gowri, V. S. Et al. (2003). Integration of related sequences with protein three-dimensional structural families in an updated Version of PALI database. Nucleic Acids Res. 2003 31: 486-488.
Bhaduri A, Pugalenthi G, Sowdhamini R. PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics. 2004, 5:35
DBAli tools: mining the protein structure space. Marc A. Marti-Renom et al. Nucleic Acids Research, doi:10.1093/nar/gkm236
Whelan, S., P.I.W. de Bakker, & N. Goldman. (2003). Pandit: a database of protein and associated nucleotide domains with inferred trees. Bioinformatics 19:1556-1563
The Pfam protein families database:,R.D. Finn,et al, Nucleic Acids Research (2010) Database Issue 38:D211-222
Morgan, D.H., D.M. Kristensen, D. Mittleman, and O. Lichtarge. ET Viewer: An Application for Predicting and Visualizing Functional Sites in Protein Structures. Bioinformatics. 2006 Aug 15;22(16):2049-50
Databases for Annotation
• Domains
• Active / allosteric sites
• SNPs
•
•
•
•
•
•
•
Servant F. rt al (2002) ProDom: Automated clustering of homologous domains. Briefings in Bioinformatics. vol 3, no 3:246-251
Marchler-Bauer A,et al CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 2009 Jan;37(Database issue):D205-10
Hulo N., Bairoch A., Bulliard V., Cerutti L., Cuche B., De Castro E., Lachaize C., Langendijk-Genevaux P.S., Sigrist C.J.A. The 20 years of PROSITE. Nucleic Acids Res. 2007
SitesBase: a database for structure-based protein–ligand binding site comparisons , Nicola D. Gold and Richard M. Jackson, Nucleic Acids Research, 2006, Vol. 34, Database issue D231-D234
sc-PDB: an Annotated Database of Druggable Binding Sites from the Protein Data Bank, Esther Kellenberger et al, J. Chem. Inf. Model., 2006, 46 (2), pp 717–727
Binding MOAD, a high-quality protein–ligand database. Mark L. Benson et al, Nucleic Acids Research 2008 36(Database issue):D674-D678
SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs . Joke Reumers at al, Bioinformatics 2006 22(17):2183-2185
Databases for Annotation
• Residues critical to enzyme mechanism
• Surface properties, cavities: Voronoia,
• Binding partners
– Small molecule: TIMBAL, CREDO
– Protein, DNA – PiBase
JAIL,
•
•
•
•
•
•
•
•
BIPA
CREDO: A Protein-Ligand Interaction Database for Drug Discovery.Adrian Schreyer, Tom Blundell. Chemical Biology & Drug Design, Vol. 73, No. 2. (February 2009), pp. 157-167
BIPA: a database for protein–nucleic acid interaction in 3D structures. Semin Lee and Tom L Blundell, Bioinformatics 2009 25(12):1559-1560
PIBASE: a comprehensive database of structurally defined protein interfaces. Davis FP and Sali A, Bioinformatics. 2005 May 1;21(9):1901-7.
JAIL: a structure-based interface library for macromolecules. Stefan Günther et al. Nucleic Acids Res. 2009 January; 37(Database issue): D338–D341
Elke Michalsky et al., SuperLigands – a database of ligand structures derived from the Protein Data Bank, BMC Bioinformatics 2005, 6:122
Voronoia: analyzing packing in protein structures. Rother K et al. Nucleic Acids Res. 2009 Jan;37(Database issue):D393-5.
CASTp: Computed Atlas of Surface Topography of proteins. Binkowski et al. Nucleic Acids Res. 2003 Jul 1;31(13):3352-5.
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton (2004) Nucl. Acids. Res. 32: D129-D133.
Databases of Analysis / Mining
• Secondary structure: SSEP
• Protein-peptide
• Active sites
interactions
• Frequent structural
motifs
• Loop databases
– Protein Coil Library
– Protein Loop
Classification
– Loops in Proteins
– Protein Topology Graph
Library
•
•
•
•
•
•
•
•
Oliva et al (1997) An automated classification of the structure of protein loops. J Mol Biol 266 (4): 814-830.
SSEP: secondary structural elements of proteins , V. Shanthi, P. Selvarani, Ch. Kiran Kumar, C. S. Mohire and K. SekarNucleic Acids Research, 2003, Vol. 31, No. 13 3404-3405
PepX: a structural database of non-redundant protein-peptide complexes. Vanhee F et al., Nucleic Acids Res. 2010 Jan;38(Database issue):D545-51.
Baeten L, et al. (2008) Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments. PLoS Comput Biol 4(5): e1000083. doi:10.1371/journal.pcbi.1000083
Bystroff C & Baker D. (1998). Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol 281, 565-77.
LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures. Stuart AC et al., Bioinformatics. 2002 Jan;18(1):200-1.
PTGL—a web-based database application for protein topologies. Patrick May et al. Bioinformatics 2004 20(17):3277-3279; doi:10.1093/bioinformatics/bth367
Fitzkee, N. C., Fleming, P. J, Rose G. D. (2005) The Protein Coil Library: a structural database of nonhelix, nonstrand fragments derived from the PDB. Proteins. 58 (4): 852-4.
Databases in Prediction
•
Oligomeric state
–
•
3D coordinates
–
–
•
•
•
•
•
•
•
•
small-molecule (PRECISE)
protein-protein (ADAN)
Dynamics, conformational changes
–
•
ab-initio folding
homology models
Possible binding partners and binding modes
–
–
•
PISA at PDBe
MolMovDB
Cellular location
LOC3D: annotate sub-cellular localization for protein structures. Nair R, Rost B., Nucleic Acids Res. 2003 Jul 1;31(13):3337-40.
MolMovDB: analysis and visualization of conformational change and structural flexibility. Echols N et al., Nucleic Acids Res. 2003 Jan 1;31(1):478-82.
ADAN: a database for prediction of protein-protein interaction of modular domains mediated by linear motifs. Encinar JA et al., Bioinformatics. 2009 Sep 15;25(18):2418-24. Epub 2009 Jul 14.
PRECISE: a Database of Predicted and Consensus Interaction Sites in Enzymes . Shu-Hsien Sheu et al., Nucleic Acids Research, 2005, Vol. 33, Database issue D206-D211
MODBASE, a database of annotated comparative protein structure models and associated resources. Ursula Pieper et al., Nucleic Acids Research 37, D347-D354, 2009.
Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. (2007) 372:774–797.
S. M. Larson . Folding@Home and Genome@Home: Using distributed computing to tackle previously intractable problems in computational biology. Mod Meth Comp Biol, R. Grant, ed, Horizon Press (2003)
Specialized databases with structures
• MCSIS (GPCRs, Prions etc) • Antibodies (Abysis)
• Lysozymes
• Carbohydrates
– KEGG Glycans
•
•
•
Abysis: http://www.bioinf.org.uk/abysis/
Horn F., Vriend G., Cohen FE. Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems. Nucleic Acids Res. 29:346-349 (2001)
LySDB - Lysozyme Structural DataBase. Mohan KS et al., Acta Crystallogr D Biol Crystallogr. 2004 Mar;60(Pt 3):597-600.
The Protein Data Bank
• Unique primary database
– Single archive of experimentally determined macromolecular
(biopolymer) structures
– ~ 65000 entries
– Distributed online
– Updated weekly
– Numerous databases derived and enriched with PDB data
– Many frontends- RCSB, PDBe, PDBsum, OCA, MMDB, Jena, SIB
• “The PDB” is a flat-file archive
– PDB formatted coordinate files
– any experimental data when submitted
The Protein Data Bank
• International Effort
– Curated by RCSB, PDBe, PDBj, BMRB
– ftp archive currently operated by RCSB
FTP traffic at PDB sites
RCSB PDB
200 million
data downloads
PDBe
37 million
data downloads
PDBj
14 million
data downloads
The Protein Data Bank
• When is a biopolymer PDB-worthy?
– Polypeptides
• Gene products
• Non-ribosomal
• Synthetic peptides > 23 residues
– Unless clearly biologically significant
– Polynucleotides
• > 3 residues
– Sugars
• > 3 sugar residues
– Fibers
• Only repeating unit deposited
Annual Growth of PDB
Primary databases
differ by magnitudes
in size.
< 105 structures
UniprotKB
107 protein sequences
http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
http://www.ebi.ac.uk/uniprot/TrEMBLstats/
GenBank
1011 base pairs
108 gene sequences
8000
7000
6000
4000
3000
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
Annual Growth of PDB
9000
X-ray
NMR
EM
5000
Dominated by x-ray!
2000
1000
0
EM rising…
Redundancy in PDB
(as in Nov’08)
• Entries > 54,000
• Chains > 120,000
– Copies of a chain in same entry
• Homo-oligomers
– Same chains in different entries
•
•
•
•
Determined by multiple labs
Determined under different conditions
Complexed with different partners
Mutants
• Chains < 8700 at seq.id < 30%
– Orthologs, paralogs are very similar
• Using non-redundant chains from PDB
– PISCES server
– WHATIF, CATH, SCOP, DALI sets
•
G. Wang and R. L. Dunbrack, Jr. PISCES: a protein sequence culling server. Bioinformatics, 19:1589-1591, 2003.
File formats at PDB
•
The .pdb format
– Header
•
Remarks
–
–
–
–
•
experimental setup
Refinement details
oligomeric state
deviations from expected geometry
Biochemical entities
–
Biopolymers, het groups
– Coordinates
•
•
3D model of the entity
Multiple coordinates for same entity can exists
–
•
MODELs, altloc identifiers
Structure factors
– .cif file
File formats at PDB
XML
mmCIF
The PDB format: header
123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+
HEADER
COMPND
COMPND
COMPND
SOURCE
SOURCE
SOURCE
SOURCE
AUTHOR
REVDAT
RETINOIC-ACID TRANSPORT
28-SEP-94
1CBS
CELLULAR RETINOIC-ACID-BINDING PROTEIN TYPE II COMPLEXED
2 WITH ALL-TRANS-RETINOIC ACID (THE PRESUMED PHYSIOLOGICAL
3 LIGAND)
HUMAN (HOMO SAPIENS)
2 EXPRESSION SYSTEM: (ESCHERICHIA COLI) BL21 (DE3)
3 PLASMID: PET-3A
4 GENE: HUMAN CRABP-II
G.J.KLEYWEGT,T.BERGFORS,T.A.JONES
1
26-JAN-95 1CBS
0
Column 1-6
Record type
Column 7-72 - human-readable, mostly
textual information
1CBS
1CBS
1CBS
1CBS
1CBS
1CBS
1CBS
1CBS
1CBS
1CBS
2
3
4
5
6
7
8
9
10
11
The PDB format: coordinates
X, Y, Z coordinates
Atom nr
Atom name
Occupancy
Residue type
“B-factor”
Chain name
Residue nr
HETATM
HETATM
HETATM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1
2
3
4
5
6
7
8
9
C
O
CH3
N
CA
C
O
CB
CG
ACE
ACE
ACE
MET
MET
MET
MET
MET
MET
A
A
A
A
A
A
A
A
A
0
0
0
1
1
1
1
1
1
4.279
3.706
3.827
5.514
6.269
6.702
7.036
7.529
7.292
14.829
14.098
16.236
14.621
13.401
13.319
12.248
13.301
12.805
14.190
15.038
14.001
13.695
13.959
15.400
15.870
13.085
11.676
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
19.08
20.62
20.22
17.77
16.51
16.41
15.38
16.52
16.48
C
O
C
N
C
C
O
C
C
Protein Data Bank in Europe
• PDBe
–
–
–
–
European node of wwPDB
Started 1996 as MSD at EBI
Deposition site since 1999
Started EMDB in 2002
• PDBe operations
–
–
–
–
–
Handle deposition and annotation of PDB and EMDB entries
Build advanced structure databases
Build services for search, browsing, analysis
Liaise with broader structural biology community
Coordinate with other databases e.g. Uniprot
• Funding
•
PDBe: Protein Data Bank in Europe. S. Velankar et al.,
PDBe Deposition and Annotation
AutoDep
Deposition
Tool
•
Checks
–
–
–
–
•
Is format correct?
Are biopolymer sequences in biochemical
entities consistent with 3D models?
Are hetero groups named correctly?
Where all does model deviate from
expected geometry?
Record various types of information
–
–
–
Experiment: Method, conditions, data
resolution, spacegroup, completeness etc.
Sample: source, expression system,
engineered etc.
Refinement: program, target
AutoDep provides valuable
information to depositors
• Validation of structure factors
– EDS criteria
•
http://www.ebi.ac.uk/pdbe-xdep/autodep/index.jsp
AutoDep provides valuable
information to depositors
Heterogen summary and Validation against ideal representations of ligands
AutoDep provides valuable
information to depositors
Oligomeric state - PQS
Sequence-structure alignment
Uniprot, Pfam, Interpro
AutoDep provides valuable
information to depositors
• Revisions, withdrawal, release
– Release sequence-only
immediately
– Release coordinates
immediately
– Hold for 1 year
– Release after publication
• Communication with depositors
– Help depositors understand and
conform to PDB standards
– Discussing errors
PDBe Services
PISA, SSM/ PDBeFold, PDBeMotif, PDBeChem, SIFTS, PDBeStatistics, PDBeSearch, PDBeView
PDBe Services
PDBe Services
PDBeView – the Atlas pages
•
http://www.ebi.ac.uk/pdbe-srv/view/
PDBe Services
PDBeFold (SSM): has my fold been seen before? Or is it novel!
PDB
???
• E. Krissinel and K. Henrick, Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Cryst. (2004). D60, 2256±2268.
PDBe Services
PDBeFold (SSM)
• Why compare structures?
– Reveal conformational changes
• Ligands, mutations, crystal packing, pH..
– Judge structural variability
• NMR ensembles, structure families
– Discover common structural motifs
– Identify fold
– Infer function
• Sequence-alignments do not work well for
distant evolutionary relationships
• Structures diverge much slowly than
sequences
• Structure improves quality of alignment
• Better inference of function, e.g. when
active sites match well
•
The relation between the divergence of sequence and structure in proteins. Chothia C, Lesk AM. EMBO J. 1986 Apr;5(4):823-6.
PDBe Services
PDBeFold (SSM) algorithm
H1
H2
S1
S4
S2
H5
S3
H4
H1
H2
H6
Iterative expansion of Ca-alignment
S3
S2
S1
H3 S
7
S6
S4
S5
Match SSE graphs to get initial alignment
PDBe Services
PDBeFold (SSM)
SSM can carry out genuine multiple structure
alignment to reveal a motif common to a
family of structures
PDBe Services
PDBePISA
• What is the likely biological assembly of a given structure?
• Can I learn about it from crystal-packing of chains?
PISA
Generate possible assemblies
Rank according to free energy
Crystal Symmetry
ASU
PDB file
(ASU)
Biological
Unit
PDBe Services
PDBePISA
PDB entry 1P30
A monomer?
Biological unit 1P30
Homotrimer!
PDBe Services
PDBePISA
PDB entry 2TBV
A trimer?
Biological Unit 2TBV
180-mer!
PDBe Services
PDBePISA
PDBe Services
PDBePISA
PDB entry 1E94
2 Biological Units in 1E94:
A dodecamer and a hexamer!
PDBe Services
PDBeMotif
•
•
•
•
•
A very powerful engine to search PDB
Structure-sequence general searches
• Chemical substructure
• Predefined frequent motifs
• Arbitrary secondary structure patterns
• Φψ patterns
• Protein sequences
• Prosite motif, Uniprot, CSA
accessions
• Raw sequence
• Regular expression
• Interactions between ligands, protein
• Seq-distance between protein motifs
PDB header searches
Specialized searches
• Envionment around an interaction
• Motif binding
• Occurrence of a motif inside another
MSDmotif: exploring protein sites and motifs. Adel Golovin and Kim Henrick. BMC Bioinformatics 2008, 9:312
Staurosporine Kinase
inhibitor
PDBe Services
PDBeMotif: which motif does my substructure bind often?
PDBe Services
PDBeMotif: which ligands and chemical fragments does my sequence motif bind?
Tyrosine protein
kinase-specific
active-site
signature:
[LIVMFYC]-{A}-[HY]-xD-[LIVMFY][RSTAC]-{D}-{PF}-N[LIVMFYC](3)
Chemical
fragments
Motif
binding
statistics
PDBe Services
PDBeMotif: how does a sequence motif look like in 3D?
Tyrosine protein
kinase-specific
active-site
signature:
[LIVMFYC]-{A}-[HY]-xD-[LIVMFY][RSTAC]-{D}-{PF}-N[LIVMFYC](3)
Sequence hits
3D alignment
PDBe Services
PDBeMotif: which sequences often host a Ramachandran path?
3D fragment
φ/ψ sequence
-156/-155,-103/17,-134/161
Search
Sequence pattern
PDBe Services
PDBeAnalysis: selections and statistics
• Structure Statistics
• frequency plots on 1 or 2 properties of entries
• Residue Statistics
• Choose residues and make frequency plots of a property
• Choose residues in entry meeting certain filters, and plot their
property
• Atom Statistics
• Choose atom-sets in entries and plots distance, angle, dihedrals
between them
• Structure Selection
• Create a subset of entries using various filters
• Database Browser
• Web-based SQL query page to internal database
• Geometric Validation coupled with 3D viewer
•
http://www.ebi.ac.uk/pdbe-as/pdbevalidate/
PDBe Services
PDBeAnalysis: selections and statistics
CA1-CA2-CA3-CA4
Torsion distribution
Low res
Resolution vs Rfactor
High res
PDBe Services
PDBeAnalysis: geometric validation
AstexViewer coordinated with plots
Table and plot of geometric checks
Phi-psi, chi, omega, B-value,
bonds, angles, chiralities
PDBe Community Work
• X-ray
– CCP4 software: MMDB,
PISA, SSM, harvesting
– Validation Task Force
• NMR
– CCP-NMR software
– Validation task force
• EM
– Validation and
standards
– Ongoing software
development
• SIFTS - coordinating
with other biodatabases
• CAPRI - Provide
infrastructure for
submission and
maintenance of entries
• PiMS – Information
management system for
protein crystallography
experiments
PDBe Community Work
• EuroCarbDB
– Databases and bioinformatic tools in
glycobiology and glycomics
• BIObar
– A toolbar for browsing biological data and
databases, a Mozilla plugin for your browser
• Outreach and training
– Roadshows: invite us!
– Tutorials
PDBe Services: Future Emphasis
• To go from being a historic structural archive to a valuable
resource for structural biomedicine
• PDBeXplore
– Provide relevant interesting avenues to access structural
information
– Ligands, Assemblies, Enzymes, GO, CATH, Sequences,
Publications, Pathways
• PDBe Validation Resource
– Provide a comprehensive battery of validation tools during
deposition and to the end-user
– Migrate and enhance EDS server
– Partner with CCDC to bring cutting edge ligand validation
Summary
• Structural Bioinformatics and Biocomputing are essential to
fulfilling the promise of structural biology
• Databases are indispensible to all aspects of structural
bioinformatics
• PDB is the primary repository of structures and numerous
databases are developed based on PDB.
• PDBe provides high-quality services to depositors and end-users,
and is an active member of structure-determination community.
• PDBe is open to all suggestions to make our services better and
more relevant to your work.
Acknowledgements
• Alejandro and organizers at IPMont
• PDBe group
– Sameer Velankar, Jawahar Swaminathan
• Designers, developers, maintainers of various
structural databases at PDBe and elsewhere