What is Proteomics?
Download
Report
Transcript What is Proteomics?
A che servono tutte queste proteine ?
Funzioni
presunte
Funzioni
note
Funzioni sconosciute
What is Proteomics?
• Proteomics - A newly emerging
field of life science research that
uses
High
Throughput
(HT)
technologies to display, identify
and/or
characterize
all
the
proteins in a given cell, tissue or
organism (I.e. the proteome).
Genomic Data Sources
Vertical Genomics
genome
transcriptome
proteome
metabolome
physiome
Dinner discussion: Integrative Bioinformatics & Genomics VU
Analisi del genoma/proteoma
• Fino poco tempo fa i ricercatori
studiavano l’espressione di un singolo
gene
• Ora è possibile studiare l’espressione
di tutti i geni di un organismo
simultaneamente (questo può aiutare a
capire meglio la funzione dei singoli
geni nel contesto cellulare)
Structural Proteomics
• Elucidating all 3D structures of
proteins in the cell
• This is also called Structural Genomics
• Finding out what these proteins do
• This is also called Functional Genomics
Proteome – by the dictionary
•The term proteome, coined in 1994.
A linguistic equivalent to the concept of genome
Proteome - complete set of proteins that is
expressed, and modified by the entire genome in the
lifetime of a cell.
Practical: the complement of proteins expressed by a
cell at any one time.
Proteomics – by the dictionary
Proteomics (Practical) - the study of the proteome
using technologies of large-scale protein separation and
identification.
Large scale separation : 2DE
Liquid Chromatography
Identification :
MALDI MS
Tandem MS/MS
FT-MS …..
Proteomics by Medline
maturation of a field
From 220 publications in the previous millennium (‘94-’99)
To 21,350 (!!!) publications in this millennium (‘00-’05)
1730
9000
8000
Papers
7000
Reviews
6000
5000
4000
3000
2000
1000
0
1997
1998
1999
2000
2001
2002
2003
2004
Proteomics –by Google
the ultimate truth..
Proteomics
Genomics
886,000 hits (2004)
4,700,000 hits (2005)
2,070,000 hits (2004)
16,000,000 hits (2005)
3 Kinds of Proteomics
• Expressional Proteomics
– Electrophoresis, Protein Chips, DNA Chips
– Mass Spectrometry, Microsequencing
• Functional Proteomics
– HT Functional Assays, Ligand Chips
– Yeast 2-hybrid, Deletion Analysis, Motif Analysis
• Structural Proteomics
– High throughput X-ray Crystallography/Modelling
– High throughput NMR Spectroscopy/Modelling
Proteomics-wide scale structural
determination
Expressional Proteomics
2-D Gel
QTOF Mass Spectrometry
Expressional Proteomics
Prostate tumor
Normal
Structural Proteomics
• High Throughput
protein structure
determination
Structural Proteomics:
The Goal
What is Structural
Biology?
Sequence
3D
structure
MESDAMESETMESSRSMYN
AMEISWALTERYALLKINCAL
LMEWALLYIPREFERDREVIL
MYSELFIMACENTERDIRATV
ANDYINTENNESSEEILIKENM
RANDDYNAMICSRPADNAPRI
MASERADCALCYCLINNDRKI
NASEMRPCALTRACTINKAR
KICIPCDPKIQDENVSDETAVS
WILLWINITALL
Structural Scales
polymerase
SSBs
Complexes
helicase
primase
Organism
Assemblies
Cell
Structures
System Dynamics
Cell
The Protein Fold Universe
How big
Is It???
500?
2000?
10000?
∞?
Why Structural
Proteomics?
• Structure
Function
• Structure
Mechanism
• Structure-based Drug Design
• Solving the Protein Folding Problem
Keeps Structural Biologists Employed
Structural Genomics
“The next step beyond the human genome project”
From the NIH Request for Proposals for Structure Genomics Centers:
“These studies should lead to an understanding of structure/function
relationships and the ability to obtain structural models of all proteins
identified by genomics. This project will require the determination of a
large number of protein structures in a high-throughput mode.”
Structural Genomics/Proteomics
Subfields
Protein Production
Cloning, expression (e.g., cell-based and cell-free methodologies),
purification and labeling of proteins
Biophysical Characterization/Structure Determination
NMR, X-ray crystallography
Bioinformatics
Algorithms and databases for biophysical data comparison,
prediction methods, homology/molecular modeling,
structure refinement, in silico screening
Rational Drug Design
target identification and optimization
Protein Chip (data array)
Proteomics – in view of other fields
Genome
Protein Science
Genomics
Structural
proteome
Biology
Biochemistry Cell
Database
MolecularApplication
evolution
Data Biotechnology
Mining
Nanotechnology
Imaging
Chemistry
Proteome
Proteomics
Tools
Identification
Mass spectrometry
Assignments
Production and 3Dstructure determination
Protein-protein interaction
Systems Biology & Cell Simulation
By the end...
Sequence
2D-Gel
Gene Chip
Bioinformatics
Atomic Resolution Structural Biology
Organ Tissue Cell Molecule Atoms
• A cell is an organization of millions of molecules
• Proper communication between these molecules is
essential to the normal functioning of the cell
• To understand communication:
*Determine the arrangement of atoms*
Atomic Resolution Structural Biology
Determine atomic structure to
analyze why molecules interact
The Reward: UnderstandingControl
Anti-tumor activity
Duocarmycin
Atomic interactions
Shape
Atomic Structure in Context
NER
RPA
BER
RR
Molecule
Pathway
Activity
Structural
Genomics
Structural
Proteomics
Systems Biology
The Strategy of Atomic
Resolution Structural Biology
• Break down complexity so that the system
can be understood at a fundamental level
• Build up a picture of the whole from the
reconstruction of the high resolution pieces
• Understanding basic governing principles
enables prediction, design, control
High-throughput Biological Data
• Enormous amounts of biological
data are being generated by highthroughput capabilities; even more
are coming
–
–
–
–
–
–
genomic sequences
gene expression data
mass spec. data
protein-protein interaction
protein structures
......
Structural Genomics Pipeline
Genomic
Based Target
Selection
PDB
Deposition & Release
Publication
Data
Collection
Functional
Annotation
Structure
Determination
Isolation, Expression,
Purification,Crystallization
Year
Number of released entries
History of the PDB
1970s
– Community discussions about how to establish an archive of protein
structures
– Cold Spring Harbor meeting in protein crystallography
– PDB established at Brookhaven (October 1971; 7 structures)
1980s
– Number of structures increases as technology improves
– Community discussions about requiring depositions
– IUCr guidelines established
– Number of structures deposited increases
1990s
– Structural genomics begins
– PDB moves to RCSB
2000s
– wwPDB formed
Protein structural data explosion
Protein Data Bank (PDB): 14500 Structures (6 March 2001)
10900 x-ray crystallography, 1810 NMR, 278 theoretical models,
others...
Policies and Practices for
3D Coordinate Data
• Structural biology
– Release of coordinates upon publication required by
most journals worldwide
– Deposition and release required by many US funding
agencies
– Some depositions from pharmaceutical companies
• Structural genomics
– Deposition of coordinates upon completion of
refinement
– Release US: 6 weeks, International: 6 months
Sequence versus
structural data
• Despite structural genomics efforts, growth
of PDB slowed somewhat down in 2001-2002
(i.e did not keep up with Dickerson’s
formula). Structural genomics initiatives are
now in full swing and growth is up again.
• More than 300 completely sequenced
genomes
Increasing gap between structural and
sequence data
2000000
1800000
1600000
1400000
1200000
1000000
800000
600000
400000
200000
0
1980
200000
180000
160000
140000
120000
100000
80000
60000
40000
20000
0
1985
1990
1995
2000
2005
Structures
Sequences
Structural Proteomics:
The Motivation
Protein Structure
Initiative
• Organize and recruit interested
structural biologists and structure
biology centres from around the world
• Coordinate target selection
• Develop new kinds of high throughput
techniques
• Solve, solve, solve, solve….
Structural Proteomics Status
•
•
•
•
•
•
•
•
20 registered centres (~30 organisms)
82700 targets have been selected
52705 targets have been cloned
29855 targets have been expressed
12311 targets are soluble
1493 X-ray structures determined
502 NMR structures determined
1743 Structures deposited in PDB
Structural Genomics Basics
• Target strategy: systematic sampling
of protein sequence families to search
for unique protein structures
• Experimental determination of unique
protein structures in high throughput
operation
• Computational modeling of structures
of sequence family homologs
Protein Structure Initiative (PSI)
Long-Range Goal
To make the three-dimensional
atomic level structures of most
proteins easily available from
knowledge of their corresponding
DNA sequences
Expected PSI Benefits
• Structure provides information on
function and will aid in the design of
experiments
• Development of better therapeutic
targets from comparisons of protein
structures from:
– Pathogens vs. hosts
– Diseased vs. normal tissues
PSI Benefit
• Collection of structures will address key
biochemical and biophysical problems
– Protein folding, prediction, folds, evolution, etc.
• Benefits to biologists
–
–
–
–
Technology developments
Structural biology facilities
Availability of reagents and materials
Experimental outcome data on protein production
and crystallization
PSI Pilot Phase
Lessons Learned
1. Structural genomics pipelines can be constructed
and scaled-up
2. High throughput operation works for many proteins
3. Genomic approach works for structures
4. Bottlenecks remain for some proteins
5. A coordinated, target selection policy must
be developed
6. Homology modeling methods need improvement
Lessons
1. It is possible to construct large-scale facilities that can determine
the structures of a hundred or more proteins per year.
2. The difficulties
at each
stepgenomics
of determining a structure of a
Table 1 Lessons
from structural
particular protein can be quantified.
3. Structures from structural genomics can have an important impact
on scientific research.
4. Rapid deposition of data in public databases increases the impact
and usefulness of the data.
5. Technology development has played a critical role in structural
genomics.
6. Validation of technologies is nearly as important as the technologies
themselves.
7. Structures from structural genomics are of high quality.
8. International cooperation advances the field and improves data
sharing.
Ad agosto 2008, il numero di
strutture depositate dai
consorzi di genomica strutturale
è 6048, che corrisponde a circa
l’11.5% delle strutture presenti
nel database PDB.
Annual Reviews
Structural proteomics is the large scale study of the
structural description of proteins and their higher
order complexes present in a given cell.
It holds special significance since cellular behavior
and disease are functions of the interactions between
macromolecular complexes involved in cellular
biological transactions.
Important questions in structural proteomics involve
elucidating the structure of these multicomponent
assemblies, including their subcomponents and their
assembly, and relating their structure to function.