An Introduction to Proteomics

Download Report

Transcript An Introduction to Proteomics

An Introduction to Proteomics
The PROTEin complement of the genOME.
Judson Hervey
UT-ORNL GST
Graduate Student
[email protected]
What is Proteomics?
• Defined as “the analysis of the entire
protein complement in a given cell, tissue,
or organism.”
• Proteomics “also assesses activities,
modifications, localization, and interactions
of proteins in complexes.”
• Proteomes of organisms share intrinsic
differences across species and growth
conditions.
Alternative View
• Definition by Mike Tyers (U Toronto):
• Lumping everything “post-genomic”
together, and eluding to proteomics as
“protein chemistry on an unprecedented,
high-throughput scale.”
• In any case, no matter which definition you
accept, consider proteomics as the “next
step” in modern biology.
Importance of Proteins:
• they serve as catalysts that maintain metabolic processes in the cell,
• they serve as structural elements both within and outside the cell,
• they are signals secreted by one cell or deposited in the extracellular
•
•
•
•
matrix that are recognized by other cells,
they are receptors that convey information about the extracellular
milieu to the cell,
they serve as intracellular signaling components that mediate the
effects of receptors,
they are key components of the machinery that determines which
genes are expressed and whether mRNAs are translated into
proteins,
they are involved in manipulation of DNA and RNA through
processes such as: DNA replication, DNA recombination, RNA
splicing or editing.
http://www-users.med.cornell.edu/~jawagne/proteins_&_purification.html
But what about the Genome?
• What does having the genome of an organism
give us?
 A great diagram, or “blueprint,” of the genes
within an organism.
 Think of the genome as code that needs
compiled into functional units.
 The genome gets “compiled” into the proteome
via the central dogma of biology.
 Proteomic strategies attempt to utilize information
from the genome in an attempt to conceptualize
protein function.
Experimental Platforms
Tyers and Mann, pg 194
“Systems biology is an approach to studying complex biological systems made possible
through technological breakthroughs such as the human genome project. …systems biology
simultaneously studies the complex interaction of many levels of biological information to
understand how they work together.”
http://www.systems biology.org/
Challenges facing Proteomic Technologies
• Limited/variable sample material
• Sample degradation (occurs rapidly, even during sample
•
•
•
•
•
preparation)
Vast dynamic range required
Post-translational modifications (often skew results)
Specificity among tissue, developmental and temporal
stages
Perturbations by environmental (disease/drugs)
conditions
Researchers have deemed sequencing the genome
“easy,” as PCR was able to assist in overcoming many of
these issues in genomics.
The Peptide Bond
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Protein Structure
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Figure 3-35. Three levels of organization of a
protein. (Alberts - Molecular Biology of the Cell)
Amino Acid Properties
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Pillar Proteomic Technologies
•
•
•
•
•
•
Amino Acid Composition
Array-based Proteomics
2D PAGE
Mass Spectrometry
Structural Proteomics
Informatics (and the challenges facing the
Human Proteome Project)
Amino Acid Composition (Edmund)
• Pioneering method of obtaining
information from proteins.
• Cumbersome and tedious by today’s
standards.
• Requires the use of terrible smelling ßmercaptoethanol. 
• Not “high-throughput” by today’s
standards, hence, aa comp is no longer
the most widely used technique.
Protein Sequencing
step 1, fragmenting into peptides
Protein Sequencing
step 2, sequencing the peptides by Edmund degradation.
Separation by HPLC and detect by absorbance at 269nm.
Array-based Proteomics
• Employ two-hybrid assays
• Use GFP, FRET, and GST
 GFP = green florescent protein
 FRET = florescence resonance energy
transfer
 GST = glutathione S-transferase, a well
characterized protein used as a marker
protein.
Array-based Proteomics
Array-based Proteomics
• Offer a high-throughput technique for
proteome analysis.
• These small plates are able to hold many
different samples at a time.
• Current research is ongoing in an attempt
to interface array methodologies with
Mass Spectrometry at ORNL.
Two-Hybrid Assay
Figure 1235. Griffiths
et. al.
Modern
Genetic
Analysis.
2D PAGE
• 2-D gel electrophoresis is a
multi-step procedure that can be used to
separate hundreds to thousands of
proteins with extremely high resolution.
• It works by separation of proteins by their
pI's in one dimension using an immobilized
pH gradient (first dimension: isoelectric
focusing) and then by their MW's in the
second dimension.
2D PAGE
• 2-D gel electrophoresis process consists
•
•
•
•
•
of these steps:
Sample preparation
First dimension: isoelectric focusing
Second dimension: gel electrophoresis
Staining
Imaging analysis via software
2D PAGE product of Hs plasma
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
http://us.expasy.org/ch2dothergifs/publi/elc.gif
Drawbacks of 2D PAGE
• Technique precision lacks reliable
•
•
•
•
reproduction.
Spots often overlap, making identifications
difficult.
More of “an art” than “a science.”
Slow and tedious.
Process contains may “open” phases
where contamination is possible.
Structural Proteomics
• Pioneering work is undergoing by
Baumeister et al, which can significantly
reduce the amount of painstaking labor in
the crystallization of proteins.
• Current techniques are not considered
“high throughput” within the structural
realm.
• Novel solutions combine current
technologies, such as NMR and XRC.
Informatics
• Significant improvements are needed in:
 Data presentation standards and formatting
 Software infrastructure
• ISB - have created many powerful software packages that
interpret data from different techniques.
• EBI and HUPO have come together to promote uniform data storage
and analysis:
 http://psidev.sourceforge.net
• The proteomics community has, over the course of the past four
years, become slightly “less proprietary.” Ron Beavis of U. Manitoba
has developed x! tandem, an open-source search algorithm as an
alternative to SEQUEST.
• Development of novel software for both analysis and strategies [for
biologists ] to manage the data are two fronts that I can see as
opportunities for folks with a CS background.
Clinical Proteomics
• This area of proteomics focuses on accelerating
•
•
drug development for diseases through the
systematic identification of potential drug targets.
How could this be accomplished?
Hopefully, we will have more specific
information, instead of raw genes, that will make
those complex differential equations much
simpler in the coming years.
Mass Spectrometry
• Mass Spectrometry is another tool to analyze
•
the proteome.
In general a Mass Spectrometer consists of:
 Ion Source
 Mass Analyzer
 Detector
• Mass Spectrometers are used to quantify the
•
mass-to-charge (m/z) ratios of substances.
From this quantification, a mass is determined,
proteins are identified, and further analysis is
performed.
“Mass Spec” Analyses can be run in Tandem
• MS/MS refers to two MS experiments
performed “in tandem.”
• Among other things, MS/MS allows for the
determination of sequence information,
usually in the form of peptides (small parts
of a protein).
• This information is used by algorithms to
identify a protein on the basis of mass of a
constituent peptide.
If you are lost….
• Consider an example: calculating a person’s
•
•
•
weight, without them knowing.
If we have a backpack that we know is 10
pounds, we could have them put it on.
Then, walk the subject over a hidden scale in the
floor.
The weight of the person could be obtained by
subtracting the weight of the backpack.
In a similar manner:
• Mass spectrometers allow the
determination of a mass-to-charge ratio of
the analyte.
• By knowing the charged state of the
analyte through the addition of protons
(the backpack in the example), the mass
can be calculated after deconvolution of
the spectrum.
LCQ Mass Spectrometer
Example MS/MS Spectrum
This spectrum shows the fragmentation of a peptide, which
is used to determine the sequence of the peptide, via a
search algorithm.
Typical MS experiment:
Algorithmic approaches to “tag” identification
• Peptide sequence tags (Mann): extract and
•
•
•
•
•
unambiguous sequence tag for ID.
Cross Correlation (Eng et al. - SEQUEST): comparison
between observed and theoretically generated spectra.
Probability-based matching (Perkins, and the proprietary
Mascot by Matrix Science): takes into statistical
significance of fragmentation.
Which one of these approaches would you employ?
(Hint: Discussion fostering question.)
Could DP be employed in the searching for posttranslational modifications in future designs?
Could it be done in advance in order to factor account for
PTMs to speed up the time of the search?
De novo algorithms
De novo algorithms
Sequence tagging Algorithms
+/- Sequence Tagging Algorithms
Other Proteomic Tools FYR
My $0.02:
• Proteomics is undoubtedly a critical component of systems biology,
however:
 The lack of hypothesis-driven experiments isn’t necessarily
“good” for science. Discovery-based science should be guided
by hypotheses, IMO.
 Along these lines, as with the HGP, when it comes to literature,
what do you do, just publish the whole thing?
• This is another stumbling block of what to do with all of this
information.
 Proteomics needs its “own PCR,” or “miracle” tool, to increase
the throughput.
• A new technology, or instrument that combines other
approaches, would be useful, esp. in structural proteomics,
quantification, and sample reproduction.
References
• Nature Insight: Proteomics. Nature 422:
191-237.
• Zhu, H. et al. Proteomics. Annual Review
of Biochemistry 72: 783-812.
• Griffiths et al. Modern Genetic Analysis.
Online: http://ncbi.nih.gov