INF380 – Proteomics

Download Report

Transcript INF380 – Proteomics

INF380 – Proteomics
Chapter 3 – Protein digestion
•
•
•
•
•
An important part of the identification process is protein digestion,
cleaving the proteins into peptides.
The proteins are experimentally digested (cleaved) into peptides
by enzymes called proteases, which are active in all organisms.
Numerous proteases from numerous species, ranging from man
to bacteria, are known and characterized.
Because proteases that should be used in proteomic research
must have specific properties, only a few of them are routinely
used.
The peptides are analysed by mass spectrometry, producing a
mass spectrum. This spectrum is then compared to the theoretical
peptide masses from in silico digestion of database sequences.
INF380 - Proteomics-3
1
Protein digestion
•
•
•
Let PE be the set of masses from a spectrum, assuming to
come from one protein.
Let PT be the set of theoretical masses from in silico
digestion (simulating the protease) of the same protein.
Then, in an ideal situation, and without modifications, PE=PT.
In the real world we however have
– Some of the masses in PE come from other proteins or
molecules contaminating our sample.
– Not all of the peptides are detected by the mass spectrometer,
and their masses will not be in PE
– There may be a disagreement between the experimental
digestion and the model used for the in silico digestion, for
example that not all expected cleavages are performed in the
experimental digestion, so-called missed cleavages.
•
•
This results in a set of experimental masses, which is not in
PT and a set of theoretical masses which is not in PE
There can be peptides containing modified residues
INF380 - Proteomics-3
2
Protein digestion
INF380 - Proteomics-3
3
Protein digestion
•
•
•
•
•
For peptide mass fingerprinting a goal is to achieve as many
common peptides as possible for the experimental and the
in silico digestions, hence as high coverage as possible.
Therefore the protein separation steps before digestion are
important.
A single protein in each sample (for example a single
protein in each spot of a 2D gel) is preferable, and
contamination should be avoided at any step of sample
handling. Human keratins (from our skin or hair) is a
common contamination
In PMF identifications of average proteins, a sequence
coverage of 15 to 30 % is frequently achieved, often
corresponding to 5 to 15 peptides.
This may constitute less than half of the experimental
peptides detected.
INF380 - Proteomics-3
4
Protein digestion
•
•
There are many reasons why the major part of the
sequence is not covered by the experimental peptide
As can be understood from above, the selection of the
protease is a very important consideration.
– The protease should cleave the protein in a unique and
predictable way.
– For two main reasons the protease should not cut the protein
into too many small peptides or a few long peptides.
•
•
•
Most mass spectrometry instruments have a limited mass range
where they can operate at their optimum.
The number of sequences containing a specific peptide mass
increases with decreasing mass. For example, if we take the
13,359 human proteins available in the database SwissProt in
January 2006, and digest them with trypsin, there are 633
peptides in the m/z range 499.0 to 501.0, but only 195 peptides in
the m/z range 1999.0 to 2001.0.
The mass of peptides of length less than six amino acids occurs
in so many sequences that they are less appropriate for use in
identification. Thus, the protease should not cut the protein into
many small peptides.
INF380 - Proteomics-3
5
Experimental digestion
INF380 - Proteomics-3
6
Cleavage specificity
•
•
Cleavage specificity is a description of the cleavage site of a
protease's substrate protein.
A cleavage site can be described by:
– Cleavage activator is a set of amino acids that a subsite can
bind to.
– Cleavage point specify the cleavage point.
– Cleavage preventor is a set of amino acids that hinders the
cleavage if one of them occur at a specific position, despite the
occurrence of the cleavage activators.
•
Thus each residue of a cleavage site is part of an activator
or a preventor. Note however that for some proteases an
activator can be X, meaning that any of the amino acids can
occur.
INF380 - Proteomics-3
7
Cleavage specificity
• A notation for describing a cleavage site is
to:
–
–
–
–
enclose the cleavage activators in brackets, '[]';
enclose the preventors in '<>';
specify the cleavage point by a full stop, '.';
the length of the cleavage site is equal to the
number of activators and preventors.
INF380 - Proteomics-3
8
Cleavage specificity
Trypsin is the protease that best satisfies the desired requirements.
• It has high specificity, few missed cleavages and rarely or never
cleaves at unexpected positions. .
• arginine and lysine appear with an average distance of
approximately 11, and with a small probability of being succeeded
by a proline, peptides of suitable length are produced.
• It is easily obtained and purified.
• It is applicable in most experimental settings and procedures, and
is used to cleave proteins in solution, in gels, or even adsorbed
onto surfaces.
• Thus trypsin, by cleaving after each arginine and lysine, ensures
that each peptide will have a site capable of retaining a proton (for
ionization).
INF380 - Proteomics-3
9
In silico digestion
•
The in silico digestion of a protein sequence is performed by
scanning the sequence for cleavage sites. However, one
should have in mind how the experimental data are
produced.
1.
2.
3.
4.
5.
6.
There may be missed cleavages.
There can be naturally occurring modifications in some
positions of the protein.
There can be chemical modifications intentionally introduced.
There can be unintentional modifications introduced by the
sample handling.
There can be unsuspected cleavages during the maturation/life
cycle of the protein
There can be unexpected cleavages occurring during the
experimental proteolytic treatment.
INF380 - Proteomics-3
10
In silico digestion
•
•
Missed cleavages and different modifications (points 1-4)
greatly increase the number of theoretical peptides, thus
also increasing the chances of accidental matches with the
experimental data.
If the number of cleavage sites in a sequence is n and the
number of missed cleavages allowed in a peptide is k,
then the number of theoretical peptides is
INF380 - Proteomics-3
11