Transcript Chapter_6

INF380 – Proteomics
Chapter 6 – Mass Spectrometry – MALDI TOF
•
•
•
•
•
The MALDI-TOF instruments are the simplest MS instruments
suitable for protein and peptide analyzes.
They are easy to handle, and rapid results are obtained.
The sample for MALDI-TOF instruments preferably should contain
the peptides from a single protein or a very low number of proteins
(two or three at maximum).
Thus, extensive sample preparation must be performed before the
analysis in the instrument.
Analysis by MALDI instruments are therefore often combined with
2-D gel electrophoresis of the sample to separate the proteins,
and single spots are picked for proteolytic treatment.
INF380 - Proteomics-6
1
MALDI TOF - properties
•
•
•
•
•
•
•
•
•
•
•
•
MALDI instruments are robust, fast and easy to use.
Not all of the peptides become ionized. There is a ``competition'' between the peptides for
being ionized, and which peptides that are ionized depend partly on the matrix used.
The ionization process by MALDI is not fully understood, but several studies have
investigated how the ionization depends on peptide properties.
MALDI is able to ionize molecules of masses from a few hundred Da up to several hundred
kDa, although the mass range studied in each single analysis is much more restricted.
It is usually known whether we study peptides or large proteins, and the instrument
settings can be optimized for each purpose. The lower mass range is limited by noise
generated from matrix ions and other unwanted ions.
Almost all of the ionized peptides have a single positive charge due to the acquisition of a
proton. As a general rule, only very large peptides or proteins may pick up two or three
protons. The spectra are therefore relatively easy to interpret.
The accuracy in the peptide mass area (500-6,000 Da) is good. The exact accuracy
obtainable is dependent on the instrument and the calibration procedures. When internal
calibration is used, the accepted accuracy during database searches is often set to 50
ppm, meaning that the difference between the experimental $m/z$ value and the
theoretical m/z value must be less than 50 ppm.
The resolution is good, meaning that all newer instruments can distinguish the isotopic
peaks in the peptide mass range and correctly assign the monoisotopic peak.
The peptides ionized by MALDI are generally quite stable, and there is little fragmentation.
There can be contaminants: the matrix (which is unavoidable), contaminants introduced by
sample handling, and contaminants due to insufficient sample separation.
It is possible to connect to other ``intelligent systems'' (robots) for automatic proteomics.
MALDI instruments are generally quite sensitive.
INF380 - Proteomics-6
2
TOF analyzers
•
•
•
•
•
Time-of-flight analyzers are the dominating analyzers for single MS.
When connected to a MALDI ionization source, the instrument is
called a MALDI-TOF mass spectrometer.
The simplest form of TOF analyzers is the linear TOF
Ions are sent to the analyzer in short pulses due to the pulses of the
laser and the synchronized changes in the electric field.
The ions are accelerated in the electric field. Let the potential of the
acceleration field be P, the velocity at the end of the acceleration v,
the distance to the detector d, and e the charge of an electron.
INF380 - Proteomics-6
3
TOF analyzers
•
•
•
Time-of-flight analyzers are the dominating analyzers for single MS.
When connected to a MALDI ionization source, the instrument is called a
MALDI-TOF mass spectrometer.
As described in the previous Chapter, resolution of an MS instrument is
its ability to discriminate between different components with small
differences in the m/z values. We will illustrate how this is related to time
and masses for TOF analyzers.
INF380 - Proteomics-6
4
TOF analyzers
•
•
•
•
•
Another important factor in relation to resolution is that not all ions of a peptide
(or several peptides of the same m/z) will reach the detector at exactly the
same time.
Therefore, there will be a spreading in the measured time (and therefore the
calculated m/z) of a peptide.
The reasons for different detected time are mainly because not all ions come
from exactly the same point in the matrix/sample, and they are spread out in a
three-dimensional space when they enter the electrical field.
Thereby the velocities after acceleration may vary slightly.
Overlaps may therefore occur between different peptides
INF380 - Proteomics-6
5
TOF analyzers
Two ways are commonly used to reduce the spreading (and thus
increasing the resolution): Delayed extraction and reflectron.
• Delayed (pulsed) extraction When the peptides are ejected from the
plate due to the laser pulse, they exhibit a range of velocities, and
would reach the detector at different times. To remedy for this, a
delayed extraction voltage can be used. When this voltage is turned
on, the ions with lowest velocities get greatest acceleration voltage,
such that all ions (of the same $m/z$) reach the detector at
(approximately) the same time.
INF380 - Proteomics-6
6
TOF analyzers
•
•
•
Reflectron is an electrostatic mirror placed at the end of the linear flight tube. When
the ions enters the reflectron, they are exposed to another electric field that forces
them to turn around and fly back, slightly angled relative to the direction they came
from.
The reflectron has two effects. First, it increases the flying distance d by sending the
ions back through another TOF analyzer that has a detector placed at the far end.
Second, the reflectron is able to 'collect' the ions of the same m/z that had slightly
different velocities in the first drift tube.
All advanced MALDI-TOF instruments have the possibility to choose between linear
TOF and reflectron TOF in addition to delayed extraction.
INF380 - Proteomics-6
7
Constructing the peak list
•
•
•
•
The raw data spectrum contains signal from peptides from the sample
proteins, and signal due to different forms of noise.
Revealing the signals (peaks) corresponding to the sample peptides
is a multistep task.
The detailed procedures for peak list construction varies with the
instruments and programs used, and the type of spectra.
The main challenges are to remove noise peaks without removing
any of the peptide peaks
INF380 - Proteomics-6
8
Constructing the peak list
•
•
The main challenges are to remove noise peaks without removing
any of the peptide peaks, and to determine the m/z and intensity
values with the best accuracy possible.
Noise may be divided into two main types.
–
Chemical noise may have several sources.
•
•
•
•
–
•
•
It may come from chemical contaminants introduced during the sample
handling,
It may be proteins or peptides that unintentionally have been introduced during
sample handling, for example human keratins from our skin or hair.
In MALDI-TOF analyzes it may also come from the instability of peptides after
receiving energy from the laser pulse, resulting in for example fragmentation
In all cases unexpected peaks are produced.
Electronic noise is the result of electronic disturbances, and occurs with
random fluctuations between the chemical noise.
The noise level of a spectrum (or part of a spectrum) is typically
calculated as a signal-to-noise ratio (s/n ratio).
Noise and contaminations may cause so much disturbance that the
mass spectrometric analysis is unusable. Both sample handling and
instrument settings may therefore be critical during an experiment.
INF380 - Proteomics-6
9
Constructing the peak list
•
Baseline correction
–
–
–
•
The baseline is an offset of the intensities of masses, and should be subtracted from
the determined intensities. It often depends on the m/z value such that it is highest at
low m/z-values, and shows an exponential decay.
The noise varies from spectrum to spectrum. Each spectrum must therefore be
treated separately.
Many algorithms for baseline correction have been developed. The simplest baseline
correction finds the lowest point in the spectrum and drags this point down to zero.
At the same time, the highest point in the spectrum (the base peak) is kept at 100
percent intensity. The result is that the intensities in the spectrum is "extended" along
the vertical axis.
Smoothing and noise reduction
–
–
–
–
–
–
A spectrum can be jagged, making it difficult to detect peaks from noise.
A smoothing is therefore performed on the spectrum. Again this can be done in
different ways.
The straightforward way is to use a sliding window, where a new value is calculated
for the point in the middle of the window, based on the values for the points in the
window. The calculation is very often a weighted middle.
The length of the window and the weights are determined from known forms of the
noise, often taking the resolution into account. Often a Gaussian filter is used, based
on the Gaussian function which is symmetric.
Due to variation of the noise level along the mass range, they are calculated for
small intervals (down to under one unit).
More advanced smoothing functions are Fourier transformations and Wavelet
transformations.
INF380 - Proteomics-6
10
Constructing the peak list
•
•
Peak detection is the process of distinguishing interesting peaks
from noise.
An aim is that every isotope in the spectrum should be represented
by exactly one data point, and several techniques are used.
–
–
–
–
–
–
–
•
•
One way is to first identify the apex of a peak. This is the point where the
intensity changes from increasing to decreasing.
Then the intensity at this apex is compared to the surrounding noise level.
Alternatively, the valleys on each side of the apex can be used to
determine start and end of the peak.
Then the area of the peak is calculated and examined if it is above the
limit for being a peak.
Other methods consider a peak as a continuous range of points with
intensity above the noise level.
Some also consider the shape of the peak to distinguish peptides from
contaminants and noise, as peptides tend to have different forms than
noise have.
The number of expected raw data points per peak (which can vary over
the spectrum) is also used in some peak detection algorithms.
Whatever peak detection algorithm used, high intensity peaks of
single peptides are usually detected without failure.
However, when the peaks have low intensities and there are complex
patterns of overlapping peptides, errors are much more prone to
occur.
INF380 - Proteomics-6
11
Constructing the peak list
INF380 - Proteomics-6
12
Constructing the peak list
•
•
•
•
•
•
•
Intensity normalization
Intensities are used to distinguish noise from real signals, in
scoring of comparisons, and in classifying spectra as good or
bad.
The intensity of peaks are highly variable from spectrum to
spectrum, and for using them in a uniform way they should be
normalized somehow.
The normalization should be such that the normalized values
reflect the probabilities that the ions are real ions from peptides
(not noise).
The traditional way has been to transform to relative
intensities, relative to the total intensity (TIC) or to the
maximum peak intensity
This can then be used to remove noise, by considering peaks
with a normalized intensity below a threshold as noise.
To obtain higher robustness (across the spectra) rank based
intensity normalization is proposed. This means that the most
intense peak gets rank one, the next highest rank two, and so
on.
INF380 - Proteomics-6
13
Constructing the peak list - calibration
•
•
The spectra obtained from MS instruments must be calibrated to
achieve the accuracy needed for database searches
Internal calibration is most commonly used, and is done by setting
the exact m/z values of known peaks in the experimental spectrum.
Such peaks are obtained in one of two ways:
– known standards may be added to our sample
– the protease used for digesting the sample will produce known
autolytic peaks by self-digestion.
– It is also possible to use external calibration by spotting known
standards close to our sample on the sample plate of the instrument,
and the spectra from the sample spot and the standard spot are
acquired consecutively.
INF380 - Proteomics-6
14
Constructing the peak list - calibration
•
Different forms of regression curves for calculating the calibrated values
are used, from a linear m*=am+b to higher order, for example m*=(am+b
sqrt(m)+c)2 where m* is the new m/z-value.
INF380 - Proteomics-6
15
Constructing the peak list - calibration
•
•
•
In the case of internal calibration, the new constants are
applied to the same spectrum, correcting the m/z values of all
peaks in the spectrum.
In the case of external calibration, the constants from the
standard spectrum are applied into the sample spectrum.
External calibration will usually be less exact than internal
calibration.
INF380 - Proteomics-6
16
Peak list preprocessing
•
Monoisotoping and deisotoping
•
This process reduces a cluster of isotopic peaks to a single peak, with intensity
equal to the sum of the isotope intensities.
Monoisotoping reduces to the lowest detected peak in the cluster. This is
usually the procedure for MALDI experiments where delayed extraction and
reflectron mode have been used.
Deisotoping reduces to a centroided peak,with the m/z value determined from
the intensities of the individual isotopes.
The centroid m/z value corresponds to the value that would be obtained if
average masses of the atoms in the peptide had been used for calculation of its
mass.
•
•
•
•
Removing spurious peaks
•
The fractional masses can be used to try to remove non-peptide masses, by
removing those whose fractional masses do not follow the pattern for peptides.
•
Removing other peaks
•
The background noise increases at low m/z ratios, and masses below a specified
limit (sometimes 500, but more typically 700 to 800) are therefore removed.
Also high masses (typically above 3,000 to 4,000) are removed, since the
sensitivity and the accuracy is usually lower at high massed.
Additionally, there are relatively few peptides in this high-mass range
It is also common to filter the masses against known contaminants, like keratin
or from the used protease.
•
•
•
INF380 - Proteomics-6
17