The peptide de novo sequencing from Ms/Ms spectrum

Download Report

Transcript The peptide de novo sequencing from Ms/Ms spectrum

The peptide de novo sequencing
from MS/MS spectrum
Kaizhong Zhang
Department of Computer Science
University of Western Ontario
London, Ontario, Canada
Joint work with
Bin Ma,
Gilles Lajoie, Amanda Doherty-Kirby,
Chengzhi Liang, Ming Li
Introduction
 Tandem mass spectrometry (MS/MS) now plays a
very important role in protein identification due to
its fastness and its high sensitivity.
 The derivation of the peptide sequence from its
MS/MS spectrum is an important task in
proteomics.
 The derivation without the help from a protein
database is called the de novo sequencing which is
especially important in the identification of
unknown protein.
Introduction (2)
 The basic lab experimental steps of this method are
the following:
 1. The proteins are digested with an enzyme to
produce peptides;
 2. The peptides are charged (ionized) and separated
according to their different mass to charge (m/z)
ratios;
 3. Each peptide is fragmented into fragment ions
and the m/z values of the fragment ions are
measured.
Introduction (3)
 Both step 2 and 3 are performed within a
tandem mass spectrometer.
 Since there are many copies of each peptide
being fragmented and the fragmentation can
occur anywhere along the peptide, a
spectrum of the observed m/z values is
obtained.
Mass spectrum
 For each possible fragment ion there could be a
peak at the corresponding m/z value.
 The height of the peak is proportional to the
frequency of the m/z value begin observed by the
mass spectrometer.
 In general proteins consist of 20 different types of
amino acids, of which most have different masses
(except for one pair Leucine and Isoleucine).
Mass spectrum (2)
 Consequently different peptides usually
produce different spectra.
 It is therefore possible, and now a common
practice, to use the spectrum of a peptide to
determine its sequence.
Peptide fragmentation
 A charged peptide may be fragmented into two
pieces in three ways, which may produce a pair of
a- and x-ions, a pair of b- and y-ions, or a pair of
c- and z-ions.
 Theoretically, a fragmentation can occur at any
place in a peptide and a spectrum is expected to
contain all the possible ion peaks.
 In practice, due to uneven strength of the bonds at
different positions, different ions occur with
different frequencies.
Peptide fragmentation (2)
Peptide fragmentation (3)
 The most abundant ions are y-ions, which
often form the complete series in a spectrum.
 The next are a- and b-ions, of which many
are not observed.
 The c-, x-, and z-ions occur much less
frequently.
 In addition, these ions can often form new
ions due to loss of water or loss of ammonia.
The approximate masses of some atoms that appear
in peptides, where C13 is the isotope of C
 Atom
C C13 H O N
 Mass(Dalton) 12 13 1 16 14
Mass of an amino acid
 For any amino acid a, we use ||a|| to denote
the mass of C2H2RNO, i.e., the amino acid
a with loss of a water.
 For P=a1 a2 … ak being a sequence of
amino acids, let ||P|| = 1  j  k ||aj||.
 Therefore the actual mass of peptide P is
18+||P|| because the extra H2O in it.
The approximate masses of the
20 amino acids
 Amino acid
 Mass (Dalton)
A
R
N
D
71.04 156.10 114.04 115.03
 Amino acid
 Mass(Dalton)
C
E
Q
G
103.01 129.04 128.06 57.02
 Amino acid
H
I
L
K
 Mass (Dalton) 137.06 113.08 113.08 128.09
 Amino acid
M
F
P
S
 Mass (Dalton) 131.04 147.07 97.05 87.03
 Amino acid
T
W
Y
V
 Mass (Dalton) 101.05 186.08 163.06 99.07
The hypothetical spectrum of P
 Let A=a1 a2 … an be a sequence of amino
acids, we introduce two notations:
||A||b = 1+||A||
||A||y =19+||A||
The hypothetical spectrum of P (2)
 Let bi be the mass of the b-ion of P with i
amino acids, then
bi = ||a1 a2 …ai||b (1  i < k).
 Let yi be the mass of the y-ion of P with i
amino acids, then
yi =||ak-i+1 …ak ||y (1  i < k).
Clearly, yk-i +bi =20+||P||
The hypothetical spectrum of P (3)
 Around each y-ion peak, it is possible to have
other peaks.
 For each y-ion with mass x, the corresponding xion and z-ion weigh x+26 and x-17.
 An ion may loss a water to generate a peak at
mass x-18.
 An ion with mass x usually has a peak at x+1
corresponding to the isotopic ion which contains a
C13 in it.
The hypothetical spectrum of P (4)
 Therefore, for each y-ion with mass x, there
are possible peaks at the masses in the
following set.
 Y(x)={x-18,x-17,x,x+1,x+26}
 Similarly for each b-ion with mass x, the
possible masses are from the following set.
 B(x)={x-28,x-18,x,x+1,x+17}
The hypothetical spectrum of P (5)
 Therefore, the hypothetical spectrum of the
peptide P has peaks at each mass in the
following set.
 S(P)=  0<i< n B(bi)  Y(yi)
The de novo sequencing problem
 Let P be a peptide and M=||P||+20.
 Given a solution containing peptide P, a
tandem mass spectrometer can measure a
peak list L.
 L is a set of 2-mers {(xi ,hi )| 0 < i < n+1}
where 0 < x1 < … < xn are the masses and
hi is the intensity of the peak at xi .
 The total mass of P=M-2 can also be
measured.
The de novo sequencing problem (2)
 The masses given by the spectrometer are
not accurate.
 The maximum error varies from 0.01
dalton to 0.5 dalton depending on the type
of spectrometer used.
The de novo sequencing problem (3)
 Let  be the error of the spectrometer.
 Let S be a set of masses, we say a peak (x,h)
in L is supported by S if there is a y in S
such that |x-y| <  .
 The subset of peaks in L supported by S is
denoted by LS .
 LS ={(x,h)  L|there is y  S s.t. |x-y|< }
The de novo sequencing problem (4)
 Therefore LS(P) consists of all the peaks in L
that are supported by the masses of the
hypothetical ions of P
 The more peaks with high intensity are in
LS(P) , the more likely L is the mass
spectrum of P.
The de novo sequencing problem (5)
 For any peak list L’, we define
h(L’)=  (x,h)  L’ h
 The de novo sequencing problem is defined
as the follows.
 Given a mass spectrum L, a positive
number M, and an error bound , to
construct a peptide P so that
| ||P||+20-M | <  and h(LS(P) ) is maximized.
Algorithms
 There are two major difficulties of the de
novo sequencing problem.
 First, each fragmentation may produce a
pair of ions.
 This means that both ends of the spectrum
must be consider at the same time.
Algorithms (2)
 Second, the types of the peaks is unknown
and a peak may be matched by zero, one or
two different types of ions.
 When a peak is matched by two ions, the
height of the peak can only be counted once
Algorithms (3)
 The straightforward approach to “grow” the
peptide from one terminal to the other does not
work.
 We use a more sophisticated dynamic
programming algorithm for the de novo
sequencing problem.
 Our algorithm gradually “grow” a prefix and a
suffix of the optimal solution in a carefully
designated pathway until the prefix and the suffix
are sufficiently long to form the optimal solution.
Experiments
 Our model and algorithm account for most
of the ion types that have been observed in
practice.
 Overlap of two different ions are correctly
modeled.
 Tolerant the mass error and handle the
missing ions in the spectrum.
Experiments (2)
 Experimental results demonstrated that our
algorithm performed extremely well.
 The program has been integrated into a
software package, peaks, which is now
online accessible at
http://www.BioinformaticsSolutions.com