Protein Sequencing and Identification by Mass Spectrometry

Download Report

Transcript Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification
by Mass Spectrometry
Motivation
• Proteins are working units of the cells
– The number of found genes is much less than the
number of expressed proteins
– Directly related with cell processes and diseases
DNA
SNP
~30,000 human
genes
mRNA
Protein
Alternative
Post-translational
splicing
Modification
>100,000 RNA
messages
>1,000,000 distinct
protein forms
Breaking Protein into Peptides and
Peptides into Fragment Ions
• Proteases, e.g. trypsin, break protein into
peptides.
• A Tandem Mass Spectrometer further breaks
the peptides down into fragment ions and
measures the mass of each piece.
• Mass Spectrometer accelerates the
fragmented ions; heavier ions accelerate
slower than lighter ones.
• Mass Spectrometer measures
mass/charge ratio of an ion.
Peptide Fragmentation
Collision Induced Dissociation
H+
H...-HN-CH-CO
Ri-1
Prefix Fragment
. . . NH-CH-CO-NH-CH-CO-…OH
Ri
Ri+1
Suffix Fragment
• Peptides tend to fragment along the backbone.
• Fragments can also lose neutral chemical
groups like NH3 and H2O.
Ideal Mass Spectrum
Real Mass Spectrum
Mass Spectra
57 Da =K‘G’
D
D
V
99 Da = ‘V’
L
L
H2O
G
D
K
V
G
mass
0
• The peaks in the mass spectrum:
– Prefix and Suffix Fragments.
– Fragments with neutral losses (-H2O, -NH3)
– Noise and missing peaks.
Protein Identification with
MS/MS
G
V
D
K
Peptide
Identification:
Intensity
MS/MS
L
mass
00
Protein Identification by Tandem
Mass Spectrometry
MS/MS instrument
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
95
687.3
90
85
588.1
80
75
70
65
Relative Abundance
S
e
q
u
e
n
c
e
60
55
851.4
425.0
50
45
949.4
40
326.0
35
Database search
•Sequest
de Novo interpretation
•Sherenga
524.9
30
25
20
589.2
226.9
1048.6
1049.6
397.1
489.1
15
10
629.0
5
0
200
400
600
800
1000
m/z
1200
1400
1600
1800
2000
De novo Peptide Sequencing
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
95
687.3
90
85
588.1
80
75
70
Relative Abundance
65
60
55
851.4
425.0
50
45
949.4
40
326.0
35
524.9
30
25
20
589.2
226.9
1048.6
1049.6
397.1
489.1
15
10
629.0
5
0
200
400
600
800
1000
m/z
1200
1400
Sequence
1600
1800
2000
Building Spectrum Graph
• How to create vertices (from masses)
• How to create edges (from mass
differences)
• How to score paths
• How to find best path
Edges of Spectrum Graph
• Two vertices with mass difference
corresponding to an amino acid A:
– Connect with an edge labeled by A
• Gap edges for di- and tri-peptides
References
• Neil C. Jones and Pavel A. Pevzner, An
Introduction to Bioinformatics Algorithms,
MIT Press, 2004.
• Dancik V, Addona TA, Clauser KR, Vath
JE, and Pevzner PA., “De novo peptide
sequencing via tandem mass
spectrometry”, J Comput Biol. 1999 FallWinter;6(3-4):327-42.