Transcript MH n

How to identify peptides
Gustavo de Souza
IMM, OUS
October 2013
Peptide or Proteins?
Bottom-up Proteomics
2DE-based approach
Peptide Mass Fingerprinting
MALDI (Matrix Assisted Laser
Desorption Ionization)
Intensity
Peptide Mass Fingerprinting
m/z
MS/MS
Ion
Mass
Mass
Mass
Source Analyzer Analyzer Analyzer
Collision cell
Detector
MS/MS
899.013
899.013
899.013
Fragmentation
Nomenclature for peptide
sequence-ions:
Collision-Induced Dissociation (CID):
MHnn+* + N2 --> b + y
Electron Capture Dissociation (ECD):
MHnn++ e- --> MHn(n-1)+· --> c + z·
Fragmentation
y
7
O
y
6
y
5
R
2
O
y
4
R
4
O
H
N
HN
2
b
1
b
2
R
3
O
b
4
R
8
H
N
OH
N
H
O
b
3
y
1
R
6
N
H
O
y
2
H
N
N
H
R
1
y
3
R
5
N
H
O
b
5
R
7
b
6
Roepstorff-Fohlmann-Biemann-Nomenclature
O
b
7
Fragmentation
12 aa
…
…
b ions
y ions
MS/MS of a peptide
LG_y2_13 #11793 RT: 84.81 AV: 1 NL: 3.57E5
T: ITMS + c ESI d w Full ms2 [email protected] [ 190.00-1485.00]
100
y8
P y++13
95
VPTVDVSVVDLTVK
90
85
y10
80
75
70
65
60
55
50
45
y6
y9
40
b5
35
y12
y11
30
y4
25
y5
y7
20
b3
15
b6
y3
b4
10
5
0
200
b7
400
500
600
700
P y13
b9
y2
300
b10
b8
800
m/z
900
b11
1000
1100
b12
1200
b13
1300
1400
How to Identify MS/MS
Stenn and Mann,
2004.
Peptide Sequence Tags
Autocorrelation
Probability based match
Submitting to Search
How identification happen?
Your data
Step 1: which theoretical peptides has the same
mass of the observed ion?
Step 2: From those, which one have the most similar
fragmentation pattern?
Protein database (fasta)
x
x
x
High mass accuracy – what is it good for?
All theoretical tryptic peptide masses from human IPI database
Example Tryptic HSP-70 peptide:
ELEEIVQPIISK, mass 1396.7813 Da
Instrument
LTQ
QSTAR
QSTAR
LTQ-FT
LTQ-FT
LTQ-FT
Mass Accuracy
500
20 ppm
10 ppm
2 ppm
1 ppm
0.5 ppm
Calibration
Ext.
Ext.
Int.
Ext.
Ext-SIM
Int.
52
33
11
9
3
# of tryptic
peptides for m/z 344
1396.7813
Defining the “Search Space”
The “Search Space”
1/2
1
4/5
5/6
5
6
1/2/3
2/3
3/4
3
2
4
2 mcl
1 mcl
0 mcl
2/3/4
3/4/5
4/5/6
1/2
1
2/3
3
2
4
5
4/5
3/4
5/6
6
1
3
2
4
5
6
Importance of Search Space Size
Search tool does not identify a peptide. It only
reports the statiscally most suitable theoretical
sequence related with the experimental data.
If you increase the size of the database too much, or
the size of the search space, false-positive rates also
increase.
Defining FDRs
Steen and Mann, 2004
MOWSE
Chance that two peptides with different sequences but
approximate Mr and sharing MS/MS similarities.
More variables inserted during search  Higher chance to
get random events  Higher MOWSE score threshold
Parameters that can modify the MOWSE calculation:
-Database size;
-MMD (measured mass deviation);
-Number of PTMs choosen;
-Data quality.
Example of MMD issue
Mycoplasma sp. sample (Munich 2006):
-Database had ~ 700 entries;
-Data accuracy had 0.7ppm average;
-MMD used during search: 3 ppm.
Probability Based Mowse Score
Ions score is -10*Log(P), where P is the probability that the observed match is a random event.
Individual ions scores > 7 indicate identity or extensive homology (p<0.05).
Protein scores are derived from ions scores as a non-probabilistic basis for ranking protein hits.
Strategies to Visualize FDRs
Peng et al (2003). Evaluation of multidimensional
chromatography coupled with tandem mass spectrometry
(LC/LC-MS/MS) for large-scale protein analysis: the yeast
proteome. J Prot Res 2, 43-50.
Reversed database sequence
False positive identification using reversed database
HSP-70 tryptic peptide
(forward)
(reverse)
K ELEEIVQPIISK
K SIIPQVIEELEK
Peptide
1396.7813 Da
Mr
1396.7813 Da
Mascot checks
both peptides
Theoretical y series
Theoretical y series
y1
147.1
147.1
y2
234.1
276.2
y3
347.2
389.2
....
....
....
y11
1267.7
1309.7
Expected ions from reversed
hit should not correlate
with oberved ions on experiment
Typical Result
All peptides Mascot
160
140
Mascot Score
120
100
80
60
40
20
0
5
7
9
11
13
15
Seq lenght
17
19
21
23
25
How to Validate the Data
Are there any Reversed hit protein with 2 peptides above
MOWSE score?
-No: All proteins identified with 2 peptides score higher
than p<0.05 are good
-Yes: Repeat mascot search with more stringent
parameters.
What about 1-hit wonders? (Proteins identified with only 1
peptide)
How to Validate the Data
All peptides Mascot
160
140
Mascot Score
120
100
80
60
40
20
0
5
7
9
11
13
15
17
19
21
Seq lenght
Basically, the idea is to ”play around” with the statistics to
make your result more reliable.
23
25
Take home message
1. Data quality (mass accuracy) and a well-defined
search space are key for reliable peptide identification
2. Reliable identification is an interplay between asking
enough without asking too much (careful when trying
to get “as many IDs as I can”!)
PTMs
Gustavo de Souza
IMM, OUS
October 2013
PTMs in biology
PTMs in biology
Complexity of Protein Samples in Eukaryotes
Modifications are specific
to a group of amino acids
What difference to expect at MS level?
Larsen MR et al, 2006.
Defining the “Search Space”
PTM abundance in a cell
Number of Peptides
Total peptides in a sample
Modified peptides
Abundance level
Differences from 10e2 to 10e4
PTM abundance in a cell
Stable vs. Labile PTMs
Larsen MR et al, 2006.
Neutral loss
Boersema PJ et al, 2009.
Identifying Labile PTMs
Larsen MR et al, 2006.
HCD fragmentation
Larsen MR et al, 2006.
Status of PTM coverage
Lemeer and Heck, 2009.
Status of PTM coverage
Derouiche A et al, 2012.
Take home message
- Depending on PTM, identification can be very easy
or very hard
- Dependent on stability under fragmentation and
abundance in the sample
- ID improvement was mostly defined by instrumentation
improvements (sensitivity etc)