The ABRF Edman Sequencing Research Group 2009 Study
Download
Report
Transcript The ABRF Edman Sequencing Research Group 2009 Study
Protein Sequencing Research Group:
Results of the PSRG 2011 Study
Sensitivity assessment of Edman and Mass Spectrometric
Terminal Sequencing of an undisclosed protein
Current PSRG Members
Jim Walters (Chair)
J. Steve Smith
Wendy Sandoval
Kwasi Mawuenyega
Bosong Xiang
Detlev Suckau*
Henriette Remmer*
Viswanatham Katta*
Peter Hunziker (ad hoc)
Jack Simpson (EB liaison)
* new members added in 2010
Sigma-Aldrich
University of Texas Medical Branch at Galveston
Genentech, Inc.
Washington University School of Medicine
Monsanto, Co.
Bruker Daltonics
University of Michigan
Genentech, Inc
University of Zurich
SAIC/National Cancer Institute at Frederick
PSRG – Review
2009 Study – what techniques are complimentary to Edman – two samples
Edman remains reliable
MS based Top down techniques performed well with great promise and
bottom-techniques successful when prior knowledge of sequence or reliable
database information is available
2010 Study – follow on from 2009 using an antibody
It was necessary for ISD participants to use T3 sequencing to obtain true
terminal information
Edman analyses required deblocking of the heavy chain
The most complete de novo sequencing was obtained by bottom up
participants
Status: Edman sequencing and mass spectrometry based techniques have varied
strengths and weaknesses depending on several experimental factors and both have
a role in biochemical research
2010 Notables
Second year as PSRG and 3rd year for non-Edman participants
Three new members added
With a complimentary role realized, we attempt to push the capabilities of the
varied sequencing techniques, namely assay sensitivity
2011 PSRG Study Timeline
PSRG committee adds
three new members
Oct ‘10
Aug ‘10
Apr ‘10
Study Proposal
sent to EB
Samples sent
to participants
Settled on a designer
protein not in a database
Data
analysis
Feb ‘11
Jan ‘11
explored different
potential study samples
for returning data
2011 Study
announcement
Jun ‘10
Mar ‘10
ABRF 2010
ABRF
Extended deadline 2011
Sep
May ‘10
Discussed ideas for 2011 study.
Agreement upon a study design
PSRG 2011 Study Objective
To obtain terminal sequence information on
varying amounts of a protein sample who’s
sequence was not in a database
2011 Study Design – The Sample Sets
Sample Set
Name
A
B
C
Sample Format
Solution (lyophilized)
Gel slice
Membrane piece
Tube # (Sample amount in pmol)
A1 (5)
B1 (5)
C1 (5)
A2 (15)
B2 (15)
C2 (15)
A3 (45)
B3 (45)
C3 (45)
Participants chose which of three sample sets they wanted to analyze
(designated A, B or C)
Each sample set contained three tubes (designated 1,2 or 3).
Each tube contains the same recombinant protein with increasing amounts
of material
Participants could request any single set (received 3 tubes),
two sets (6 tubes), or all three sets (9 tubes)
The Protein Sample
recombinant protein
expressed in an E. coli system
molecular weight ~50 kDa
amino acid sequence of the protein is not in public domain database
sample was donated in liquid formulation in buffer
purified and AAA quantified
Sample Preparation and Distribution
Expressed protein purified using C-terminal His tag then
by size exclusion chromatography and confirmed by
SDS-PAGE.
protein containing fractions were quantified by AAA
dispensed into 1.5 mL tubes and lyophilized
dried samples were shipped as is, referred to as Set A.
or samples were resuspended and run on a gel (Set B)
or pvdf membrane (Set C) and the gel/membrane slices
corresponding to the ~50 kDa band were sent to
participants.
the tube with lowest sample amount contains ~ 5 pmol
dried, loaded on gel, or blotted on membrane
A - lyophilized
B – in gel samples
C – membrane
Requests of participants
Analyze samples in the designated numerical order or from lowest sample amount to
highest and report on all samples analyzed
Edman sequencing: participants to provide amino acid yield data at every cycle
Alternative (MS based) methods: asked participants to provide the raw data files and
peak lists, and method used for sequence assignment
instructed not to split sample due to the objective of the study and relatively low
sample amounts
potential presence of a co-purified E. coli protein at <20 kDa in Sample Set A is
known, but of no interest to current study.
suggested buffers to use to dissolve Sample Set A (lyophilized samples).
0.1 %TFA
25 mM ammonium bicarbonate
0.1% TFA / 20% acetonitrile
Participants asked to fill out a survey and all survey and raw data was submitted
anonymously
2011 PSRG Study Sample Set Requests
Sample Set
Name
A
B
C
Sample Format
Solution (lyophilized)
Gel slice
Membrane piece
Tube # (Sample amount in pmol)
A1 (5)
B1 (5)
C1 (5)
A2 (15)
B2 (15)
C2 (15)
74 sample set requests
(by 38 different labs)
40
# of requests
35
30
25
20
15
10
5
0
A
B
Sample Set
C
A3 (45)
B3 (45)
C3 (45)
Survey response results
(18 out of 38 Labs filled out a survey)
Survey response results
Techniques used and how many responded
12
# of responses
10
8
6
4
2
0
Top Down
Bottom-up
Technique Performed
Edman
N-Terminal Techniques:
Edman Degradation
Uses of Edman Sequencing
Cleavage site determination for proteases
Sequencing of MHC peptides
Sequencing of synthetic peptide libraries
Full characterization of proteins, especially recombinant proteins, that
are present in large quantities
Stoichiometry, Edman is semi-quantitative
Protein identification for non-model organisms which do not have
extensive DNA sequencing
Domain mapping
Confirmation of N-terminus
As a help for mass spectrometry sequencing to perform manual
subtractions
Product characterization for SOPs for pharma
Can distinguish between the isobaric amino acids Leucine and
Isoleucine
Clonality determination or antibody sequencing for cloning
Adapted from: ESRG Presentation: ABRF 2005
Edman Workflows
PSRG 2011 Sample
Direct sequence
ABI Procise Instruments:
7 - 494 HT’s
2 - 494
2 - 494 cLC
Maximum # of correct calls from N-terminus reported
Sample
Set
Sample Format
A
Sample Amount (pmols)
5
15
45
Solution (lyophilized)
24
32
49
B
Gel slice
N/A
9*
N/A
C
Membrane piece
26
33
33
* no supporting data provided
Summary of Edman Data
average number of residues called
25
20
15
average number of residues called
10
5
0
A1
A2
A3
C1
C2
C3
45
40
35
30
Amount loaded (pmol)
25
20
Average initial yield
(pmol)
15
10
5
0
A1
A2
A3
C1
C2
C3
Sample Sets A and C: N-terminal residues identified
A1
C1
S
A
M
P
L
E
S
A2
C2
A3
C3
G
G
G
G
X
X
G
X
X
X
G
g
X
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
L
L
L
X
R
R
V
V
V
F
F
F
D
D
D
E
E
E
F
F
F
K
K
K
P
L
V
E
X
P
Q
N
L
I
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
A
A
A
X
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
X
R
R
X
X
R
R
R
X
R
R
r
R
R
R
R
X
R
R
R
R
R
R
R
X
R
R
R
R
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
D
D
D
X
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
F
F
F
F
F
F
F
F
F
F
F
f
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
K
K
K
X
K
K
K
X
K
K
K
X
P
L
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
P
X
P
P
P
L
L
L
L
X
V
V
V
V
V
E
E
E
E
E
P
E
P
Q
N
L
I
R
V
F
D
E
F
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
F
K
P
L
V
K
P
K
K
K
K
K
K
K
K
K
K
K
K
K
K
K
K
K
P
P
L
L
V
V
E
E
E
E
P
P
Q
Q
N
N
L
L
I
I
R
R
V
V
F
F
D
D
X
E
F
F
K
K
P
L
V
K
k
e
e
X
q
n
l
i
P
P
P
P
L
L
L
L
V
V
V
V
E
E
E
E
E
E
E
E
P
Q
N
L
I
X
F
P
P
Q
Q
N
N
L
L
I
R
V
F
D
E
F
K
P
N
X
X
P
E
E
X
Q
X
N
D
I
G
P
P
L
L
V
V
E
E
E
E
P
p
Q
Q
N
L
I
R
V
F
D
E
F
K
P
L
V
K
P
E
E
P
Q
N
L
I
R
V
P
P
P
P
P
L
L
L
L
L
V
V
V
V
V
E
E
E
E
E
E
E
E
E
E
P
P
Q
Q
N
N
L
L
I
I
X
R
F
V
F
D
E
F
K
P
L
V
K
P
E
P
P
Q
Q
N
N
L
L
I
I
R
R
V
V
F
F
D
D
E
E
P
N
L
H
P
X
E
X
E
D
Q
N
F
Does increasing amount of sample increase calls?
Confident calls by Edman - Sample Set A
Lab
Participant 004
Participant 024
Participant 058
Participant 020
PSRG002
A1 (5 pmols)
20
10
N/A
N/A
24
A2 (15 pmols)
N/A
10
N/A
10
32
A3 (45 pmols)
N/A
10
N/A
N/A
50
Confident calls by Edman - Sample Set C
Lab
Participant 006
Participant 014
Participant 016
Participant 024
Participant 036
Participant 040
Participant 058
PSRG001
PSRG002
C1 (5 pmols)
1
9
24
10
11
11
13
26
11
C2 (15 pmols)
10
26
31
10
20
N/A
15
33
19
C3 (45 pmols)
17
N/A
N/A
10
20
33
15
30
24
Data trends toward longer reads as function of increased sample amount
Edman degradation sample solubility
Lab
Participant 004
Participant 024
Participant 058
PSRG002
Solvent for Set A
0.1%TFA/30%IPA
0.1%TFA/20% ACN
0.1% TFA
0.1% TFA/50% ACN
Maximum residues
18 in A1
10 in all
0
49 in A3
Sample recovery was best when organic solvent was utilized.
Other solvents have been shown to be OK as well, data not shown.
PSRG 2011 Edman Conclusions & Observations
Edman sequencing allows for direct determination of the protein’s N-terminal sequence.
Reliable N-terminal Edman data was obtained from the lowest concentration (5 pmol)
samples for both Sample sets A and C.
Generally, slightly longer read lengths were noticed as sample concentration increased.
Sequencing preview and lag became more evident as sample concentration increased.
Contaminating proteins in the sample did not contribute negatively to any Edman result.
Sample A: concentration of contaminating protein was too low to be detected.
Sample C: sample was “isolated” by running the gel prior to blotting.
No C-terminal data was produced with Edman.
One lab returned N-terminal data from Set B (gel slice). Did not provide supporting data.
N-Terminal Techniques Overview:
Bottom-Up MS Techniques
Enzymatic Digestion
Uses of Bottom-up Sequencing
Protein identification via sequencing of unique (internal) peptides and
subsequent database search
Biomarker discovery
A high degree of sequence coverage can be achieved by utilizing
different proteases for digestion and combining results
Identification and localization of Post-translational Modifications
Identification and localizations of introduced protein modifications,
e.g. cross linkers
Estimation of relative quantities of like proteins between samples via
spectral counting
Confirmation of the complete protein sequence
De-novo elucidation of complete protein sequences
Elucidation of the N-and C-terminus with limitations (multiple
enzymes or labeling strategies)
PSRG Presentation: ABRF 2011
Bottom-Up MS Experimental – LC-MS Systems
All Labs used LC separation prior to peptide analysis.
Eksigent NanoLC-2D
AB Sciex 4800
Thermo LTQ XL - 2
Thermo LTQ-Orbitrap Velos - 2
Bruker Ultraflex TOF/TOF
Bottom up Sample Preparation
909.34
518.27
631.36
274.30
PSRG 2011 Sample
389.23
840.14
525.30
794.34
437.01
507.89
939.12
548.38
725.28
205.06
891.38
596.10
578.15
175.25
320.14
256.01
316.13
215.14
200
250
300
482.91
402.96 440.89
386.13
455.25
350
400
450
500
679.19
707.15728.67
822.11
872.45
661.30
742.47
550
600
650
m/z
700
750
800
850
900
967.971001.43
1041.61
1085.46
1000 1050 1100
950
100 mM AmBiC
10 mM AmBiC
Digestion Enzymes
1 lab did Trypsin alone
Multiple enzymes
Trypsin, Glu-C, Lys-C
Trypsin, Glu-C
2 Trypsin, Chymotrypsin
Lys-C, Lys-N
2 MASCOT
3 manual
Data Explorer (AB)
Manual DeNovo Mascot
PEAKS 5.2
in house analysis software
Bottom up results
11
P
450
D
12
L
451
D
13
V
452
K
14
E
453
A
15
E
454
L
16
P
455
V
17
Q
456
A
18
N
457
L
19
L
458
H
20
I
459
V
21
R
460
H
22
V
461
H
23
F
462
H
24
D
463
H
25
E
464
H
26 27
F K
465
H
C-terminal Sequence
1
G
440
E
2
A
441
T
3
L
442
N
4
R
443
L
5
V
444
Y
6
F
445
F
7
D
446
Q
8
E
447
G
9
F
448
D
10
K
449
D
Sample A1
Participant #040
Terminal sequence (de novo)
N-terminus
C-terminus
1
G
X
2
A
X
3
L
X
4
R
X
5
V
X
6
F
X
7
D
X
8
E
X
9
F
X
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
K P L V E E note 1
X X X X X X
Sample A2
Participant #048
Terminal sequence (de novo)
N-terminus
C-terminus
N-terminus
C-terminus
N-terminus
C-terminus
Terminal sequence (de novo)
N-terminus
C-terminus
N-terminus
C-terminus
1
X
X
X
X
G
2
X
X
X
X
A
3
X
X
X
X
L
4
X
X
X
X
R
5
X
X
X
X
V
6
X
X
X
X
F
7
X
X
X
X
D
8
X
X
X
X
E
9
X
X
X
X
F
2 3
F M
T N
X X
X X
4
D
L
X
X
5
D
Y
X
X
6
F
F
X
X
7
A
Q
X
X
8
A
G
X
X
9
F
D
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
X X X X X X
X X X X X X
X X X X X X
X X X X X X
K X X X X X note 2
A L V A L H V H H H H H H
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
V E K V A V K F V V P R A L E L L F
D D D K A L V A L H V H H H H H H
N-terminal Sequence
Participant #034
PSRG003
Samples B1, B2, B3
Participant # 048*
Participant #026
1
Ac-M
E
X
X
* Participant #048 sequenced more than 200 amino acids by manual spectra interpretation .
Note 1: Participant 040 also sequenced by Edman degradation and had the opportunity to search MS/MS data for the correct N-terminal peptide.
Note 2: Participant PSRG003 used Lys-C and Lys-N in combination according to a published procedure for N-terminal sequencing (see reference section).
no call is marked with "X " an incorrect call is denoted with letter not color coded
correct N-terminal call Correct C-terminal call
Bottom up Strategies – Lys-C/Lys-N digest:
A Novel Method for Analyzing Protein
Terminals.
Kishimoto et. al., ASMS2010
TP08
Straightforward ladder sequencing of
peptides using a lys-N
metalloendopeptidase
Taouatas et. al., NATURE
METHODS. VOL.5 NO.5.,
p405-407,2008
Lys-N Vendors: Associates of Cape
Cod, East Falmouth, MA, Seikagaku
KK, Japan
Proteome-wide analysis of protein
carboxy termini: C terminomics.
NATURE METHODS. VOL.7 NO.7.
p508-511, 2010
PSRG03
Comparison of N-terminal protein sequence of
and Lys-C and Lys-N
Lys-C
Lys-N generates the same Nterminal peptide as Lys-C,
except there is no lysine in the
sequence for the Lys-N
peptide.
Lys-N
PSRG03
Bottom up Strategies – Lys-C/Lys-N digest
Lys-N generates peptides
with same m/z as Lys-C.
Exception 1.
no lysine in N-terminal
peptide using Lys-N
Exception 2.
No lysine in C-terminal
Peptide using Lys-C
PSRG03
C-terminal MS1 spectra from Lys-C digest
PSRG03
C-terminal peptide spectra and de novo sequencing
PSRG03
C-terminal peptide spectra and de novo sequencing
PSRG03
Combining Edman and enzymatic digestion using
Trypsin and Glu C to identify N-term (Part #40)
Sequence Calls using Edman on Sample C3:
GALRVFDEFKPLVEEPQNLIRVFDEFKPLVKPE
MS/MS Data using 4700
Participant 009
Bottom-Up Conclusions
Bottom up analysis involves enzymatic or chemical cleavage of the protein
followed by MS/MS analysis of the peptide mixture.
Small (6-25aa) fragments are generated that usually do not cover the
complete protein sequence and may not include the terminal fragments.
Successful bottom up analyses utilized multiple enzymes and relied heavily
on bioinformatics or manual data interpretation
Successful calling the N-terminus and C-terminus using lyophilized sample,
15 pmols
Successful calling C-terminus using in-gel sample, 15 pmol
MALDI and ESI show success as well as Orbitraps and TOF/TOF
Difficulty in assigning true N-terminal peptides however can used in
complimentary fashion with Edman or dedicated chemistry to elucidate
terminal peptides
N-Terminal Techniques Overview:
Top-Down MS
In-Source Decay Fragmentation
In-Source Decay (MALDI-ISD)
MALDI-MS and MS/MS
• Analyte + matrix on metal target plate
• Spot is excited with laser, ionization
occurs
• Ions are resolved by mass in TOF
analyzer
• Second TOF allows for MS/MS by
precursor ion fragmentation
MALDI-ISD
• “pseudo-MS/MS” technique
• Decomposition of protein in the MALDI
plume at <nsec timescale
• Ion formation due to radical transfer from
matrix to analyte (Takayama, 2001)
• Sequence determination without digestion
(“Top Down”) even from large proteins
• Second TOF allows for T³-sequencing
ISD and T3 Sequencing
Suckau & Resemann, Anal Chem, Vol. 75, 21 (2003)
Uses of MALDI-Top-Down Sequencing (ISD)
Confirmation of N-terminus, even if modified (pyroGlu, Methyl,
Acetyl,…)
Confirmation of C terminus (terminal read length up to 80
residues)
Protein identification from low complexity mixtures
Biopharma: protein termini QC, side products elucidation (terminal
truncations or elongations)
Fusion site confirmation in recombinant proteins
Proteolytic degradation product assignment
PTM elucidation; modification sites and types, PEGylation sites
Enzyme specificity testing on protein fragments (e.g. Kinase
phosphorylation sites determination)
Full characterization of proteins that are present in large quantities
Full de novo sequencing capability up to ~15 kDa
Domain mapping
Identification of ragged termini
PSRG Presentation: ABRF 2011
ISD Experimental attempts
Separation
Sample
ISD Instrumentation
0.1% TFA
20% ACN/0.1% TFA
C4 ziptip
Matrix
DAN
1,5-diaminonapthalene
Bruker Ultraflex
Clean-Up
DHB
2,5-dihydrobenzioc acid
Chloroform-methanol
precipitation
Recon in 0.1%TFA
AB Sciex 4800
Study Preparation: Cl-MeOH prec. ISDmanual data analysis
MS/MS on 1619
759. 4
100
0
899.0
1117.2
1901.3
(G)
4700 M S/M S Precursor 1619.1 Spec #1 M C=>BC=>NF0.7[BP = 759.3, 134]
10
1973.3
K/Q
1863.2
1490.1
[PE]
1845.3
E
V
721.4
1767.1
I/L
1156.6
1091.7
1110.7
50
1037.7 1041.6
1057.6
60
[PK]
907.6
927.5
936.5
954.6
978.6
995.6
1010.7
% Intensity
70
1619.1
F
80
1254.8
1277.9
90
1562.1
905.6
100
1052.7
4700 Reflector Spec #1 MC=>BC=>SM5[BP = 1052.7, 721]
133. 7
1335.4
90
b7 1553.6
1771.8
1990.0
Mass (m/z)
80
L R V F D E FSpec
K P L V #1
E E MC=>BC=>SM5[BP = 1052.7, 721]
4700G AReflector
70
1029. 6
1608. 4 1610. 6
1573. 8
1602. 6
1541. 9
1463. 8
1498. 0
1471. 8
1379. 5
y11(?)
1433. 7
1340. 0
1249. 6
1179. 4
1223. 6
1130. 5
1052. 6
y10(?)
1631. 9
1163. 6
689. 4
1008. 9
888. 4
906. 6
951. 1
846. 5
761. 3
737. 3
661. 4
584. 4
627. 3
b5
497. 2
452. 2
381. 2
349. 2
b4
398. 2
329. 1
253. 2
298. 2
211. 1
10. 6
0
9. 0
1369. 8
1710. 0
3225
3850
Mass (m/z)
4475
4995.3
4593.8
4482.1
4335.0
4125.9
4196.9
4253.0
3983.8
3842.6
3899.7
3742.5
3109.1
Mass (m/ z)
2862.1
2715.9
2751.0
2408.72412.7
2469.7
2524.8
2568.8
2636.9
2600
20
10
b10
b8
872. 4
2313.6
2283.6
2200.5
2168.5
217.7
30
N I/L I/L V R F
R V
10
5
0
1975
(N-terminal seq obtained from Edman analysis)
Red seq from ISD analysis
50
40
2087.4
75
70
65
60
55
50
45
40
35
30
2011.3
2057.4
% Intensity
% I n ten si ty
60
5100
Summary of Top Down Analysis
None of the participants or PSRG succeeded in obtaining terminal
sequences using ISD from study samples – other Top-Down methods
were not attempted (ECD, ETD, …)
All participants did the routine things, but typical sample issues likely
hindered analysis
Potential Reasons
Solubility - only a fraction of sample is recovered
Sample amount over estimated by traditional quantitative
methods – less provided than presumed
Protein contamination has significant effect in Top-Down:
problem and potential!
Limited sample availability: no investigation of problem, no
optimization possible (intact MW, purity, solubility..)
Protein LC-separation of 100 pmol sample
Pepswift PS-DVB (monolithic column)
100 pmol Casein
Result:
•Several proteins present,
•Much less protein available to the
analysis than anticipated by original
protein quantification
• ~ 5-10 pmol instead of 100 pmol
100 pmol study sample
Monolithic LC separation of Lyopholized sample
Protein of
interest
Theoretical amount
of 100 pmol
Reveals the
presence of several
proteins
ISD of Fraction 75 contains study sample:
Matches sequence, but NOT de novo
ISD of Fraction 36 +Mascot:
30S ribosomal protein S15 E.coli
Abs. Int. * 1000
R DANDTGST E VQVAL L TA
c
Y R DK R K L Y DL L K
y
Y DL L K
RAVD
z+2
c 46
c 22
180
c 24
c 20
160
c 29
y 41
y 22
c 18
140
LL RM
N H L QGH FAE H K K D H HS R
y 21
c 30
c 28
c 17
c 27
120
c 16
100
40
c 26
c 19
c 15
c 23
80
60
c 21
y 13
y 12
c 12
20 y 11
y 18
y 16
y 15
2000
y 19
3000
c 44
c 38
c 37
y 24
c 45
c 40
c 32
c 31
c 57
c 50
c 36
c 25
y 20
c 51
c 39
y 25
y 23
y 17
c 43
c 33
c 49
c 48
c 42
c 47
c 55
c 52
c 56
c 54
c 41
c 35
4000
c 58
5000
6000
7000
m /z
ISD of Fraction 32 +Mascot:
YOBA_ECOLI Fragment 27-84
Abs. Int. * 1000
a
L
c
TA QA I T L N F S EGV E TG F SGAK I T
y
VSV V H
V
z+2
VS
70
c 17 c 21 c 24
60
50
40 z+2 15
y 15
c 15
20 a 15
10
c 22
y 16
c 16
30
a 24
z+2 16
c 20
y 19
y 18
y 17
y 14
c 25
c 19
c 51
c 35
c 34
c 33
c 23
c 30
y 22
c 29
y 21
c 26
c 55
c 37
y 32
c 31
z+2 35
a 63
c 53
c 45
c 39
c 38
c 54
c 46
c 40
z+2 32
c 32
c 47
c 42
a 34
c 28
c 27
z+2 14
AK R N E Q D Q K Q L
c 36
a 23
a 20
KNEN I
c 59
c 52
c 44
c 49
c 43
c 58
c 57
a 52
c 56
c 62
c 61
c 60
z+2 57
0
2000
3000
4000
5000
6000
m /z
ISD of Fraction 47 +Mascot:
HFQ_SERP5 N-term only (homolog to E.coli??)
Abs. Int. * 1000
a
c
E R VP
R E R
VS I Y L V N
K L QG
ES F DQ F V I L L
a 49
180
160
140
120
100
y 35
c 17
c 20
c 16
a 16
c 15
a 18
c 32
z+2 29
c 22
c 21
c 29
c 25
c 24
c 39
c 30
c 26
a 20
a 19
y 39
c 33
c 23
a 17
60
20
z+2 36
c 18
80
40
y 23
y 19
c 27
c 37
c 36
c 31
c 35
c 40
c 38
c 45
c 42
c 41
c 43
c 44
c 51
0
2000
3000
4000
5000
m /z
Summary on MALDI-ISD study follow-up work
Expected ~50 kDa protein present plus contamination in the 16 kDa range
De novo sequencing was not possible due to sample amount restrictions
Protein LC-MALDI analysis showed only ~ 5-10 % of expected protein is
available after separation
Multiple labs observed poor recovery from reverse phase columns
Protein LC-MALDI-ISD analysis theoretically starting with 100 pmols of
sample
49 N-term and 56 C-term matches – not de novo – as sample amount
was much lower than thought
IDs of several bacterial Heat Shock Proteins after ISD-Mascot
analysis
Comments…’but not enough time’
I had planned to isolate/capture N-terminus but did not due to lack
of time
Be more clear in instructions and allow much more time between
sample arrival and data submission so that if extensive
preparation is necessary, there will be time enough to perform it
without affecting standard samples sequenced in the lab
Very nice setup; but I needed more time to take full advantage. As
my ISD ambitions failed (!!) I turned to proteolytic digestions and
PSD: Performed a lot of bottom up analyses, mainly after
sulfonation…
Sorry, I did not have time to properly analyze the data and to do
the experiment as if it would have to be done
Comments
(continued)
did not spend time to purify or evaluate low level sequences by MS... Instructions
were somewhat confusing. Not clear if the sample needed purification before Edman
Thanks! …even though we have de novo software we do NOT have a good strategy
for obtaining sequence and determining N and C termini…Also, we identified quite a
few peptides that likely weren't N-terminal or C-terminal…using other enzymes and
finding overlapping sequences would have been a better strategy
I wouldn't mind trying another of these after I see how to approach it
I will be very interested in seeing the results of the mass spec analysis of these
samples to which I do not have access…would like to see the comparison
It was very tough one to get the whole sequence even though it was not the goal
Sample has a ragged N-terminal sequence. ..Samples A1 to A3 were solublized in
01.% TFA and blotted but no sequence was observed…suggesting that no protein
was in the tube or that it was insoluble in 0.1% TFA.
Challenging but good.
Final conclusions
Two techniques were successfully employed in this study to obtain N-terminal
sequence of an undisclosed protein not present in public databases.
Edman Degradation – lowest sample amounts of Samples A and C
Enzymatic Digestions – 15 pmols sample amounts of Sample A and B
For Edman, slightly longer read lengths were noticed as sample concentration
increased, however, sequencing preview and lag became more evident.
De novo Bottom-up was not successful unless a priori knowledge of sequence was
obtained (by Edman, database…etc). There are strategies which can be successful
however the current strategies have limitations.
For Top Down, not successful in obtaining terminal sequences using ISD from study
samples – other Top-Down methods were not attempted.
Likely reasons: poor recovery due to solubility, hindering impurities,
Ionization, etc.
Top down was able to obtain sequence in 100 pmol sample using protein LC and
MALDI-ISD strategy as long as theoretical sequence was utilized.
Time is of the essence – for committee to appropriately design and develop study
and for participants to be able to properly analyze samples.
Acknowledgements
Robert English - University of Texas, Medical Branch at Galveston
Accumulation & annonimization of data
Shantanu Roychowdhury - Sigma-Aldrich
Expressed and purified protein
Anja Resemann - Bruker Daltonics
LC MALDI ISD and Top Down work
Jack Simpson and the rest of ABRF Executive Board
For support and scrutiny of study proposal
Participating labs!!!!!!!