Transcript Document

Protein & Peptide Analysis
Linda Breci
Chemistry Mass Spectrometry Facility
University of Arizona
MS Summer Workshop
Using mass spectrometry for the measurement
and/or identification of proteins
Overview
• Measuring whole proteins
–
–
–
–
Information about proteins is available on the internet
Limits due to instrument resolution, protein mass, matrix
method: MALDI/TOF
method: ESI + various analyzers
• Measuring peptides from proteins by MS
– peptide mass mapping
• Gel separation steps to prepare for protein
identification by MS/MS
• Identifying proteins from peptides by MS/MS
Proteins versus peptides
Enzyme
Protein
Peptides
Analysis of whole proteins
Good news & bad news
• MALDI-TOF = measure with 1 or 2 protons
• ESI-Ion Trap = measure with many protons (high charge
state)
• Result = mass accuracy not good enough to identify
protein (but still useful!)
– Mass accuracy decreases as size increases
Same protein, 2 ionization methods
MALDI/TOF – whole protein detected
16000000
+
[M+H]
14318.68
14000000
12000000
Intensity
10000000
2+
[M+2H]
7157.18
8000000
6000000
4000000
+
[2M+H]
14318.68
2000000
0
10000
20000
m/z
30000
40000
ESI: Protein MW can be calculated from a
protein’s charge distribution
+11
1301.53
100
14306.0
100
75
Intensity
Relative Intensity
75
+10
1431.47
+12
1193.20
Calculated
Mass Spectrum
50
25
50
0
5000
+9
1590.33
10000
15000
m/z
25
+8
1789.00
+13
1101.40
0
1000
1100
1200
1300
1400
1500
m/z
1600
1700
1800
1900
2000
Peak broadening in high mass measurement
We measure ISOTOPES (not averages)
Example: Carbon is 12.000 (not 12.0107)
For every 12C there is 1.1% 13C
10 carbons
100 carbons
120
120
100
100
100
100
80
80
60
60
40
40
20
20
92.5
53.5
18.8
10.8
5
0.5
0
1
0.2
6
7
0
1
2
3
4
5
6
7
1
2
3
4
5
Peak broadening in high mass measurement
Theoretical isotope distribution for a small protein
9th Isotope
14313.906
1st Isotope
14304.885
Isotope #
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
m/z
% Maximum
14304.885
0.2
14305.888
1.2
14306.891
4.6
14307.893
12.8
14308.896
26.9
14309.898
46.3
14310.900
67.6
14311.902
86.3
14312.904
97.7
14313.906
100.0
14314.908
93.5
14315.910
80.4
14316.912
64.2
14317.914
47.8
14318.916
33.4
14319.918
21.8
14320.920
13.2
14321.922
7.5
14322.924
3.9
14323.925
1.8
14324.927
0.7
14325.929
0.2
Peak broadening in high mass measurement
Resolution = Mass Accuracy
Only multiply charged proteins observed
(more peaks/mass unit)
INSTRUMENT
LCQ (Ion Trap)
MASS RANGE
Resolution
Accuracy (Error)
m/z
(at m/z 1,000)
(at m/z 1,000)
to 2,000
2,000 (full scan)
0.03% (300 ppm)
10,000 (zoom scan)
MALDI/TOF
to 400,000
15,000 (reflectron)
0.006% (60 ppm) Ext. Cal.
0.003% (30ppm) Int.Cal.
FTICR
to 4,000
500,000
0.0001% (1ppm)
No reflectron for high masses = reduced resolution
(Theoretica lMW  MeasuredMW )
ppm 
10 6
Theoretica lMW
Mass changes are difficult to identify in high mass measurements
Examples of post translational modifications
abbreviation monoisotopic average
Acetylation
Amidation
Beta-methylthiolation
Biotin
Carbamylation
Citrullination
C-Mannosylation
Deamidation
N-acyl diglyceride cysteine (tripalmitate)
Dimethylation
FAD
Farnesylation
Formylation
Geranyl-geranyl
Gamma-carboxyglutamic acid
O-GlcNAc
Glucosylation (Glycation)
Hydroxylation
Lipoyl
Methylation
Myristoylation
Palmitoylation
Phosphorylation
Pyridoxal phosphate
Phosphopantetheine
Pyrrolidone carboxylic acid
Sulfation
Trimethylation
ACET
AMID
BMTH
BIOT
CAM
CITR
CMAN
DEAM
DIAC
DIMETH
FAD
FARN
FORM
GERA
GGLU
GLCN
GLUC
HYDR
LIPY
METH
MYRI
PALM
PHOS
PLP
PPAN
PYRR
SULF
TRIMETH
42.0106
-0.9840
45.9877
226.0776
43.0058
0.9840
162.0528
0.9840
788.7258
28.0314
783.1415
204.1878
27.9949
272.2504
43.9898
203.0794
162.0528
15.9949
188.0330
14.0157
210.1984
238.2297
79.9663
229.0140
339.0780
-17.0266
79.9568
42.0471
42.0373
-0.9847
46.0869
226.2934
43.0250
0.9848
162.1424
0.9847
789.3202
28.0538
783.5420
204.3556
28.0104
272.4741
44.0098
203.1950
162.1424
15.9994
188.3027
14.0269
210.3598
238.4136
79.9799
229.1290
339.3234
-17.0306
80.0642
42.0807
Computer Exercises -- Goals
Open Webpage:
http://www.chem.arizona.edu/facilities/msf/BiochemLab/exercises.html
• Exercise #2, Whole protein analysis
– Explore Expasy information available for a protein
– Find the theoretical MW of a protein
– Find the amino acid sequence of a protein in FASTA format
for use in another exercise
– Explore the X-ray crystal structure of a protein
Proteins versus peptides
Enzyme
Protein
Peptides
Identification of proteins from peptide analysis
Separate by 2-D (or 1-D) Gel
Remove protein from gel after cutting into
peptides with an enzyme (trypsin)
We can identify hundreds of proteins in one
experiment
Extracting and Separating proteins
• Extracting proteins from biological organisms
– Results in complex mixture of proteins
– May require detergents, etc. that complicate Mass Spec analysis
– Remove contaminants (filtration, dialysis, SPE, etc.)
• Separating proteins
– 1D SDS-PAGE
• Cross linking controls MW separated
• Low resolution technique, spot can contain 10's to 100's of
proteins
– 2D SDS-PAGE
• Best method for complex protein mixtures (IEF + SDS-PAGE)
– Preparative isolectric focusing (IEF)
– Reverse phase HPLC
– Size exclusion chromotography
– Ion exchange chromatography
– Affinity chromatography
2-D Electrophoresis
• 1st Dimension:
Isoelectric Focusing (IEF)
– Requires maximal resolution of a
target group of proteins
– Uses Immobiline DryStrip gels
(various lengths and pH
gradients)
– IPGphor programmed to hydrate
and separate proteins by pI (i.e.
overnight)
•
2nd Dimension:
Gel Separation
– Apply the Immobiline DryStrip to
the top of a gel
– Separation by molecular weight is
rapid (6-10 hours)
#2) MW
#1) pI
2-D Electrophoresis
• Standard Method:
– Separate proteins on 2-dimensional gels
– Spots (and changes) can be observed (manual or with
computer aid).
– Method is reproducible (multiple runs required)
– Cut out and identify spots of interest
• Gel Electrophoresis (DIGE)
– Two or more samples for comparative analysis are labelled
with different fluorescent dyes, mixed together, run on the
same 2D gel, and interrogated with a multi-wavelength
fluorescent scanner
– Allows quantitation of subtle changes in protein expression
levels between samples, without inter-gel variability - Very
good for quantitation of subtle protein expression changes
– Following example: analysis of a Bordetella broncheoseptica
enzyme knockout cell line, compared to wild type.
small
Molecular Weight
large
pH 3
10/29/03 Gel 2: Multiplexed gel image
pH 7
10/29/03 Gel 2: Side-by-side Cropped Grayscale Images of WT (Cy3)
and ∆dnt (Cy5)
pH 3
pH 7
WT (Cy3)
pH 3
pH 7
∆dnt (Cy5)
WT (Cy3)
nanoLC-MS/MS
identification of two
differentially expressed
protein spots
BB3856 – AZURIN
L.AAECSVDIAGTDQM#QFDK.K
A.AEC*SVDIAGTDQM#QFDK.K
K.QFTVNLK.H
K.DGIAAGLDNQYLK.A
BB3856 – AZURIN
K.TADMQAVEK.D
K.VLGGGESDSVTFDVAK.L
K.DGIAAGLDNQYLK.A
∆dnt (Cy5)
In-Gel enzymatic digestion
(trypsin most common)
Proteases and Cleavage Specificities
Trypsin
after K, R
Chymotrypsin
after W, Y, F, before P
Glu C (V8 protease) after E, D, before P
Lys C
after K, before P
Asp N
after D
Computer Exercises -- Goals
Open Webpage:
http://www.chem.arizona.edu/facilities/msf/BiochemLab/exercises.html
• Exercise #3, 2-D Gel Electrophoresis
– Find a gel image online containing a protein spot of interest
– Explore the gel images of various organisms
Analysis of peptides from proteins
• MALDI-TOF = measure mass of peptides
– peptide mass mapping
• ESI-Ion Trap = measure mass/charge of peptide
– PLUS can select and fragment (MS/MS) for more information
• Result = possible to identify a protein, or identify SNP’s
or modifications made to a protein
Peptide Mass Mapping using MALDI-TOF
MS of a peptide mixture by MALDI/TOF
x 4.0
90
Ref
Ref
0
500
m/z
D:\011003_500fmol\Bsaintcal\2Ref\pdata\1\1r (11:26 10/04/01)
2500
Data Analysis for Peptide Mass Mapping
?
MS
protein
peptides
identify
rank
MS Peptide MW
Found in Selected
Databases
NDALYFPT...
SWDLTAL...
PTDLDVSY...
• Important data
– multiple peaks
– mass accuracy
– confirming information
(pI, approx. mass,
organism, etc.)
Computer Exercises -- Goals
Open Webpage:
http://www.chem.arizona.edu/facilities/msf/BiochemLab/exercises.html
• Exercise #4, Peptide Mass Mapping
– Identify a protein from a peptide mass list
– Confirm this identity by producing a theoretical mass list
– Optional (for the speedy ones) identify more unknowns from
mass lists
Unknown proteins
• 66 = Bovine Serum Albumin
•
116 = beta-galactosidase from e.coli
•
55 = glutamic dehydrogenase from bovine liver
•
36 = glyceraldehyde-3-phosphate dehydrogenase from
rabbit muscle
LC/LC-MS/MS for Complex Mixtures
SCX = Strong cation exchange
RP = Reverse Phase (C-18)
Alternate an increasing salt gradient (move some peptides onto RP)
Follow by RP gradient (separate peptides, send to mass spec)
MS/MS
SCX
peptides
from many
proteins
RP
Results in thousands of mass spectra
A computational challenge!
MS/MS Method Using ESI
Ion Current
over 60 min
MS/MS
MS
Peptide precursor ions observed by MS
calculation of MH+
571.2 m/z measured
x2
1,142.4 [M+2H]
- 1.0
1,141.4 [M+H]
[M+ 2H]2+
m/z = 571.2
MH+
m/z = 1141.3
895.25
MS-MS of 571.2
Data Analysis for MS/MS Sequencing Method
?
MS/MS
protein
identify
peptides
rank
120000
compare
100000
Relative Intensity
80000
Relative Intensity
MS Peptide MW
Found in Selected
Databases
NDALYFPT...
SWDLTAL...
PTDLDVSY...
60000
40000
200
400
600
800
1000
1200
1400
1600
m/z
20000
0
theoretical spectra
200
400
600
800
1000
m/z
1200
1400
1600
895.25
VFGTDMDNSR
y3
376.2
y8 y7 y6 y5
y4
491.1
y4
622.2
y series ions
N
Asn
D
Asp
M
Met
D
Asp
T
Thr
G
Gly
F
Phe
Peptide fragment ions
c2
Peptide bond fragment ions
b2
a2
O
H2N
CH
C
H
N
H
CH
O
O
H
N
C
CH
H
C
H
O
H
N
CH
C
OH
H
z2
R
y2
H
x2
CH
H2N
N
C
CH
O
R'
Internal immonium ion
R
H2N
CH
Amino acid immonium ion
Peptide Sequencing
amino acid
71 u.
115 u.
Ala
O
C
Asp
O
H
N
CH
CH3
C
O
H
N
CH
C
CH2
C
OH
O
H
N
mass
Alanine
ALA
A
71.09
Arginine
ARG
R
156.19
Aspartic Acid
ASP
D
115.09
Asparagine
ASN
N
114.11
Cysteine
CYS
C
103.15
Glutamic Acid
GLU
E
129.12
Glutamine
GLN
Q
128.14
Glycine
GLY
G
57.05
Histidine
HIS
H
137.14
Isoleucine
ILE
I
113.16
Leucine
LEU
L
113.16
Lysine
LYS
K
128.17
Methionine
MET
M
131.19
Phenylalanine
PHE
F
147.18
Proline
PRO
P
97.12
Serine
SER
S
87.08
Threonine
THR
T
101.11
Tryptophan
TRP
W
186.12
Tyrosine
TYR
Y
163.18
Valine
VAL
V
99.14
Computer Exercises -- Goals
Open Webpage:
http://www.chem.arizona.edu/facilities/msf/BiochemLab/exercises.html
• Exercise #5, Peptide Sequencing & Protein ID
– Identify a peptide (and it’s protein) from an MS-MS mass list
Homology search to find protein function
BLAST: Computer Exercise #6
• Peptide sequences found for an “unknown” protein by
Sequest database searching
• Find a possible function of this protein
Spectrum
Count
Locus
Sequence
Coverage
Length
Descriptive Name
CL001145.84_fgenesh_1_aa
1
FileName
XCorr
3.20%
DeltCN
1013
Unknown
M+H+
Sequence
lb100404_01.2750.2750.2
2.7258
0.181
3290.86
RLVVVNAKPTAASAVGLAGPGAADVLP
FVEADLKKS
lb100404_01.1675.1675.2
3.6422
0.181
1658.85
RHFFAAAAGQPPPQY.L
Computer Exercises -- Goals
Open Webpage:
http://www.chem.arizona.edu/facilities/msf/BiochemLab/exercises.html
• Exercise #6, Blast Search
– Perform a BLAST search for a peptide sequence that was
found in the previous exercise
– Observe the other proteins with similar sequence
– Not all organisms have full genomic information – homology
sequencing is useful for protein identification
Computer Exercises -- Goals
Open Webpage:
http://www.chem.arizona.edu/facilities/msf/BiochemLab/exercises.html
• Exercise #7, Find an unknown protein
– Use the same method of #4 to find an unknown peptide
– Information provided:
• MS spectrum
• MS/MS spectrum
Open source software for high-throughput
proteomics: X Tandem
• Current trends to free software
• The Global Proteome Machine http://www.thegpm.org/
– X Tandem
– Sequenced peptide libraries
– Software available to programmers
Computer Exercises -- Goals
Open Webpage:
http://www.chem.arizona.edu/facilities/msf/BiochemLab/exercises.html
• X Tandem identification of the same spectra
• Exercise #8, Find an unknown protein