CVA for NMR data
Download
Report
Transcript CVA for NMR data
Experimental & Bioinformatic Tools for
Proteomics
Steve Oliver
Professor of Genomics
Faculty of Life Sciences
The University of Manchester
http://www.cogeme.man.ac.uk
http://www.bioinf.man.ac.uk
Functional Genomics
Level of Analysis
Definition
Status
Method of Analysis
Genome
Complete set of genes of
an organism or its
organelles.
Context-independent
(modifications to the
yeast genome may be
made with exquisite
precision.
Systematic DNA
sequencing.
Transcriptome
Complete set of mRNA
molecules present in a
cell, tissue or organ.
Context-dependent (the
complement of mRNAs
varies with changes in
physiology, development
or pathology.
Hybridisation arrays.
SAGE
High-throughput Northern
analysis.
Proteome
Complete set of protein
molecules present in a
cell, tissue or organ.
Context-dependent.
2-D gel electrophoresis.
Peptide mass
fingerprinting.
Two-hybrid analysis.
Metabolome
Complete set of
metabolites (low
molecular weight
intermediates) present in
a cell, tissue or organ.
Context-dependent.
Infra-red spectroscopy.
Mass spectometry.
Nuclear magnetic
resonance spectometry.
GENOME
TRANSCRIPTOME
PROTEOME
METABOLOME
Proteomics
Separation
Identification
Quantitation
Bioinformatics
Complex mixture analysis
knowledge+ prediction
genome
“virtual” proteome
peptide mass
database
post-translational
modification
separation
methods
2D-gels,
functional
separations,
n-dimensional
chromatography
real proteome
Bioinformatics
Identification
complex mixtures [digest] complex peptide
& subsets
map fingerprint
simple mixtures
& single proteins
[digest] simple peptide
map fingerprint
4.0
4.5
5.0
5.5
6.0
6.5
Aberdeen PRF1: S. cerevisiae 2D map
ADE6
+
150
100
CDC48
+
HIS4
+
ADE5,7
+
SSE1
90
ABP1
+
80
LEU1
SSA2 SSA1
+ +
HSP60
PDR13
+
70
60
PUB1+
50
SSC1
VMA1
+ + SSB1
+
WTM1+
HXK2+
+
VMA2
HXK1
SAM1 ATP2
+
+ +
+ LYS9
TIF3
+
SGT2
+
ADO1
+
TPM1
+
FBA1
+
+
SPE3
Ykl056c
+
+ PDC1
+
+
+
RHR2
YHB1
+
+
ASC1
FBA1
EGD2
TDH3 +
TPI1
ADK1
+
+
RIB3
+
+
+
ILV5 +
+ +URA1 +ADH1
ENO2
+ + PGK1
+
PDC1
HSP26
+
ADH1
+
PSA1+
+ ENO2
PGK1?
+
OYE2 +
+
TPI1
+
FBA1
+
ENO1
+
+ MET17
+
CYS3 +
EFB1
+
CYS4
+
+ SES1
ENO2
+
+
+
VMA4 ENO2
+
Ylr301w
SEC53 +
+
RPS0A +
RPS0B
+
GLK1,
+
ARO8
GDH1
+
CDC19
+
+
Yfr044c
IPP1
+
+
BMH1 +
HYP2
+
PDC1
+
FBA1
BMH2 +
30
ALD6
+
PAB1
+
+
ASN2
+
+
PDB1 +
CLC1,BGL2
+
+
ACT1
+
+
ARG1
SAM2 +
40
MET6
+
STI1
+
+
PST2
SOD1
+
20
TSA1
AHP1
+
+
MGE1
+
BNA1
TDH3
+
+
COF1
+
+
EGD1
PDC1
+
FPR1
+
NTF2
+
10
PFY1
+
ENO2
+
RPS21
+
RIB4
+
RPL22A
+
CPH1
+
Peptide mass fingerprinting
denature
KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRC
LPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMS
ITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHF
DASV
digest (trypsin)
KETAAAK
m1
FER
QHMDSSTSAASSSNYCNQMMK
m2
m3
CLPVNTFVHESLADVQAVCSQK
NVACK
m7
ETGSSK
m10
SR
m4
NLTK
m5
m9
YPNCAYKTTQANK
HIIVACEGNPYVPVHFDASV
m11
m12
abundance
mass spectrometry
m7
m10
m1
mass
m6
NGQTNCYQSYSTMSITDCR
m8
m11
DR
m12
m9
Proteomic applications
• Quantitative Proteomics
– “Expression” proteomics
• protein levels under different
conditions/times
• Qualitative Proteomics
– Identification proteomics
• protein:protein interactions
• post-translational modifications
“A MASS SPECTROMETER
MEASURES THE MW….”
“...A MS ANALYSIS
GIVES
THE MASS-TO-CHARGE RATIO (m/z)
FOR IONS…IN GAS PHASE”.
Brancia FL, Trieste, 12/02/2004
What is a “mass spectrometer”...?
Brancia FL, Trieste, 12/02/2004
TOF, quadrupole, ion trap
Pumping
system
vacuum
Sample
introduction
DIRECT
INTRODUCTION
(solid, liquid, gas)
SEPARATION
TECHNIQUES
(HPLC, CE, GC)
ION SOURCE
ANALYZER
(“ion generation”)
(“mass analysis”)
EI, FAB,
MALDI,Electrospray
Detector
Data
Processing
Brancia FL , Trieste, 12/02/2004
Various ionisation methods
•
•
•
•
Electron impact ionisation (1919 A.J. Dempster)
Chemical Ionisation CI
Fast atomic bombardment FAB (1981 M. Barber)
Matrix-assisted laser desorption ionisation MALDI
(1988 K. Tanaka, M. Karas F. Hillenkamp)
• Electrospray ES (1985, J. Fenn)
Brancia FL, Trieste, 12/02/2004
‘Soft’ Ionisation Techniques
‘Soft’ refers
to the low amount of energy imparted into the
analyte during ionisation. Too much internal energy will
result in fragmentation. Soft ionisation techniques form
intact molecular or pseudo-molecular (M+H) ions.
Matrix-assisted laser desorption
ionisation (MALDI)
Electrospray (ES)
Brancia FL, Trieste, 12/02/2004
Nobel Prize in Chemistry 2002
“...for their developments of soft desorption ionisation methods for
mass spectrometric analysis of biological macromolecules”.
1/4 to John B. Fenn (USA)
1/4 to Koichi Tanaka (Japan)
Virginia Commonwealth University
Shimadzu Corp. Kyoto
Electrospray Ionization
Laser Ionization
1//2 of the prize went to Kurt Wutrich (Switzerland) development of NMR analysis
Brancia FL , Trieste, 12/02/2004
Electrospray (ES)
Brancia FL, Trieste, 12/02/2004
[M+nH]n+
Droplet shrinks
due to solvent
evaporation
Droplet explodes due
to charge density limit
Gaseous ions formed via
one of two proposed
mechanisms
mass analyzer
counter
electrode
(near ground)
skimmer
electrodes
high vacuum
electrospray
capillary
atmospheric
pressure
+HV
sample
solution
pressure gradient
potential gradient
Brancia FL, Trieste, 12/02/2004
The principal outcome of the electrospray process is the transfer of
analyte species, generally ionised in condensed phase, into the gas phase
as isolated entities
++
++
+HV
+
+
+
Gaskell SJ Jounal of Mass Spectrometry 1997
++
++
+
+
++
Aerosol of
charged droplets
Brancia FL, Trieste, 12/02/2004
ES spectrum of Rho protein
759.3 771.6
100
Rho Protein: 47004.33 Da
759.1
747.1
735.5
784.4
797.7
724.1
[M+56H]56+
713.2
825.6
713.0
702.6
840.3
702.4
%
855.6
[M+50H]50+
871.7
888.0
682.1
905.0
682.0
941.0
672.4
960.2
672.2
653.9
0
600
650
980.3
1001.2
700
750
800
850
900
950
1000
1050
Courtesy of Dr Matt Openshaw
1100
1150
1200
1250
m/z
1300
Brancia FL, Trieste, 12/02/2004
Electrospray (ES)
[M+56H]56+
Therefore,
=
M
=
840.3 m/z
[840.3 x 56] – 56
=
47000.8 Da
Deconvolution: Takes all the multiply charged ions and converts them into a
spectrum on a mass (Da) scale i.e. works out the molecular weight is most likely to
be.
Brancia FL, Trieste, 12/02/2004
ES spectrum after deconvolution
47004.9
100
47004.0 Da
%
0
44000
44500
45000
45500
46000
46500
47000
47500
48000
48500
49000
49500
mass
50000
Brancia FL, Trieste, 12/02/2004
Advantages
• Production of molecular ions from solution
• The ease of coupling with separation
techniques (micro LC-MS/MSMS, nano LCMS/MSMS)
• Production of multiply charged ions
Brancia FL, Trieste, 12/02/2004
Matrix Assisted Laser Desorption
Ionisation
MALDI
Time-of-Flight
Brancia FL, Trieste, 12/02/2004
Matrix assisted laser desorption ionisation
(MALDI)
COOH
COOH
OH
COOH
H3CO
CN
HO
HO
HO
OCH 3
-cyano-4-hydroxy
cinnamic acid (CHCA)
2,5-dihydroxybenzoic acid
(DHB)
Trans-3,5-dimethoxy-4hydroxy cinnamic acid
(sinapinic acid; SA)
Typically used with a nitrogen laser (337 nm)
Brancia FL, Trieste, 12/02/2004
MALDI is an efficient desorption ionisation technique for
producing gaseous ions from a solid sample by laser
pulses
[M+H]+
Brancia FL, Trieste, 12/02/2004
Matrix Assisted Laser
Desorption/Ionisation (MALDI)
Unlike ES, MALDI forms predominantly singly charged ions e.g. [M+H]+ or adducts
(sodium [M+Na]+ or potassium [M+K]+)
Sodium = 23 amu
Potassium = 39 amu
[M+H]+
[M+Na]+
22 m/z
38 m/z
[M+K]+
Brancia FL, Trieste, 12/02/2004
Why is the matrix so important?
• Matrix is necessary to dilute and disperse the
analyte
• It functions as energy mediator for ionising
the analyte itself or other neutral molecule
• It forms an activated state produced by photo
ionisation
Brancia FL, Trieste, 12/02/2004
Advantages
• MALDI primarily creates singly charged ions
[M+H]+
• Less sensitive to contaminants
• Sensitivity at femtomole level
• High throughput analysis
Brancia FL, Trieste, 12/02/2004
Time-of-flight (ToF) mass
spectrometer
MALDI target
Flight tube (field-free region)
Detector
mv2/2= zV
Extraction grid
t=0
t2=m/z(d2/2V)
t=>0
Brancia FL, Trieste, 12/02/2004
Reflectron-time of flight mass analyser
Detector 1
Electrostatic
mirror
Laser
Detector 2
Target
VACCEL
Brancia FL, Trieste, 12/02/2004
MALDI
Sensitivity =
Simplicity
femtomole 10-15 M/l (...attomole 10-18 M)
=
$$$ =
Speed
(“high throughput”) =
Selectivity
very easy
training required
70 to 650 k$
120 to 650 k$
~104/day
dynamic system
(“resolution”) =
Structural information =
Software =
ESI
>5000
MSn
MSn
“ ...evaluation in progress.”
Brancia FL, Trieste, 12/02/2004
Structural information can be achieved by
tandem mass spectrometry
Brancia FL, Trieste, 12/02/2004
The tandem mass spectrometry
experiment
Ion source
Analyser 1
e.g. quadrupole
e.g. electrospray
Decomposition
region
Analyser 2
e.g. quadrupole,
time-of-flight
collisionally activated
decomposition CAD
Brancia FL, Trieste, 12/02/2004
ion source
Collision gas
molecules
ion beam
m+
f+
f+
1
2 f+
f+ 3
4
f+
1
f+
3
f+
2
f+
4
MS1
*
*
m+
*
f+
1
* *
ion
detector
m+ f + f +
1 3
*
Collision Cell
MS2
(a)
f
4
TIC
f
3 f
2
f
1
m
(b)
TIC
m/z
f
3
f
1
m
m/z
Brancia FL, Trieste, 12/02/2004
PROBLEMS WITH ‘CLASSICAL’
PROTEOME ANALYSIS:
1. Not comprehensive
2. Not high-throughput
3. Destroys protein-protein interactions
that provide important clues to function
Number of (protein) database matches
450
400
350
300
250
200
C. elegans
150
100
S.cerevisiae
50
0
1000
E.coli
H.influenzae
1200
1400
1600
Peptide mass (Da)
1800
2000
• Multidimensional protein identification
technology (MudPIT)
•
Washburn MP, et al Nat Biotechnol 2001, 19:242-247.
SCX
Reverse Phase
Load complete digest of sample
Develop with gradient
and spray directly onto
MSMS
Identified 1500 proteins from yeast including
lower abundance species and membrane proteins
MS/MS
2415 (46%) of Plasmodium genome identified in all 4 stages of parasitic life cycle
Just Enough Diagnostic Information
Sidhu KS, Sangavich P, Brancia FL, Sullivan AG,
Gaskell SJ, Wolkenhauer O,
Oliver SG, Hubbard SJ (2001)
Bioinformatic assessment of mass spectrometric
chemical derivatisation techniques for proteome
database searching.
Proteomics 1, 1368-1377.
Provide limited sequence information by:
1. Identification of N-terminal amino acid by
PTC derivatisation
2. Use guanidination to identify C-terminus,
determine lysine content, and improve
signal response
3. Specifically fragment next to Asp
residues using MALDI-QToF MS
PTC-derivatisation
•phenylthiocarbamoyl derivative
•Edman chemistry
•N-terminal amino acid
•b1 ion created via low energy collisions
•precursor ion scan gives parents
•increased sensitivity
peptide ions
ms2
ms1
scan for
precursors
collision
cell
fixed on b1
Spectra collected of all peptides which
give rise to a given b1 ion (implying
knowledge of the N-terminal amino acid)
Database peptide hits by N-terminal amino acid
N-terminal mean number
Amino acid of peptides
ANY
W
C
H
M
:
N
I
E
S
L
:
I/L
74.15
1.70
1.77
2.30
3.41
5.61
5.76
6.04
7.18
8.39
14.16
Error = ± 0.5 Da
Average number of matching proteins
in the yeast proteome when searching
with a peptide mass in the 1000-2000
Da range
Rare amino acids give a bigger search
gain
Guanidation of Lysine
H2
N
NH2
NH
NH
NH2
O
H3C
NH2
NH2
O
O-methyl isourea
OH
NH2
O
OH
lysine
homoarginine
500
K
1000
R
1500
R
R
1500
Mass (m/z)
1790.0320
1822.1611
R
1841.1048
1412.96
756.56
R
K
2442.40
1286.90
1308.83
1000
1159.77
1170.72
807.46
2000
1057.77
726.43
656.16
Counts
MALDI spectrum of an enolase tryptic digest
R
K
0
2000
2500
MALDI spectrum of a tryptic digest of enolase after
guanidation
*K
*K
6000
4000
R
2000
*K
*K
*K
R *K
*K
*K *K
R
R
R
*K
0
800
1000
1200
1400
1600
1800
Mass (m/z)
2000
2200
2400
2600
Initial set of search
peptides and associated
information
Search database,
compile protein “hit
list” with matching
peptides
Top-scoring protein is
matched. Remove
corresponding peptides
from search list
If all initial search
peptides masses are
matched, stop, else
continue searching
Real yeast proteomics
• Alternatives to 2D-gels
– denaturing technology
– low abundance spots difficult to identify
• Many steps of orthogonal 1D-steps
– Size exclusion chromatography
– Ion exchange chromatography
– 1D-gels
1000
1200
1400
1512.69
1752.65
3600
R
K
1416.55
1210.39
1221.90
1150.49
1040.30
795.23
925.33
After guanidination
R
1768.59
1600
3612.77
800
1708.61
1470.68
795.32
811.32
Before guanidination
3570.36
1752.62
Yeast proteome sample
0
800
1000
1200
1400
Mass (m/z)
1600
1800
3600
K
Database search gains
Standard MALDI
7 search peptides
(before
guanidination)
Standard MALDI
12 search peptides
(after guanidination)
Combined 19 (7 +
12) search peptides
(both experiments)
YDR457w
YGR192c
YIL192c
YJR009c
YDL140c
YJR109c
6 out of 7
5 out of 7
5 out of 7
4 out of 7
4 out of 7
4 out of 7
85.7%
71.4%
71.4%
57.1%
57.1%
57.1%
YGR192c
YJR009c
YBR208c
YFR031c
YER075c
TY1B_LR2
10 out of 12
9 out of 12
7 out of 12
6 out of 12
6 out of 12
5 out of 12
83.3%
75.0%
58.3%
50.0%
50.0%
41.7%
2549 proteins
match at least
1 peptide
YGR192c
YJR009c
YDR457w
YIL129c
YGR098c
YFR031c
15 out of 19
13 out of 19
10 out of 19
10 out of 19
8 out of 19
8 out of 19
78.9%
68.4%
52.6%
52.6%
42.1%
42.1%
3235 proteins
match at least
1 peptide
1656 proteins
match at least
1 peptide
Database search gains
# peptides
in common
Search peptides
in common
(5 from expt 1,
4 from expt 2)
PTC
derivatised 3
peptides
N-term =
Ile/Leu
All 3 sets of
experimental
data
combined
YGR192c
YJR009c
YJL052w
O7535
YDR545w
YBR223c
9 out of 9
7 out of 9
5 out of 9
4 out of 9
4 out of 9
4 out of 9
100.0%
77.8%
55.6%
44.4%
44.4%
44.4%
YGR192c
YJR009c
YJL052w
YLR060w
YNL271c
YAL019w
3 out of 3
3 out of 3
2 out of 3
2 out of 3
2 out of 3
2 out of 3
100%
100%
66.7%
66.7%
66.7%
66.7%
YGR192c
YJR009c
YJL052w
YLR454w
YJL165c
YLR060w
18 out of 22
16 out of 22
9 out of 22
8 out of 22
6 out of 22
5 out of 22
81.8%
72.7%
40.9%
36.4%
27.3%
22.7%
5
4
3
3
2
2
Only 289 proteins
match at least 1
peptide in both
experiments
Only 204 proteins
match at least 1
peptide
3
2
2
2
2
2
Only 18 proteins
match at least 1
peptide in all 3
experiments
S. cerevisiae
Yeast 22proteins
proteins
100
100
90
90
% unambiguous identification
% unambiguous identification
S. cerevisiae
1 protein
Yeast 1 protein
80
80
standard
70
guanidination
60
standard
PTC (500)
30
PTC (500)
50
PTC (50)
40
Asp-frag
30 (All)
Asp-frag
20
20
70
60
50
40
10
guanidination
PTC (50)
Asp-frag
Asp-frag (All)
10
0
0
1
2
2
4
C. elegans 1 protein
6
C. elegans 2 proteins
100
100
90
90
% unambiguous identification
% unambiguous identification
4
total number of search peptides
total number of search peptides
80
70
60
50
40
30
20
80
70
standard
standard
guanidination
60
guanidination
PTC (500)
PTC (500)
PTC (50)
PTC (50)
50
40
Asp-frag
Asp-frag
30 (All)
Asp-frag
Asp-frag (All)
20
10
10
0
0
1
2
total num ber of search peptides
4
2
4
6
total number of search peptides
Improved bioinformatics approaches
for complex mixtures
primary data
(input masses)
search
engine
secondary data
Database:
- proteome
- proteins
- peptides
protein hit list
(quantitative data)
(experimental proteome data)
rule-based
system
protein information
(qualitative data)
probability combined
evidence
Final Scores
possibility
Contextual information
pI (theoretical & experimental)
Molecular weight (oligomerisation state)
Subcellular localisation (known, predicted - PSORT)
Molecular environment (soluble, membrane, DNA-,
actin- associated.)
Post-translational modifications (known, putative,
predicted)
Sequence motifs
Homology relationships
Non-native state digestions
Scoring systems
• Bayesian approach
P(k | I ) P( D | kI )
P(k | DI )
P( D | I )
–
–
–
–
–
–
k is hypothesis that the sample protein is protein k,
D is mass spec fingerprint data,
I is background information,
P(k|DI) is posterior probability for k given D and I,
P(k|I) is prior probability of k given I,
P(D|I) is a normalisation constant
QUANTITATIVE
PROTEOMICS
DiGE
Difference Gel Electrophoresis
• Ünlü M. et al (1997). Difference gel electrophoresis:a
single gel method for detecting changes in cell extracts.
Electrophoresis,18, 2071-2077
Sample 2
Sample 3
label with cy3
in dark 30mins @ 4OC
label with cy5
in dark 30mins @ 4OC
Sample 1
label with cy2
in dark 30mins @ 4OC
quench un-reacted dye
by adding 1mM lysine
in dark 10mins @ 4OC
Difference Gel Electrophoresis
2D gel electrophoresis
Cy
5
Cy3
no difference
●
presence / absence
●●
up / down-regulation
●
Cy3 +Cy5
Stable Isotope Labelling
•
In vivo labelling = Isotopes introduced during cell culture
N14
N15
m/z
Pro
Cheap
Information rich
Con
Only works for microbes and
cell culture????
Very complex samples
Have to deduce sequence
before assigning pairs
–
Growth of C.elegans on isotopically labelled E.coli
E.coli grown on 15N
14N
E.coli grown on
nitrogen source
nitrogen source
Metabolic labelling
of C.elegans
Light mutant
Heavy WT
Also grew Drosophila on metabolically labelled yeast
Light WT
Heavy mutant
Krijsveld et al (2003) Nat. Biotech.
In vitro labelling - continued
I Isotopes introduced during proteolysis 18O – labelled water, Ctermini
II Guanidinylation of lysine using isotopes of O-methyl isourea –
lysine residues
III Dimethyl labelling – lysine residues
–Pro
Con
•Cheap
Complex peptide mixture
•Universal
Small mass difference on MS
ICAT – Isotope Coded Affinity Tags
Gygi SP, et al . Nat Biotechnol 1999, 17:994-999.
Biotin
Affinity
Tag
Cleavable
Linker
Isotope Coded Linker
227 / 236 (9*13C) amu
SHreactive
group
(Iodoaceta
mide)
Pros
Cons
Universal
Simplified sample
Protein must contain cysteine
ICAT method
O
HN
NH
O
X
NH
S
Biotin
X
X
X
H
O
O
Linker (heavy or light)
O
H
O
H
NH
H
Thiol-specific reactive group
Gygi S, Rist B et al. (1999) Nature Biotech. 17: 994.
Control sample
Test sample
Denature (SDS) and
reduce (TCEP)
SH
SH
SH
SH
SH
SH
SH
SH
SH
SH
SH
SH
Label with
light
reagent
S
S
S
Pool Samples
S
S
Label with
heavy
reagent
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
Digest
overnight
with
trypsin
S
S
S
S
S
Purify labelled
peptides using
avidin column
S
S
S
S
S
S
S
S
S
S
S
Cleave biotin
portion of the tag
with concentrated
TFA
S
S
S
S
S
S
S
S
S
S
LC-MSMS
iTRAQ
Ross P. et al. Mol Cell Proteomics. 2004 Sep 22
WORKFLOW
reduce, alkylate (cysteine block) and digest protein sample with trypsin as
usual
label each sample (max of 4) with a different iTRAQ reagent, 100ug of
protein is optimal
combine all iTRAQ labeled samples to one sample mixture
clean up sample by Cation- Exchange- Chromatography
for complex sample mixtures, pre-fractionation is achieved by using a
High-Resolution-Cation-Exchange column
analyze the mixture by LC/MS/MS
results are analysed by Pro Quant Software
PROTEIN TURNOVER
The missing dimension of proteomics
JM Pratt, J Petty, I Riba-Garcia, DHL Robertson,
SJ Gaskell, SG Oliver, RJ Beynon (2002)
Molec. Cell. Proteomics 1, 579-591.
Experimental Approach
Deuterated leucine labelling
Unlabelled chase
1
0.9
0.8
Loss of label from
proteins at different
rates = turnover
Protein labelling curve
(100ml/h-1)
0.7
0.6
0.5
Doubling times (0.1h-1)
0.4
0.3
0.2
0.1
0
0
10
20
30
40
Time (h)
50
60
70
80
Dilution rate =
0.1h-1
Half-time =
6.9h
L=3
Pratt et al., Figure 3
100% d9
1467.
3
1119.9
1454.
1
1686.
3
1795.
4
1336.
2
L=1
2336.
5
2057.
5
L=3
50% d9
1119.8
L=0
L=1
L=2
L=3
L=2
L=2
L=2
1119.9
1440.0
L=1
1444.9
0% d9
1747.1
1668.0
1317.8
1768.2
2039.2
2327.2
L=1
27Da (3 Leu)
9Da (1 Leu)
1364.833
100
0h
1178.928
1538.967
x6
1538.967
1002127.389
1612.914
%
100
2126.389
1864.125
%
%
1521.909
0
100
4h
1365.895
1864.163
0
6h
1365.878
%
0
1365.886
1179.957
1529.938
1365.898
1169.911
1529.932
%
1181.821
2099.316
%
2164.200
0
1355.994
0
1600
%
1551.968
1530.129
1800
2000
2111.252
2121.256
100 2099.525
%
0
1400
1539.029
0
100
1753.268
2109.259
100 2099.316
2099.525
1530.129 1613.156
1200
1523.882
x6
1170.065
%
1529.984
100
1754.101
2126.407
0
x6
1612.996
1529.984
2122.239
100 2099.247
1551.916%
1520.933
0
51h
1529.932
1538.987
%
1355.860
2112.228
0
100
1754.037
2126.420
2099.251
%
1552.943
0
100
100
%
x6
0
%
100
1521.928
1612.944
2121.241
1538.991
0
100
2110.235
0
100
2099.251
2126.419
2099.250
1551.931
x6
0
25h
%
1521.941
1772.147
12h
100
1529.930
%
1612.944
2121.305
1538.981
0
100
2110.260
0
2126.419
1772.145
%
1552.938
100
1538.981
2126.443
% 2099.240
1521.946
0
8h
1529.949
x6
1612.934
1178.956
100
2126.443
2002.202
%
1539.007
100
1612.960
2112.260
1554.837
0
x6
1539.007
%
100
1532.971
0
1178.974
2126.389
%
m/z
1520
1539.080
1530
1540
1552.130
1550
2110.462
0m/z
2100
2110
2122.522
m/z
2120
2130
Pratt et al., Figure 3
1
NADP-glutamate dehydrogenase (GDH)
(3 peptides)
RIAt
0 .8
Hsp26(2 peptides)
0 .6
0 .4
0 .2
1
0 .6
0 .4
0 .2
0
0
10
20
30
Time(h)
40
50
0
10
20
30
Time(h)
40
0.16
kloss (h-1) ±
SEM
RIAt
Pyruvate decarboxylase (PDC)
(4 peptides)
Hsp71 (4 peptides)
0 .8
0.08
0
NADP-GDH
Hsp26
Hsp71
PDC
50
60
Pratt et al., Figure 5
0.02-0.03 h-1
0.12
30
0.01-0.02 h-1
Distribution (%)
0.1
20
0.03-0.04 h-1
> 0.04 h-1
10
0.08
0
Degradation rate constant
0.06
0.04
0.02
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
15
16
17
18
19
20
21
22
23
25
26
27
27
28
29
30
31
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Degradation rate constant (h-1) ± SEM
< 0.01h-1
Protein (Spot ID)
INTEGRATION
Evaluating protein-interaction
data
von Mering C, Krause R, Snel B, Cornell M,
Oliver SG, Fields S, Bork P (2002)
Comparative assessment of large-scale data sets
of protein–protein interactions.
Nature 417, 399-403.
Cornell M, Paton NW, Oliver SG (2004)
A critical and integrated view of the yeast interactome.
Comp. Funct. Genom. 5, 382-402
(A)
DNA Binding Domain
Fused to Protein A
A
LacZ
UAS
Reporter Gene
Promoter
The fusion of the “bait” protein and the DNA binding domain of the
transcriptional activator cannot turn on the Reporter Gene.
Activator region fused to
Protein B
(B)
B
LacZ
UAS
Reporter Gene
Promoter
The fusion of the “prey” protein and the activating region of the transcriptional
activator is also insufficient to switch on the reporter.
Activator region fused to
Protein B
(C)
DNA Binding Domain
Fused to Protein A
A
UAS
Promoter
B
Transcription
LacZ
Reporter Gene
The association of “bait” and “prey” brings the DNA binding domain and the
activator region close enough to switch on the Reporter Gene and turn yeast
blue.
Fig. 1 How the two-hybrid system detects protein associations in yeast.
Schematic representation of the two hybrid system in
case of interaction of protein A and B
activation D
B
A
RNA POL II
DNA-binding D
UAS
reporter gene
Gene expression
Schematic representation of the two hybrid system in
absence of interaction of protein A and B
activation D
B
RNA POL II
A
DNA-binding D
UAS
NO TRANSCRIPT
reporter gene
Synthetic lethals
Definition: lethality is caused by mutating two or more genes
gene1
gene1
gene2
gene2
geneA
gene3
gene3
geneB
gene4
gene4
geneC
gene5
Single essential pathway
gene5
Functionally overlapping pathways
Asparagine-linked Glycosylation
Dolpp-GlcNAc2Man9Glc3
(Substrate)
(ALG genes are responsible for the core synthesis)
Asp -NH -GlcNAc2Man9Glc3
+
Asp-NH2
X
STT3, OST1
WBP1, OST3
OST6, SWP1
OST2
OST5
OST4
X
SER/THR
SER/THR
alg mutations are synthetically lethal with
conditional mutation affecting oligosaccharyltransferase activity
Integrating complex data with
yeast two-hybrid data
Complex consists of six proteins
A, B, C, D, E, F
B
F
A
E
In a yeast two-hybrid experiment,
A
A interacts with another protein
Is
B, C, D, E or F?
C
D
Large-scale interaction data and the distribution of
interactions according to functional categories.
Quantitative comparison of interaction datasets.
Set of confirmed Y2H interactions
Confirmation of an interaction requires:
1. Identification in more than one Y2H screen, OR
2. The reverse interaction must have been identified,
OR
3. The two proteins must have been identified in the
same protein complex (from either classical or
high-throughput affinity purification studies).
A total of 451 reliable interactions,
involving 581 proteins have been identified
from a combined data set comprising
5214 interactions and 4025 proteins
PEDRo: A Systematic Approach to
Modelling, Capturing and
Disseminating Proteomics Data
Taylor CF, Paton NW, Garwood KL,
Kirby PD, Stead DA, Yin Z, Deutsch EW, Selway L, Walker J,
Riba–Garcia I, Mohammed S, Deery MJ, Howard JA,
Dunkley T, Aebersold R, Kell DB, Lilley KS, Roepstorff P,
Yates JR III, Brass A, Brown AJP, Cash P, Gaskell SJ, Hubbard SJ, Oliver SG (2003)
Nature Biotechnol. 21, 247-591.
Garwood K, McLaughlin T, Garwood C, Joens S, Morrison N, Taylor CF, Carroll K, Evans C, Whetton AD, Hart S,
Stead D, Yin Z, Brown AJP, Hesketh A, Chater K, Hansson L, Mewissen M, Ghazal P, Howard J, Lilley KS,
Gaskell SJ, Brass A, Hubbard SJ, Oliver SG, Paton NW (2004)
PEDRo: A database for storing, searching and disseminating experimental proteomics data.
BMC Genomics 5, 68 doi:10.1186/1471-2164-5-68.
Proteomics — the state of play
• The volume of generated proteome data is rapidly increasing
– Movement towards high–throughput approaches
– Experimental techniques increasing in complexity
– Analyses also increasing in complexity
• Current publicly available proteomics data is limited
– 2D–Gel image databases (e.g. SWISS–2DPAGE) contain little information about sample
preparation, or analysis of results
– No widely used databases of mass spectrometry data or analyses
• A robust, future-proofed, standard representation of both methods
and data from proteomics experiments is required
–
–
–
–
Analogous to the MIAME guidelines for transcriptomics
Users will know what to expect from datasets (formats etc.)
Will facilitate handling, exchange and dissemination of data
Will guide the development of effective search/analysis tools
PEDRo and PEML
• The PEDRo (Proteome Experiment Data Repository) model
– Specifies the information required about a proteomics experiment
•
sufficient information to exactly replicate that experiment
– Organised in a manner reflecting the procedures that generated it
– Flexible enough to accommodate new technological developments
– Described in UML (Universal Modelling Language) making it
implementation–independent (effectively a generic blueprint)
•
•
Implemented in SQL (the relational database repository)
Also implemented in Java (later slide), and XML (next bullet)
• PEML (Proteomics Experiment Markup Language)
– The XML implementation of PEDRo for data exchange and rapid
dissemination (using XSLT to display PEML files as web pages)
• Two benefits arising from early implementation of the model
– Implementation allows the underlying technologies to be tested
– Making explicit what data might most usefully be captured about
proteomics experiments will speed the model’s evolution
The nature of proteomics experiment data
• Sample generation
– Origin of sample
• hypothesis, organism, environment,
preparation, paper citations
• Sample processing
– Gels (1D/ 2D) and columns
• images, gel type and ranges, band/spot
coordinates
• stationary and mobile phases, flow rate,
temperature, fraction details
• Mass Spectrometry
• machine type, ion source, voltages
• In Silico analysis
• peak lists, database name + version,
partial sequence, search parameters,
search hits, accession numbers
The PEDRo UML schema in reduced form
Organism
TaggingProcess
OntologyEntry
PercentX
MobilePhase Component
AssayDataPoint
SampleOrigin
GradientStep
Column
OtherAnalyte ProcessingStep
ChemicalTreatment
Fraction
AnalyteProcessingStep
OtherAnalyte
Analyte
Sample
TreatedAnalyte
Experiment
MassSpecMachine
RelatedGelItem
mzAnalysis
IonSource
GelItem
Electrospray
BoundaryPoint
DiGEGelItem
Gel1D
Gel
Detection
Spot
Gel2D
DiGEGel
Tandem SequenceData
MSMSFraction
IonTrap
PeakList
MALDI
Band
MassSpecExperiment
DBSearch
ToF
DBSearchParameters
ListProcessing
OtherIonisation
Hexapole
PeptideHit
Peak
OntologyEntry
ProteinHit
OntologyEntry
OthermzAnalysis
Quadrupole
CollisionCell
Chromatogram
Point
Peak-Specific
ChromatogramIntegration
Protein
RelatedGelItem
Experiment
hypothesis
method_citations
result_citations
MassSpecMachine
manufacturer
model_name
software_version
MALDI
laser_wavelength
laser_power
matrix_type
grid_voltage
acceleration_voltage
ion_mode
OtherIonisation
name
1
ionisation_
parameters
_parameters
1
OthermzAnalysis
name
ToF
reflectron_state
internal_length
1
1
analyte_parameters
1
OtherAnalyte *
name
1
Analyte
* sample_date
experimenter
1
GelItem
id
area
intensity
local_background
annotation
annotation_source
volume
pixel_x_coord
pixel_y_coord
pixel_radius
1
normalisation
normalised_volume
*
MassSpecExperiment
*
description
parameters_file
1
1
IonSource
type
collision_energy
0..1
0..1
1
0..1
mzAnalysis
type
1
1
0..1
Detection
type
Quadrupole
description
has_children
1
*
1
PeakList
1
1..n list_type
description
mass_value_type
0..1
Hexapole
description
IonTrap
gas_type
gas_pressure
rf_frequency
excitation_amplitude
isolation_centre
isolation_width
final_ms_level
CollisionCell
gas_type
gas_pressure
collision_offset
PEDRo UML Class Diagram: Key to colours
Sample Generation
Sample Processing
Mass Spectrometry
MS Results Analysis
1
*
RelatedGelItem
description
gel_reference
item_reference
1
1
Peak
*
m_to_z
abundance 1
multiplicity
{ordered}
OntologyEntry
category
value
*
mz_analysis description
*
1
MobilePhase
Component
*
description
concentration
*
OntologyEntry
category
value
description
*
{ordered}
Electrospray
spray_tip_voltage
spray_tip_diameter
solution_voltage
cone_voltage
loading_type
solvent
interface_manufacturer
spray_tip_manufacturer
Sample
* sample_id
analyte_processing
_step_parameters
*
Chromatogram
Point
time_point
ion_count
*
PeakSpecific
ChromatogramIntegration
resolution
software version
background_threshold
area_under_curve
peak_description
sister_peak_reference
Column
AssayDataPoint
{ordered}
description
1
* time
manufacturer
part_number
protein_assay
1
batch_number
1
1 PercentX
internal_length
OtherAnalyte 1
internal_diameter 0..1
2..n percentage
ProcessingStep
stationary_phase
1
1
bead_size
name
GradientStep
*{ordered}1 pore_size
step_time
*
temperature
AnalyteProcessingStep
flow_rate
Fraction
injection_volume
* 1
Gel
parameters_file
start_point
description
end_point
raw_image
ChemicalTreatment
protein_assay
annotated_image
digestion
software_version
1
derivatisations
TreatedAnalyte 1
warped_image
warping_map
Gel1D
Band
equipment
1 denaturing_agent
lane_number *
percent_acrylamide
mass_start
apparent_mass
solubilization_buffer
mass_end
stain_details
run_details
protein_assay
Spot
1 in-gel_digestion
apparent_pi
1
Gel2D
background
apparent_mass *
pi_start
pixel_size_x
pi_end
pixel_size_y
BoundaryPoint
mass_start
*
* pixel_x_coord
mass_end
DiGEGel
pixel_y_coord
first_dim_details
dye_type
second_dim_details
DiGEGelItem
excitation_wavelength
*
exposure_time
dye_type
MSMSFraction
tiff_image
target_m_to_z
* plus_or_minus
DBSearch
* username
{ordered}
id_date
Tandem
*
ListProcessing
n-terminal_aa
SequenceData
1
*
c-terminal_aa
smoothing_process
source_type
count_of_specific_aa
background_threshold sequence
name_of_counted_aa
*
regex_pattern
PeptideHit
1
1
{ordered}
*
score
DBSearchParameters
score_type 1..n
ProteinHit
*
program
sequence
database
all_peptides_matched
information
1
database_date
probability
*
parameters_file
1..n
1
taxonomical_filter
db_search_
peptide_hit
parameters
Protein
_parameters
fixed_modifications
1
*
*
accession_number
variable_modifications
OntologyEntry
gene_name
max_missed_cleavages
category
synonyms
mass_value_type
value
organism
fragment_ion_tolerance
description
orf_number
peptide_mass_tolerance
description
accurate_mass_mode
RelatedGelItem
sequence
mass_error_type
modifications
mass_error
description
*
predicted_mass
protonated
gel_reference
predicted_pi
icat_option
item_reference
1
next_dimension
Organism
SampleOrigin
species_name
description
1
*
strain_identifier
condition
relevant_genotype
condition_degree
environment
TaggingProcess
tissue_type
cell_type
* 0..1 lysis_buffer
tag_type
cell_cycle_phase
cell_component 1..n tag_purity
protein_concentration
technique
tag_concentration
metabolic_label
final_volume
The Framework Around PEDRo
1. Lab generated data is encoded using the PEDRo data entry tool,
producing an XML (PEML) file for local storage, or submission
2. Locally stored PEML files may be viewed in a web browser (with
XSLT), allowing web pages to be quickly generated from datasets
3. Upon receipt of a PEML file at the repository site, a validation tool
checks the file before entering it into the database
4. The repository (a relational database) holds submitted data, allowing
various analyses to be performed, or data to be extracted as a PEML
file or another format
The PEDRo Data Collator
• The tool with which a user enters
information about, and data from,
proteomics experiments
–The tool collates these data into a single
PEML file
–The hierarchical nature of the PEDRo
schema (and PEML) is reflected in the
structure of the data entry tool
• Successive stages of the experimental design are
added as ‘children’ of the previous stage
• Enforces an audit trail for data; e.g. details of a
gel cannot be entered without first describing the
sample
• A simple, filterable list of all the sub–records
present and tree-style browser act as ‘index’ and
‘contents’ for the PEML file being edited
Conclusions
• The PEDRo model does require a substantial amount of data
– Much of this information will be available in the lab of origin
– Some data will be common to many experiments, and therefore need only be
entered once, then saved as a template in PEDRoDC
• But there are several advantages to adopting such a model
– All datasets will contain information sufficient to quickly establish the
provenance and relevance (to the researcher) of a dataset
– Datasets will be detailed enough to allow non–standard searches, for
example, by sample extraction technique
– Tools can be developed that allow easy access to large numbers of
such datasets, from a wide range of proteomics sites
– Integration with other resources such as the major sequence
databases, will provide sophisticated search and analysis capability
– Information exchange between researchers will be facilitated through
the use of a common language (PEML), and the ability to rapidly
display PEML-encoded data as a web page