Transcript BioMarker
A biased look at Biomarkers
BioMarker
Definition:
Biomarker is a substance used as an indicator of a biologic state
Existence of living organisms or biological process.
A particular disease state
Proteins
Nucleic acids
Metabolites:
Carbohydrates
Lipids
Small molecules
Biomarker
Detection of biomarker
Detection of biomarker – diagnosis
Self properties, e.g enzymatic activities
Antibodies, IHC, ELISA
Detection of biomarker
Quantitative
a link between quantity of the marker and disease
Qualitative
a link between exist of a marker and disease
Biomarker & Diagnosis
Ideal Marker for diagnosis
Should have great sensitivity, specificity, and accuracy in reflecting total
disease burden. A tumor marker should also be prognostic of outcome and
treatment
Biomarker for Screening
•The marker must be highly specific, minimize false positive and negative
•The marker must be able to clearly reflect the different stages of the disease (early)
•The marker must be easily detected without complicated medical
procedures. The disease markers released to serum and urine are good
targets for application of early screening.
•The method for screening should be cost effective.
Samples for biomarker detection
Blood, urine, or other body fluids samples
Tissue samples
Prostate Cancer marker PSA
PSA is a protein normally made in the
prostate gland in ductal cells that make
some of the semen. PSA helps to keep
the semen liquid. PSA, also known as
kallikrein III, seminin, semenogelase,
γ-seminoprotein and P-30 antigen, is a
glycoprotein, a serine protease
Prostate Cancer Diagnosis with PSA
Cancer of the prostate does not cause any symptoms until it is locally advanced
or metastatic.
There is a correlation between elevated PSA and prostate cancer.
Detection of PSA is a surrogate for early detection of prostate
cancer.
Large screening trials have shown that PSA nearly doubles the rate of
detection when combined with other methods. Based on these data, PSA
testing was approved by the US FDA for the screening and early detection of
prostate cancer.
PSA is also found in the cytoplasm of benign prostate cells.
“I never dreamed that my discovery four decades ago would lead to such a
profit-driven public health disaster." -Richard Ablin (inventor of the PSA test)
PSA screening generates ~$1.7 billion annually in the U.S. alone.
Sensitivity = the ability of the test to detect the disease (True positive rate)
Specificity = the likelihood that your test will be normal if you are disease free
(True Negative)
A brief aside about Statistics and Probability
-Statistics are the formalization of common sense
-because they have to handle many different
situations, they can be really complicated
-they should make you feel really good or really
bad about your data
-People are inherently bad at statisitics and probability
Case Study:
rate for being HIV positive: 1:10000
false positive rate of HIV test: 1:1000
If I test positive, what is the chance that I am really HIV negative?
A brief aside about Statistics and Probability
-Statistics are the formalization of common sense
-because they have to handle many different
situations, they can be really complicated
-they should make you feel really good or really
bad about your data
-People are inherently bad at statisitics and probability
Case Study:
rate for being HIV positive: 1:10000
false positive rate of HIV test: 1:1000
What is the chance that I am HIV negative?
0.0001 0.001
0.01 0.1 0.9 0.99 0.9999
A brief aside about Statistics and Probability
-Statistics are the formalization of common sense
-because they have to handle many different
situations, they can be really complicated
-they should make you feel really good or really
bad about your data
-People are inherently bad at statisitics and probability
Case Study:
rate for being HIV positive: 1:10000
false positive rate of HIV test: 1:1000
What is the chance that I am HIV negative?
0.0001 0.001
0.01 0.1 0.9 0.99 0.9999
For every 1 True Positive there will be 10 false positives, so my chance
of being Negative is 10/11.
How about the PSA test?
Rate is 15:10000
False Positive Rate is 60:1000
For every 15 True Positives, there will be 600 False Positives!
Chance of being Negative 600/615 = .97
Chance of being Positive = .03 (before test chance was 0.015)
-Is this true?
How about the PSA test?
Rate is 15:10000
False Positive Rate is 60:1000
For every 15 True Positives, there will be 610 False Positives!
Chance of being Negative 600/615 = .97
Chance of being Positive = .03 (before test chance was 0.015)
-Is this true?
The test will miss 80% of the true positives (sensitivity = 20%)
so there will only be 3 True Positives Detected so:
Chance of being Negative 600/603 = 0.995
Chance of being True Positive = 0.005
Follow up for a +HIV test is another blood test.
Follow up for +PSA test is tissue biopsy.
How good does a Biomarker have to be?
By Age 65 the rate of Prostate Cancer climbs to 8:1000 and the test performs
much better.
For every 8 True Positives, there will be 60 False Positives!
Chance of being Negative 60/68 = .88
Chance of being Positive = .12 (before test chance was 0.015)
How good does a Biomarker have to be?
Prostate Cancer is one of the most frequent cancers (15:10000), most
cancers are much less frequent (1:10000: 1:50000) so a biomarker would
have to be much better than the PSA test. It is currently believed that a new
biomarker would need sensitivity and specificity better than 95%.
Early Proteomics Base Biomarker work
was based on SELDI
SELDI can detect 200-300 features in a sample. It has been used to find
biomarkers from everything from blood to tears.
Early Biomarker work has largely been
discredited
-Biomarkers with similar masses kept being rediscovered
-When the proteins were identified, they were abundant serum
proteins and were from the same proteins
-Multi-center studies failed to validate the biomarkers in “clinical”
setting
-Realization that serum and other biofluids are incredibly complex.
-Realization that serum and other biofluids are incredibly variable
and “fragile”
-some strong “biomarkers”
-blood collection tube
-# of freeze-thaw cycles
-diet
Key Concept: Proteins vary widely in concentration
Typical Biomarker Discovery study
will take 50 samples per condition.
Typically takes 10 samples per condition
to have a 90% chance of finding differences
of 2 times.Validation will take 1000s of
samples. Finally the assay will have to be
converted to something that can be done in
a clinical lab.
PCA or other Clustering is used for
Biomarker discovery
2007
Common Serum Markers for Cancer
Diagnosis/prognosis
AFP
Lung
CEA
CA15-3
CA19-9
CA125
x
x
x
x
Pancreas
x
x
Kidney
x
x
Breast
x
Ovarian
x
Cervical
x
Uterine
x
x
PSA
PSAf
PAP
hTG
HCGb
Ferr
NSE
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Prostate
x
x
Liver
x
Gastro
Colon
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Bladder
x
Brain
x
x
x
x
Myeloma
x
Thyroid
Testicular
x
x
A2M
x
x
Leukemia
B2M
x
x
x
x
x
Conclusions
-Biomarker Discovery is difficult
-biofluids are complex
-biofluids have a high dynamic range
-biomarkers are usually low abundance
-even taking “proximal” fluids typically
does not help
-the is a lot of person to person variability
-Most Biomarkers will never become clinically relevant
-statistical standards for diagnostic tools is very high
-the more prevalent the disease the “better” the
biomarker will perform
-An MS based biomarker assay is unlikely due to the greater
analytical performance of antibody based methods.
-For a biomarker workflow to be meaningful it must be quantitative!
Quantitative Approaches
Stable Isotope Labeling methods
-adds heavy isotopes to one sample so chemically
identical compounds are mass shifted
-added to the peptides/proteins using reactive groups
-added to the proteins in vivo using heavy amino acids
-can be multiplexed
Label free methods
-extracted ion chromatograms
-spectral counting
20
0
7 9 9 .0
1 4 4 1 .8
% In te n s ity
1 7 3 7 .8 8 0 9
1 0 5 9 .5 3 3 3
80
2 0 8 4 .6
M a s s (m /z )
2 5 3 9 .4 3 2 4
1 9 0 1 .8 8 2 7
1 2 9 6 .6 7 9 7
1 2 1 0 .6 8 9 1
1 4 2 5 .6 2 2 3
2 2 4 2 .1 6 6 3
2 4 6 5 .1 9 2 6
2 2 1 1 .0 5 2 2
2 0 3 0 .0 2 3 6
1 9 2 2 .8 7 0 2
1 8 4 4 .8 2 4 5
1 7 2 0 .8 4 0 9
1 5 7 0 .6 7 5 9
1 4 9 5 .6 8 2 1
30
1 3 5 3 .6 0 1 7
1 2 2 2 .6 2 1 8
1 1 7 4 .5 8 0 4
1 0 2 1 .5 5 2 0
60
1 0 7 9 .5 6 3 2
40
9 6 3 .5 2 7 1
50
9 9 5 .5 3 7 5
70
1 1 2 5 .4 9 2 3
10
90
100
1 7 3 7 .8 8 0 9
8 6 3 .4 2 7 9
100
8 8 1 .2 4 2 8
% In te n s ity
4700 Reflector Spec #1 MC[BP = 863.4, 3348]
4700 Reflector Spec #1 MC[BP = 863.4, 3348]
3 3 4 8 .0
90
1738.8808
0
1 7 3 7 .4 9 4 2 5
1 7 3 8 .5 6 9 5 4
2 7 2 7 .4
1 9 4 1 .2
80
70
60
50
40
1739.8810
30
20
10
1740.8808
1 7 3 9 .6 4 4 8 3
M a s s (m /z )
1 7 4 0 .7 2 0 1 1
3 3 7 0 .2
1 7 4 1 .7 9 5 4 0
1 7 4 2 .8 7 0 6 9
4 0 1 3 .0
ISOTOPE-CODED AFFINITY TAG
(ICAT):
• Label protein samples with heavy and light reagent
• Reagent contains affinity tag and heavy or light isotopes
Chemically reactive group: forms a covalent bond to
the protein or peptide
Isotope-labeled linker: heavy or light, depending on
which isotope is used
Affinity tag: enables the protein or peptide bearing an
ICAT to be isolated by affinity chromatography in a
single step
Example of an ICAT Reagent
Biotin Affinity tag: Binds
tightly to streptavidinagarose resin
Reactive group: Thiol-reactive group will
bind to Cys
O
Linker: Heavy version will have
deuteriums at *
Light version will have hydrogens
at *
NH
NH
H
N
*
S
O
*
O
O
*
O
*
H
N
I
O
The ICAT Reagent
How ICAT works?
Affinity isolation on
streptavidin beads
Lyse &
Label
Quantification
MS
Identification
MS/MS
NH2-EACDPLR-COOH
Light
100
100
MIX
Heavy
Proteolysis
(ie trypsin)
0
0
550
570
m/z
590
200
400
m/z
600
ICAT Quantitation
ICAT
Advantages vs. Disadvantages
• Estimates relative protein
levels between samples
with a reasonable level of
accuracy (within 10%)
• Yield and non specificity
• Can be used on complex
mixtures of proteins
• Expensive
• Slight chromatography
differences
• Tag fragmentation
• Cys-specific label reduces
sample complexity
• Can set up the mass
spectrometer to fragment
only those peaks with a
certain ratio
• Meaning of relative
quantification information
• No presence of cysteine
residues or not accessible by
ICAT reagent
iTRAQ™
Reagent
Design
Isobaric
Tag
(Total mass = 145)
Reporter
Balance
Charged
Neutral loss
Gives strong signature ion in
MS/MS
Gives good b- and y-ion series
Maintains charge state
Maintains ionization efficiency
of peptide
Balance changes
in concert with
reporter mass to
maintain total mass
of 145
Neutral loss in
MS/MS
PRG
Amine specific
Isobaric Tag
Total mass = 145
Isobaric Tag
= MS/MS Fragmentation Site
Amine specific peptide
reactive group (NHS)
(Total mass = 145)
O
Reporter
Reporter Group mass (Mass = 114 thru
N 117)
114 –117 (Retains Charge)
O
N
Peptide Reactive
Group
O
N
O
PRG
Balance Group
Mass 31-28 (Neutral loss)
Multiplexed protein quantitation in saccharomyces
cerevisiae using amine-reactive isobaric tagging reagents
Ross, PL., et al, Mol Cell Proteomics 2004 3: 1154-1169.
Balance
(Mass = 31 thru 28)
Isobaric Tagging - General Method (4-Plex)
S1
S2
S3
Parallel Denature & Digest
114
31 -PRG +
b
114
115
30 -PRG +
Mix
116
MS
29 -PRG +
b
y
b
y
b
y
115
114
31 -N
H
115
30 -N
H
116
29 -N
H
117
28 -N
H
MS/MS
116
117
y
117
-Reporter-Balance-Peptide INTACT
- 4 samples identical m/z
28 -PRG +
S4
- Peptide fragments EQUAL
1352.84
- Reporter ions DIFFERENT
100
117
116
80
115
114
90
70
1347.0
60
1349.6
1352.2
1354.8
Mass (m/z)
y8
1360.0
P
40
111.0
112.8
30
114.6
116.4
118.2
y11
1352.8
y10
b10
b9
y9
b8
b7
y6
b6
y5
b4
y4
y3
y2
b1
142.1
10
112.1 q,H
20
b2
Mass (m/z)
39.045.1
A
T
74.1 72.1
L
% Intensity
50
1357.4
0
9.0
292.8
576.6
860.4
Mass (m/z)
1144.2
1428.0
Spotfire K-means Clustering of Protein-level Ratios
G1L
S
PM
G1L
S
PM
G1L
S
PM
MS/MS Spectra of a Singly-charged Peptide
100
*-TPHPALTEAK-*
90
8396.7
80
70
y8
50
P
40
y11
1352.8
y10
b10
b9
y9
b8
b7
y6
b6
y5
y4
y2
y3
b4
b2
b1
10
142.1
20
112.1q,H
30
39.0
45.1 A
T
74.1 72.1
L
0
9.0
292.8
576.6
860.4
1144.2
1428.0
111.0
112.8
114.6
116.4
Mass (m/z)
y8
b7
117.1
116.1
115.1
Mass (m/z)
114.1
% Intensity
60
118.2
120.0
757
759
761
763
Mass (m/z)
765
767
869
871
873
875
Mass (m/z)
877
879
Reporter Group Placement: Selection of ‘Quiet
Summed Ion Intensity
Region’
(~75,000 Spectra)
Summed Ion Intensity
160000000
120000000
80000000
40000000
0
0
200
400
600
800
1000
m/z
1200
1400
1600
1800
2000
Simplified Workflow: (One extra step)
Control
Test 1
Test 2
Test 3
116
117
Example: Time course
labeling
Trypsin Digestion
114
Label with iTRAQ™
Reagents
Quant
115
1 hr, RT,
Single addition
MIX
ID and
SCX
Single 2D LC analysis
for combined samples (4-plex)
LC MS/MS Analysis
MS/MS
Differential Expression using iTRAQ™ Reagent Approach
OverExpression of Chaperonin 10
Non-Cysteine containing Protein
Cance Cancer
r
54
50
Normal
45
*VLQATVVAVGSGS*K
* iTRAQ Labeled Residue
Normal
40
35
114
115
116
m/z, amu
117
30
25
y2
y1
20
y3
y5
15
b3
b2
10
b5
y4 b4
y6
5
y7b6
b7
0
100
200
300
400
500
m/z, amu
600
700
800
900
ITRAQ
Advantages vs. Disadvantages
•
•
•
•
•
Estimates relative protein levels
between samples with a
reasonable level of accuracy (>
10%)
Can be used on complex
mixtures of proteins
Isobaric so the tag is only visible
in the MS/MS, keeping the
precursor scans as clean as
possible.
The abundance of the peptides
sums together. Making analysis
of low abundance peptides
easier.
Replicates analyzed on the same
LC-MS/MS run, minimizing run to
run variability.
•
Reagent not completely specific
•
Expensive
•
Does not work on ion trap instruments
•
Reporters tend to dominate the
spectra
•
You have to fragment everything and
sort out the ITRAQ reporters later.
The mass spec spends a lot of time
analyzing peptides with no
quantitative differences.
Stable Isotope Labeling in Animal Culture
SILAC
Advantages vs. Disadvantages
•
•
Estimates relative protein levels
between samples with a high
level of accuracy ( <5%)
Can be used on complex
mixtures of proteins
•
Can set up the mass
spectrometer to fragment only
those peaks with a certain ratio
•
Extremely flexible and can be
adapted to many systems.
•
Labeling may be incomplete
•
Urea Cycle may cause incorporation
of heavy isotopes into other amino
acids
•
Expensive
•
Works best on high resolution
instruments.
Label-Free Quantitation
All approaches so far require purchase of isotopically labeled reagents
(can be expensive).
•What if you want to compare large numbers of samples (10+)
•What if you can’t afford lots of reagents?
•Peak/Spectral counting
•Peak area comparison (Extracted Ion Chromatograms)
Spectral Counting
•Count the number of peptides identified from a protein in each sample.
•Typically do not count repeat identifications of the same peptide
•Not accurate at quantifying magnitude of change, but can be used to
determine if there is a difference.
•In general, need a spectral count difference of about 4 peptides in order
to be confident of a difference being real.
•Most proteins in complex mixtures are identified by less than 4
peptides.
EIC
(Extracted Ion Chromatogram)
•Measure intensity of peak during its elution off HPLC column and into the
mass spectrometer.
•Measure area of peak in XIC.
•More accurate than selecting peak intensity for one given scan.
emPAI
(Exponentially Modified Protein Abundance Index)
emPAI = 10PAI –1
Where PAI = Nobserved / Nobservable
What is an ‘observable’ peptide
•Peptides with a precursor mass between
800-2400Da.
•There is a roughly linear relationship
between log protein concentration and the
ratio of ‘observable’ peptides observed in
range of 3-500 fmoles.
•If you know how much total protein you
analyzed you can derive absolute
abundancies.
Ishihama et al. Mol Cell Proteomics (2005) 4 9 1265-1272
MRM
(Multiple Reaction Monitoring)
Look for a component of a specific mass that when fragmented forms a
fragment of another specific mass.
Transition:
precursor m/z 521.7
•Very sensitive and specific.
fragment m/z 757.6
MRM
•Best performed on a triple quadrupole instrument.
•Scans are very fast, so can perform multiple transition scans on a
chromatographic time-scale.
•Requires a lot of optimization:
Verify transitions are reproducible, typically want 2-3
transitions/peptide, 3-4 peptides/protein.
Determine the retention time to maximize the number of peptides
that can be analyzed per run.
It is possible to analyze 100s of transition per hour
•MRM coupled to isotopically labeled peptides allows for very high
sensitivity and high accuracy analysis and can give absolute quantification.
•Once optimized 1000s of samples can be run in a short time frame
•Not for discovery! You must already know what you are looking for,
sometimes refered to as targeted proteomics
Issues with MS Quantitation Analysis
•Should you use all data for quantitation?
•Minimum peak intensity?
•Peaks near to signal to noise will have
much higher variability in quantitation
accuracy.
•Very intensive peaks may be
saturated.
•Proteins identified by a single peptide are
probably not accurately quantified?
•It is best to ignore sequences with more
than one form: PTMs, missed cleavages,
etc.
•Multiple charge states should be summed.
Results are normally reported with a mean and standard deviation
Conclusions
•There are many different ways to
quantitate proteomics data
•Quantitative studies need to be
approached carefully, because it is easy to
make mistakes
•No one strategy is best
•MRM is the most sensitive and
accurate, but requires the most
optimization and cannot be used for
discovery.