PowerPoint Poster Template
Download
Report
Transcript PowerPoint Poster Template
HMMatch: Peptide Identification by Spectral Matching of
US
Tandem Mass Spectra Using Hidden Markov Models HUPO
Xue
1
Wu ,
Chau-Wen
1
Tseng ,
Nathan J.
2
Edwards
1Department of Computer Science, University of Maryland, College Park
of Biochemistry and Mol. & Cell. Biology, Georgetown Medical Center
Introduction
HMMs Capture Consensus and Variation in MS/MS Spectra
m/z ratio
Hidden Markov Models (HMM)
D4
I0
I1
I2
I3
I4
Begin
b1
y1
b2
y2
•HMM constructed with states
representing ion peaks,
insertions, and deletions
•Spectra preprocessed via peak
normalization, noise filtering,
intensity scaling, discretization
End
•Trained w/ high confidence
spectra to recognize ion
peaks and correlations in
peak intensity patterns
•Outputs Viterbi score for
each spectra
HMMs Used For Protein Families
Protein Family
•Consensus &
variation found
in amino acid
sequence
•Sequence
logos capture
variation
•HMMs capture
both variation &
pattern of gaps
in sequence
•HMMs used by
Pfam to provide
high sensitivity
# spectra
D3
MS/MS Spectra
•Two example spectra for
peptide DLATVYVDVLK
•Different peaks present
•Different peak intensities
Intensity Box Plot
•Computed from multiple
spectra for peptide
•Frequency & intensities
of peaks vary
•Can capture using HMMs!
Peptide ID Case Studies
•Two spectra for peptide DLATVYVDVLK
•Scores for Tandem, Mascot, NIST MS Match
Tandem: 0.0011
NIST:
0.813
HMM: 7.35e-12
Mascot: 1.42
•Extra small peaks reduce search engine E-values
•NIST MS Search score degraded
Peptide Identification: HMM Viterbi Score & p-value
Viterbi Score
# spectra
D2
Peak Position
-log(p-value)
Viterbi Scores
•Compare spectra for
target peptide (train &
test) vs. non-target
peptides (other)
•Confidence based on
Tandem E-values
•Identifies test spectra
•Rejects other spectra
P-values
•HMMatch computes
statistical significance
using synthetic spectra
•Many spectra have low
Tandem E-values to target
peptide (test), non-target
peptide (other), or no
peptide (unknown)
•HMMatch can compute
good p-values for many
such spectra
Model Extrapolation To Nearby Related Peptides
Extrapolated HMM
D1
Intensity
•Spectral matching identifies peptides by comparing
spectra with libraries of identified spectra
•We present HMMatch, a new approach to peptide
identification that summarizes many MS/MS spectra
for a peptide using Hidden Markov Models (HMM)
•HMMs can capture both consensus and variation in
peak intensity found in spectra for a peptide
•As a result, HMMatch can confidently identify many
spectra for a peptide that other tools miss
Peak Occurrence Frequency
Intensity Variation
2Dept.
Trained HMM
Extrapolation
•Compute HMMs for
nearby protein sequence
•Compare p-values for
extrapolated HMM vs.
trained HMM
•Extrapolated HMM
scores correlate well
•Can find novel peptides
HMM: 9.87e-12
Tandem: 0.15
NIST:
0.994
Mascot: 7.52
•Peaks exceeding m/z tolerance significantly
reduced search engine E-values
Observations
•HMMatch confidently identified 3537 spectra
(p-value < 10-5) with low search engine
scores (Tandem & Mascot E-values > 0.05),
NIST MS Search identified 673 (score > 0.9)
•Accounting for intensity variation provides
additional useful information in practice
•HMMatch complements existing MS/MS
search engines & spectral matching methods
•Growing size of MS/MS libraries can improve
applicability & accuracy of HMMatch
•Reference: Wu et al., J of Computational
Biology, 14(8):1025-1043, 2007
Acknowledgments
This research was supported by
• NIH/NCI grant CA126189
• USDA cooperative agreement 5812757342