Data Processing Algorithms for Analysis of High Resolution MSMS
Download
Report
Transcript Data Processing Algorithms for Analysis of High Resolution MSMS
Data Processing Algorithms for Analysis of High
Resolution MSMS Spectra of Peptides with
Complex Patterns of Posttranslational
Modifications
Shenheng Guan and Alma L. Burlingame
Problem
Input: An MS/MS spectrum of a mixture of peptides:
Heavily modified protein
Same amino acid sequence
Same PTM
Same total number of PTMs
Different PTM configurations
Example
Two peptides with two methylations each.
LATK[+32]AARKSAE
LATK[+16]AARK[+16]SAE
Problem:
Identify the PTM configurations
Estimate their relative abundance
Work flow
Peptide identification
Input
A deisotoped MS/MS spectrum of a mixture of peptides
An identified peptide, the type of PTMs and the number of
PTMs.
Example
Peptide: LATKAARKSAPATGGVKKPHRYRPGTVALRE
PTM: Methylation
#PTM: 4
Problem
Identify the PTM configurations
Estimate their relative abundance
All possible configuration
Assumption:
All methylations are on lysine residues
Each lysine residue has at most 3 methyl groups.
Configuration identification
Score of Spectrum-Configuration-Pair
Spectrum S: ETD peak list
Configuration C: theoretical peak list (c-ion)
Sc(S,C) is the number of matched peaks in the real peak list and
the theoretical peak list.
Greedy algorithm
Compute the matching score for each configuration
Remove the configure with the highest score from the
configuration set and remove the peaks in S that are matched to
the configuration
Repeat the above steps until all configurations have score 0
Configuration identification
results
Estimation of relative abundance
We have four identified configurations C1,C2,C3,C4.
x1, x2, x3, x4 the relative abundance
Consider the ith c-ion with charge z
Sum equals to 1
Five possible peaks p0, …, p4
Suppose p2 is matched to C1, C2
Observed peak intensity I(p2)
Theoretical peak intensity ( x1 x2 )
I(p
0 j 4
j
)
Compute the observed and theoretical peak
intensity pair for each matched c-ion
Estimation of relative abundance
Find x1, x2, x3, x4 such that the sum of the squared errors
of these intensity pairs is minimized.
Standard non-negative least-square procedure
A Novel Approach for Untargeted Posttranslational Modification Identification Using
Integer Linear Optimization and Tandem Mass
Spectrometry
Richard C. Baliban, Peter A. DiMaggio, Mariana D.
Plazas-Mayorca, Nicolas L. Young, Benjamin A. Garcia
and Christodoulos A. Floudas
Bottom up PTM identification
Two approaches
Tags
Non-tags
Restricted
Unrestricted
PILOT_PTM
Preprocessing
Remove all peaks related the precursor ion
Only keep locally significant peaks
Deisotope
Remove neutral offset if the peak doe not
have a complementary peak.
Each candidate peak has a list of
supporting peaks.
ILP Model
Input
Theoretical peak bk
A preprocessed deisotoped spectrum S={ a1,a2,…,am }
A peptide (theoretical b-ion peak list) P={ b1b2…bn}
A list of all known PTMs
CSk is the set of all possible peaks (indices) in S that bk can be
matched to with PTMs
Real peak aj
Posj is the set of all possible peaks (indices) in P that aj can be
matched to with PTMs
Supportj is the set of all peaks (indices) supporting peak j in S
Multj is the set of all peaks (indices) peak j supports
ILP Model
Binary variable
pj,k
= 1 if peak aj in S is matched to bk in P,
otherwise pj,k = 0
yj = 1 is peak aj is a supporting peak or
matched peak, otherwise yj = 0
ILP Model
Objective
Subject to
One peak in P can only match one peak in S
One peak in S can only match one peak in P
ILP Model
Subject to:
No three consecutive missing peaks
The intensity of peak i is counted iff the exists one peak j
such that peak i supports j and peak j is a matched peak.
ILP Model
Solve using CPLEX
Report
top-10 variable assignments
Existing problem
No
constraints that require the distance
between two neighboring matched peaks
should match the mass of a residue (with
PTM)
New constraints
For each pj,k
Set of candidate ion peaks j’
with respect to k’ such that no valid jump
exists between j and j’
The maximum and minimum
masses that can be reached from j,
respectively
New constraints
Neighboring matched peaks do not conflict
Conflicting matched peaks must have a matched peak between them
The distance between two matched peaks should be bounded
Postprocessing
Re-scoring 10 candidate modified
candidate peptides
Cross-correlation
score
Recheck modifications if there are
unmatched peaks indicating nonmodification
Test data sets
Test set A: 44 CID spectra (Ion trap), 174 ETD spectra (Orbitrap) of
chemically synthesized phosphopeptides, manually validated
Test set B: 58 ECD spectra (FTICR) of Histone H3-(1–50) N-terminal Tail,
manually validated
Test set C: 553 CID spectra (Orbitrap) of Propionylated Histone Fragments,
manually validated
Test set D: 525 modified and 6025 unmodified CID spectra (Orbitrap) from
chromatin fraction. Identified by SEQUEST and validated by MASCOT and
remove low quality spectra manually
Test set E: unmodified 36 (Ion trap), 37 (Q-TOF), 4061(Orbitrap) CID
unmodified spectra. Validated as test set D
Residue predication accuracy
Peptide prediction accuracy
Comparison on test sets C and D1
Peptide and residue prediction accuracy
Comparison on test sets C and D1
Subsequence prediction accuracy
Running time
Q&A