Small Version of Poster

Download Report

Transcript Small Version of Poster

Neural Network Predictor for Peptide Fragmentation Spectra
Narmada Jayasankar
Advisor: Dr. Predrag Radivojac
Collaborators: Dr. Haixu Tang, Dr. Randy Arnold, Divya Aggarwal
Abstract
Procedure Outline
Accurate peptide identification from tandem mass spectrometry experiments is the
cornerstone of proteomics. Although various approaches for matching database
sequences with experimental spectra have been developed to date (e.g. Sequest,
Mascot) the sensitivity and specificity of peptide identification have not yet reached
their full potential. This is in part due to the tradeoffs be-tween robustness and
accuracy of the existing methods with respect to the non-uniform nature of peptide
fragmentation and bond cleavages induced by different mass spectrometers. To
address this problem, here we used a data-driven approach to learn peptide
fragmentation rules in mass spectrometry using separate ensembles of Neural
Networks for each of the different types of Ions. We stress that the method developed
by us is not restricted to any mass spectrometry instrument. As long as a large set of
annotated spectra are available, our method can be applied to any proteomics
platform.
Prediction Accuracies
Ion Type
Data Preparation
• Digested proteins into peptides with
trypsin and isolated the tryptic
peptides.
• Experimental spectra were obtained
by electro spraying the peptides into
the mass spectrometer.
Feature Extraction
• 202 features for each of the ion
type was identified by expert
guidance.
• The experimental spectra was
processed and the features were
extracted.
So what are….
Doubly Charged Peptides
Triply Charged Peptides
Sn
Sp
Acc
AUC
Sn
Sp
Acc
AUC
Precursor – H2O
72.0
60.8
66.4
70.7
81.3
68.5
74.9
79.7
B
80.4
75.4
77.9
85.8
80.6
71.9
76.3
84.6
b – H2O
76.8
76.3
76.5
84.6
76.2
60.2
68.2
76.8
b – NH3
75.8
76.0
75.9
82.8
76.9
65.0
70.9
78.6
b – H2O – NH3
69.1
64.6
66.8
73.1
81.8
51.9
66.9
68.1
b2
-
-
-
-
88.4
75.8
82.1
88.5
y
84.7
79.3
82.0
89.5
88.9
79.1
84.0
91.4
y – H2O
66.4
66.2
66.3
72.2
82.6
56.5
69.6
73.0
y – NH3
70.3
70.8
70.6
79.0
81.2
59.8
70.5
77.8
y – H2O – NH3
60.7
51.1
55.9
56.5
83.2
54.3
68.7
69.6
y2
-
-
-
-
87.9
72.6
80.2
86.8
ROC Curves
Proteins?
A protein (from the Greek "protos" meaning "of primary
importance") is a complex, high-molecular-weight
organic compound that consists of amino acids joined by
peptide bonds. Life, chemically speaking, is nothing but
the function of proteins although the information to make
a unique protein resides in DNA.
Feature Space
Compression
3D View of a Protein
• Principle component was applied to
remove correlated features.
• A t-test feature selection filter was
employed to filter out unpromising
features.
Peptides?
Peptides (from the Greek πεπτος, "digestable") are small
protein fragments, typically upto 50 amino acids in
length.
NN Model Training
Mass Spectrometer?
• An ensemble of 30 Neural Networks
were trained for each ion type.
A mass spectrometer is a device used for mass
spectrometry, and produces a mass spectrum of a sample
to find its composition. This is normally achieved by
ionizing the sample and separating ions of differing masses
and recording their relative abundance by measuring
intensities of ion flux. A typical mass spectrometer
comprises three parts: an ion source, a mass analyzer, and a
detector.
Sample Results
• Algorithm to dynamically select the
number of hidden neurons for each NN
model was developed.
Performance Evaluation
Peptide Fragmentation Ions?
• Performance of the model was
measured in terms of sensitivity,
specificity and accuracy of the
trained models on unseen data.
Feed Forward Network
•Ten fold cross validation was used.
Peptide Sequence: HRDTGILDSIGRZ
Peptide Sequence: HVLSGTLGCPEHTYR
Neural Networks?
A neural network is an interconnected group of nodes, akin to
the vast network of neurons in the human brain. Neural
networks, with their remarkable ability to derive meaning
from complicated or imprecise data, can be used to extract
patterns and detect trends that are too complex to be noticed
by either humans or other computer techniques. They can
then be used to provide projections given new situations of
interest and answer "what if" questions.
Single Processing Unit
Future Work
•The current predictor outputs the probability of the presence or absence of an
ion. We would like to extend this and build a regression model where in the
predictor will predict the true intensities of the ion peaks.
Publication
R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, P. Radivojac, “A Machine
Learning Approach to Predicting Peptide Fragmentation Spectra”, Pacific
Symposium on Biocomputing, PSB 2006.
•We are also planning to train a single neural network model which predicts all
the ions.
Disclaimer: The layout and design of the poster is not completely original and some of the figures in the poster have been borrowed from the internet.