Kiran - Bioinformatics at School of Informatics, Indiana University

Download Report

Transcript Kiran - Bioinformatics at School of Informatics, Indiana University

ProReP - Protein Results Parser v3.0©
A Tool For Handling Tandem Mass Spectrometer Protein Database
Search Results
Capstone Presentation
Kiran Annaiah (M.S Bioinformatics)
Advisors
Dr. Randy Arnold
Dr. Haixu Tang
Outline








Background
Data generation from Mass Spec Experiment
Mascot Search Engine
Why to parse Mascot results?
Parser features
Results
Conclusions
Acknowledgments
Background
 High-throughput “shotgun” Proteomics
• Identify, characterize and quantify all expressed proteins
simultaneously in a mixture.
 Mass Spectrometry
• Peptide mass fingerprinting
• Collision Induced Dissociation (CID) spectra from MS/MS analysis
 LC/MS/MS approach used to identify protein components in
a complex mixture
 Tandem mass spectra helps in inferring amino acid
sequences of peptides
Peptide Mass Fingerprinting vs. MS/MS protein identification
James S. Eddes et.al., 2002, Proteomics
Database Searching
b1
NH2
b2
b 3 b4
b5
b6
b7
L M G S E P I K
y7
m/z
y6
Database searching software
MASCOT®
y5
y4
y3
y2
CO2
y1
Results
Proteins found
Database (SwissProt)
Hemoglobin, beta chain
Actin
Pept.
MYTCVPIASEQUENCEMIMEWTPQSDLI
RPTVCIMNERCVGGPYILCMTEND
Mass
Score
Sequence
1
738.84
41
HLDNLK
2
912.01
61
VHLTDAEK
Amylase
3
915.06
56
AAVNGLWGK
DSLIKRNYTIPMCSQIRECNHIPLMTRCH
GYYKWSIALAINTQSFGIVRIVAMNKLPS
SCRTIVGHWEDRICTMQNCISPPEKELIA
VARGTSP
4
1090.24
41
VINAFNDGLK
5
1122.33
62
VVAGVASALAHK
6
1218.42
70
LVINAFNDGLK
…
…
Mascot Search Engine
 Uses mass spectrometry data to identify proteins from
primary sequence databases
 MS/MS ion search
– Enzyme cleavage rules applied to sequences in the protein
databases
– Experimental mass values compared with calculated fragment
ion mass values
– Use scoring algorithm to identify the closest match or matches
– Probability based MOWSE scoring algorithm
 Databases
–
–
–
–
MSDB – non-identical protein sequence DB
NCBInr
SwissProt
dbEST – “single-pass” cDNA sequences or EST’s
A Typical Experiment
Analysis of Liver / Brain Tissue
RT: 0.00 - 169.99
52.01 58.65
39.66
100
47.99
95
Digest with Trypsin
77.58
75.32
NL: 4.27E9
Base Peak F: + c
NSI Full m s [
250.00-1500.00]
MS
CH_whole_RG_0
71503_V06
86.93
94.76
99.69
73.09
143.16
107.63
69.37
90
85
34.66
66.30
80
75
70
113.22
Relative Abundance
65
60
55
115.90
50
116.53
45
134.48
40
134.06
35
144.74
30
Liquid Chromatography
146.35
117.33
25
118.31
20
123.94
34.22
15
154.64
10
5
10.63 15.79
0
0
32.20
161.33
20
40
60
80
Tim e (m in)
100
120
140
160
LC eluting sample electrosprayed into Mass Spec
CH_whole_RG_071503_V06 #3291 RT: 72.31 AV: 1 NL: 9.71E8
F: + c NSI Full ms [ 250.00-1500.00]
608.1
100
95
801.8
90
85
80
MS-MS on intense peak of a parent ion
APAAIGAYSQAVLVDR
from 14.5 kDa translational
inhibitor protein
75
70
Relative Abundance
65
60
55
50
45
1051.3
840.6
40
696.1
35
597.9
579.9
30
894.1
25
545.1
667.7
20
Raw data converted to a DTA file
746.9
1214.7
620.3
15
931.3
528.0
1023.8
10
1066.7
1098.4 1148.2
473.3
5
1321.5
1216.9
1386.4
454.7
277.2 335.9 397.7
1469.3
0
300
Mascot Search
400
500
Generates Html file
600
700
800
900
m/z
1000
1100
1200
1300
1400
1500
Mascot output – Html file (avg. size 5 MB)
Motivation
 Mass spectrometry generates enormous amount of data
 Mascot returns on an average hundreds of proteins
matching the mass spectral data
 Time consuming to analyze the mascot results manually
 Need different ways of looking at data
 Comparison of various data sets (experiments)
 No tools were available in public domain to analyze
Mascot results
Protein Results Parser v3.0
Features




Single File parsing
Sequence coverage - with single file parsing
Two-file comparison
Multiple files
– Compare
– Combine
 Tool was developed using Perl/Tk
 Windows application
Single File Parsing
Screened Html Result (smaller file size)
Sequence Coverage
Two file Comparison
Results – Comparison of Two Experiments
Combine and Compare Feature
Drug A
Treatments
Drug B
(protein digest)
Fractions
(SCX)
Triplicates
(LC/MS/MS)
15 data files
15 data files
Combine
Combine
Compare
Multiple File Comparison
Results – Multiple file comparison (sequential display)
Results –
Multiple file comparison
(tabular display)
Combine – Merging of multiple experiments
Results – combining multiple experiments
+
+
Conclusion
 Decreased data analysis and processing time.
 Search results reduced using user specified criteria in an
automated way.
 Removal of low-scoring peptide matched greatly improves the
accuracy of data interpretation
 A single result file can be processed multiple times, using a
different set of parsing criteria each time, without the need to
repeat the database search.
 The ability to compare two or more result files in an
automated fashion makes determination of sample similarity a
nearly effortless endeavor
Acknowledgements
• Dr. Randy Arnold – Manager and Research Scientist
(Proteomics Research and Development Facility – Dept. of Chemistry)
• Dr. Haixu Tang – Asst. Prof, School of Informatics
• Abhijit Mahabal – Grad student, CS Dept.
• Kranthi Varala – Grad Student, Bioinformatics