A Data Model for Annotating the Peaks of Mass

Transcript A Data Model for Annotating the Peaks of Mass

A Data Model for Annotating the Peaks of
Mass Spectrum Data
Dennis Harman[1] David Sigfredo Angulo[2] Kevin Drew[3] Alex Schilling[4]
[1]
DePaul University, CTI. [email protected]
[3]
DePaul University, [email protected] (to whom correspondence is
to be addressed)
[4] University of Illinois at Chicago. [email protected]
DePaul University, CTI. [email protected]
[2]
Mass Spectrometry Data
The advent of mass spectrometry (MS) for characterization of proteins has revolutionized proteomics. MS is
technology used to study gas-phase ions by mass analysis, i.e., the separation of ions by their m/z
(mass/charge) values. Although various mass spectrometers differ in their sensitivity and technology used in
calculating m/z, all of them generate mass spectra as output. The raw data generated by a mass spectrometer
must be refined before being informative. The process involves the interpretation of a mass spectrum. This
requires a knowledge of the protonated peptide structure and the way that a peptide fragments [1]. The
terminology conventionally used in describing MS ions encapsulates this information. Descriptive
information using such terminology can be added to a mass spectrum as metadata.
Summary
MS protein sequencing techniques can provide information which allows researchers to determine the
amino acid sequence of peptides. A data model is described that will allow for the annotation of mass peak
data produced by a range of mass spectrometry experiments directed at peptide sequencing, characterization
of post-translational modifications, and metabolite profiling. The model includes comprehensive sets of
commonly used ion fragmentation terminology, while also allowing the user flexibility by providing userdefined text fields. The next phase of work will involve completing a physical data design of the model that
integrates into the current IBG-MSP database schema and implementing memory-resident data structures to
represent annotations.
Entity-Relationship Model
Components and Data Flows in the Annotation Process
Process Workflow for a Mass Spectrometry-based Proteomics Experiment
Analysis of a peptide mass
spectrum provides information
about the amino acid sequence
of the peptide contained in the
spectrum. Ions are displayed as
spectral peaks, which can be
annotated, e.g., a mass peak
may be annotated as a “y3” ion.
Data requirements for our
annotations data model were
determined by identifying
possible ion types and focusing
on use case scenarios to identify
other forms of data that a
researcher using MS may want
to include as annotations to a
mass spectrum. In addition to
annotations useful for peptide
sequencing experiments, the
model also allows for
annotating experimental data
concerned with the
characterization of posttranslational modifications of
proteins, and the identification
of metabolites (intermediate
molecules resulting from
chemical reactions occurring
within biological systems).
The annotations data
model is mapped to an
XML schema that will
extend mzXML, an open,
XML-based representation
for MS data [2]. The XML
model will support data
exchange between the
relational Illinois Bio-Grid
Mass Spectrum database
(IBG-MSD) [3] and other
tools. One such tool is the
IBG Desktop, an opensource tool which can be
used to edit and display
annotations [4].
*Ruedi, A., Matthias M., "Mass spectrometry-based
proteomics", Nature vol. 422, 2003 March, pp. 198-207.
Excerpt from XML Schema
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="peakAnnotations">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="annotation"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="annotation">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" ref="ionSeriesType"/>
</xs:sequence>
<xs:attribute name="id" use="required"/>
<xs:attribute name="mz" use="required"/>
<xs:attribute name="intensity" use="required"/>
<xs:attribute name="comment"/>
</xs:complexType>
</xs:element>
References
[1] M. Kinter and N.E. Sherman, Protein Sequencing and Identification Using Tandem Mass Spectrometry,
2000; John Wiley & Sons, Inc, New York, NY.
[2] P.G.A. Pedrioli1, et. al., “A common open representation of mass spectrometry data and its application to
proteomics research”, Nature Biotechnology, vol. 22, 2004 Nov, pp. 1459-1466.
[3] http://histone.cti.depaul.edu/IBG-MSP/login.do
[4] http://gridweb.cti.depaul.edu/twiki/bin/view/IBG/IBGSoft#IBG_DeskTop
Acknowledgements
Work supported in part by the National Institute of Health (NIH) under Grant No. R01 HG3864.

A Data Model for Annotating the Peaks of Mass

Transcript A Data Model for Annotating the Peaks of Mass

Directory