Molecule Pages - Berkeley Database Research
Download
Report
Transcript Molecule Pages - Berkeley Database Research
Database Publishing at Nature
Timo Hannay
Nature Publishing Group
7 October 2005
Overview
Publishing collaborations: Making databases
more like journals
NPG New Technology: Making journals more
like databases
Tagging and social bookmarking: New methods
of annotation and navigation
Database publishing at NPG
The AfCS-Nature Signaling Gateway
(http://www.signaling-gateway.org/)
The CMC-Nature Cell Migration Gateway
(http://www.cellmigration.org/)
Forthcoming collaborations with NCI and several
other groups
The AfCS-Nature Signaling Gateway
A freely available online resource for anyone
interested in cellular signalling
A collaboration with the research community
through the Alliance for Cellular Signaling
An experiment in the next generation of online,
database-driven scientific publications
The Signaling Gateway
• Facts and figures on
major cell signaling
proteins (3,700+)
• Continually updated by
selected experts (~1000)
• Peer-review run by NPG
Home,
Info & News
Signaling
Update
Molecule
Pages
News & comment
written and
commissioned by
NPG editors
• Repository for
raw experimental
data from AfCS
• Tools for viewing
and analyzing
AfCS
data (online &
offline)
Data
Center
Hardware & software hosted at
San Diego Supercomputer Center
The Molecule Pages
Comprehensive, structured data for 3,700+ proteins
involved in cellular signalling
Some information automatically fed in from other online
databases and updated monthly
Other information entered by selected expert authors and
updated annually
Author-entered data peer-reviewed by NPG
Fully citable using digital object identifiers (DOIs)
Using Digital Object Identifiers
http://dx.doi.org/10.1038/35057062
Nature 409,
860 - 921 (2001)
doi:10.1038/35057062
Correct URL at
publisher’s
website
IDF/CrossRef databases
• Allows unambiguous identification of paper
• Allows readers to find the paper online
• Allows publishers to cross-link reference lists
• Guaranteed not to change (even if the publisher changes)
The Molecule Pages: A scientific publication
Characteristic
Traditional
journal
Traditional
database
Molecule
Pages
Recognised serial publication
with an ISSN
Authored by recognised
scientific experts
?
Subjected to full anonymous
peer review
Maintained indefinitely (with
errata and addenda)
Formerly citable and fully
integrated into CrossRef
Structured and highly
queryable
The Molecule
Pages has the
same features
as a traditional
journal, except
that the
information it
contains is
more highly
structured and
queryable.
Overview
Publishing collaborations: Making databases
more like journals
NPG New Technology: Making journals more
like databases
Tagging and social bookmarking: New methods
of annotation and navigation
Great underestimated technologies of our age
Technology
Purported use
Eventual impact
Steam engines
(early 1700s)
Pumping water
from coal mines
The Industrial Revolution
Alternating current
(1880s)
Executing
criminals
The electrically
powered society
Web-based
scientific publishing
(2004)
A new charging
model for
scientific papers
Redefining the concept
the scientific paper
Scientific papers as structured data objects
Print
journal
Article metadata
database
Structured
data sets
<rdf>
Online
facsimile
<svg>
</svg>
Structured,
interactive and
queryable
figures and text
</rdf>
circa 2000
circa 2006
Experimental article metadata database
Initial data to be included:
Author and institute details
Scientific:
Molecules (InChI)
Genes (Entrez Gene)
Proteins (UniProt)
Cellular processes, functions, locations (GO)
Species (NCBI)
Citation annotations (controlled vocabulary)
Support for structured data sets
Preview in browser
Download to desktop software
Developing support for:
• Systems Biology
Markup Language
• CellML
• Chemical Markup
Language
• Others
Search for more data
SVG: Figures as interactive data objects
Plot graph on axes of choice
Overlay data sets of choice
Zoom and pan to view detail
Click to download raw data
Automated scientific markup and linking
Increasing structure in text markup (1)
The old way (no semantic markup):
“<p>...gp120 binding to CXCR4 or CCR5 activates PYK2 and
FAK…</p>”
Now (key entities and concepts marked up):
“<p>...<protein id="urn:lsid:uniprot.org:uniprot:P03378">gp120</protein>
<action id="urn:lsid:geneontology.org:go:000548">binding</action> to
<protein id="urn:lsid:uniprot.org:uniprot:P48061">CXCR4</protein> or
<protein id="urn:lsid:uniprot.org:uniprot:P10147">CCR5</protein> <action
id="urn:lsid:geneontology.org:go:0008047">activates</action> <protein
id="urn:lsid:uniprot.org:uniprot:O43150">PYK2</protein> and <protein
id="urn:lsid:uniprot.org:uniprot:Q05397">FAK</protein>…</p>”
Increasing structure in text markup (2)
The new way (full RDF/XML):
<p>...
<rdf:Graph xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:go="urn:lsid:geneontology.org:go:"
xmlns:uniprot="urn:lsid:uniprot.org:uniprot:">
<go:000548>
<uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P03378"/>
<uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P48061"/>
<go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:O43150"/>
<go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:Q05397"/>
</go:000548>
<go:000548>
<uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P03378"/>
<uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P10147"/>
<go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:O43150"/>
<go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:Q05397"/>
</go:000548>
<rdf:label>gp120 binding to CXCR4 or CCR5 activates PYK2 and
FAK</rdf:label>
</rdf:Graph>
…</p>
With RDF
markup, the
article XML
itself literally
becomes a
relational
database
Why go to all this effort?
Discoverability and
recontextualisation
“Show me statements about the
hedgehog gene.”
“Find claims that disagree with this.”
Transparency and flexibility
“Plot this graph on a different scale,
with error bars added and with these
two extra data sets overlaid.”
Specificity and completeness
“Give me a full description of this
mathematical model that I can run on
my own computer.”
Reuse and interoperability
“Provide the raw data set used in this
analysis in a form that allows me to
merge it with my own data.”
Views from the database side
“Before the end of the next decade, pathway
databases will become scientific journals and journals
will become databases. Biologists will be greatly
empowered, and bioinformatics will continue its long
evolution.”
Lincoln Stein (Reactome)
“Is a biological database any different than a
biological journal? I am working toward reaching an
answer of, no, there is no difference.”
Phil Bourne (Protein Data Bank)
Overview
Publishing collaborations: Making databases
more like journals
NPG New Technology: Making journals more
like databases
Tagging and social bookmarking: New methods
of annotation and navigation
A few uses for Connotea
Keeping bookmarks and references in order
Sharing links and ideas within a team (perhaps
geographically dispersed)
Providing readers with a (dynamic) list of further
or related reading
Encouraging readers to share relevant links with
the author and with each other