BioMed Central’s open data
Alliance for Permanent Access conference
7th November 2012
Iain Hrynaszkiewicz
Publisher (Open Science), BioMed Central
About BioMed Central
• Launched in 2000, largest global publisher of peerreviewed open access journals (>240)
• >136,000 peer-reviewed open access articles published
• Part of Springer Science+Business Media since 2008
• Publish using Creative Commons (CC-BY) licenses
• Non-journal products include ISRCTN database
• Interested in innovation and recognise the growing need
for data sharing and publication
BioMed Central and open data
• Increasing transparency in scientific research and
scholarly communication is at the core of strategy
• Data are an increasingly integral part of scholarly
communication, with many opportunities for increasing
the pace of knowledge discovery
• Publishers, particularly open access publishers, are wellplaced to share information across domain boundaries
“By ‘open data’ BioMed Central means that these data are freely available on the public
internet permitting any user to download, copy, analyse, re-process, pass them to
software or use them for any other purpose without financial, legal, or technical barriers
other than those inseparable from gaining access to the internet itself. BioMed Central
encourages the use of fully open formats wherever possible.”
BioMed Central open data initiatives
Data journals and article types
Open Data Award
Data hosting, citation, deposition and linking
Lab notebook-journal integration (LabArchives)
Data licensing
Guidance and best practice e.g. human subjects –
confidentiality and consent
• Data formats and standards – efficient reuse
• Facilitation of data/text mining research
Problem: Lack of credit/recognition for
data sharing and publication
• In science credit is everything but incentives for data
publication are still emerging
• Datasets are not generally as discoverable and
citable as journal articles – yet
• Requirements for data sharing are field/locationspecific
• Need more empirical evidence of the benefits of data
publication for individual scientists
Solution #1: Journals and article types
enabling data publication
Data notes: “[B]riefly describe a biomedical data set
or database, with the data being readily accessible
and attributed to a source” http://bit.ly/y3Jb3b
Research: E.g. The International Stroke Trial
Data notes: “[E]xceptional datasets deposited
in our GigaScience repository that have been
selected for further peer review”
Solution #2: Open Data Award
“We ... recognize
researchers who have ...
have demonstrated
leadership in the
sharing, standardization,
publication, or re-use of
biomedical research
Solution #3: Enable and
encourage/require data citation
Only articles, datasets and abstracts that have been published or
are in press, or are available through public e-print/preprint servers,
may be cited
“Dataset with persistent identifier
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F;
Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome
data from sweet and grain sorghum (Sorghum bicolor).
GigaScience. http://dx.doi.org/10.5524/100012."
Problem: Where can data be stored –
• Publishers not best placed to run repositories for long
term preservation of large datasets
• Mirrors of publisher content not able to accept
arbitrary amounts of additional data
• Many data repositories exist but most are
domain/location specific and there are many different
types of funding model, license agreement and
persistent identifiers in use
Solution #1: Journal with integrated database
Assistant Editor:
Laurie Goodman, BGI (USA)
Scott Edmunds, BGI (China) Alexandra Basford, BGI (China)
GigaScience publishes ‘big-data’
studies from the entire spectrum
of life sciences
Novel publishing format manuscript publication and data
Assignment of data DOIs allows
separate data citation
The BGI is covering all APCs for
the first year after launch
GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological
and biomedical research as it enters the era of “big-data”… (see more)
Anatomy of a GigaScience Publication
Solution #2: Comprehensive author
information on available data repositories
Solution #3: Research on repositories
We are looking for
repositories with interests
in clinical research data –
can you help?
Problem: Data are not consistently
linked to publications
• Data deposition policies are not established in all
• Even where they are links/accession numbers tend to
be inconsistently presented and rarely cited
• Researchers may, independently of journal
requirements, deposit data in repositories
• A missed opportunity to enhance the literature
Solution #1: ‘Availability of supporting
data’ article section
• A tool to put data deposition policies – encouraged or
mandated – into practice
• Provides links in a consistent place within an article
to supporting data, regardless of the location or
format of the data
• Data must be permanently available (DOI or
• ~50 journals including GigaScience, BMC series
Availability of supporting data
BMC Res Notes 2012, 5:21 http://www.biomedcentral.com/1756-0500/5/21/
GigaScience 2012, 1:3 http://www.gigasciencejournal.com/content/1/1/3
Solution #3: Lab notebook integration
• BMC authors entitled to LabArchives’
(http://www.labarchives.com/bmc) online lab
notebook with 100Mb of free storage
• Features include:
- Data publishing with DOIs assignment
- Citable, linkable data supporting publications
- Reusable/integrate-able data with CC0 waiver
- Integrated manuscript submission to BMC journals
- Additional free storage (standard is 25Mb)
LabArchives partnership
24 Oct 2012
Open data
partnership leads to
release of data
from Nobel Prizewinning laboratory
for public use
Problem: Licensing that restricts data
integration and (re)use efficiently
“[P]eople mis-use copyright licenses on
uncopyrightable materials and data sets: the
confusion of the legal right of attribution in
copyright with the academic and professional
norm of citation of one's efforts.” John
“...any restrictions on use should be strongly
Wilbanks, VP, Science, Creative Commons,
http://bit.ly/djl5Fa August 11, 2010
resisted and we endorse explicit encouragement
of open sharing.” Schofield et al.: Post-publication
sharing of data and tools. Nature 2009, 461:171.
“The data should be released in standardized
formats without intellectual property constraints.”
Conway PH, VanLare JM: Improving Access to
Health Care Data: The Open Government
Strategy. JAMA 2010;304(9):1007-1008.
Why Creative Commons CC0?
• interoperability: CC0 is human and machinereadable
• universality: CC0 is global and universal and widely
• simplicity: no need for humans to make, and
respond to, individual data requests – avoids
“attribution stacking” with CC-BY licenses
Schaeffer P: Why does Dryad use CC0?
Solution: Stakeholder engagement and
community collaboration, leadership
Public consultation on
implementing CC0 for
data published in open
access journals: closes
10th November 2012
Hrynaszkiewicz I, Cockerill MJ:
Open by default: a proposed
copyright license and waiver
agreement for open access
research and data in peerreviewed journals. BMC Research
Notes 2012, 5:494
Implementing CC0 in journals – how?
• Specify a date from which the new license would
apply to data (CC-BY remains for other content)
• Only applies to data submitted to the journal
• Some relatively minor technical and operational
• Cultural change may be the biggest challenge
• Consultation is identifying common concerns, FAQs,
and further definitions and use cases for open data in
journal publications
Hrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright
license and waiver agreement for open access research and data
in peer-reviewed journals. BMC Research Notes 2012, 5:494
Problem: Lack of guidance, exemplars,
incentives to make date reusable
• Sharing/publishing detailed human subjects data, in
the absence of explicit consent, can potentially
infringe privacy (ethically and legally)
• Data are more (re)usable if published in community
endorsed, standard formats
• Standards and appropriate guidance do not yet exist
in all domains
• Few incentives to follow data standards
Solution #1: Work with journal editors
to produce guidance where it is needed
BMJ 2010;340:c181
Co-published in:
Trials 2010, 11:9
Solution #2: Publish exemplars
Solution #3: Incentivize, promote and
share best practice and standards
Problem: Adding value to data of use to
researchers, readers and publishers
• Text/data mining applications often are research
project or research specific and not always attractive
to commercial publishing platforms and their
• Value to the non-expert can be limited
• Makes business model/case challenging for
www.casesdatabase.com –
www.casesdatabase.com –
www.casesdatabase.com –
coming soon
The future...
Image adapted from Gillam
et al: The Healthcare
Singularity and the Age
of Semantic Medicine. In
The Fourth Paradigm (2009)
Iain Hrynaszkiewicz
Publisher (Open Science), BioMed Central
