Transcript Slides

Making research findings
visible – the future of the
scientific paper
Matthew Cockerill
Publisher, BioMed Central
"There is nothing more amusing than
watching business interests work
themselves up into a righteous frenzy over
a threat to their monopoly profits from a
new technology or some upstart with a
different business model. Invariably, the
monopolists… try to present themselves as
champions of the consumer, or defenders of
a level playing field, as if they hadn't
become ridiculously rich by sticking it to
consumers and enjoying years in which the
playing field was tilted to their advantage."
Steven Pearlstein in the Washington Post, July 19 2006
Status of open access publishing
Momentum for transition to OA
 We are seeing action (not just words) from
funding agencies and governments
– Wellcome and several UK research councils now require OA
deposit as a condition of grants
– Federal Research Public Access Act may do the same in US
 OA journals continue to grow rapidly
 Impressive impact factors demonstrate OA
and quality are absolutely compatible
 Move to OA basically unstoppable
Rolling 28-day count of submissions to BioMed Central
Journals
Growth of OA
1400
1200
Submissions
1000
800
600
400
200
0
Jul-00
Jan-01
Jul-01
Jan-02
Jul-02
Jan-03
Jul-03
Jan-04
Jul-04
Jan-05
Jul-05
Jan-06
Jul-06
Impact factors
 Genome Biology – IF 9.71
 BMC Bioinformatics – IF 4.96
 BMC Genomics – IF 4.09
Genome Biology is:


10th of 124 in GENETICS & HEREDITY
4th of 139 in BIOTECHNOLOGY & APPLIED MICROBIOLOGY
What does this mean for the
future of the scientific article?
Why did we start BioMed Central
as an open access publisher?
 Limited access to research articles makes
further research needlessly inefficient
 Barriers to access obstruct interdisciplinary
cross-fertilization
 It is in the interest of researchers for their
research being read and cited as widely as
possible
 Traditional scientific publishing is not an
effective market, and so high serials prices
mean a poor deal for the scientific community
The main reason we started
BioMed Central




Publications and data are a continuum
Publications include data
Publications are data
To make sense of data and publications
delivered by post-genomic science, we need
– The best possible tools
– The widest possible collection of raw material
 Open access stimulates the creation of tools
by providing access to the raw material
The future of the scientific
article
 Computers will be at least as
important as human readers
Text mining
 Open access facilitates text mining
 BioMed Central XML corpus of full
text articles is freely downloadable
 The more semantics that are
captured in the XML, the richer the
possibilities for mining
Existing examples of automated
sifting of published research
Postgenomic
CiteULike
This is just
bibliographic information –
but it's a start
Semantic enrichment
 Ensure that the rest of the knowledge
represented in scientific articles is
structured to be computer-readable
 Ideally capture semantics
unambiguously at time of publication
 Mining of free text is a stopgap/fall-back
 It is not just articles that need semantic
enrichment, but data sets too
 Appropriate standards are now emerging
RDF
 Useful common technical standard
for expressing semantics
 Subject-predicate-object triples
 BioMed Central already exposes
bibliographic RDF for all articles
 Tools like the PiggyBank can
capture RDF and then store it in
triple-stores (local or networked)
Semantic Laundry List
 Scientific stuff
–
–
–
–
–
–
–
–
–
Genes
Proteins
Anatomy
Taxonomy
Small molecules/drugs
Macromolecules
Diseases
Experimental methodologies
Experimental data types
 General stuff
– People, Places, Organizations, Relationships
NCBO
e.g. of enriched research
Neurocommons.org
 A ScienceCommons project
 Working with open access articles
from BioMed Central and PLoS
 Attempting to define best
practices/gold standard for
semantic enrichment of articles
 Text mining and enhanced
authoring tools both have role
The role of wikis
 The challenge: Ontologies, to be useful,
must stay up-to-date and receive
ongoing maintenance and curation
 Scope of problem is enormous - every
entity and relationship of relevance to
science
 Wikis provide a promising approach perhaps the only viable approach
 e.g. AuthorIDs
Projects at BioMed Central to
capture structured info
Case reports
Clinical trials
Biological processes
Chemical structures
Taxonomic descriptions
Publishing research articles in a more structured form
allows the results to be treated as a database
Structured authoring
Publicon – an experiment in
structured authoring
Benefits of structure
Live maths in articles
Live maths in articles
Problem – adding structure is a
hassle
Incentivize authors
 Ideally, create structured
authoring tools that remove work
rather than add it (e.g. EndNote)
 If you do create extra work for
authors, find a way to provide the
author with an immediate return
on investment
Reduce work - smart authoring
e.g. auto suggest
 Standard way to disambiguate contacts
 Why not chemicals, genes, species too?
– Unambiguously capture semantics
– Increase accuracy, save time, encourage uptake
Return on investment
 Automatic update of meta-analysis
based on clinical trial data
 Automatic list of closely-related
case reports from database
 Automatic deposit of taxonomic
information in registry (Zoobank)
Q&A