Dave De Roure

Download Report

Transcript Dave De Roure

Co-evolution of
digital technologies
and research methods
David De Roure
e-Science
• e-Science was defined by John Taylor (Director
General of the UK Research Councils) as
global collaboration in key areas of science
and the next generation of infrastructure that
will enable it
• e-Science was the name of the destination
• It became the name of the journey
• When we arrive, the destination is just called
science
Researchers
Infrastructure
Researchers
Infrastructure
Researchers
Infrastructure
...the imminent flood of
scientific data expected
from the next generation of
experiments, simulations,
sensors and satellites
Tony Hey and Anne Trefethen
Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
4th Paradigm
The Fourth Paradigm:
Data-Intensive
Scientific Discovery
Presenting the first
broad look at the rapidly
emerging field of dataintensive science
http://research.microsoft.com/en-us/collaboration/fourthparadigm/
Doug Kell
BioEssays,
26(1):99–105, January 2004
e-Research
“e-research extends
e-Science and
cyberinfrastructure
to other disciplines,
including the
humanities and
social sciences.”
http://mitpress.mit.edu/catalog/item/default.asp?tid=12185&ttype=2
SALAMI
23,000 hours of
recorded music
Digital Music
Collections
Student-sourced
ground truth
Music Information
Retrieval Community
Community
Software
Supercomputer
Linked Data
Repositories
Datascopes
telescopes for the naked mind
Malcolm Atkinson
NRAO/AUI/NSF
From Signal to Understanding
Jeannette M. Wing COMMUNICATIONS OF THE ACM March 2006/Vol. 49, No. 3 Pages 33-35
E. Science laboris
• Workflows are the new rock
and roll
• Machinery for coordinating
the execution of (scientific)
services and linking together
(scientific) resources
• The era of Service Oriented
Applications
• Repetitive and mundane
boring stuff made easier
Carole Goble
Triana
Trident
Kepler
BPEL
Meandre
Taverna
Galaxy
co-shaping
co-design
co-evolution
co-creation
co-construction
co-constitution
co-realisation
co-
Reuse, Recycling, Repurposing
• Paul writes workflows for identifying biological
pathways implicated in resistance to
Trypanosomiasis in cattle
• Paul meets Jo. Jo is investigating Whipworm in
mouse.
• Jo reuses one of Paul’s workflow without change.
• Jo identifies the biological pathways involved in
sex dependence in the mouse model, believed to
be involved in the ability of mice to expel the
parasite.
• Previously a manual two year study by Jo had
failed to do this.
Carole Goble
“A biologist would rather share their
toothbrush than their gene name”
Mike Ashburner and others
Professor in Dept of Genetics,
University of Cambridge, UK
Data mining: my data’s mine and your
data’s mine
 “Facebook for Scientists” ...but
different to Facebook!
 A repository of research
methods
 A community social network of
people and things
 A Social Virtual Research
Environment
 A probe into researcher
behaviour
 Open source (BSD) Ruby on
Rails app
 REST and SPARQL interfaces,
supports Linked Data
 Inspiration for: BioCatalogue,
MethodBox and SysMO-SEEK
myExperiment currently has 4767 members, 270 groups, 1848
workflows, 424 files and 174 packs
http://www.myexperiment.org/
method
data
Paul’s
Paul’s
Pack
Research
Object
QTL
Workflow 16
Results
produces
Included
in
Published in
Included in
Feeds into
Logs
produces
Included in
Included in
Metadata
Slides
produces
Paper
Published in
Common pathways
Workflow 13
Results
The Six Rs of Research Object Behaviours
Research Objects enable data-intensive research to be:
1.
2.
3.
4.
5.
6.
7.
Replayable – go back and see what happened
Repeatable – run the experiment again
Reproducible – independent expt to reproduce
Reusable – use as part of new experiments
Repurposeable – reuse the pieces in new expt
Reliable – robust under automation
Referenceable – citable and traceable
De Roure, D. (2010) “Replacing the Paper: The Twelve Rs of the e-Research Record”, Nature Network
eResearch blog, article posted November 27, 2010. Available on
http://blogs.nature.com/eresearch/2010/11/27/replacingthe-paper-the-twelve-rs-of-the-e-research-record
Semantically enhanced publication versus
Shared digital Research Objects
Challenging the mindset of immutable paper-sized chunks
“Documents
under glass”
Jeremy Frey
MethodBox
http://www.methodbox.org/
Enable cross
disciplinary research
into Major Public
Health problems
Ease handling data
and sharing results
and insights
A Bioinformatics Experiment
Scott Marshall
Marco Roos
“…to discover proteins that interact with transmembrane
proteins, particularly those that can be related to neurodegenerative diseases in which amyloids play a significant role”
1) Taverna provenance exposed as RDF
2) myExperiment RDF document for a protein discovery workflow
3) Mocked-up BioCatalogue document using myExperiment RDF
data as example
4) Provisional RDF documents obtained from the ConceptWiki
(conceptwiki.org) development server
5) An RDF document for an example protein, obtained from the RDF
interface of the UniProt web site
www.wf4ever-project.org
www.executablepapers.com
Executable Journals
• A scientific publishing perspective
• The “executable journal” is a platform for
publishing experiments
– the platform hosts the experiments
– journal submissions both run on and add
components to the platform
• To be discussed at the Future of Research
Communication
Computational Research Objects
• Programmatic use, e.g. autonomic curation
• Research Objects contain process specifications
• Developing a “semantics” of Research Object
execution and composition
• Combines REST, Linked Data and Programming
Language semantics
Headlines
1. Primacy of method in a data-centric world
2. Emergence of new sharable digital artefacts
3. Social Media elsewhere in the cycle
4. Executable papers and journals
5. Computational Research Objects
[email protected]
blogs.nature.com/eresearch
@dder
Thanks to: Carole Goble, myGrid and myExperiment; Iain Buchan; Sean
Bechhofer; Doug Kell; Jeremy Frey; Marco Roos; Malcolm Atkinson.