eBank slides courtesy of Liz Lyon and Jeremy Frey - Purdue e-Pubs

Download Report

Transcript eBank slides courtesy of Liz Lyon and Jeremy Frey - Purdue e-Pubs

The Role of Libraries
in the Context of e-Science
Dr Anne E Trefethen
Oxford e-Research Centre
[email protected]
IATUL Porto, May 21, 2006
A Definition of e-Science
‘e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it.’
John Taylor
Director General of Research Councils
Office of Science and Technology, 2001
IATUL Porto, May 21, 2006
UK e-Science Programme
Director’s
Awareness and Co-ordination Role
Pilot Application
Programme
PPARC (£26m) £31.6m
BBSRC (£8m) £10.0m
MRC (£8m) £13.1m
NERC (£7m) £8.0m
ESRC (£3m) £10.6m
EPSRC (£17m) £18.0m
CLRC (£5m) £5.0m
Research Councils (£74m), £96.3m
DTI (£5m)
IATUL Porto, May 21, 2006
Director’s
Management Role
Generic Challenges
EPSRC (£15m) £16.2m, DTI (£15m)
Collaborative projects
Industrial Collaboration
e-Science Goals
• to enable new forms of science that are
–
–
–
–
–
distributed
collaborative
multi-disciplinary
information-intensive
data-intensive
• to use information technology to
– leverage data as a form of science capital
– to manage the “data deluge”
– improve access to scientific information
IATUL Porto, May 21, 2006
AstroGrid Slides courtesy
of Nick Walton, Cambridge
Powering the Virtual
Universe
http://www.astrogrid.ac.uk
(Edinburgh, Belfast, Cambridge, Leicester,
London, Manchester, RAL)
Multi-wavelength showing the jet in M87: from top to
bottom – Chandra X-ray, HST optical, Gemini mid-IR,
VLA radio. AstroGrid will provide advanced, Grid based,
federation and data mining tools to facilitate better and
faster scientific output.
21/03/05
Picture credits: “NASA / Chandra X-ray Observatory /
Herman Marshall (MIT)”, “NASA/HST/Eric Perlman
(UMBC), “Gemini Observatory/OSCIR”, “VLA/NSF/Eric
Perlman (UMBC)/Fang Zhou, Biretta (STScI)/F Owen
(NRA)”
National Centre for Text Mining
Image from
ESO
SWIFT satellite
observes gamma
ray burst
Gamma Ray Bursts
Interaction with
observatory pipelines
Localise GRB alert
in minutes – as
fade
rapidly.
Large computational
photometric redshift
calcs on multi-λ
Cross reference multi> gives distance
λ
Compare against
data – ID pre-cursor
SN
and or environment
light curves –
bump
shows eveidence
for a SN in the
GRB
(Price et al, 2002)
Reprocessing of
ionospheric STP data
change
coords
21/03/05
National
Centre for Text Mining
from earth to
Image + IRIS data
Collate data from
multiple telescopes
over months meta data issues
D. Ducros, ESA
Dark Matter + Large Scale Structure
Multi-TB λCDM
models, e.g.
Millennium Sim
Multiple large
image sources:
registration &
association
Automatic cluster
finding techniques
Generate Shear Maps
c.f. CDM models
> DM distribution
with redshift
X-ray cluster: Chandra Xray (Mullis) overlaid on a
deep BRI image (Clowe
& Luppino).
Image from
ESO
21/03/05
Source ID from
multiplexed
National Centre for Text Mining
spectral data
Colour-Colour
relationships
classification in
multi-phase space
Remove star
correlate gal
with z
Some facts on Astronomy data
• Virtual observatories
– Many national virtual observatories containing data at
different wavelengths. Estimated
• US NVO project alone will store 500 Terabytes/year
• Laser Interferometer Gravitational Observatory (LIGO) generates
250 Terabytes/year
• VISTA, Visible and infrared survey telescope estimated to generate
250 Gigabytes of raw data/night – 10 terabytes of stored data/year.
• Together with data analysis need to combine with
previously published knowledge on that astronomical
time/space events
IATUL Porto, May 21, 2006
myGrid:
Directly Supporting the e-Scientist
myGrid slides
courtesy of
Carole Goble
Partners
Manchester, EBI,
Southampton,Nottingham, Newcastle,
Sheffield
AstraZeneca
GlaxoSmithKline
IBM
Merck KGaA
SUN Microsystems
Epistemics Ltd
GeneticXchange
Network Inference
IATUL Porto, May 21, 2006
http://mygrid.man.ac.uk
(courtesy of Carole Goble, Manchester)
myGrid Project
• Imminent ‘deluge’ of
genomics data
• Highly heterogeneous
• Highly complex and
inter-related
• Convergence of data
and literature archives
IATUL Porto, May 21, 2006
An in silico experiment = a web of interconnected
information and components
People
Provenance
record of
workflow runs
Literature
Provenance of the workflow
template. Related
workflows.
Notes
Data in and out
Ontologies
describing
workflows
Services used
(courtesy of Carole Goble, Manchester)
The eBank Project
• Building links between e-research data, from the
CombeChem project, with scholarly communication and
other on-line sources
• Investigating the role of aggregator services in linking
data-sets from Grid enabled projects to open data archives
contained in digital repositories through to peer-reviewed
articles as resources in portals
• JISC-funded project led by UKOLN in partnership with
the Universities of Southampton and Manchester
IATUL Porto, May 21, 2006
(eBank slides courtesy of Liz Lyon and Jeremy Frey)
Comb-e-Chem Project
Video
Simulation
Diffractometer
Properties
Analysis
Structures
Database
X-Ray
e-Lab
Properties
e-Lab
Grid Middleware
IATUL Porto, May 21, 2006
(eBank slides courtesy of Liz Lyon and Jeremy Frey)
Goals of e-Bank Project
• Provide self archive of results plus the raw and
analysed data
• Links from traditionally published work provides
the provenance to the work
• Disseminate for “Public Review” – raw data
provided so that users can check themselves
• Avoid the “publication bottleneck” but still
provide the quality check
IATUL Porto, May 21, 2006
(eBank slides courtesy of Liz Lyon and Jeremy Frey)
Crystallographic e-Prints
 Direct Access to Raw Data
from scientific papers
Raw data sets can be very large and these
are stored at National Datastore using SRB
server
IATUL Porto, May 21, 2006
(eBank slides courtesy of Liz Lyon and Jeremy Frey)
e-Bank: Some Comments
• Data as well as traditional bibliographic
information is made available
• Can construct high level search on data
– aggregate data from many e-print systems
• Build new data services
• Will extend to provision of real spectra - rather
than very reduced summaries - for chemistry
publications
IATUL Porto, May 21, 2006
(eBank slides courtesy of Liz Lyon and Jeremy Frey)
Grid
E-Scientists
collaboration
storage &
processing
data &
metadata
Current E-Science
Focus: Experimentation
Virtual collaborations for
large-scale
experimentation & analysis
E-Experimentation
(eBank slides courtesy of Liz Lyon)
E-Scientists
Grid
1
Experimentation
& Analysis Cycle
E-Experimentation
(eBank slides courtesy of Liz Lyon)
2
Publication &
Preservation Cycle
E-Scientists
Reprints
PeerReviewed
Journal &
Conference
Papers
Grid
Technical
Reports
Preprints &
Metadata
E-Experimentation
Publisher
Holdings
Institutional
Archive
Local
Web
Certified
Experimental
Results &
Analyses
Data,
Metadata &
Ontologies
(eBank slides courtesy of Liz Lyon)
Digital
Library
3
Research Cycle
access & impact
E-Scientists
E-Scientists
Reprints
PeerReviewed
Journal &
Conference
Papers
Grid
Technical
Reports
Preprints &
Metadata
E-Experimentation
Publisher
Holdings
Institutional
Archive
Local
Web
Certified
Experimental
Results &
Analyses
Data,
Metadata &
Ontologies
(eBank slides courtesy of Liz Lyon)
Virtual Learning
Environment
Undergraduate
Students
Digital
Library
Graduate
Students
E-Scientists
4
E-Scientists
Reprints
PeerReviewed
Journal &
Conference
Papers
Grid
Learning Cycle
training and developing
tomorrow’s e-scientists
Technical
Reports
Preprints &
Metadata
E-Experimentation
Publisher
Holdings
Institutional
Archive
Local
Web
Certified
Experimental
Results &
Analyses
Data,
Metadata &
Ontologies
(eBank slides courtesy of Liz Lyon)
Virtual Learning
Environment
Undergraduate
Students
Digital
Library
E-Scientists
E-Scientists
Reprints
PeerReviewed
Journal &
Conference
Papers
Grid
Technical
Reports
Preprints &
Metadata
E-Experimentation
Publisher
Holdings
Graduate
Students
Institutional
Archive
Local
Web
Certified
Experimental
Results &
Analyses
Data,
Metadata &
Ontologies
5
Entire E-Science Cycle
Encompassing
experimentation,
analysis, publication,
research, learning
(eBank slides courtesy of Liz Lyon)
Role of publications in science
• Product of research
• Cumulative, historical
record of science
• Input to research
• Value chain: Network
of documents linked
via citations
IATUL Porto, May 21, 2006
(courtesy of Christine Borgman)
Publication changes
• Changes much broader than just the libraries
• Nature of publishing
• Cycle of authoring, publication, access
Drivers
• Technology
• Economics
• Social and Legal
IATUL Porto, May 21, 2006
Data Publishing
Databases, notably in biology, are replacing (paper)
publications as a medium of communication
– Built and maintained with a great deal of human effort
– Often do not contain source experimental data, sometimes just
annotation/metadata
– Borrow extensively from, and refer to, other databases
– Researchers are now judged by databases as well as (paper) publications
– Upwards of 1000 (public databases) in genetics
• Integration of literature and data analysis of increasing
importance - linking bio-database to literature, using
publishers to check, complete or complement contents
of such databases
IATUL Porto, May 21, 2006
Digital Curation?
• ‘In next 5 years e-Science projects will produce more
scientific data than has been collected in the whole of
human history’- Tony Hey
• In 20 years can guarantee that the operating and
spreadsheet program and the hardware used to store
data will not exist
– Research curation technologies and best practice
– Need to liaise closely with individual research communities,
data archives and libraries
IATUL Porto, May 21, 2006
Generic Issues
• Data Deluge from e-Science projects requires
technologies to facilitate discovery, analysis, curation
of data
• Sheer volume of text published and new results
appearing, is impossible for researchers to read and
correlate – text mining
• Effective automated processing required research,
locate, gather and make use of knowledge encoded
electronically in available literature
IATUL Porto, May 21, 2006
What data deserve to be
permanently accessible?
• What are the scientific
criteria for preservation?
• What is the equivalent of
peer review for data?
• Whose data do you trust?
• What data will be re-used?
• How much to invest?
• Who will add the value?
IATUL Porto, May 21, 2006
Digital Curation Centre
• Actions needed to maintain and utilise digital data
and research results over entire life-cycle
– For current and future generations of users
• Digital Preservation
– Long-run technological/legal accessibility and usability
• Data curation in science
– Maintenance of body of trusted data to represent current state of
knowledge
• Research in tools and technologies
– Integration, annotation, provenance, metadata, security…..
(www.dcc.ac.uk)
IATUL Porto, May 21, 2006
The hybrid library
‘The dominant user view of a library is of a
physical space. But libraries are services which
provide organised access, to the intellectual
record, wherever it resides, whether in physical
places or scattered digital information spaces. The
‘hybrid’ library of the future will be a managed
combination of physical and virtual collections
and information resources.’
Reg Carr, Oxford University
IATUL Porto, May 21, 2006
Conclusions
• Publication of data and “paper” becoming
integrated in the digital scholarly research cycle
• Libraries will move further to the “hybrid” model
– Institutional repositories
• e-Science brings with it the data deluge – needs
for data management and curation skills
• e-Scientists also need library training in discovery
and access
• Have implicitly touched on Open Access but as
policies begin to apply to data as well as
publication research outputs, then the above will
be even more so.
IATUL Porto, May 21, 2006
Acknowledgements
With special thanks to Tony Hey, Carole Goble,
Reg Carr, Jeremy Frey, Liz Lyon, Chris Borgman
and Nick Walton
IATUL Porto, May 21, 2006