eBank UK - linking research data, scholarly communications

Download Report

Transcript eBank UK - linking research data, scholarly communications

eBank UK : linking research data,
scholarly communication and learning.
Dr Liz Lyon, UKOLN, University of Bath
Dr Simon Coles, School of Chemistry, University of Southampton
AHM, Nottingham, September 2004
1
Overview
• In context: scholarly communications
– Open Access
– Data, information, workflows and provenance
• The data publication bottleneck
– e-Science and crystallography
– Comb-e-chem Project
• eBank UK
– Information architecture and data flow
– Interoperability issues
• Challenges for the future
AHM, Nottingham, September 2004
2
Scholarly communications
Current chemistry publishing protocols
Ideas and interpretations
Hooks into the literature
Raw data!
Results &
derived
data
AHM, Nottingham, September 2004
4
AHM, Nottingham, September 2004
5
AHM, Nottingham, September 2004
6
The government line
“It is envisaged that the sharing of primary data
would
prevent
unnecessary
repetition
of
experiments and enable scientists to build directly
on each others’ work, creating greater efficiencies
and productivity in the research process.”
AHM, Nottingham, September 2004
7
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Repositories :
institutional,
e-prints, subject,
data, learning objects
The scholarly
knowledge cycle.
Liz Lyon,
eBankUK article.
Ariadne, July
2003.
Validation
Publication
Linking
Data curation:
databases & databanks
Peer-reviewed
publications: journals,
conference proceedings
AHM, Nottingham, September 2004
8
Presentation services: subject, media-specific, data, commercial portals
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Peer-reviewed
publications: journals,
conference proceedings
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Resource
discovery, linking,
embedding
AHM, Nottingham, September 2004
Validation
Quality
assurance
bodies
9
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Resource
discovery, linking,
embedding
Linking
Data curation:
databases & databanks
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Peer-reviewed
publications: journals,
conference proceedings
AHM, Nottingham, September 2004
Validation
Quality
assurance
bodies
10
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator services:
eBank UK
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Resource
discovery, linking,
embedding
Linking
Data curation:
databases & databanks
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Peer-reviewed
publications: journals,
conference proceedings
AHM, Nottingham, September 2004
Validation
Quality
assurance
bodies
11
The Data Publication Bottleneck
The data deluge
Data
Overload!
EPSRC National
Crystallography
Service
How do we
disseminate?
AHM, Nottingham, September 2004
13
CombeChem: An EPSRC pilot project
Simulation
Video
Diffractometer
Properties
Analysis
Structures
Database
Properties
e-Lab
X-Ray
e-Lab
Grid Middleware
AHM, Nottingham, September 2004
14
Virtual Learning
Environment
Undergraduate
Students
Digital
Library
E-Scientists
E-Scientists
Reprints
PeerReviewed
Journal &
Conference
Papers
Grid
Technical
Reports
Preprints &
Metadata
E-Experimentation
Publisher
Holdings
Graduate
Students
Institutional
Archive
Local
Web
Certified
Experimental
Results &
Analyses
Data,
Metadata &
Ontologies
AHM, Nottingham, September 2004
5
Entire E-Science Cycle
Encompassing
experimentation,
analysis, publication,
research, learning
15
The eBank UK Project
eBank UK project
• JISC-funded for 1 year from September 2003
• UKOLN at the University of Bath (lead), University of
Southampton, University of Manchester
• “Building the links between research data, scholarly
communication and learning”
• Exemplar: e-Science testbed ‘Combechem’
–
–
–
–
Grid-enabled combinatorial chemistry
Crystallography, laser and surface chemistry examples
Development of an e-Lab using pervasive computing technology
National Crystallography Service
• Resource Discovery Network / PSIgate physical
sciences portal
• http://www.ukoln.ac.uk/projects/ebank-uk/
AHM, Nottingham, September 2004
17
The project team
•
•
•
•
•
•
•
UKOLN
Michael Day
Monica Duke
Rachel Heery
Liz Lyon
+
Andy Powell
•
•
•
•
•
•
Southampton
Les Carr
Simon Coles
Jeremy Frey
Chris Gutteridge
Mike Hursthouse
• Manchester
• John Blunden-Ellis
AHM, Nottingham, September 2004
18
First steps: establishing common ground…
• Understand the data creation process
• Terminology and definitions
–
–
–
–
–
Data
Metadata
Datafile
Dataset
Data holding
• Different views
– Digital library researchers, computer scientists, chemists
– Generic vs specific
– Modeller vs practitioner
• Aim for a common ontology
• Modelling the domain
• Creating a metadata schema
AHM, Nottingham, September 2004
19
Progress update
• Version 2.0 eBank metadata schema
• Enhanced ePrints.org software
• Pilot institutional e-data repository for
harvesting (raw, derived, results data)
• Exports records as ebank_dc and oai_dc
• Validation of schema
• Pilot eBank UK aggregator service
• Developing search interface Version 1.0
• Testing with PSIgate physical sciences portal
– embedding eBank UK
AHM, Nottingham, September 2004
20
Crystallography workflow
• Initialisation: mount new sample on diffractometer &
set up data collection
• Collection: collect data
• Processing: process and correct images
• Solution: solve structures
• Refinement: refine structure
• CIF: produce CIF (Crystallographic Information File
format)
• Report: generate Crystal Structure Report
RAW DATA
DERIVED DATA
RESULTS DATA
AHM, Nottingham, September 2004
21
Deposition into the archive
AHM, Nottingham, September 2004
22
An Archive entry
For a demo come
to the JISC booth!
Today @ 13:00 &
during tea
ecrystals.chem.soton.ac.uk
AHM, Nottingham, September 2004
23
All the way back to the underlying data…
AHM, Nottingham, September 2004
24
Some metadata issues
• Using simple and qualified Dublin Core
• Additional chemical information in schema for
harvesting e.g. empirical formula
• Schema contains International Chemical Identifier
(InChI)
• Links to all datasets associated with an experiment
• Links to individual datasets within an experiment
• Links to eprints (and other published literature)
derived from the data
• Using vocabularies specific to crystallography
• Engaging the broader scientific community to ensure
different schemas are compliant and standards can
emerge
AHM, Nottingham, September 2004
25
Dataset
Data flow in eBank
Dataset
Dataset
dcterms:references
Crystal structure
(data holding)
Linking
ebank_dc
record (XML)
dc:identifier
dc:type=“CrystalStructure”
and/or “Collection”
Institutional
repository
Crystal structure
report (HTML)
dcterms:isReferencedBy
Eprint
“jump-off”
page
(HTML)
Eprint
manifestation
(e.g. PDF)
Deposit
dc:identifier
Linking
Model input Andy Powell, UKOLN.
Eprint oai_dc
record (XML)
dc:type=“Eprint”
and/or ”Text”
AHM, Nottingham, September 2004
26
Dataset
Searching,
linking and
embedding
Data flow in eBank
Dataset
Dataset
dcterms:references
Harvesting
OAI-PMH
oai_dc
Crystal structure
(data holding)
Linking
ebank_dc
record (XML)
dc:identifier
dc:type=“CrystalStructure”
and/or “Collection”
Institutional
repository
Crystal structure
report (HTML)
Searching,
linking and
embedding
Harvesting
OAI-PMH
PSIgate
portal
ebank_dc
eBank UK
aggregator
service
dcterms:isReferencedBy
Eprint
“jump-off”
page
(HTML)
Eprint
manifestation
(e.g. PDF)
Deposit
ePrint UK
aggregator
service
dc:identifier
Linking
Model input Andy Powell, UKOLN.
Harvesting
OAI-PMH
oai_dc
Eprint oai_dc
record (XML)
dc:type=“Eprint”
and/or ”Text”
AHM, Nottingham, September 2004
Subject service
Searching,
linking and
embedding
28
Harvesting: OAIster
AHM, Nottingham, September 2004
29
Linking and aggregating: Search & discover
For a demo come to the
JISC booth!
Today @ 13:00 &
during tea or the buffet
AHM, Nottingham, September 2004
30
Linking and aggregating: Hit browsing
AHM, Nottingham, September 2004
31
And finally…
eBank embedded in a science portal
AHM, Nottingham, September 2004
32
Currently we are……
• Assessing outcomes of a Consultation Workshop
held in August e.g.
– Cost-benefit issues for researchers?
– RAE / assessment impact?
– Disciplinary differences?
• Presenting a demonstrator
• Completing supporting studies on
(1) Provenance and (2) Data models and schema
• Promoting Open Access and Open eData Archives to
international crystallographic organisations,
publishers, learned societies
• Phase 2 proposal funding sought for further 12
months
AHM, Nottingham, September 2004
33
Challenges for the future
Phase 2 plan…….(1)
• Continue to progress towards generic metadata
schemas
• Validation against other schema
– CLRC Scientific Metadata Model
• Modify Eprints.org software to allow for more generic
scientific data and schemas
• Metadata enhancement: subject keyword additions
based on knowledge of keywords in related publications
• Investigate identifiers e.g. International Chemical
Identifier (InChI code)
• Explore context sensitive linking: find me
– Datasets by this person; Journal articles by this person; Datasets
related to this subject; Journal articles on this subject; Learning
objects by this person; Learning objects on this subject
AHM, Nottingham, September 2004
35
Phase 2…….(2)
• Full embedding into the crystallographic research and
publishing communities
• Chemistry workflow embedding
– SMART TEA e synthesis Lab
– Other analytical techniques in chemistry
• e-Learning embedding and pedagogic evaluation
– Undergraduate chemical informatics courses
– Introduction to visiting schools
• Expand into other physical, mathematical, geological
and engineering sciences
• Feasibility study in related domains – bio and medical
sciences
• Feasibility study in unrelated domains – arts and
humanities
AHM, Nottingham, September 2004
36
Thank you.
Questions?…..