eBank UK - linking research data, scholarly communications

Download Report

Transcript eBank UK - linking research data, scholarly communications

eBank UK : linking research data,
scholarly communications and
learning.
Dr Liz Lyon, UKOLN, University of Bath, UK
JISC CNI Conference
July 2004, Brighton.
UKOLN is supported by:
www.ukoln.ac.uk
a centre of expertise in digital information management
www.bath.ac.uk
Overview
• Setting the scene: e-Research
• The scholarly knowledge cycle
– Data, information and workflows
– Provenance
• eBank UK Project
– The experience so far
– Issues arising
• Challenges for the future
JISC CNI Conference 2004
2
Setting the scene: e-Research
e-Research trends summary
•
•
•
•
•
Increasingly data–intensive, quantitative
Implementing new science
Inter-disciplinary
New disciplines e.g. Astro-informatics
New skills requirements
– IT + statistics + domain
• Collaborative
• Highly distributed resources
– Knowledge discovery / extraction
• Open access to data and information
– OECD Declaration January 2004
• A changing landscape of scholarly communications
JISC CNI Conference 2004
4
The scholarly knowledge cycle
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
The scholarly knowledge
cycle.
Aggregator
services: national,
commercial
Liz Lyon, eBankUK article.
Ariadne, July 2003.
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Linking
Data curation:
databases & databanks
Peer-reviewed
publications: journals,
conference proceedings
JISC CNI Conference 2004
6
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Linking
Data curation:
databases & databanks
Peer-reviewed
publications: journals,
conference proceedings
JISC CNI Conference 2004
7
JISC CNI Conference 2004
8
JISC CNI Conference 2004
9
JISC CNI Conference 2004
10
JISC CNI Conference 2004
11
JISC CNI Conference 2004
12
JISC CNI Conference 2004
13
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Linking
Data curation:
databases & databanks
Peer-reviewed
publications: journals,
conference proceedings
JISC CNI Conference 2004
14
Presentation services: subject, media-specific, data, commercial portals
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Peer-reviewed
publications: journals,
conference proceedings
JISC CNI Conference 2004
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Resource
discovery, linking,
embedding
Validation
Quality
assurance
bodies
15
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Resource
discovery, linking,
embedding
Linking
Data curation:
databases & databanks
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Peer-reviewed
publications: journals,
conference proceedings
JISC CNI Conference 2004
Validation
Quality
assurance
bodies
16
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator services:
eBank UK
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Resource
discovery, linking,
embedding
Linking
Data curation:
databases & databanks
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Peer-reviewed
publications: journals,
conference proceedings
JISC CNI Conference 2004
Validation
Quality
assurance
bodies
17
The eBank UK Project
eBank UK project
• JISC-funded for 1 year from September 2003
• UKOLN at the University of Bath (lead), University of
Southampton, University of Manchester
• “Building the links between research data, scholarly
communication and learning”
• e-Science testbed Combechem
–
–
–
–
Grid-enabled combinatorial chemistry
Crystallography, laser and surface chemistry
Development of an e-Lab using pervasive computing technology
National Crystallography Service
• Resource Discovery Network PSIgate physical sciences
portal
• http://www.ukoln.ac.uk/projects/ebank-uk/
JISC CNI Conference 2004
19
The project team
•
•
•
•
•
•
•
UKOLN
Michael Day
Monica Duke
Rachel Heery
Liz Lyon
+
Andy Powell
•
•
•
•
•
•
Southampton
Les Carr
Simon Coles
Jeremy Frey
Chris Gutteridge
Mike Hursthouse
• Manchester
• John Blunden-Ellis
JISC CNI Conference 2004
20
Comb-e-Chem Project
Video
Simulation
Diffractometer
Properties
Analysis
Structures
Database
X-Ray
e-Lab
Properties
e-Lab
Grid Middleware
Crystallography workflow
• Initialisation: mount new sample on
diffractometer & set up data collection
• Collection: collect data
• Processing: process and correct images
• Solution: solve structures
• Refinement: refine structure
• CIF: produce CIF (Crystallographic
Information File format)
• Report: generate Crystal Structure Report
JISC CNI Conference 2004
22
JISC CNI Conference 2004
23
First steps: establishing common ground…
• Understand the data creation process
• Terminology and definitions
–
–
–
–
–
Data
Metadata
Datafile
Dataset
Data holding
• Different views
– Digital library researchers, computer scientists, chemists
– Generic vs specific
– Modeller vs practitioner
• Aim for a common ontology
• Modelling the domain
• Creating a metadata schema
JISC CNI Conference 2004
24
Progress update
• Version 2.0 eBank metadata schema
• Enhanced ePrints.org software
• Pilot institutional e-data repository for
harvesting (raw, derived, results data)
• Exports records as ebank_dc and oai_dc
• Validation of schema
• Pilot eBank UK aggregator service
• Develop search interface Version 1.0
• Testing with PSIgate physical sciences portal
– embedding eBank UK
JISC CNI Conference 2004
25
Some metadata issues
• Using simple and qualified Dublin Core
• Additional chemical information in schema for
harvesting e.g. empirical formula
• Schema contains International Chemical
Identifier (InChI)
• Links to all datasets associated with an
experiment
• Links to individual datasets within an experiment
• Links to eprints (and other published literature)
derived from the data
• Using vocabularies specific to crystallography
• Will substitute when standards emerge
JISC CNI Conference 2004
26
Dataset
Data flow in eBank
Dataset
Dataset
dcterms:references
Crystal structure
(data holding)
Linking
ebank_dc
record (XML)
dc:identifier
Deposit
dc:type=“CrystalStructure”
and/or “Collection”
Institutional
repository
Crystal structure report
(HTML)
dcterms:isReferencedBy
dc:type=“Eprint” Eprint oai_dc
and/or ”Text”
record (XML)
Model input Andy Powell, UKOLN.
JISC CNI Conference 2004
27
Dataset
Data flow in eBank
Dataset
Dataset
dcterms:references
Harvesting
OAI-PMH
oai_dc
Crystal structure
(data holding)
Linking
ebank_dc
record (XML)
dc:identifier
Deposit
dc:type=“CrystalStructure”
and/or “Collection”
Institutional
repository
Crystal structure report
(HTML)
ePrint UK
aggregator
service
Harvesting
OAI-PMH
ebank_dc
eBank UK
aggregator
service
dcterms:isReferencedBy
dc:type=“Eprint” Eprint oai_dc
and/or ”Text”
record (XML)
Harvesting
OAI-PMH
oai_dc
Subject service
Model input Andy Powell, UKOLN.
JISC CNI Conference 2004
28
Dataset
Searching,
linking and
embedding
Data flow in eBank
Dataset
Dataset
dcterms:references
Harvesting
OAI-PMH
oai_dc
Crystal structure
(data holding)
Linking
ebank_dc
record (XML)
dc:identifier
Deposit
dc:type=“CrystalStructure”
and/or “Collection”
Institutional
repository
Crystal structure report
(HTML)
ePrint UK
aggregator
service
Searching,
linking and
embedding
Harvesting
OAI-PMH
PSIgate
portal
ebank_dc
eBank UK
aggregator
service
dcterms:isReferencedBy
dc:type=“Eprint” Eprint oai_dc
and/or ”Text”
record (XML)
Harvesting
OAI-PMH
oai_dc
Model input Andy Powell, UKOLN.
JISC CNI Conference 2004
Subject service
Searching,
linking and
embedding
29
Currently we are……
• Planning Consultation Workshop – August
• Developing a demonstrator
• Promoting Open Access and Open eData
Archives to international crystallographic
organisations, publishers, learned societies
• e-Science All Hands Meeting, Nottingham
September 2004.
• Phase 2 proposal funding sought for further
12 months
JISC CNI Conference 2004
30
Challenges for the future
Phase 2 plan…….(1)
• Continue to progress generic data models and
metadata schemas
• Validation against other schema
– CLRC Scientific Metadata Model vs 1.0 2001 (under revision)
http://www-dienst.rl.ac.uk/library/2002/tr/dltr-2002001.pdf
• Complex digital objects
• Investigate packaging options
– METS
– MPEG 21 DIDL
– ??
• Metadata enhancement - subject keyword additions to
datasets based on knowledge of keywords in related
publications
JISC CNI Conference 2004
32
Phase 2…..(2)
•
Investigate identifiers e.g. International Chemical
Identifier (InChI code)
– Access to scientific (climate) data using DOIs (German
National Library of Science & Technology)
•
Explore context sensitive linking: find me
–
–
–
–
–
–
Datasets by this person
Journal articles by this person
Datasets related to this subject
Journal articles on this subject
Learning objects by this person
Learning objects on this subject
JISC CNI Conference 2004
33
Phase 2…….(3)
• Workflow embedding
– Expand to include SMART e-Lab metadata e.g.
sample preparation
• e-Learning embedding and pedagogic
evaluation
– MChem course
– Chemical informatics course
• Expand into other physical sciences
• Feasibility study in a related domain biosciences
JISC CNI Conference 2004
34
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator services:
eBank UK
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Resource
discovery, linking,
embedding
Linking
Data curation:
databases & databanks
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Peer-reviewed
publications: journals,
conference proceedings
JISC CNI Conference 2004
Validation
Quality
assurance
bodies
35
Potential longer term impact
1. Track data, information and workflows in e-research
and scholarly communications – knowledge audit??
2. Validate the accuracy and authenticity of derived
works – ideas audit??
3. Facilitate explicit referencing and acknowledgment of
original contributors – intellectual integrity??
4. Raise standards associated with publication of
research outputs – academic publishing rigour??
5. Implement open access to and dissemination of data
and information – enhance the research process??
6. Give students links to original data underpinning
published works – enhance the learning process??
JISC CNI Conference 2004
36
JISC CNI Conference 2004
37
Thank you.
Questions?…..