Adding Value to Data & Information

Download Report

Transcript Adding Value to Data & Information

Adding Value to Data and
Information: Moving towards a
Science Commons?
Dr Liz Lyon
Director, UKOLN
Science Commons Workshop, Brussels, September 2006.
UKOLN is supported by:
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
www.ukoln.ac.uk
A centre of expertise in digital information management
Scholarship today?
OA landscape
http://www.flickr.com/photos/dmclean/239158788/in/pho
tostream/
15 September 2006
Architecture of Participation?
www.ukoln.ac.uk
A centre of expertise in digital information management
Reference datasets as infrastructure?
Datacentric
2020
vision
(Very simple) e-Research Cycle
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate hypothesis / ideas, test,
experiment, observe: data creation,
collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
e-Infrastructure
Open access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications: data disclosure, publication,
citation, discovery, re-use
This work is licensed under a Creative Commons License
Attribution-ShareAlike 2.0
Understanding the research process: workflows
• UK JISC-funded activity
• Project StORe: Source-to-Output Repositories (Edinburgh)
• RepoMMan: Repository Metadata and Management (Hull)
– Primary data : research publications
– Survey questionnaire, activity diagrams
e-Scientist desktop?
Slide: Carole Goble
Data capture
www.ukoln.ac.uk
A centre of expertise in digital information management
Deposit scenario (…part of….)
1.
2.
3.
4.
5.
6.
7.
8.
Produce strategy for synthesis (=idea)
Submit plan to SmartTea system (incl. identifiers)
Retrieve and follow instructions (sub-workflow?)
Experimental synthesis metadata automatically recorded on instruments
(Smart Lab)
Create record for synthesised sample (+ proposed chemical identifier) in
R4L laboratory data management system
Run spectral analyses on sample capturing further analysis metadata
(incl. time-stamp, analysis software version, researcher details etc.)
Save spectrum in native and common formats
Invoke R4L data capture service and deposit files + metadata in
laboratory repository…
RAW DATA
DERIVED DATA
RESULTS DATA
The R4L Repository
Create new compound
Add experiment data and metadata
Deposit
Search / Browse
Slide: Simon Coles
eBank UK Project
•
•
•
•
http://www.ukoln.ac.uk/projects/ebank-uk/
Promoting open access data in an institutional repository
Adding value through linking from data to derived publication
Embedding data service in learning workflows: pedagogy
UKOLN (lead), University of Southampton, University of
Manchester
Data creation
& capture in
“Smart lab”
Presentation services: portals
Data discovery,
linking, citation
Data analysis,
transformation,
mining, modelling
Search,
harvest
Aggregator
services
Harvest
Deposit
e-Research
workflows
e-Crystals
Federation
model
Data curation &
preservation:
databases &
databanks
Institutional
data
repositories
Laboratory
repository
Deposit
Validation
Publication
Validation
(Chemistry
Central)
Linking, citation
Publishers: peer-review
journals, conference
This work is licensed under a
proceedings
Creative Commons Licence
Attribution-ShareAlike 2.0
Digital repositories, OA & preservation
• Long-term access: trust, responsibility, policy
• Trusted DR Audit Checklist for Certification Draft Research
Libraries Group-NARA Taskforce 2005
• Self-certification: DINI-Zertifikat
• UK Digital Curation Centre: advice, tools & services
• RepInfo Registry http://www.dcc.ac.uk/
• EU CASPAR Integrated Project
http://www.casparpreserves.info/pages/1/index.htm
• Task Force on the Permanent Access to the Records of
Science http://tfpa.kb.nl/
Data, metadata and
interdisciplinary discovery
• Validation, publication & discovery of data
models & schema
• Metadata packaging standards
– METS, MPEG 21 DIDL
– Complex object model?
• Semantic descriptions
– Formal high-level and domain ontologies
• ePrints DC Application Profile
http://www.ukoln.ac.uk/repositories/digirep/index/
Eprints_Application_Profile
• eBank Application Profile crystallography
data http://www.ukoln.ac.uk/projects/ebankuk/schemas/
• UK Intute IR search service (eprints)
• Informal social network approaches
“folksonomies”
Persistent identifiers for data citation
• How will they be used? We need use cases: depositor,
author, service provider, researcher, publisher?
• Schemes: DOI, Handle, ARK, PURL
• Publication & citation of scientific primary data project
National Library for Science & Technology (TIB), University
of Hanover, Germany. STD-DOI Project DOI registry for
datasets http://www.std-doi.de
• eBank exemplar
• DOIs from TIB
http://dx.doi.org/10
.1594/ecrystals.chem.soton.ac.uk/145
• Data citation policy
http://ecrystals.chem.soton.ac.uk/rights.
html
Discovering data:
• Domain identifier:
International
Chemical Identifier
(INChI) code
• Google molecule
using INChI
Slide from Simon Coles
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol.
Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k
Adding value: repository services
• Tools: for deposit, normalisation,
manipulation, transformation…..
• Linking, annotation, visualisation
• Aggregators: generic, (sub-)
disciplinary
Knowledge extraction:
• Mining (data, text, structures)
• Modelling (economic, climate,
mathematical, biological…)
• Analysis (statistical, lexical, gene….)
Adding value: eBank linking data to
publications
New forms of
publication:
integration
of data and
journals
www.ukoln.ac.uk
A centre of expertise in digital information management
Linking research to learning - embedding
eBank aggregator service in a science portal
for student learners
• MChem course
• Assess role in
Undergraduate Chemical
Informatics courses
• Pedagogic evaluation
• Report to be published.
NaCTeM
http://www.nactem.ac.uk/
Emerging tools: TerMine,
GENIA, Cafetiere
Nature 23 March 2006
OTMI: Open Text Mining Interface
Avian flu outbreaks mashup - Nature January 2006
Data from
FAO, WHO…
+Google Earth
Thank you.
UKOLN receives core funding from the Joint Information Systems
Committee (JISC) and the Museums, Libraries & Archives Council (MLA)
and is based at the University of Bath, UK.
www.ukoln.ac.uk
A centre of expertise in digital information management