DCC Presentation
Download
Report
Transcript DCC Presentation
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
An Introduction to the UK
Digital Curation Centre
Dr Liz Lyon,
DCC Associate Director Outreach
Director, UKOLN, University of Bath, UK
CURL/SCONUL Workshop
December 2005
Funded by:
Digital | Curation | Centre
Overview
• About the Digital Curation Centre
– Organisation and structure
• What is digital curation?
– e-Research cycle
• DCC activities
–
–
–
–
Development activity
Research agenda
Advisory services
Outreach programme
2
Digital | Curation | Centre
UK Digital Curation Centre
• Development activities
• Research agenda
• Delivering services
• Outreach Programme
• 3 http://www.dcc.ac.uk/
Digital | Curation | Centre
DCC people (some of them…)
• Management & Co-ordination
– Director Chris Rusbridge (University of Edinburgh)
• Community Support & Outreach
– Led by Dr Liz Lyon (UKOLN, University of Bath)
• Service Definition & Delivery
– Led by Professor Seamus Ross (HATII, University of Glasgow)
• Development
– Led by Dr David Giaretta (Astronomical Software & Services, CCLRC)
• Research
– Led by Professor Peter Buneman (University of Edinburgh)
4
Digital | Curation | Centre
What is digital curation?
For later use?
Static
Data preservation
In use now (and the future)?
Dynamic
Data curation
“maintaining and adding value to a trusted body
of digital information for current and future use”
5
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate hypothesis / ideas, test,
experiment, observe: data creation,
collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
6
e-Infrastructure
Open access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate hypothesis / ideas, test,
experiment, observe: data creation,
collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
7
e-Infrastructure
Open access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
Digital | Curation | Centre
8
Digital | Curation | Centre
Engineering Product Information
9
EPSRC Grand Challenge Project, Prof
Chris McMahon, University of Bath
Digital | Curation | Centre
– Access Grid
– Collaborative telematic art
– Modify spaces for performers
– Interplay: Hallucinations
10
Digital | Curation | Centre
Data capture & integration
into research workflows
• R4L Repository for the Laboratory Project (JISCfunded) automated data capture from instrumentation,
deposit of results (chemistry)
• SMART TEA electronic Laboratory notebook +
annotations
11
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate hypothesis / ideas, test,
experiment, observe: data creation,
collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
12
e-Infrastructure
Open access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
Digital | Curation | Centre
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Resource
discovery, linking,
embedding
The scholarly knowledge cycle.
13
Liz Lyon, Ariadne, July 2003.
© Liz Lyon (UKOLN, University of Bath), 2005
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
Deposit / selfarchiving
Peer-reviewed
publications: journals,
conference proceedings
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Validation
Quality assurance
bodies
Digital | Curation
| Centre
Disciplinary data-centres
14
Digital | Curation | Centre
eBank UK Project
http://www.ukoln.ac.uk/projects/ebank-uk/
• Two key themes:
– Open access to datasets
– Linking research data to publications and to learning
• UKOLN, University of Southampton, University of Manchester
• e-Science application ‘Combechem’ : Grid-enabled
combinatorial chemistry + National Crystallography Service
• Resource Discovery Network / PSIgate physical sciences portal
15
Digital | Curation | Centre
A data repository entry
16
Digital | Curation | Centre
Access to the underlying data: complex objects
17
Digital | Curation | Centre
ecrystals.chem.soton.ac.uk
Data descriptions
• Validation, publication & discovery
of data models & schema
• Managing complex objects
• Metadata packaging standards
– METS
– MPEG 21 DIDL
• Semantic descriptions
– Formal controlled vocabularies
– High-level and domain ontologies
– Inter-disciplinary discovery
• Informal approaches Web 2.0
“folksonomies”
18
Digital | Curation | Centre
Trusted digital repositories
• Audit Checklist for Certification
• Draft Report published August 2005
• Research Libraries Group RLG-NARA
Taskforce
• Defined criteria under 4 categories
–
–
–
–
Organisation
Functions, processes & procedures
Designated community & usability
Technologies & technical infrastructure
19
Digital | Curation | Centre
OAIS Reference Model
20
Digital | Curation | Centre
DCC: Development
• “DCC Approach to Digital Curation” based on the
Reference Model for an Open Archival Information
System (OAIS); ISO standard, 14721:
– Monitoring international standards
– Development of a Representation Information
(RI) registry/repository (DCC-RR)
– Recommendations for tools and methods for
generating Representation Information
– Creating test-beds for digital curation tools
Development info – see
21
http://dev.dcc.ac.uk
for details
of Wiki
and email
list
Digital
| Curation
| Centre
open to all
(Very simple) e-Research Cycle and Data Curation
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate hypothesis / ideas, test,
experiment, observe: data creation,
collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
22
e-Infrastructure
Open access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
Digital | Curation | Centre
Persistent identifiers for data citation
• Identify use cases: depositor, author, service provider,
reader, publisher, ?
• Schemes: DOI, Handle, ARK, PURL
• Global identification: express as http URIs
• Added value services: CrossRef, resolution service,
integration (Globus), look-up service
• Domain identifiers: e.g. International Chemical Identifier
(INChI) codes
• Google molecules using InChIs demo:
Peter Murray-Rust, University of Cambridge
• DCC Workshop June 2005 Glasgow
23
Digital | Curation | Centre
One approach to data citation using DOIs
• Publication & citation of scientific primary data project
National Library for Science & Technology (TIB),
University of Hanover, Germany STD-DOI Project
http://www.std-doi.de
• DOI registry for datasets
• Data publication agents: World Data Center Climate,
GeoForschungsZentrum Potsdam
• Data requirements: quality control, long-term curation,
use DOI resolver
• Exemplar data citation:
24
– Kamm, H; Machon, L; Donner, S (2004): Gas chromatography
(KTB Field Lab), GFZ Potsdam. doi:10.1594/GFZ/ICDP/KTB/ktbgeoch-gaschr-p
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate hypothesis / ideas, test,
experiment, observe: data creation,
collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
25
e-Infrastructure
Open access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
Digital | Curation | Centre
Adding value: eBank linking data to publications
26
Digital | Curation | Centre
Linking research to learning - embedding eBank
aggregator service in a science portal for student learners
27
Digital | Curation | Centre
Adding value through annotation
DCC Research at the University of Edinburgh
• Scientific databases: Annotation scoping report
• AstroDAS: distributed annotation servers in astronomy
• New annotation model + prototype: top-ranked
demonstration at recent DB conference
28
Digital | Curation | Centre
DCC Research agenda
•
•
•
•
Publishing & integrating scientific databases
‘Archiving’ past states of volatile databases
Database provenance and annotation
Organisational dynamics of trusted
repositories
• Automating metadata extraction
• Cost-benefit analysis of data curation
• Rights and responsibilities
29
– “Public domain, public interest, public funding”
paper Waelde & McGinley
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate hypothesis / ideas, test,
experiment, observe: data creation,
collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
30
e-Infrastructure
Open access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
Digital | Curation | Centre
Facilitate “post-processing” and
knowledge extraction
Enable the acquisition of newly-derived information
and knowledge
• Run complex algorithms over primary datasets
• Mining (data, text, structures)
• Modelling (economic, climate, mathematical,
biological)
31
• Analysis (statistical, lexical, pattern matching,
gene)
Digital | Curation | Centre
32
Digital | Curation | Centre
DCC Case Study published: Wide Field Astronomy Unit
33
Digital | Curation | Centre
Supporting the community
• DCC Outreach & Services:
34
– [email protected]
(legal - technical guidance)
– Curation Manual 45 chapters
planned, Briefing Papers
– Workshops: Future-proofing
Institutional Web sites,
Jan 19-20, London
– Information Days: regional
– 1st International DCC
Conference, Bath Sept 2005
– PV2005 November,
Edinburgh
– 2nd International Conference
November 2006 Glasgow tbc
Digital | Curation | Centre
• www.ijdc.net
• Peer-review
Editorial Board
• Peter Buneman
Editor (research)
• Production editor
Richard Waller
• Papers for
submission are very
welcome!
35
• 1st issue soon….
Digital | Curation | Centre
Associates Network
Goals
Develop understanding, share best practice, advance
research, promote recognition, develop consensus
Membership
International groups, national bodies, industry partners,
funders, research groups, HEIs, FEIs, individuals……
Benefits
Early access to R&D outputs, advisory services, training,
input to definition and design, community participation
36
Discussion Forum www.dcc.ac.uk
Please join us!
Digital | Curation | Centre
Developing skills & collaboration
•
•
•
•
NSF Report : “Data scientist”
Develop hybrid skills
Embed in u/g, p/g curriculum
Facilitate community
collaboration:
– Researchers
– Data centres
– Libraries & archives
• New roles???
• Achieve cultural change
37
Digital | Curation | Centre
Thank you.
Questions?
[email protected]
Join the DCC Associates Network at
www.dcc.ac.uk
Digital | Curation | Centre