Transcript Powerpoint

a centre of expertise in data curation and preservation
Looking to the longer term: some
perspectives on data curation
and preservation
Dr Liz Lyon,
DCC Associate Director Outreach
Director, UKOLN, University of Bath, UK
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
Funded by:
IMechE Workshop, London, 26th September 2006
About UKOLN
• “a centre of expertise in digital information management”
• Funding: Joint Information Systems Committee (JISC) +
Museums, Libraries & Archives Council (MLA)
• Portfolio of R&D projects Delos, DRIVER, Grand Challenge
• 29+ staff based at the University of Bath
• Inform the library, information, education and cultural
heritage communities
• Policy, advocacy at national level, build innovative Webbased systems & services, R&D, e-journal Ariadne,
workshops and conferences.
• http://www.ukoln.ac.uk/
Acknowledgement: Alex Ball,
Grand Challenge Project
UK Digital Curation Centre
• Digital Curation Centre
•
•
•
•
•
Funded by JISC & EPSRC
Development activities
Research agenda
Delivering services
Outreach Programme
• http://www.dcc.ac.uk/
a centre of expertise in data curation and preservation
Overview
• Data curation and digital preservation issues
• Draw on research and scholarship
perspectives
• Data / information flows and the “business
process”
• UK Digital Curation Centre activities
“maintaining and adding value to a trusted
body of digital information for current and
future use”
IMechE Workshop, London, 26th September 2006
Reference datasets as infrastructure?
Datacentric
2020
vision
(Very simple) Product Research Cycle & Data Curation
(New) knowledge
extraction: data
mining, modelling,
analysis, synthesis
Data processing
Formulate ideas / hypothesis, test,
experiment, observe, design: data
creation, collection & capture
Data processing
Data processing
Adding value: Data
linking, annotation,
visualisation, simulation
Data processing
e-Infrastructure
Open ?? access
Collaboration
Data management
storage & validation:
description, deposit,
self-archiving,
preservation,
certification
Data processing
Scholarly communications & Business
transactions: data disclosure, publication,
citation, discovery, re-use
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
• RepoMMan: Repository Metadata and
Management (Hull) using WS-BPEL
• Are your engineering workflows
identified and described?
e-Scientist desktop?
Slide: Carole Goble
Workflow
Airport
Maintenance
Engineer
Visual
Inspection
DS&S
Maintenance
Analyst (Fleet
Manager)
Aircraft Lands
Quote
Diagnosis
Rolls Royce
Domain
Expert
DAME signal processing
workflows using Grid
Services
Brief Diagnosis /
Prognosis
Check
Diagnoses
[ unknown ]
Diagnosis
Result
Detailed Diagnosis /
Prognosis
[ fault unresolved ]
[ Clear ]
[ known ]
[ information required ]
Provide
Information
Maintenance
Procedure
Release
Engine
complete
[ diagnosis
Maintenance
Result
[ fault resolved ]
Request
Information
Analyst [ unknown ]
Decision
Detailed
Analysis
[ diagnosis ]
Expert Decision
[ information required ]
Sign-off
Diagnosis
Provide Further
Details
Request Further
Details
Research outputs in institutional repositories: engineering
“JISC Vision”: a global landscape of
federated repositories
• Multi-disciplinary, crosssectoral
• e-Framework and Information
Environment context
• National, institutional
• Define common + domainspecific + repository “services”
• Different platforms
• Many format types: data,
eprints, images, geospatial
heterogeneous - metadata
formats, content formats,
identifiers, packaging
standards
homogeneous - metadata
formats, content formats,
identifiers, packaging
standards
repository
• Interoperability based on open
standards, software tools
From Andy Powell: http://www.ukoln.ac.uk/distributed-systems/jiscie/arch/presentations/jiie-jcs-2005/
repository
repository
repository
repository
fusion layer ‘repository federator’
portal
portal
portal
portal
portal
Pilot Engineering Repository Xsearch PerX
http://www.engineering.ac.uk/
a centre of expertise in data curation and preservation
IMechE Workshop, London, 26th September 2006
Interoperability???
STEP
ISO10303
Repositories and OAIS Reference Model
“an archive consisting of an organisation of people and systems that has
accepted the responsibility to preserve information and make it available for
a Designated Community..an identified group of potential consumers who
should be able to understand a particular set of information”
Assuring permanence: digital preservation
• Trusted DR Audit Checklist for Certification Draft Research Libraries
Group-NARA Taskforce 2005 Defined criteria:
–
–
–
–
Organisation
Functions, processes & procedures
Designated community & usability
Technologies & technical infrastructure
• Revised Checklist based on feedback and pilot audits (KB, BADC)
• Self-certification: DINI-Zertifikat: requirements & recommendations:
–
–
–
–
–
–
–
Server policy / Guidelines
Author support
Legal issues
Authenticity and integrity
Cataloguing
Access statistics
Long-term sustainability
• Has your repository / PLM been audited?
Interdisciplinary discovery
• Validation, publication & discovery of
data models & schema
• Harmonisation and normalisation of
metadata and semantics
• Packaging standards:
METS, MPEG-21 DIDL
• Formal high-level and domain
ontologies
• ePrints DC Application Profile
http://www.ukoln.ac.uk/repositories/digirep/index/
Eprints_Application_Profile
• eBank Application Profile
crystallography data
http://www.ukoln.ac.uk/projects/ebankuk/schemas/
• What data models and metadata
schema are in place?
Persistent identifiers for data citation
• How will they be used? We need use cases: depositor, author,
service provider, researcher, publisher?
• Schemes: DOI, Handle, ARK, PURL
• Global identification: express as http URIs
• Data citation (human and machine-actionable)
• Publication & citation of scientific primary data project National
Library for Science & Technology (TIB), University of Hanover,
Germany. STD-DOI Project DOI registry for datasets
http://www.std-doi.de
• Is there a data citation
policy?
• What persistent
identifiers have been
assigned to your data?
Discovering data: eBank Project
• Domain identifier:
International
Chemical Identifier
(INChI) code
• Google molecule
using INChI
Slide from Simon Coles
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org.
Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k
Domain identifiers for engineering?
Format migration challenges?
CAD Program Compatibility Chart
http://www.okino.com/conv/filefrmt_cad.htm
Registry development
Development: Representation
Information Registry Repository
• “DCC Approach to Digital Curation” based on OAIS
• Representation Information Registry Repository
• Prototype demonstrator: based on 2 key concepts to
facilitate sharing of the curation effort
– Curation Persistent Identifier (CPID)
– Descriptive “label” (structural, semantic, other metadata)
• Development of (M2M) tools and interfaces for creating,
using and re-using representation information
• http://dev.dcc.ac.uk Wiki and email list
• EU CASPAR Integrated Project
http://www.casparpreserves.info/pages/1/index.htm
• Task Force on the Permanent Access to the Records of
Science http://tfpa.kb.nl/
Registry API
Allows applications to talk to many
different registry implementations
e.g. GDFR, PRONOM, UDDI
•GUI Access and via Web browser http://registry.dcc.ac.uk
Adding value through annotation
Research at the University of Edinburgh
• Scientific databases: Annotation scoping report
• New annotation model + prototype MONDRIAN
• Intuitive visual interface iMONDRIAN
• Annotate sets of values
• Support for querying annotations
NaCTeM
http://www.nactem.ac.uk/
Emerging tools: TerMine,
GENIA, Cafetiere
Knowledge extraction:
Nature 23 March 2006
OTMI: Open Text Mining Interface
• Mining (data, text, structures)
• Modelling (economic, climate,
mathematical, biological…)
• Analysis (statistical, lexical,
gene….)
Supporting the community: Services
• [email protected]
• legal - technical guidance
• Curation Manual 45
chapters planned
–
–
–
–
–
–
Metadata (umbrella)
Open Source
Archival metadata
Preservation metadata
Selection & appraisal
Curating emails
• Briefing Papers
–
–
–
–
–
Curating emails
Digital repositories
Geospatial data
Data protection
eScience data
• Case studies
a centre of expertise in data curation and preservation
DCC Case Study published: Wide Field Astronomy Unit
IMechE Workshop, London, 26th September 2006
Supporting the community:
Outreach & Services
• Workshops:
• Geospatial data, NeSC, 27 October
• OAIS 5 year Review, October
• Audit & Certification Forum, October
• Records Management, L’pool 30 Nov
• Curation & Preservation Training, Dec
• 2007 Preservation of journals tbc
• 2007 Legal environment tbc
• 2007 Preparing for audit tbc
• Information Days British Library L’pool UCL
• 2nd International DCC Conference
21-22 November, Glasgow
• Keynotes: Hans F. Hoffmann, CERN,
Clifford Lynch, CNI
a centre of expertise in data curation and preservation
DCC Phase 2: 2007-2010
• Working more closely with data centres, e-Science
Programmes and Research Councils
• SCARP Project: disciplinary approach
• JISC Digital Repository Programme collaboration
• RepInfo Registry service migration
• Define self-assessment procedures and tools
• Collaborate with CASPAR, DPE and PLANETS (EUfunded Digital Preservation Projects)
• Workshop Programme, International Conference 2007
IMechE Workshop, London, 26th September 2006
a centre of expertise in data curation and preservation
Thank you.
Questions?
[email protected]
Join the DCC Associates Network at www.dcc.ac.uk
University of Bath, 13 September 2006