- ePrints Soton

Download Report

Transcript - ePrints Soton

Digital Repositories as a Mechanism for
the Capture, Management and
Dissemination of Chemical Data
Simon Coles
School of Chemistry,
University of Southampton, U.K.
[email protected]
© S.J. Coles 2006
Funding Body Viewpoint
© S.J. Coles 2006
Supporting Small Laboratory Working
Practice
“Data from experiments conducted as recently as six months ago
might be suddenly deemed important, but those researchers may
never find those numbers – or if they did might not know what those
numbers meant”
“Lost in some research assistant’s computer, the data are often
irretrievable or an undecipherable string of digits”
“To vet experiments, correct errors, or find new breakthroughs,
scientists desperately need better ways to store and retrieve
research data”
“Data from Big Science is … easier to handle, understand and
archive. Small Science is horribly heterogeneous and far more vast.
In time Small Science will generate 2-3 times more data than Big
Science.”
‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)
© S.J. Coles 2006
The Information Environment
Institutional
Data
Sources
© S.J. Coles 2006
A Data-Rich Subject – the
Crystallography Problem
1.5,000,000
Cl
Cl
N
Cl
O
O
Cl
+
Cl
N O
OCl
O
Cl
Cl
Cl
O
O
+
N O
Cl
O
Cl
Cl
N
Cl
N
O
N
30,000,000
450,000
© S.J. Coles 2006
Data and Information Loss
© S.J. Coles 2006
Open Access as the Answer?
© S.J. Coles 2006
Separating Data from Interpretations
Underlying
Intellect &
Interpretation data
© S.J. Coles 2006
Presentation services: subject, media-specific, data, commercial portals
Searching ,
harvesting,
embedding
Resource
Data creation /
discovery, linking,
capture /
embedding
gathering:
laboratory
Data analysis,
Aggregator
experiments,
transformation,
services: national,
Grids, fieldwork,
mining, modelling
commercial
surveys, media
Harvesting
metadata
Research &
e-Science
workflows
Validation
Deposit / selfarchiving
Repositories :
institutional,
e-prints, subject, data,
learning objects
The scholarly
knowledge cycle.
Liz Lyon,
eBankUK article.
Ariadne, July
2003.
Validation
Publication
Linking
Data curation:
databases & databanks
Peer-reviewed
publications: journals,
conference proceedings
© S.J. Coles 2006
eBank-UK and the eCrystals Repository
© S.J. Coles 2006
Workflow Capture and Analysis
RAW DATA
DERIVED DATA
RESULTS DATA
© S.J. Coles 2006
The eCrystals Data Archive
http://ecrystals.chem.soton.ac.uk
© S.J. Coles 2006
Access to the underlying data
© S.J. Coles 2006
Metadata Publication
• Using simple Dublin Core
• Crystal structure
• Title (Systematic IUPAC Name)
• Authors
• Affiliation
• Creation Date
• Additional chemical information through Qualified Dublin Core
• Empirical formula
• International Chemical Identifier (InChI)
• Compound Class & Keywords
• Specifies which ‘datasets’ are present in an entry
• DOI http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145
• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html
• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
© S.J. Coles 2006
Metadata and Data Quality Control
Data manipulation toolbox
Associated Metadata
Value added
Format conversion
© S.J. Coles 2006
Harvesting & Aggregating: Google
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org.
Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k
© S.J. Coles 2006
Harvesting: OAIster
© S.J. Coles 2006
Linking and aggregating
© S.J. Coles 2006
Embedded in a science portal
© S.J. Coles 2006
The Repository for the Laboratory – R4L
© S.J. Coles 2006
Repositories Supporting Laboratory
Working Practice
• eBank-UK / eCrystals concentrates on the dissemination
of data compiled once a study is complete – ideal for
complex studies
• Still a need to capture data from ‘single shot’ experiments
on small laboratory instruments
• To fully assure quality and accuracy of metadata it is
essential to capture and describe data at the point when
it is generated
• Solution: A repository with the potential to store data and
metadata as they are generated in the laboratory
• Added Bonus: A repository can manage data and provide
automated report generation and data analysis tools
© S.J. Coles 2006
Laboratory Repositories and Information
Management
© S.J. Coles 2006
Workflow Analysis
Researcher, Compound,
Experiment type, Timestamp
Sample preparation
Deposit current
dataset
Data acquisition
Analyse: Refine experiment?
Complete experiment deposit
© S.J. Coles 2006
The R4L Repository
Create new compound
Add experiment data and metadata
Deposit
Search / Browse
© S.J. Coles 2006
The eCrystals Federation Model
Data creation &
capture in
“Smart lab”
Data discovery,
linking, citation
Data analysis,
transformation,
mining, modelling
Presentation services:
portals
Search,
harvest
Aggregator
services
Harvest
Deposit
e-Research
workflows
Institutional
data
repositories
Laboratory
repository
Validation
Data curation &
preservation:
databases &
databanks
Deposit
Validation
Publication
Linking, citation
Publishers: peerreview journals,
conference
proceedings
© S.J. Coles 2006