Transcript Dr Liz Lyon

C21st Scholarship:
Data as an Agent
for Change
Dr Liz Lyon, Director, UKOLN, University of Bath, UK
Associate Director, UK Digital Curation Centre
3rd Bloomsbury Conference, London, June 2009
UKOLN is supported . by:
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
www.ukoln.ac.uk
A centre of expertise in digital information management
Perspectives
1. The 21stC Scholar :
Team Science in the Cloud
2. Chemical Crystallography :
Data Publishing Showcase
3. The Future : a
Transformational Agenda
The 21stC
Scholar :
Team
Science in
the Cloud
http://www.flickr.com/photos/wwarby/3632317031/
What does the C21st research(er) look like?
• “From users to
choosers” (Yanosky)
• Pro-sumers (Toffler)
• Digital nomads
• Work on the Webtop
http://www.flickr.com/photos/shankrad/2905938179/
•
Multi-scale & complex
• Highly data-intensive
• Increasingly “open”
http://www.flickr.com/photos/stormsriver/2286011597/
“Continuum of Openness”?
OPEN
CLOSED
What do we mean by Team Science?
• Science as a
social activity
Tweet
Blog
Comment
Rate
Vote
Recommend
Tag
Share
Mash
• Trust is key
• Highly collaborative
• Inter-institutional
• Multi-disciplinary
collaboration –
• Core team skills
better science
(Brian Uzzi, 2008)
• Data is:
A new digital economy?
– On demand
– A utility
– Commoditised
– Un-differentiated
– “Publish then filter”
(Shirky)
– Traded
• “Cloud” model?
• Brokers & aggregators
are key roles
• Free, pay per use, pay
as you grow…..
http://www.flickr.com/photos/will-lion/2738252562/
• Economies of scale
• Network effects
• New data publishing
business models
Chemical Crystallography :
Data Publishing Showcase
http://www.flickr.com/photos/thomasreichart/2130018485/sizes/l/
Slide: Dr Simon Coles, Univ Soton
Data Deluge
“40 years ago a PhD student would
determine about 3 crystal
structures for their thesis – this can
now be easily achieved in a day”
0.5 million
35 million
2.5 million
‘Few thousand’
A bottleneck : the primary cause is the current data publication
process, which is tied to journal articles and peer review
eCrystals Team
Domain (Chemists)
Computer science
Informatics
Simon Coles, Mike
Hursthouse, Jeremy Frey,
Cameron Neylon, Andrew
Milsted, Richard Stephenson,
Jamie Robinson, Steven
Wilson, Andrew Bailey, Mark
Borkum
Dave DeRoure, Les Carr,
Monica Schraefel, Chris
Gutteridge, Tim Myles-Board,
Arouna Woukei, Dave
Tarrant, Stuart Middleton
Liz Lyon, Manjula Patel,
Rachel Heery, Monica Duke,
Michael Day, Traugott Koch,
Pete Cliff
eCrystals Data Repository
• Quick & simple to deposit
• Software tools
• Laboratory archive
• Community involvement
• ‘Embargo’ facility
• Structured foundations
• Discoverable & harvestable
http://ecrystals.chem.soton.ac.uk
Data sustainability
Trust
Standards
Audit and certification tools
• TRAC
• DRAMBORA
eCrystals Curation Reports (3)
• Preservation metadata
• PREMIS Data Dictionary
• OAIS
• Representation Information
• Registry/Repository RRORI
• PLATTER
• NESTOR
• Data Seal of Approval
Data Discovery & Access
“Community Criteria for Interoperability”
(Scaling Up Report 2008)
• Domain data format standard: CIF
• Domain data validation standard: CheckCIF
• Metadata schema: eCrystals Application Profile
http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
• Crystallography Data Commons:
TIDCC Data Model in development
• Embargo & Rights http://ecrystals.chem.soton.ac.uk/rights.html
• Domain identifier: International Chemical Identifier
• Citation & linking: DOI
http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145
Paris,
March 2009
Memorandum of Understanding
“
“
http://wiki.ecrystals.chem.soton.ac.uk/index.php/Main_Page
Dr Simon Coles, Univ Southampton
Slide of data services : CrystalEye,
Crystal Web, Chemxseer etc search
structures check PMR stuff
aggregate, syndiucate, filter etc.
New Web
service to
aggregate
published
crystallography
www.ukoln.ac.uk
A centre of expertise in digital information management
data...
... federated search.....
structure search...
Original slide: Dr Simon Coles, Univ Soton
Data casts : Lab Blogs
Tools
Machines
Sensors
Publishing and sharing
methodologies ...
... and workflows ...
... data for re-use, mash-ups, mining,
computation, models, simulations ...
Slide: Dr Simon Coles, Univ Soton
oreChem – The Chemical Semantic Web
•
•
•
•
•
•
•
At-source capture of chemistry data
Chemical structure search
Compound object authoring
Retrospective harvesting of chemistry data
Reuse through common ORE data model
Semantic authoring
Virtualized triple storage
•
•
•
•
•
•
University of Cambridge
Cornell University
Indiana University
Penn State University
University of Queensland
University of Southampton
Mash-up
(reuse)
Semantic
Graph
(storage)
experiments
text
documents
measurements
data
Data
(capture)
scientists
molecules
data
molecules
27
The Future : a Transformational
Agenda?
http://www.flickr.com/photos/cyber_chof/1246303241/sizes/m/
We need to understand the value and
benefits of data publishing and associated
data curation / management.... and
articulate them clearly
• Values & benefits may be:
– political
– economic
– societal...
• DCC Research Data Management Forum 3
Some issues and challenges.....
1. Research quality
• Publications based on closed peer review
• Maintain reputation
• Demonstrate provenance
• Open pilots – Nature
• Use collective intelligence
• Ratings, polls, recommender systems
• Data publishing policy?
2. Research sustainability
• Ensure curation & preservation of long term
scientific record including the data
• Requires significant investment in infrastructure
• Assure data security
• Demonstrate resilience & robustness
• Establish trust
• New business models
• Understand full costs
3. Research capacity & capability
• Multidisciplinary
team
• Hybrid skills
• New field data
informatics
• New roles for
information
professionals?
IJDC 2009
(in press)
• Increase capacity & capability
• Embed skills in LIS curriculum
• Develop career paths, incentivise
Take homes
1. Team science is a social activity
2. We need to advocate the value
& benefits of data publishing
3. Data informatics underpins
C21st scholarship
Moving to
Multi-Scale
Science:
Managing
Complexity
& Diversity
Thank you
Slides will be available at :
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html
http://www.dcc.ac.uk/