SageCite - The National Academies

Download Report

Transcript SageCite - The National Academies

Monica Duke
[email protected]
Project Manager, SageCite Project
http://blogs.ukoln.ac.uk/sagecite/
#sagecite
Developing Data Attribution and Citation
Practices and Standards
An International Symposium and Workshop
August 22-23, 2011
UKOLN is supported by:
www.ukoln.ac.uk
A centre of expertise in digital information management
Citation in the domain of disease network modelling
Funded: August 2010 – July 2011
www.ukoln.ac.uk
A centre of expertise in digital information management
SageCite project overview
• Review of data citation (issues,
technology)
• Understanding the domain
– Sage Bionetworks partners in project
– Site visit
– Documenting processes (workflow tools)
www.ukoln.ac.uk
A centre of expertise in digital information management
SageCite project overview
• Demonstrator
– Adding support for data citation
– Using DataCite services
• Working with publishers
• Benefits analysis: KRDS Taxonomy
www.ukoln.ac.uk
A centre of expertise in digital information management
www.sagebase.org
• US-based non-profit organisation
• Creating a resource for communitybased, data-intensive biological
discovery
• Community-based analysis is
required to build accurate model
www.ukoln.ac.uk
A centre of expertise in digital information management
www.sagebase.org
• US-based non-profit organisation
• Creating a resource for communitybased, data-intensive biological
discovery
• Community-based analysis is
required to build accurate models
www.ukoln.ac.uk
A centre of expertise in digital information management
Slide by Lara Mangravite Sage Bionetworks
Sage data and processes
Data
curation
Statistical
QC
Genomic
analysis
Network
construction
Network
analysis
Data mining
Validation
• Idealised 7-stage process
• A combination of phenotypic, genetic, and expression
data are processed to determine a list of genes
associated with diseases
• Different people are responsible for different stages of
the modelling process. One person oversees the
whole process.
www.ukoln.ac.uk
A centre of expertise in digital information management
• Stage 1: Data Curation
– basic data validation to ensure integrity
and completeness
– datasets include microarray data and
clinical data.
– ensures that the format of the data is
understood and the required metadata is
present.
www.ukoln.ac.uk
A centre of expertise in digital information management
Data
curation
Statistical
QC
Genomic
analysis
Network
construction
Network
analysis
www.ukoln.ac.uk
A centre of expertise in digital information management
Data mining
Validation
Agreeing standards to support
sharing
• Derry J et. al Developing predictive
Molecular Maps of Human Disease
through Community-based Modeling.
• http://precedings.nature.com/documents/5883/version/1/files/
npre20115883-1.pdf
www.ukoln.ac.uk
A centre of expertise in digital information management
Workflow capture using
Taverna http://www.vimeo.com/27287109
Documenting data processes through
workflow tools
– supports better citation
– makes the cited resource more reusable
– strengthening the reproducibility and
validation of the research.
www.ukoln.ac.uk
A centre of expertise in digital information management
Data Citation Purposes
• For attribution
– Leading to credit and reward
• For reproducibility
– Supports validation, re-use
• Eric Schadt at Sage Bionetworks
Congress 2011
– http://fora.tv/2011/04/16/Eric_Schadt_Ma
p_Building (start at 4.28)
www.ukoln.ac.uk
A centre of expertise in digital information management
Open challenges: attribution
• Preserving link with original data
– Some discipline-based repositories have their own
identifiers
– Bi-directional links
• Attributing data creators
– including individuals?
• Defining creation of new intellectual object e.g.
curated dataset?
• Cultural challenge in recognising non-standard
contributions; microattribution
• New metrics
• Identification of contributors
www.ukoln.ac.uk
A centre of expertise in digital information management
Open challenges: reproducibility
• Identification and granularity
– Discipline identifiers, global identifiers
– How much value has been added since
the data entered the workflow?
• Identifying processes and software
www.ukoln.ac.uk
A centre of expertise in digital information management
Acknowledgements
• University of
Manchester
– Carole Goble
– Peter Li
• British Library
– Max Wilkinson
– Tom Pollard
• Sage Bionetworks
• UKOLN
– Liz Lyon
– Monica Duke
• Nature Genetics
– Myles Axton
• PLoS Comp Bio
– Phil Bourne
www.ukoln.ac.uk
A centre of expertise in digital information management