wdi_igad_sandiego - Research Data Alliance

Download Report

Transcript wdi_igad_sandiego - Research Data Alliance

RDA Wheat Data Interoperability
Cookbook and last developments
9th March 2015, San Diego
The WDI working group in brief
 Endorsed by RDA in March 2014
 Members: ~=30 members and 15 active members, Wheat
scientists, data and metadata technologists
 The goal: contribute to the improvement of Wheat related
data interoperability by
 Building a common interoperability framework (metadata, data formats and
vocabularies)
 Providing guidelines for describing, representing and linking Wheat related
data
2
Initial plans
3
 Deliverables
 A report of the survey of existing standards
 A cookbook intended for the Wheat data managers community, which
provides them with guidelines on what data formats, metadata, vocabularies
and ontologies they should use to describe, represent and link different
types of Wheat data.
 A library of linked vocabularies and ontologies in machine readable formats
with respect to the Linked Data standards.
 A prototype which showcases the gain of interoperability
Where we are
4
• Landscape of Wheat related standards and their use by the community
• Comprehensive overview of Wheat related ontologies and vocabularies
Surveys
Workshops
• Recommendations
• Mappings between different data formats
• Actions to conduct in order to improve the current level of Wheat related
data interoperability
• Interoperability use cases
• Interactive cookbook: recommendations + guidelines
• A repository of Wheat related linked vocabularies (Bioportal)
Implementation
Wheat related standards survey and workshop
Data type
Data formats currently used
Recommendations
6
Standardized
Tool specific
SNPs
VCF
BAM/SAM,
BED,
VARSCAN,
VEP
genome
annotations
Genbank Flat File,
General Feature
Format (GFF), EMBL
Germplasms
MPCD, ABCD, Darwin
Core, Darwin Core
Germplasm
Gene
expression
Many format standards
laid out by repositories
such as NCBI (GEO)
and EBI Array Express
Physical maps
GFF
Non
standardized
VCF files generated by using the
survey sequences of IWGSC +
metadata about VCF files to
enrich the information about the
SNPs.
GFF 3 + specifications with
regard the description of specific
columns
Grin Global
tabulated
MPCD
Existing format standards laid out
by the repositories such as NCBI
(GEO) and EBI Array Express +
ENA
Cmap, fpc
GFF3
Genetic maps
Cmap, gnpmap
GFF3 (to be confirmed)
Phenotypes
Drops, ped, isatab, ephesis
tabulated
Isa-tab
Examples of use cases
7
Title
Searching for germplasm with specific traits
Description
Example of searching for germplasm with specific traits - tagged with ontology terms?
Data types
Germplasm
Phenotype
Challenges
●
●
●
●
●
Title
Identification of wheat genes that control root growth
Description
Requires: Annotated genes (Gene Ontology, PFam, and other functional annotation)
Data types
Genomic annotations? - Gene location ? (IWGS-SS ID or MIPS HCS link)
Challenges
Mapping between wheat genes and orthologs from other species (deduce function by seq. similarity);
Access to RNASeq data (genes that are not expressed in roots may be irrelevant) ; mapping of wheat
genes and information on their function based on literature
Title
Query on trial data associated with varieties
Data types
Phenotypic data, GIS data, (wheat economy/production data)
Description
To search wheat varieties with distribution maps, production figures, performances in wheat mega
environments, associated projects worldwide plus layers of climatic data on specific wheat production
areas and disease prevention information.
Challenges
Phenotypic data should be linked to GIS data. Using keywords or ontology terms a system or a tool
should be able to pull out such information from different websites/systems developed by wheat
community.
Metadata very important ~ standardized format
Association of genes to traits, linked to germplasm, marker information
Need for quality controls- how confident are you of the data source?
Provenance of the germplasm- pedigree, ownership,
Standard system for tracking germplasm, names
8
Wheat related ontologies and vocabularies
survey
The objectives of the survey
 Assess the level of visibility and interoperability of Wheat
related vocabularies and ontologies
 Is the vocabulary/ontology updated regularly?
 What license and/or copyright is used?
 Is the vocabulary/ontology part of any ontology communities or listing
services?
 Is the vocabulary/ontology used or implemented in any database/repository?
 Does the vocabulary/ontology interlink and/or map to other vocabularies and
ontologies?
 Does the vocabulary/ontology
 Identify the domain covered by the ontologies and
vocabularies
 Refine the cookbook
 Collect more interoperability use cases
 Collect some technical details
10
The objectives of the survey
What level of
visibility/operability?
What content?
What formats, and
technologies?
Guidelines and
Repository
11
The Wheat related BioPortal allows one to search for terms across multiple ontologies, browse
mappings between terms in different ontologies, receive recommendations on which ontologies are
most relevant for a corpus, annotate text with terms from ontologies
Next steps
 Metadata (harmonization, minimal metadata sets)
 Mappings
 Next workshop (summer 2015)
 Review and complete the recommendations
 Refine and complete the guidelines and the best practices
 Finalize the repository of Wheat related vocabularies
 Implement the prototype
13
Thanks!
14