WP4 Data Representation

Download Report

Transcript WP4 Data Representation

CASIMIR WP4
Data Representation
John Hancock
Duncan Davidson
CASIMIR Networking Meeting
Heathrow, July 2007
Objectives
• Assessment of technical aspects of database interoperability as
a barrier to scientific and financial sustainability
• Assessment of the variability of practice in the semantics of
biological data representation, e.g. genotype, gene expression
• Assessment of emerging standards and current practice for data
representation, annotation and ontologies
CASIMIR Networking Meeting
Heathrow, July 2007
• 4.1 - D9 - Classified list of data representations in European
mouse-centric and related databases
• 4.4 - Network meeting 1 - June-Sep 07 - Bring together
bioinformatics reps from (EU-funded) mouse projects to
discuss data representation
• 4.4 - Joint work package meeting to discuss results (4-5 Oct 07)
• 4.5 - Sep - Dec 07 - Report of network meeting
• 4.6 - Present conclusions at meetings
CASIMIR Networking Meeting
Heathrow, July 2007
Discussion Points
• What do we understand by “data
representation” - is it just CVs/Ontologies?
– Interaction with other work packages
• What kinds of data?
• What ontologies? How many on the PRIME
list do you use? Do you use others? Do you
use OBO ontologies by default?
• What processes are they involved in
elsewhere to discuss/unify data
representation?
CASIMIR Networking Meeting
Heathrow, July 2007
Future: Cross-Species
Interactions
• Mouse-Human must be a priority because of the
disease angle
• Mouse-Rat - already quite well integrated (?To what
extent?) because of MGI-RGD-OBO interactions
• Other important models
–
–
–
–
–
–
Chick (ChickEST (UK), ChickVD (CN), Ensembl, others?)
Xenopus
Zebrafish
Drosophila
C. elegans
Yeast, E.coli
• In longer term get together with community reps to
discuss similarities & differences
CASIMIR Networking Meeting
Heathrow, July 2007
Extant Resources
• PRIME Expert Group Report and
Outcomes
• Euromouse
• Interphenome discussion group & pilots
• EUMORPHIA/EUMODIC
bioinformaticians
CASIMIR Networking Meeting
Heathrow, July 2007
PRIME Expert Group
• Draft lists of:
– Databases
– Ontologies
CASIMIR Networking Meeting
Heathrow, July 2007
Interphenome
• Phenotype data:
– Common data description
– Common protocol description
– Standard for data exchange
CASIMIR Networking Meeting
Heathrow, July 2007
Interphenome - Current Status
• Ontologies
– Investigate cross-mapping of current approaches and
eventual possible convergence (?)
• Protocols
Publication
in Mammalian Genome 18, 157-163 (March 2007):
– Work on developing a format that can accommodate all
information needed for a protocol
“Integration
of Mouse Phenome Data Resources”
– Encode this as an XML schema
By The
Mouse Phenotype Database Integration Consortium
– PPML?
• Data Exchange
– Work on an XML schema that will allow structured exchange
of phenotype data and metadata - started work on this in
EUMODIC
CASIMIR Networking Meeting
Heathrow, July 2007
WP4 - 1st Actions
Update the PRIME list of European
mouse projects
Also identify “mouse-related” projects
Identify contacts
• To hold a meaningful dialogue, get as
many as possible to a networking
meeting
CASIMIR Networking Meeting
Heathrow, July 2007
Ontologies - So Far
• We have a little list
• Test how many of these are actually in
use - Questionnaire
• Check how up to date it is, and track
developments (e.g. Relationships Ontology,
potential Synapse Ontology)
CASIMIR Networking Meeting
Heathrow, July 2007
The CASIMIR Questionnaire
• http://www.casimir.org.uk/questionnaire.php
• 1a. Are you using a relational database, object
database or flat files?
• 1b. If relational, what is your chosen RDBMS
(Relational Database Management System)?
• 2a. Is your database providing external links to other
on-line resources; possibly via URL/HTTP (if yes
please name them)?
• 2b. Supported/Installed Web Services (if yes please
name them)? Do you plan to install or develop web
services in the near future?
CASIMIR Networking Meeting
Heathrow, July 2007
The CASIMIR Questionnaire
• 3a. Please list the sorts of data entities you store
(e.g. protein sequence data, mouse strain information
etc...)
• 4a. Can you provide a brief explanatory
description/schema of your data/data structure?
• 4b. Are you willing to provide a entity relationship
diagram and would you be willing to provide it under
an open source license?
CASIMIR Networking Meeting
Heathrow, July 2007
The CASIMIR Questionnaire
• 5a.Are you currently using or do you intend to use
any ontologies or controlled vocabularies to describe
your data?
• 5b. Do you plan to expand your use of ontologies in
future?
• 5c. Do you use OBO ontologies?
• 5d. Do you perceive the need for additional
ontologies to serve your domain of knowledge?
CASIMIR Networking Meeting
Heathrow, July 2007
The CASIMIR Questionnaire
• 6. Do you make use of Minimum Information
standards (such as MIAME for microarray
experiments) to describe any data? If so,
which ones? If you do not make use of these
standards, are you likely to do so in future?
CASIMIR Networking Meeting
Heathrow, July 2007
Minimum Standards
• MIAME - Brazma et al (2001) Nat.
Genet. 29, 365-71
CASIMIR Networking Meeting
Heathrow, July 2007
The CASIMIR Questionnaire
• 7. What do you perceive as the main limiting factor in
data representation/interoperability etc. in European
bioinformatics databases?
• 8. Do you have any comments/thoughts on standards
for data representation that need to be developed or
that you might like discussed in CASIMIR?
CASIMIR Networking Meeting
Heathrow, July 2007
The CASIMIR Questionnaire
Please fill it in as soon as humanly possible!
We will be chasing around database
coordinators over the next few months to
make sure we have as much information as
possible
CASIMIR Networking Meeting
Heathrow, July 2007
Agenda for Today
• Reports from some databases:
–
–
–
–
MUGEN - Christina Chandras
EMMA - Glenn Proctor
EUMODIC - Niels Adams
EUCLIS - Eduardo Mendoza
• Discussion, e.g.
– Comments on the questionnaire/CASIMIR’s aims
– How to get widest possible participation
– What do people see as the main obstacles to the
aim of integrating all this data?
CASIMIR Networking Meeting
Heathrow, July 2007
Mouse to Human
H
u
m
a
n
M
o
u
s
e
DISEASE
Phenotypic Attributes
Phenotypic Attributes
Phenotypic Attributes
Phenotypic Attributes
Phenotypic Attributes
Phenotypic Attributes
Phenotypic Measures
Phenotypic Measures
Phenotypic Attributes
Phenotypic Measures
PHENOTYPING
CASIMIR Networking Meeting
Heathrow, July 2007