77.0.smedley

Download Report

Transcript 77.0.smedley

Phenotype database interoperability
and integration
Damian Smedley, EBI
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Why do we need data integration
and interoperability?
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Centralised
Distributed
warehouse
solutionsolutions
v1
v2
Centralised
vs distributed
Genomics
portal
MGI
JaxMice
Ensembl
Central
database
IKMC projects
KOMP
Strains
EUCOMM NorCOMM
Mouse models for human disease
IMSR
EMMA
nightly
data syncs
web services
Phenotype/Expression
TIGM
Eurexpress
/GXD etc
Europhenome
The Royal Society London, May 19-21st, 2010
Centralised solutions
Advantages
– Better query performance for large datasets
– Easier to analyse raw data in one location
Disadvantages
– Regular data deposition is non-trivial
– Designing a single schema to store different types
of data is not simple.
– Persuading people to “give up” their
data/databases/websites
– Will still need to make interoperable with other data
sources
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Distributed solutions
Advantages
– Domain expertise at production site exploited
– Different types of data easily integrated as long as they share
something in common such as a gene identifier
– No need for nightly data flow to keep data up to date
– No need for redundant data in each database
– Easier to persuade people to collaborate in a distributed scenario
Disadvantages
– Technical knowledge required to deploy the web services
– Potential query performance problems for large datasets (may need
to provide summary level data)
– Potential problems performing analysis over all datasets
– Problems with services going down
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
1000 Genomes - centralisation
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
International Cancer Genome Consortium
France
Liver (alcohol-related)
Breast (HER2+ve)
UK
Breast (several subtypes)
Japan
Liver (virus related)
Canada
Pancreas
China
Stomach
Spain
CLL
India
Oral Cavity
Australia
Pancreas
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
ICGC - distributed
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Joint Ensembl and EurExpress query
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
IKMC portal: knockoutmouse.org
NorCOMM
Eurexpress
IMSR
CMMR
Europhenome
EUCOMM
GXD
EMMA
KOMP rep
KOMP
TIGM
Ensembl
CREATE
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
IKMC interoperability strategy
CREATE
Ensembl
GXD
Sanger, UK
JAX, USA
EBI, UK
IKMC
MGI ID
MGI ID
EURExpress
MGI ID
Sanger, UK
ES cells + lines
MGI ID
MGI ID
EMMA (UK), KOMP (USA), CMMR (Canada)
Edinburgh, UK
BioMart query
interface(s)
MGI ID
MGI ID
MGI
Phenotype(EuroPhenome etc)
JAX, USA
Harwell, UK
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
www.knockoutmouse.org/martsearch
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Europhenome: raw and summary data
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Possible strategy for phenotype data
CREATE
High thoughput phenotyping centres
Ensembl
GXD
Sanger, UK
JAX, USA
EBI, UK
IKMC
MGI ID
MGI ID
Central
database
Sanger, UK
ES cells + lines
MGI ID
EURExpress
MGI ID
MGI ID
Presentation of
results
EMMA (UK), KOMP raw
(USA), CMMR
(Canada)
Edinburgh, UK
BioMart query
interface(s)
MGI ID
MGI ID
Analysis to assign
MGI
phenotypes
to genes
High throughput phenotyping
JAX, USA
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Linking from IKMC portal
Phenotype searches
Phenotyping
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Linking from IKMC portal
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Mouse models for human disease
Mouse models for human disease
The Royal Society London, May 19-21st, 2010
Acknowledgements
The whole CASIMIR consortium and in particular:
• Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos,
Ann-Marie Mallon, John Hancock: MouseFinder tool.
• MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes
• BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora
Mouse models for human disease
The Royal Society London, May 19-21st, 2010