JamesMaloneGeneAtlasBioRDF20110228

Download Report

Transcript JamesMaloneGeneAtlasBioRDF20110228

RDFizing the EBI Gene
Expression Atlas
James Malone, Electra Tapanari
[email protected]
Master headline
Motivation
-
Initial motivation is explorative
Can we ask new questions?
Do we get new answers?
Can we integrate this data with other related data?
Is there a sufficient user community to justify an RDF
Atlas resource?
Master headline
SESL Project
- Semantic Enrichment of Scientific Literature Working
Group
- Includes EBI (Dietrich Rebholz) and Pistoia Alliance
- Pilot project in 2010 looking at Developing knowledge
brokering standards for semantic integration of gene to
Type II diabetes data using Gene Expression Atlas,
OMIM, UniProt literature
Master headline
Gene Expression: Archive to Atlas
ArrayExpress
Curation
AE/GEO acquire
Master headline
Curation
>250,000
Assays
Re-annotate & summarize
>10,000
experiments
ATLAS
Experimental Factor Ontology
•
•
•
We consume parts of reference ontologies from domain
Construct new classes and relations to answer our use cases
Aim is reuse of existing resources, shared frameworks and mapping of
equivalencies where they exist
Ontology Biomedical
Investigations
Relation
Ontology
Disease Ontology
EFO
5
4/2/2016
Master headline
Chemical Entities of
Biological Interest
(ChEBI)
Anatomy
Reference
Ontology
Various
Species
Anatomy
Ontologies
Gene Expression Atlas
@ www.ebi.ac.uk/gxa
Query for Cell adhesion genes in all ‘organism parts’
‘View on EFO’
Master headlineModeling Sample Variables in Gene Expression Data
Ontologically
[email protected]
Input XML
Master headline
Mapping XML Results to RDF (1)
•
•
Gene to related transcripts, sequence and gene functions
Also EFO ontology classes in RDF form (shown is label to IRI
triple)
Id here is an ENSEMBL Gene ID,
e.g. RUNX1 (ENSG00000159216)
Master headline
Mapping XML Results to RDF (2)
• Connecting gene and ontology id together with
experimental metrics
Master headline
Mapping XML Results to RDF (3)
• Connecting gene with experimental metadata
Master headline
Relationship Issues
• EFO attempts to follow OBO Foundry guidance and uses
the OBO Relation Ontology
• OBI model is more complex, e.g. the relation between
sample and measure is indirect*
• Relationship between some of entities is still not well
represented across community, even protein product to
gene (see my post to OBO list)
• is_about relation is very generic and largely meaningless
• We will use RO where possible, subclass RO otherwise
and continue to monitor OBO
*see Brinkman et al, (2010) Modeling biomedical experimental
processes with OBI, JBMS, 1(Suppl 1):S7
Master headline
Display of query results in Gene Expression Atlas DB





Already:
1) JSON format
2) XML format
Plus now:
3) RDF format
Master headline
RDF pipeline
• Pipeline for generating the RDF given the XML input
• note this works with any XML code
INPUT
XML result doc
from Atlas
XML doc with
triple patterns
Master headline
PROCESS
OUTPUT
Java code
RDF triples XML doc
Triple Pattern specification
Master headline
Example RDF
Master headline
Blank Node Connections
•
First row (n1_0 ) 7 triples
Master headline
Discussion
• Is there a community that warrants directing resources
towards this?
• Can we answer new questions?
• Can we integrate with other data sources?
• Can we consolidate complex, non-interoperable
ontologies?
• EFO represents a view on this but is a scoped, pragmatic
choice – will this indeed always be the case?
Master headline
Acknowledgements
•
•
•
•
•
Electra Tapanari (intern that did bulk of implementation)
Dietrich Rebholz-Schumann (funding internship)
Christoph Grabmuller
Misha Kapushesky
Helen Parkinson
• Contact me
James Malone:
Master headline
[email protected]