20130122_ConvergenceMeeting_OpenPhacts

Download Report

Transcript 20130122_ConvergenceMeeting_OpenPhacts

Paul Groth
http://www.few.vu.nl/~pgroth/
@pgroth
VU University Amsterdam
Convergence Meeting: Semantic Interoperability for Clinical
Research & Patient Safety in Europe
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
1
The Problem
We are all doing this many times……
Pfizer
AZ
GSK
Merck
n
Open PHACTS objective
Platform
Apps
API
Standards
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
3
Partners
Convergence Meeting: Semantic
Interoperability for4Clinical Research &
Patient Safety in Europe
Associate Partners
Sequeno
mics
Convergence Meeting: Semantic
Interoperability for5Clinical Research &
Patient Safety in Europe
“Let me compare
MW, logP and PSA
for known
oxidoreductase
inhibitors”
“What is the
selectivity profile of
known p38 inhibitors?”
ChEMBL
Gene
Ontology
DrugBank
ChEBI
UniProt
ConceptWiki
“Find me compounds
that inhibit targets in
NFkB pathway assayed
in only functional assays
with a potency <1 μM”
Wikipathways
UMLS
ChemSpider
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
GeneGo
GVKBio
TrialTrove
TR Integrity
6
Open PHACTS Explorer
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
7
PharmaTrek
ChemBioNavigtor
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
9
Utopia
Documents
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
10
Semantic interoperability approach
Principles
• Respect data providers
• Make it easy for application developers
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
11
Semantic interoperability approach
Fig. 1. The Open PHACTs Core Architecture
Convergence Meeting: Semantic
Interoperability for
Clinical
Research
& often making cross-linking
12 of
suffer from a proliferation of identifiers
and
URIs
[22]
Patient Safety in Europe
Semantic Resources – Data sets
814,535,923 triples
Source
Version
Supplier
Downloa
ded
Initial Records
Triples
Chembl
Chembl 13
RDF (11Jun-2012)
Maastricht
08 Aug
2012
1,149,792
(~ 1,091,462
compounds,
8845targets)
146,079,194
DrugBank
Aug 2008
08 Aug
2012
2012_07
(July 11,
2012)
July 11,
2012
Release
94
19,628
(~14,000
targets, 5000
drugs)
536,789
517,584
SwissProt
Bio2Rdf
(www4.wi
wiss.fuberlin.de)
SIB
17 (for
compounds
)
13 (for
targets)
74
156,569,764
78
SIB
07 Aug
2012
08 Aug
2012
08 Aug
2012
07 Aug
2012
6,187
73,838
2
35,584
905,189
2
1,194,437
161,336,857
2,828,966
3,739,884
22 ACD
4 CS
1
ENZYME
ChEBI
ChemSpider
ACD Labs
ConceptWiki
EBI
ChemSpid
er
NBIC
07 Aug
2012
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
Pro
per
ties
13
Semantic Resources - Mappings
18 Million
Mappings
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
14
Semantic resources - Summary
• Types of semantic resources
– RDF Datasets
– Mappings
– Terminologies
• Mesh, UMLS, NCIM
– Hierarchies are essential
• E.G. Target Ontology, Gene Ontology, Enzyme
classification
• Class reasoning is essential
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
15
Methodology for semantic integration
1. Define use cases
2. Data Providers – create RDF with VoID headers
3. Create mappings
–
–
4.
5.
6.
7.
between dataset and known datasets (instance level)
index for text to url conversion
Ingest RDF into data cache (i.e. triple store)
Define access paths to core concepts in data
Extend or create sparql queries for API calls
Publish api calls
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
16
Its easy to integrate, but difficult
to integrate well
Adoption of standards
• Basic Semweb standards
– SPARQL 1.1, RDF(S), SKOS
• Dataset descriptions
– Vocabulary of Interlinked Datasets (VoID)
– VoID linkset descriptions
• QUDT Quantities, Units, Dimensions and Types
• Provenance
– W3C PROV, PAV, Nanopublications
• BioPortal
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
18
Tooling
• Infrastructure
–
–
–
–
–
Linked Data API
Bridge DB - identifier to identifier mapping
Concept Wiki - text to identifier mapping and curation
Chemspider: chemistry registration and services
Triple Store: Virtuoso Professional addition
• Data
– VoID descriptions and http and ftp sites
– Github for data conversion scripts
– Recommend turtle as RDF syntax
• friendly for scripting
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
19
Quality assurance of the semantic
resources
• Provenance Everywhere
• Validation
• ChemSpider Validation and Standardization Platform
(CVSP) for flagging chemical representation issues
• Curation
• High quality chemical names and synonyms.
• Curation interfaces for terminologies (concept wiki)
• Report data quality issues to data providers
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
20
Semantic interoperability issues
1. Do not underestimate infrastructure
2. APIs are important
1. Allows for tuning of sparql queries
2. Makes it easy for developers
3. Ontologies- Requirements vs.
Recommendation
4. Modeling is hard
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
21
Open PHACTS Information
•
•
•
•
http://www.openphacts.org
[email protected]
@Open_PHACTS
Publications
– Overview paper: Williams, A.J., Harland, L., Groth, P., Pettifer, S., Chichester, C.,
Willighagen, E.L., Evelo, C.T., Blomberg, N., Ecker, G., Goble, C., Mons, B.: Open PHACTS:
Semantic interoperability for drug discovery. Drug Discovery Today. 17, 1188–1198
(2012).
– Technical approach: Gray, A.J.G., Groth, P., Loizou, A., et al.: Applying linked data
approaches to pharmacology: Architectural decisions and implementation. Semantic
Web. (2012).
Convergence Meeting: Semantic
Interoperability for Clinical Research &
Patient Safety in Europe
22