Semantics for eScience

Download Report

Transcript Semantics for eScience

Semantics for eScience
Susie Stephens,
Principal Research Scientist,
Eli Lilly
Outline
• Introduction to the Semantic Web
• W3C’s Semantic Web for Health Care and Life
Sciences Interest Group
• Semantic Web Solutions at Lilly
Introduction to the Semantic Web
Drivers for the Semantic Web
• Business models develop rapidly these days, so infrastructure
that supports change is needed
• Organizations are increasingly forming and disbanding
collaborations so need to be able to better share data
• Increasing need in pharma to be able to query across data silos
• Data is growing so quickly that it is no longer possible for
individuals to identify patterns in their heads
• Increasing recognition of the benefits of collective intelligence
Characterizing the Semantic Web
• Semantic Web is an interoperability technology
• An architecture for interconnected communities and
vocabularies
• A set of interoperable standards for knowledge exchange
Creating a Web of Data
Applications
Graph representation
Data in various formats
Source: Ivan Herman
Mashing Data
Source: W3C
W3C’s Semantic Web for Health Care
and Life Sciences Interest Group
Task Forces
• Terminology – Semantic Web representation of existing resources
•
Task lead - John Madden
• Scientific Discourse – building communities through networking
•
Task leads - Tim Clark, John Breslin
• Clinical Observations Interoperability – patient recruitment in trials
•
Task lead - Vipul Kashyap
• BioRDF – integrated neuroscience knowledge base
•
Task lead - Kei Cheung
• Linking Open Drug Data – aggregation of Web-based drug data
•
Task lead - Chris Bizer
• Other Projects: Clinical Decision Support, URI Workshop,
Collaborations with CDISC & HL7
BioRDF: Integrating Heterogeneous Data
• Integration and analysis of heterogeneous data sets
•
Hypothesis, Genome, Pathways, Molecular Properties, Disease, etc.
PDSPki
Gene
Ontology
NeuronDB
Reactome
BAMS
Antibodies
NC
Annotations
Entrez
Gene
Allen Brain
Atlas
BrainPharm
MESH
Mammalian
Phenotype
SWAN
AlzGene
Homologene
Publications
PubChem
BioRDF: Looking for Targets for Alzheimer’s
• Signal transduction pathways are
considered to be rich in “druggable”
targets
• CA1 Pyramidal Neurons are
known to be particularly damaged
in Alzheimer’s disease
• Casting a wide net, can we find
candidate genes known to be
involved in signal transduction and
active in Pyramidal Neurons?
Source: Alan Ruttenberg
BioRDF: SPARQL Query
Source: Alan Ruttenberg
BioRDF: Results: Genes, Processes
DRD1, 1812
ADRB2, 154
ADRB2, 154
DRD1IP, 50632
DRD1, 1812
DRD2, 1813
GRM7, 2917
GNG3, 2785
GNG12, 55970
DRD2, 1813
ADRB2, 154
CALM3, 808
HTR2A, 3356
DRD1, 1812
SSTR5, 6755
MTNR1A, 4543
CNR2, 1269
HTR6, 3362
GRIK2, 2898
GRIN1, 2902
GRIN2A, 2903
GRIN2B, 2904
ADAM10, 102
GRM7, 2917
LRP1, 4035
ADAM10, 102
ASCL1, 429
HTR2A, 3356
ADRB2, 154
PTPRG, 5793
EPHA4, 2043
NRTN, 4902
CTNND1, 1500
adenylate cyclase activation
adenylate cyclase activation
arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway
dopamine receptor signaling pathway
dopamine receptor, adenylate cyclase activating pathway
dopamine receptor, adenylate cyclase inhibiting pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
glutamate signaling pathway
glutamate signaling pathway
glutamate signaling pathway
glutamate signaling pathway
integrin-mediated signaling pathway
negative regulation of adenylate cyclase activity
negative regulation of Wnt receptor signaling pathway
Notch receptor processing
Notch signaling pathway
serotonin receptor signaling pathway
transmembrane receptor protein tyrosine kinase activation (dimerization)
ransmembrane receptor protein tyrosine kinase signaling pathway
transmembrane receptor protein tyrosine kinase signaling pathway
transmembrane receptor protein tyrosine kinase signaling pathway
Wnt receptor signaling pathway
Many of the genes
are related to AD
through gamma
secretase
(presenilin) activity
Source: Alan Ruttenberg
LODD: Introduction
Use Semantic Web technologies to
1. publish structured data on the Web
2. set links between data from one data source to data within other data sources
Linked Data
Browsers
Linked Data
Mashups
Search
Engines
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
typed
links
A
typed
links
B
typed
links
C
typed
links
D
E
Source: Chris Bizer
LODD: Potential Links between Data Sets
Source: Chris Bizer
LODD: Potential questions to answer
• Physicians and Pharmacists
• What are alternative drugs for a given indication (disease)?
• What are equivalent drugs (generic version of a brand name, or the
chemical name of a active ingredient)?
• Are there ongoing clinical trials for a drug?
• Patients
•
•
•
•
What background information is available about a drug?
What are the contraindications of a drug?
Which alternative drugs are available?
What are the results of clinical trials for a drug?
• Pharmaceutical Companies
•
•
What are other companies with drugs in similar areas?
Which companies have a similar therapeutic focus?
Source: Chris Bizer
LODD: Linked Version of ClinicalTrials.gov
• Total number of triples:
6,998,851
• Number of Trials:
61,920
• RDF links to other data
sources: 177,975
• Links to:
• DBpedia and YAGO
(from intervention and conditions)
• GeoNames (from locations)
• Bio2RDF.org's PubMed (from references)
Source: Chris Bizer
Semantic Web Solutions at Lilly
Implementations at Lilly
• Integration of Clinical and Pathways Data
• Competitive Intelligence
• Experimental Metadata
• Discovery Metadata
Discovery Metadata: Goals
• Integrate master data throughout the discovery
process to enable information sharing/integration for
the scientific community
• Model key relationships between master data classes
• Provide ability to integrate disparate data sets quicker than the
normal warehouse paradigm typically allows
• Create a re-usable and sustainable semantic implementation
• Allow for user-driven, manual curation of key data
relationships
Source: Phil Brooks
Discovery Metadata: Ontology
SAP
Legacy
REFDB
GSM
Manual
Curation
NCBI
Source: Phil Brooks
Discovery Metadata: Architecture
A
P
P
S
Application 1
S
O
A
Application 2
…
Application 3
SOA Layer/Enterprise Service Bus
(WebServices, Visualizers, DataAccess Components)
SQL
Authentication
SPARQL
ETL
D
A
T
A
Source
Model 1
Source
Model 2
Source
Model 3
Source
Model 4
Other
Other
Source
Sources
…
Sources
Rdbms
Local
Assertions
Top Level
Ontology
Provenance
Other
Tools
Spreadsheets
Source: Phil Brooks
External Collaborations
• RDF Access to Relational Databases - Chris Bizer, Eric Prud'hommeaux
• Scalability testing of relational to RDF mapping approaches
• End User Semantic Web Authoring - David Karger
• Enhancing the scalability and robustness of the Exhibit and Potluck tools
• Scientist-Driven Semantic Integration of Knowledge in Alzheimer's
Disease - Tim Clark, June Kinoshita
• Project to develop an integrated knowledge infrastructure for the neuromedical
research community, pairing rich digital semantic context with the ever-growing
digital scientific content on the web
• Provenance Collection and Management - Carole Goble, Beth Plale
• Project to develop a metadata taxonomy for global data at Lilly which enables
the rapid integration of data and mining/analysis algorithms into dataflows
which support clinical and discovery decisions
• W3C’s Health Care and Life Sciences Interest Group
Conclusion
• Many Semantic Web solutions are being explored within the
health care and life sciences community
• Lilly is seeing tangible benefits in multiple projects from
Semantic Web
• Semantic Web provides a flexible framework for data integration
•
Incremental adoption of technology
• Flexibility to integrate unanticipated data sets
• Link existing silos together
• Lilly is setting up open collaborations in this space
• Try out LSG