No Slide Title

Download Report

Transcript No Slide Title

Applying Semantic Web Standards to Drug
Discovery and Development
Eric Neumann
W3C HCLS co-chair
Knowledge
“--is the human acquired capacity (both
potential and actual) to take effective
action in varied and uncertain
situations.”
How does this translate into using Information Systems
better in support of Innovation?
2
Knowledge 
Predictiveness
• Knowledge of Target Mechanisms
• Knowledge of Toxicity
• Knowledge of Patient-Drug Profiles
3
Where Information Advances are
Most Needed
• Supporting Innovative Applications in R&D
– Mol Diagnostics (Biomarkers)
– Molecular Mechanisms (Systems)
– Data Provenance, Rich Annotation
• Clinical Information
–
–
–
–
eHealth Records + EDC
Clinical Submission Documents
Safety Information, Pharmacovigilance, Adverse Events
Handling Biomarker evidence
• Standards
– Central Data Sources
• Genomics, Diseases, Chemistry, Toxicology
– MetaData
• Ontologies
• Vocabularies
4
Raw Data
MAGE ML
Decision
Support
GO
CDISC
BioPAX
Biomarker
Qualification
Translational
Research
Psi XML
ICH
ASN1.
XLS
SAS Tables
Target
Validation
Semantic Bridge
New
Applications
Safety
CSV
Tox
5
Losing Connectedness in Tables
Fast Uptake and ease of use,
but loose binding to entities and terms
?
Genes
Tissues
6
Data Integration?
• Querying Databases is not sufficient
• Data needs to include the Context of Local
Scientists
• Concepts and Vocabulary need to be
associated
• More about Sociology than Technology
Information  Knowledge
7
Data Integration:
Biology Requirements
Papers
Disease
Proteins
Genes
Retention
Policy
Samples
Compounds
Audit
Trail
Curation
Ontology Experiment
Tools
8
Standards- Why Not?
• Good when there’s a majority of agreement
• By vendors, for vendors?
• Mainly about Data Packing-- should be more
about Semantics (user-defined)
• Ease and Expressivity
• Too often they’re Brittle and Slow to develop
• “They’re great, that’s why there are so many of
them”
9
Data Integration Enables Business
Integration: Efficiency and Innovation
•
•
•
•
•
•
Searching
Visualization
Analysis
Reporting
Notification
Navigation
10
Searching…
#1 way for finding information in
companies…
11
Semantic Web Data Integration
R&D Scientist
Dynamic,
Linked,
Searchable
LIMS
Bioinformatics
Cheminformatics
13
Public Data Sources
The Current Web
 What the computer sees:
“Dumb” links
 No semantics - <a href>
treated just like <bold>
 Minimal machineprocessable information
14
The Semantic Web
 Machine-processable
semantic information
 Semantic context
published – making the
data more informative
to both humans and
machines
15
The Web of Data
target
target
gene
•
•
•
•
URI’s are universal ID’s
Distributed data references
Non-locality of data
NamedGraphs can help
segment external references
• New meaning for Annotation
pathway
16
Case Study: Omics
ApoA1 …
… is produced by the Liver
… is expressed less in Atherosclerotic Liver
… is correlated with DKK1
… is cited regarding Tangier’s disease
… has Tx Reg elements like HNFR1
Subject  Verb  Object
17
Example:
Knowledge
Aggregation
18
Courtesy of
BG-Medicine
Tim Berners-Lee’s App View
20
Semantic Web Drug DD Application Space
Therapeutics
Chem Lib
manufacturing
NDA
Production
Genomics
Clinical
Studies
HTS
eADM
E
Biology
Compound
Opt
DMPK
genes
21
Patent
informatics
W3C Launches Semantic Web for HealthCare
and Life Sciences Interest Group
•
Interest Group formally launched Nov 2005:
http://www.w3.org/2001/sw/hcls
•
First Domain Group for W3C - “…take SW through its paces”
•
An Open Scientific Forum for Discussing, Capturing, and Showcasing Best
Practices
•
Recent life science members: Pfizer, Merck, Partners HealthCare,
Teranode, Cerebra, NIST, U Manchester, Stanford U, AlzForum
•
SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA,
•
Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann
(Teranode)
22
HCLS Objectives
• Share use cases, applications,
demonstrations, experiences
• Exposing collections
• Developing vocabularies
• Building / extending (where appropriate) core
vocabularies for data integration
23
HCLS Activities
•
•
•
•
•
•
BioRDF - data as RDF
BioNLP - unstructured data
BioONT - ontology coordination
Clinical Trials - CDISC/HL7
Scientific Publishing - evidence management
Adaptive Healthcare Protocols
24
Semantic Web in R&D
Progression
Manager
Toxicogenomicist
Shared Annotations
Notified of Alternatives
Reporting on Progression
Notify Others of Decisions
A Single Compound
Scientist
Found Determinations
Noted Alternatives
Open Data Format and Flexible Linking Enabled
Data Integration and Collaboration
25
R&D Applications in the Semantic Web
Progression Manager
Project Dashboard
Toxicogenomicist
Experiment Manager
Scientist
R&D Commons
A Single Compound
26
Other Benefits of Semantic Web
• Enterprise Distributed Connectivity
– Universal Resource Identifiers (URI)
• Authenticity
– Auditability (Sarbanes-Oxley)
– Authorship Non-repudibility
• Privacy
– Encryptibility and Trust Networks
• Security
– At any level of granularity
27
What is the Semantic Web ?
It’s Semantic
Webs
It’s Text
Extraction
It’s AI
It’s
Web 2.0
It’s Data
Tracking
It’s a Global
Conspiracy
• http://www.w3.org/2006/Talks/0125-hclsig-em/
28
It’s
Ontologies
W3C Roadmap
• Semantic Web foundation specifications
– RDF, RDF Schema and OWL are W3C
Recommendations as of Feb 2004
• Standardization work is underway in Query,
Best Practices and Rules
• Goal of moving from a Web of Document to
a Web of Data
The Only Open and Web-based Data Integration Model
Game in Town
29
Leveraging with Semantic Web
Benefit #1
• Free Data from Applications…
– Data uniquely defined by URI’s, even across multiple
databases
– Mapped through a common graph semantic model
– Data can be distributed (not in one location)
– New relations and attributes dynamically added
• As easy as spreadsheets, but with semantics and web
locations
30
Leveraging with Semantic Web
Benefit #2
• All things on the Web can have semantics added to
them
–
–
–
–
–
Ability to define and link in ontologies
Documents Management through Links
Changed data and semantics can be managed as versions
Semantics can be used to define and apply policies
No Need for complex Middleware
31
Leveraging with Semantic Web
Benefit #3
• Supporting the Management of Knowledge
– All data nodes and doc resources can be linked
– Ability to represent Assertions and Hypotheses
• Include authorship and assumptions
• Use of KD45 logic
– Both Local and Global Knowledge
• Scientists can upload partially validated facts
– View Data and Interpretations through Points-of-View
(Semantic Lenses)
• Share views with others
32
The Technologies: RDF
• Resource Description Framework
• Think: "Relational Data Format"
• W3C standard for making statements of fact or
belief about data or concepts
• Descriptive statements are expressed as triples:
(Subject, Verb, Object)
– We call verb a “predicate” or a “property”
Subject
<Patient HB2122>
Property
<shows_sign>
33
Object
<Disease Pneumococcal_Meningitis>
What RDF Gets You
Universal, semantic
connectivity supports
the construction of
elaborate structures.
34
What does RDF get you?
• Structure is not format-rigid (i.e. tree)
– Semantics not implicit in Syntax
– No new parsers need to be defined for new data
• Entities can be anywhere on the web (URI)
• Define semantics into graph structures (ontologies)
– Use rules to test data consistency and extract important relations
• Data can be merged into complete graphs
• Multiple ontologies supported
35
RDF vs. XML example
Wang et al., Nature Biotechnology, Sept 2005
AGML
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
36
HUPML
RDF Stripe Mode
Node>Edge>Node
>Edge….
37
RDF Graph
38
gsk:KENPAL
rdf:type :Compound ;
dc:source
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&#38;db=pubmed&#38;dopt=Ab
stract&#38;list_uids=14698171 ;
chemID “3820” ;
clogP “2.4” ;
kA “e-8” ;
mw “327.17” ;
ic50 { rdf:type :IC50 ; value “23” ; units :nM ; forTarget gsk:GSK3beta } ;
chemStructure “C16H11BrN2O” ;
rdfs:label “kenpaullone” ;
synonym “bromo-paullone” ;
smiles “C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B” ;
inChI “1/C16H11BrN2O/c17-9-5-6-14-11(7-9)12-8-15(20)18-13-4-2-1-3-10(13)16(12)1914/h1-7,19H,8H2,(H,18,20)/f/h18H” ;
xref http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3820 .
40
Mapping from Current Formats
DB
41
Excel => RDF
ls:indivCell ${ rdf:type ls:GE_Cell;
ls:probeHub
gl:CASP2
;
ls:GE_Expected_Ratio
"0.2726"
;
ls:conditionHub
gl:BREAST_MALIGNANT
ls:indivCell ${ rdf:type ls:GE_Cell;
ls:probeHub
gl:TNFRS
;
ls:GE_Expected_Ratio
"0.0138"
;
ls:conditionHub gl:BREAST_MALIGNANT
};
ls:indivCell ${ rdf:type ls:GE_Cell;
ls:probeHub
gl:CASP2
;
ls:GE_Expected_Ratio
"0.1275"
;
ls:conditionHub gl:BREAST_NORMAL
};
42
Casp2
};
TNFRS
Breast
Malig
W3C Launches Semantic Web for HealthCare and Life Sciences
Interest Group
•
•
Interest Group formally launched Nov 2005:
http://www.w3.org/2001/sw/hcls
First Domain Group for W3C - “…take SW through its paces”
– Not a standards group, but a group to identify the best
implementations of current SW Standards!
•
•
An Open Scientific Forum for Discussing, Capturing, and
Showcasing Best Practices
Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare);
Eric Neumann (Teranode)
43
W3C Launches Semantic Web for HealthCare and Life Sciences
Interest Group
•
First formal meeting: Jan 25-26, 2006 Cambridge, MA
•
SW Supporting Vendors: Oracle, IBM, HP, Siemens, Agfa,
•
Recent life science members: Pfizer, Merck, Partners HealthCare,
Teranode, Cerebra, NIST, U Manchester, Stanford U, U Bolzano,
AlzForum,
•
Joining W3C gets you in as s group member
– Early access to technology and discussions
– Interaction with potential partners and clients
44
Multiple Ontologies Used Together
Disease
OMIM
UMLS
Group
FOAF
Disease
Polymorphisms
SNP
Drug target
ontology
UniProt
Protein
BioPAX
Person
PubChem
Patent
ontology
Extant ontologies
Chemical
entity
45
Under development
Bridge concept
Potential Linked Clinical Ontologies
Clinical Obs
Disease
Descriptions
SNOMED
Applications CDISC
ICD10
RCRIM
(HL7)
Clinical Trials Disease
Models
ontology
Mechanisms
Pathways
(BioPAX)
IRB
Tox
Extant ontologies
Genomics
Molecules
Under development
Bridge concept
46
Case Studies
47
Case Study: NeuroCommons.org
•
•
•
•
Public Data & Knowledge for CNS
R&D Forum
Available for industry and academia
All based on Semantic Web Standards
48
NeuroCommons
The Recontribution of Knowledge
Publications are usually copyrighted…
Knowledge of Nature should be openly shareable!
49
NeuroCommons.org
The Neurocommons project, a collaboration between Science Commons
and the Teranode Corporation, is creating a free, public Semantic Web
for neurological research. The project has three distinct goals:
1. To demonstrate that scientific impact and innovation is directly related to
the freedom to legally reuse and technically transform scientific
information.
2. To establish a legal and technical framework that increases the impact
of investment in neurological research in a public and clearly
measurable manner.
3. To develop an open community of neuroscientists, funders of
neurological research, technologists, physicians, and patients to extend
the Neurocommons work in an open, collaborative, distributed manner.
50
NeuroCommons First Steps
The first stage is underway:
• Using NLP and other automated technologies,
extract machine-readable representations of
neuroscience-related knowledge as contained
in free text and databases
• Assemble those representations into a graph
• Publish the graph with no intellectual property
rights or contractual restrictions on reuse
52
HCLS Neuro Tasks
• Aggregate facts and models around
Parkinson’s Disease
• SWAN: scientific annotations and evidence
• Use RDF and OWL to describe
–
–
–
–
–
Brain scans in the The Whole Brain Atlas
Neural entries in NCBI’s Entrez Gene Database
’Brain Connectivity'
N
euronal data in SenseLab
Neurological Disease entries in OMIM
53
Case Study: BioPAX (Pathways)
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep">
<bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/>
<bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b">
<bp:keft rdf:resource="#xDsh"/>
<bp:right rdf:resource="#xGSK-3beta"/>
<bp:participants rdf:resource="#xGSK-3beta"/>
<bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Dishevelled to GSK3beta</bp:name>
<bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction >
<bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
INHIBITION</bp: control-type >
<bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION >
</bp: step-interactions >
</bp: PATHWAYSTEP >
54
Case Study: BioPAX (Pathways)
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep">
Modulation
<bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/>
<bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b">
<bp:keft rdf:resource="#xDsh"/>
<bp:right rdf:resource="#xGSK-3beta"/>
<bp:participants rdf:resource="#xGSK-3beta"/>
<bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Dishevelled to GSK3beta</bp:name>
<bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction >
<bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
INHIBITION</bp: control-type >
<drug:affectedBy rdf:resource=”http://pharma.com/cmpd/CHIR99102"/>
<bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION >
</bp: step-interactions >
</bp: PATHWAYSTEP >
55
affectedBy
CHIR99102
Case Study: Drug Discovery
Dashboards
• Dashboards and Project Reports
• Next generation browsers for semantic
information via Semantic Lenses
• Renders OWL-RDF, XML, and HTML documents
• Lenses act as information aggregators and logic
style-sheets
add { ls:TheraTopic
hs:classView:TopicView
}
56
Drug Discovery Dashboard
http://www.w3.org/2005/04/swls/BioDash
Topic: GSK3beta Topic
Disease: DiabetesT2
Alt Dis: Alzheimers
Target: GSK3beta
Cmpd: SB44121
CE: DBP
Team: GSK3 Team
Person: John
Related Set
Path: WNT
58
Bridging Chemistry and Molecular Biology
Semantic Lenses: Different Views of the same
data
BioPax
Components
Target Model
urn:lsid:uniprot.org:uniprot:P49841
Apply Correspondence Rule:
if ?target.xref.lsid == ?bpx:prot.xref.lsid
then ?target.correspondsTo.?bpx:prot
59
Bridging Chemistry and Molecular Biology
•Lenses can aggregate, accentuate,
or even analyze new result sets
• Behind the lens, the data can be
persistently stored as RDF-OWL
• Correspondence does not need
to mean “same descriptive
object”, but may mean objects
with identical references
60
Case Study: Drug Safety
‘Safety Lenses’
• Lenses can ‘focus data in specific ways
– Hepatoxicity, genotoxicity, hERG, metabolites
• Can be “wrapped” around statistical tools
• Aggregate other papers and findings (knowledge) in context
with a particular project
• Align animal studies with clinical results
• Support special “Alert-channels” by regulators for each
different toxicity issue
• Integrate JIT information on newly published mechanisms of
actions
61
GeneLogic GeneExpress Data
• Additional relations
and aspects can be
defined additionally
Diseased
Tissue
Links to
OMIM (RDF)
62
ClinDash: Clinical Trials Browser
Subjects
•Values can be
normalized across all
measurables (rows)
Clinical Obs
•Samples can be
aligned to their
subjects using RDF
rules
Expression
Data
•Clustering can now be
done over all
measureables (rows)
63
Case Study: Nokia
• Developer’s Forum Portal
64
Case Study: TERANODE Design Suite
Supports Laboratory Data and Workflow
• Protocol Modeler
–
Accelerates workflow
development
–
Eliminates database
programming
• Protocol Player
65
–
Guides users through workflow
–
Automates data capture
–
Automates complex data flow
plates
–
Integrates lab data with project
and enterprise data
Conclusions:
Key Semantic Web Principles
•
•
•
•
•
•
•
•
Plan for change
Free data from the application that created it
Lower reliance on overly complex Middleware
The value in "as needed" data integration
Big wins come from many little ones
The power of links - network effect
Open-world, open solutions are cost effective
Importance of "Partial Understanding"
66
Efficiency and Innovation:
Semantic Web Applications Roadmap