DIA-CDM-3-18-07-f - W3C Public Mailing List Archives

Download Report

Transcript DIA-CDM-3-18-07-f - W3C Public Mailing List Archives

Tutorial: Semantic Web Applications
in Clinical Data Management
Eric Neumann
Clinical Semantic Group
W3C HCLS chair, MIT Fellow
1
Tutorial Overview
• Bench-to-Bedside Vision
• Information Challenges
• Semantic Web: What is it?
– RDF: Recombinant Data (Aggregation)
– OWL: Vocabularies (NCI, SNOMED)
– Rules
• Translational Medicine Needs
• Clinical Data Standards- CDISC
• Re-Using Clinical Knowledge
– Retrospective DBs: JANUS
– Open Knowledge Benefits: Tox Commons
2
Bench-to-Bedside
• Connecting pre-clinical and clinical studies
– Translational Medicine
• Patient Stratification & Personalized
medicine (not the same)
• Knowledge and Data Integration
– Better Disease Understanding
– Next Generation Therapies, New Applications
• More Predictive (earlier) Safety Signals
3
4
from Innovation or Stagnation, FDA Report March 2004
New Regulatory Issues
Confronting Pharmaceuticals
Tox/Efficacy
ADME Optim
from Innovation or Stagnation, FDA Report March 2004
5
Translational Medicine
• Enable physicians to more effectively translate
relevant findings and hypotheses into therapies for
human health
• Support the blending of huge volumes of clinical
research and phenotypic data with genomic research
data
• Apply that knowledge to patients and finally make
individualized, preventative medicine a reality for
diseases that have a genetic basis
6
Drug Discovery & Development
Knowledge
Qualified
Targets
Molecular
Mechanisms
Lead
Generation
Toxicity &
Safety
Lead
Optimization
Pharmacogenomics
Biomarkers
Clinical
Trials
Launch
7
Ecosystem: Goal State
Merging Biomed Research, Clinical Trials and Clinical Practice
Biomedical Research
Clinical Practice
8
HC
Choices
HCLS Ecosystem
Insurers
Grants
HMO,PPO
Gov/Funding
Biomed Research
Publications and
Public Databases
BKB
Risks &
Benefits
Disease
Areas
Mol Path Res
Clin Res
EHR
Chem
Manuf
HCP
VA
System
R&D
Large
Studies
Clin POC Surveilla
nce
BiomarkerT
ox
Preclin
Gov/Regulatory
Safety
Commons
Drug
Programs
Public
Clin
Safety
Marketing
CROs
JANUS
9
Information Challenges
• No common way to bring data and docs together
– HTML links carries no meaning with them
• Today’s integration approaches prevent data re-use
• No global way to annotate our experiments and
experiences
– Most annotations cannot be found by context
– No “sci-blog” for data interpretation
• Enterprise Information access and discoverability are weak
– Making timely discoveries!
– Why we all like Google
• Cutting and pasting between docs promotes fact mutation
and loss of provenance
– Address business operations and tracking, and reduce static data
10
copying
A web of information
Courtesy of
R. Stevens
11
Distributed Nature of
Biomedical Knowledge
Silos of Data…
HCS
Patents
Tox
Biomarkers
Libraries
Targets
Assays
Drug
Registry
Genotypes
Diseases
Clinical
Trials
12
The Big Picture In Drug R&D
Hard to understand from just a
few isolated Points of View 13
What if Scientists could put it
together for themselves?
14
15
Complete view tells a very different Story
Whose Schema?
Clinical Papers
Disease
Subjects Genotype
Enrollment
Criteria
Dosing
Observations
Audit
Trail
Tox Signals
Statistics
Ontology
Trials
16
Why Searching ala Google is
not enough
Google’s ability to rank and graph without
using semantics is comparable to…
… a Drug R&D Project that looks for
associations, but makes no attempt to find or
represent mechanisms of action
17
What is the Semantic Web?
The Layer
Cake
19
The Current Web
 What the computer sees:
“Dumb” links
 No semantics - <a href>
treated just like <bold>
 Minimal machineprocessable information
20
The Semantic Web
 Machine-processable
semantic information
 Semantic context
published – making the
data more informative
to both humans and
machines
21
Needed to realize the SW vision
•
•
•
•
A standard way of identifying things
A standard way of describing things
A standard way of linking things
Standard vocabularies for talking about
things
22
The Semantic Web
Basic Standards for Describing Things
 Richer structure for basic resources (XML)
 Describe Data by Semantics and Not Syntax: RDF
 Define Semantics using RDFS or OWL
 Reference and Relate All Resources using URIs
 SPARQL is super model of SQL
 Rules for higher level reasoning
23
The Technologies: RDF
• Resource Description Framework (RDF)
• W3C standard for making statements or
hypotheses about data and concepts
• Descriptive statements are expressed as triples:
(Subject, Verb, Object)
Subject
<Compound HB-2182>
Property
<binds_to>
Object
<Target P38_alpha>
24
Facts as triples
has_associated_disease
PARK1
subject
Parkinson disease
predicate
object
25
From triples to a graph
MAPT
Parkinson disease
Parkinson disease
MAPT
MAPT
Pick disease
PARK1
Parkinson disease
TBP
Parkinson disease
Pick disease
PARK1
Parkinson disease
Parkinson disease
TBP
TBP
Spinocerebellar ataxia
Spinocerebellar ataxia
has_associated_disease
MAPT
PARK1
TBP
Pick disease
Parkinson disease
Spinocerebellar ataxia
26
Connecting graphs
• Integrate graphs from multiple resources
• Query across resources
Neurodegenerative diseases
isa
Alzheimer disease
Parkinson disease
APP
Alzheimer disease
has_associated_disease
PARK1
Parkinson disease
27
The URI - global identification
URI serves as a universal
and uniform identifier for all
web based resources.
29
A Family of Identifiers
URI
URL
URN
URI = Uniform Resource Identifier
URI = Uniform Resource Identifier
URL = Uniform Resource Locator
LSID
URL = Uniform Resource Locator
URN = Uniform Resource Name
URN = Uniform Resource Name
LSID = Life Science Identifier
LSID = Life Science Identifier
http://www.w3.org/Addressing/
30
Uniform Resource Locator
• A type or resource identifier
• Identifies the location of a
resource (or part thereof)
• Specifies a protocol to access
the resource
– http, ftp, mailto
URI
URL
URN
• E.g.,
– http://www.nlm.nih.gov/
LSID
31
Uniform Resource Name
• A type or resource
identifier
• Identifies the name of a
resource
• Location independent
• Defines a namespace
• E.g.,
– urn:isbn:0-262-02591-4
– urn:umls:C0001403
URI
URL
URN
LSID
32
Life Science Identifier
• A type or resource
identifier
• A type of URN
• For biological entities
• Specific properties
URI
URL
– Versioned
– Resolvable
– Immutable
URN
LSID
• E.g.,
urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434
DNS name
namespace unique ID
http://lsid.sourceforge.net/
33
RDF Examples
…as RDF-XML
<cdisc:Subject http://clinic.com/study/T2271/subject/4183542663506>
<nci:sex_code rdf:resource=“nci#Female” />
<cdisc:treatment
rdf:resource=“http://clinic.com/study/T2271/subject/4183542663506/observation/O2241” />
<cdisc:vitalSigns
rdf:resource=“http://clinic.com/study/T2271/subject/4183542663506/observation/O6561” />
<cdisc:adverseEvent rdf:resource=“http://
clinic.com/study/T2271/subject/4183542663506/observation/O6622” />
</cdisc:Subject>
…as N3
<http://clinic.com/study/T2271/subject/4183542663506>
a cdisc:Subject ;
nci:sex_code nci:Female ;
cdisc:treatment <http://clinic.com/study/T2271/subject/4183542663506/observation/O2241> ;
cdisc:vitalSigns <http://clinic.com/study/T2271/subject/4183542663506/observation/O6561> ;
cdisc:adverseEvent <http://clinic.com/study/T2271/subject/4183542663506/observation/O6622> .
34
Semantic Data Integration:
Incremental Roadmap
• Data assets remain as they are!
They do not need to be modified
• The wrapper abstracts out details related to location,
access and data structure
• Integration happens at the information level
• Highly configurable and incremental process
• Ability to specify declarative rules and mappings for
further hypothesis generation
35
RDBM => RDF
<URI>
{primary keys}
<URI>
<hasDisease>
<URI>
{primary keys}
<interactsWith>
<URI>
<URI>
<URI>
<canCause>
Virtualized RDF
36
Semantic Data Integration
Bridging Clinical and Genomic Information
“Paternal”
“Mr. X”
1
90%
degree
type
name
Patient
(id = URI1)
has_structured_test_result
evidence1
Patient
(id = URI1)
related_to
has_family_history
Person
(id = URI2)
associated_relative
MolecularDiagnosticTestResult
(id = URI4)
identifies_mutation
indicates_disease
problem
FamilyHistory
(id = URI3)
“Sudden Death”
MYH7 missense Ser532Pro
(id = URI5)
EMR Data
LIMS Data
Dialated
Cardiomyopathy
(id = URI6)
evidence2
Rule/Semantics-based Integration:
- Match Nodes with same Ids
- Create new links: IF a patient’s structured test result indicates a disease
THEN add a “suffers from link” to that disease
95%
37
Semantic Data Integration:
Bridging Clinical and Genomic Information
90%
evidence
Dialated
Cardiomyopathy
(id = URI6)
suffers_from
“Paternal”
“Mr. X”
1
type
name
degree
indicates_disease
StructuredTestResult
(id = URI4)
has_structured_test_result
identifies_mutation
MYH7 missense Ser532Pro
(id = URI5)
Patient
(id = URI1)
related_to
has_family_history
has_gene
Person
(id = URI2)
associated_relative
problem
FamilyHistory
(id = URI3)
RDF Graphs provide a semantics-rich substrate for decision
support. Can be exploited by SWRL Rules
“Sudden Death”
38
Drug
Discovery
Dashboard
Semantic
Data Integration
and Visualization:
http://www.w3.org/2005/04/swls/BioDash
Drug Discovery
Topic: GSK3beta Topic
Disease: DiabetesT2
Alt Dis: Alzheimers
Target: GSK3beta
Cmpd: SB44121
CE: DBP
Team: GSK3 Team
Person: John
Related Set
Path: WNT
39
Semantic Data Integration:
Bridging Chemistry and Molecular Biology
Semantic Lenses: Different Views of the same
data
BioPax
Components
Target Model
urn:lsid:uniprot.org:uniprot:P49841
Apply Correspondence Rule:
if ?target.xref.lsid == ?bpx:prot.xref.lsid
then ?target.correspondsTo.?bpx:prot
40
Semantic Data Integration
Bridging Chemistry and Molecular Biology
•Lenses can aggregate, accentuate,
or even analyze new result sets
• Behind the lens, the data can be
persistently stored as RDF-OWL
• Correspondence does not need
to mean “same descriptive
object”, but may mean objects
with identical references
41
Semantic Data Integration
Pathway Polymorphisms
•Merge directly onto
pathway graph
•Identify targets with
lowest chance of genetic
variance
•Predict parts of pathways
with highest functional
variability
Non-synonymous
polymorphisms
from db-SNP
•Map genetic influence to
potential pathway elements
•Select mechanisms of
action that are minimally
impacted by polymorphisms
42
Scenario: Biomarker Qualification
• Semantics which Define…
•
Biomarker Roles
–
–
–
•
Disease
Toxicity
Efficacy
Molecular and cytological markers
– Tissue-specific
– High content screening derived information
– Different sets associated with different predictive tools
•
Statistical discrimination based on selected samples
– Predictive power
– Alternative cluster prediction algorithms
– Support qualifications from multiple studies (comparisons)
•
Causal mechanisms
– Pathways
– Population variation
43
Semantic Data Integration: Advantages
•
RDF: Graph based data model
– More expressive than the tree based XML Schema Model
•
RDF: Reification
– Same piece of information can be given different values of belief by different
clinical genomic researchers
•
Potential for “Schema-less” Data Integration
– Hypothesis driven approach to defining mapping rules
– Can define mapping rules on the fly
•
Incremental approach for Data Integration
– Ability to introduce new data sources into the mix incrementally at low cost
•
Use of Ontology to disallow meaningless mapping rules?
– For e.g., mapping a gene to a protein…
44
Semantic Data Integration
“Schema-free” data integration
• Low cost approach for data integration
• No need for maintenance of costly schema
mappings
• Ability to “merge” RDF graphs based on simple
declarative rules that specify:
– Equality of URIs
– Connecting nodes of same type
– Connecting two nodes associated by a “path”
• Disadvantage: Potential for specifying spurious
non-sensical rules
45
Semantic Data Integration
Use of Reification
• Level of accuracy of test result.
– Sensitivity and Specificity of lab result
– Level of confidence in genotyping or gene sequencing
• Probabilistic relationships
– Likelihood that a particular test result or condition is indicative of
a disease or other medical condition
• Level of trust in a resource
– Results from a lab may be trusted more than result from another
– Results from well known health sites (NLM) may be trusted more
than others
• Belief attribution
– Scientific hypotheses may be attributed to appropriate
researchers
46
The Available Data Space
Separate RDF
documents are merged
automatically into one
aggregate graph.
47
Recombination in
Molecular Genetics works
due to proper alignment of
genetic regions, thereby
preventing gene loss,
mangling, or duplication.
48
Recombinant Data
Graphs can be filtered
and pivoted, without
losing meaning
49
Recombinant Data
• Mash-ups that don’t lose perspective
• Dynamic mixing of data
• Provide Different Views for Different Roles
and Functions
– Dashboards
• Direct output of a SPARQL query
50
Key Functionality offered by
Semantic Web
• Ubiquity
– Same identifiers for anything from anywhere
• Discoverability
– Global search on any entity
• Interoperability
– => “Recombinant Data” is Application
Independence
51
Data Vision
• Aggregating data and statements using the Web
– Defined aggregation by need and role
– “Recombinant Data”
• Common system of referencing things (no copying)
– even is they exits in one of many databases
• Indexing things by types and with tags
– Common and ad hoc vocabularies
• Supporting the collective knowledge of an R&D
Community
– A Wiki that has awareness about types and things
– New Generation Discovery Tools
52
Ontologies and
Web Ontology Language (OWL)
53
OWL Introduction
• History: DAML + OIL = OWL
(2001)
– DAML – DARPA Agent Markup Language (1999)
– OIL – Ontology Inference Layer
(1997)
• Based on RDF(S)
• Added features, mostly related to identity
– Restrictions
• Three flavors of increasing expressiveness, but
decreasing tractability
– OWL Lite
– OWL DL (used for most applications)
– OWL Full
54
The Knowledge Semantics Continuum
Medication Lists
DDI Lists
Catalog
TAMBIS
KEGG
Thesauri:
BT/NT,
Parent/Child,
Informal Is-A DB Schema
Terms/
glossary
MeSH,
Gene Ontology,
UMLS Meta
Formal is-a
Frames
(Properties)
RDF(S)
Ontylog
Formal
instances
Value
Restriction
BioPAX
Disjointness,
Inverse
OWL
CYC
IEEE SUO
General
Logical
constraints
Snomed
EcoCyc
Simple
Terminologies
Expressive
Ontologies
55
Ontology Dimensions based on McGuinness and Finin
OWL DL Example
• Class: Benign intracranial meningioma
http://cancer.gov/cancerinfo/terminologyresources/
in the NCI Thesaurus
<owl:Class rdf:ID="Benign_Intracranial_Meningioma">
<rdfs:label>Benign Intracranial Meningioma</rdfs:label>
<code>C5133</code>
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Benign_Intracranial_Neoplasm"/>
<owl:Class rdf:about="#Benign_Meningioma"/>
<owl:Class rdf:about="#Intracranial_Meningioma"/>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
<Preferred_Name>Benign Intracranial Meningioma</Preferred_Name>
<Semantic_Type>Neoplastic Process</Semantic_Type>
<dSynonym>Benign Intracranial Meningioma</dSynonym>
[…]
<NCI_META_CUI>CL006955</NCI_META_CUI>
</owl:Class>
56
OWL Class Constructors
Borrowed from Tutorial on OWL by Bechhofer, Horrocks and Patel-Schneider
http://www.cs.man.ac.uk/~horrocks/ISWC2003/Tutorial/
57
OWL Axioms
• Axioms (mostly) reducible to inclusion (v)
– C ´ D iff both C v D and D v C
Borrowed from Tutorial on OWL by Bechhofer, Horrocks and Patel-Schneider58
http://www.cs.man.ac.uk/~horrocks/ISWC2003/Tutorial/
Existential vs. Universal Quantification
• Existential quantification
– owl:someValuesFrom
– Necessary condition
– E.g., migraine = headache & has_symptom throbbing pain [only
if one-sided]
• Universal quantification
– owl:allValuesFrom
– Necessary and sufficient condition
– E.g., heart disease = disease & located_to heart
59
OWL reasoners
• For OWL DL, not OWL Full
• Reasoners
– Fact++
– Pellet
– RacerPro
http://owl.man.ac.uk/factplusplus/
http://www.mindswap.org/2003/pellet/
http://www.racer-systems.com/
• Functions
– Consistency checking
– Automatic classification
60
OWL Reasoners Details
•
•
•
•
•
•
•
CEL
– Polynomial time classifier for the description logic EL+
– EL+ is specially geared towards biomedical ontologies
Cerebra
– Commerical C++ reasoner, Support for OWL-API
– Tableaux based reasoning for TBoxes and ABoxes
Fact++
– Free open source reasoner for DL reasoning
– Support for Lisp API and OWL API
KAON2
– Free Java based DL reasoner with support for SWRL fragment
– Support for DIG API
MSPASS
– A generalized theorem prover for numerous logics, also works for DLs
Pellet
– Free open source Java based reasoner for DLs
– Support for OWL, DIG APIs and Jena Interface
RacerPro
– Commercial lisp based reasoner for DLs
– Support for OWL APIs and DIG APIs
61
http://protege.stanford.edu/
Editing OWL ontologies
62
Resources available in OWL
• Many resources currently available in
OWL
– Gene Ontology
– NCI Thesaurus
http://www.geneontology.org/
http://cancer.gov/cancerinfo/terminologyresources/
• Many projects using OWL
– e.g., BioPax
http://www.biopax.org/
• NCBO - Mark Musen, Director
63
Domain Semantics in Clinical Trials
Clinical Semantics
• Patient/Subject  Disease/Health state
• Diagnostics  Findings
• Findings  Inferred (proposed) Disease state
• Disease state  Patient Classification /
Segmentation
• Design  Trial arms / treatments
• Observation  POC, safety, mechanisms
65
Linking Clinical Ontologies with
the Semantic Web
Clinical Obs
Disease
Descriptions
SNOMED
Applications CDISC
ICD10
RCRIM
(HL7)
Clinical Trials Disease
Models
ontology
Mechanisms
Pathways
(BioPAX)
IRB
Tox
Extant ontologies
Genomics
Molecules
Under development
Bridge concept
66
Ontology Referencing
<rdf:RDF
xmlns owl=“http://www.w3.org/owl#”
xmlns snomed=“http:// snomed.org”
xmlns cdisc=“http://cdisc.org/cdisc#”
xmlns icd10=“http://www.ich.org/icd10#”
xmlns nci=“http://www.nci.nih.gov/thesaurus#”
xmlns rcrim=“http://www.hl7.org/rcrim#”
xmlns biopax=“http://biopax.org/biopax#”
xmlns biopax=“http://biopax.org/biopax#”
…
<snomed:disease snomed:DiabetesType2>
<biopax:involves> <nci#InsulinSignalingPathway>
67
Rules and Policies
68
Imagine this CDS Rule:
If Renal Disease and DM and no contraindication,
should be on ACE inhibitor or ARB
• Renal disease =
– Chronic Renal Failure
• Nephropathy, chronic renal failure, end-stage renal disease, renal
insufficiency, hemodialysis, peritoneal dialysis on Problem List
(SNOMED)
• Creatinine > 2
• Calculated GFR < 50
– Malb/creat ratio test > 30
• Diabetes
• Many variants on the problem list
• On Insulin or oral hypoglycemic drug
• Contraindication to ACE inhibitor
• Allergy, Cough on ACE on adverse reaction list, or Hyperkalemia on
problem list, Pregnant (20 sub rules to define this state)
• K test result > 5
69
Translational Medicine
70
Translational Medicine in Drug R&D
Early
Middle
Late
Cellular
Systems
Human
In Vitro Studies
Animal Studies
Clinical Studies
Disease Models (Therapeutic Relevance)
Toxicities
Target/System Efficacy
$
$$
$$$
71
Case Study: Drug Safety
‘Safety Lenses’
• Lenses can ‘focus data in specific ways
– Hepatoxicity, genotoxicity, hERG, metabolites
• Can be “wrapped” around statistical tools
• Aggregate other papers and findings (knowledge) in context with
a particular project
• Align animal studies with clinical results
• Support special “Alert-channels” by regulators for each
different toxicity issue
• Integrate JIT information on newly published mechanisms of
actions
72
ClinDash: Clinical Trials Browser
Subjects
•Values can be
normalized across all
measurables (rows)
Clinical Obs
•Samples can be
aligned to their
subjects using RDF
rules
Expression
Data
•Clustering can now be
done over all
measureables (rows)
73
GeneLogic GeneExpress Data
• Additional relations
and aspects can be
defined additionally
Diseased
Tissue
Links to
OMIM (RDF)
74
EDC and EHR
• Should they be merged?
– Differences in goals and implementations
• Reduce data redundancy
• The Semantic Web solution
– Use EHR RDF to generate part of EDC frame
– Use same URI’s for patient, clinic entities
76
AE Channels
<item rdf:about=" http://www.cdc.gov/MMWR/48e905bdb66310af85ad2e8503628e01 ">
<title>Female service members reported higher rates of reactions to the previous dose of vaccine during anthrax vaccination of all
U.S. military personnel.</title>
<link>http://www.cdc.gov/MMWR/48e905bdb66310af85ad2e8503628e01</link>
<description>Posted by alan_zimmers to health.mil/adverse_events&#x26;Processes on Thu Jan 19 2006</description>
<dc:creator> alan_zimmers </dc:creator>
<dc:date>2006-01-19T11:24:03Z</dc:date>
<rdf:type>AdverseEvent</dc:subject>
<dc:subject>Anthrax Vaccination&#x26;Treatment</dc:subject>
<nih:uri>
<dc:title> Female service members reported higher rates of reactions to the previous dose of vaccine during anthrax
vaccination of all U.S. military personnel.</dc:title>
<dc:creator>A Sainz-Perez</dc:creator>
<dc:creator>H Gary-Gouy</dc:creator>
<dc:identifier>
< nih:PubMedID>
< nih:idValue>16408101</connotea:idValue>
<rdf:value>PMID: 16408101</rdf:value>
</ nih:PubMedID>
</dc:identifier>
<dc:date>2006-01-12</dc:date>
<prism:publicationName>Leukemia</prism:publicationName>
<prism:issn>0887-6924</prism:issn>
</ nih:uri>
</item>
77
AE Channels
<item rdf:about=" http://www.cdc.gov/MMWR/48e905bdb66310af85ad2e8503628e01 ">
<title>Female service members reported higher rates of reactions to the previous dose of vaccine during anthrax vaccination of all U.S. military
personnel.</title>
<link>http://www.cdc.gov/MMWR/48e905bdb66310af85ad2e8503628e01</link>
<description>Posted by alan_zimmers to health.mil/adverse_events&#x26;Processes on Thu Jan 19 2006</description>
<dc:creator> alan_zimmers </dc:creator>
<dc:date>2006-01-19T11:24:03Z</dc:date>
<rdf:type>AdverseEvent</dc:subject>
<dc:subject>Anthrax Vaccination&#x26;Treatment</dc:subject>
<kn:nugget rdf:resource=“#N251”>
<tn:expert>Alan R </tn:expert>
Anthrax Vacc
<tn:topic>ns#AnthraxTreatment</tn:topic>
<tn:kChannel>ns#HomelandSeccurity</tn:kChannel >
nugget
<tn:comment>This research suggests a lower limit of adverse responses</tn:comment >
</kn:nugget >
N251
<nih:uri>
<dc:title> Female service members reported higher rates of reactions to the previous dose of vaccine during anthrax vaccination ofexpert
all U.S.
military personnel.</dc:title>
<dc:creator>A Sainz-Perez</dc:creator>
Alan R
<dc:creator>H Gary-Gouy</dc:creator>
topic
<dc:identifier>
< nih:PubMedID>
< nih:idValue>16408101</connotea:idValue>
ns#AnthraxTreatment
<rdf:value>PMID: 16408101</rdf:value>
kChannel
</ nih:PubMedID>
</dc:identifier>
<dc:date>2006-01-12</dc:date>
ns#HomelandSecurity
<prism:publicationName>Leukemia</prism:publicationName>
<prism:issn>0887-6924</prism:issn>
</ nih:uri>
</item>
78
Surveillance using RSS/RDF
(CDC)
81
Clinical Data Standards
(CDISC)
82
“Protocol” and the Semiotic Triangle
Doug Fridsma (U Pittsburg)
Concept 2
Concept 1
“We need to sign off on
the protocol by Friday”
“Protocol XYZ has enrolled
73 patients”
Symbol
Thing 1
Thing 2
Study
“Protocol”
Document
Concept 3
“Per the protocol, you must be
at least 18 to be enrolled”
Source: John Speakman/Charlie Mead
Thing 3
Plan
83
CDISC and the Semantic Web?
• Reduce the need to
write data parsers to
any CDISC XML
Schema
• Make use of ontologies
and terminologies
directly using RDF
• Easier inclusion of
Genomic data
• Use Semantic Lenses
for Reviewers
• Easier acceptance by
industry with their
current technologies
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
84
During 2006-2007
Relationship HL7/CDISC
SDTM variables as
Common Data
Elements
&
Controlled
Terminologies
In OWL format
CDISC
Clinical Data
Interchange Standards
Consortium
RCRIM
Regulated Clinical Research
and
Information Management,
technical committee
HL7
“Health Level
Seven”
NCI Thesaurus
UMLS
BRIDG
Biomedical Research
Integrated Domain
Group Model
85
Relationship HL7/CDISC
Ongoing work at FDA
CDISC
Clinical Data
Interchange Standards
Consortium
Announcement of CDISC/SDTM
as a standard format
RCRIM
Regulated Clinical Research
and
Information Management,
technical committee
Janus Model and
HL7
“Health Level
Seven”
"The FDA has the largest pool of
Data Warehouse
randomized clinical trial data in “… populate a cross-study
the world, but it cannot be
database and do more
analyzed now because it is
comprehensive analyses for
inaccessible"
the benefit of patients.”
Dr. Janet Woodcock, Deputy Commissioner for
Operations and Chief Operating Officer, FDA
27 January 2006
86
Retrospective DBs: JANUS
• Accessing and analyzing CT data without
changing any schema
• Interpretive Annotations without impacting
CDB
• Analyses and Insights by Future Projects
• Collective Knowledge on Targets,
Diseases, and Toxicities
87
Tox Commons
• Proposed Open Re-use of Failed
Compounds
• Common Effort by Pharmaceuticals
• Puzzle Analogy
• No real IP in Failed Clinical Data
• Part of Science Commons Initiative
• Is a Drug Safety Commons Possible?, Bio-ITWorld
88
FDA’s JANUS Full Model
one visual representation
89
FDA’s JANUS basic elements diagram
another visual representation
General classes of
Clinical observations
90
SDTM ala RDF
<http://clinic.com/study/T2271/subject/4183542663506>
a cdisc:Subject ;
nci:sex_code
nci:Female ;
cdisc:treatment
<http://clinic.com/study/T2271/subject/4183542663506/observation/O2241> ;
cdisc:vitalSigns
<http://clinic.com/study/T2271/subject/4183542663506/observation/O6561> ;
cdisc:adverseEvent
<http://clinic.com/study/T2271/subject/4183542663506/observation/O6622> ;
// ROUTE
DRGGROUP
DOSE
pid
treatment
tpfday tptday
// IV
B
7 MG
4183542663506
7mg then 14mg SEMWEB 6/11/84 7/11/84
<http://clinic.com/study/T2271/subject/S83221/observation/O2241 >
a cdisc:Treatment ;
// cdisc:Treatment is a subclass of cdisc:Observation
cdisc:design_arm <http://clinic.com/study/T2271/treated_B/double_dose> ;
cdisc:route cdisc:IV_route ;
cdisc:drug_group "B”;
cdisc:dose "7" ;
cdisc:dose_units nist:mg ;
cdisc:treatment "7mg then 14mg SEMWEB" ;
cdisc:first_date "6/11/84" ;
cdisc:term_date "7/11/84" .
91
SDTM ala RDF
<http://clinic.com/study/T2271/subject/S83221/observation/O2241 >
a cdisc:Biomarker_Measure ; // a subclass of cdisc:Observation
cdisc:biomarker_proc <http://clinic.com/study/T2271/treated_B/biomarker_sample> ;
cdisc:mol_analyses nih:gene_expression ;
cdisc:biomarker_set <http://nci.nih.gov/biomarkers/colon_cancer/B324> ;
cdisc:biomarker_values [2.343, 1.211, 0531, 23.34, 83.12, 4.323, 9.543] ;
cdisc:units nist:norm_ratio ;
cdisc:date "6/11/84" ;
92
HCLS Drug Safety and Efficacy Focus Areas
• Translational Science Perspective
– Subject State Thinking (biomarkers)
• Safety dimensions
• Efficacy (disease models)
– Animal  Human (CDISC’s SEND, SDTM/ODM)
• Clinical Observations and their relation to biomarkers (+
mechanisms) and pharmacogenomics
• Connecting back to Discovery
–
–
–
–
–
Targets
Biomarkers
Therapeutic Knowledge
Leads, Candidates selection
Mechanisms of Action
• BioPAX…
93
Proposed Notes and Activities
http://www.w3.org/2001/sw/hcls/
• Notes planned
– SDTM and JANUS from a SW perspective
• Semantic enriched evolvable recombinant clinical observations
• DEMO: Table and XML models ala RDF
– Retrospective DBs (JANUS) and SW + power of annotations and links
• DEMO: using URI code and RDBM
– Provenance and trust (non-reputability)
• ACL?
94
Reasons for SW
• Exponential Growth and Distribution of Medical Knowledge and
Complex Data - needs to scale with the Web!
• Reduce innovation adoption curve from discovery into accepted
standards of practice (currently 17 years)
• Reduce the cost/duration/risk of clinical trial management
–
–
–
–
Patient identification and recruitment
Trial Design (Learn/Confirm, adaptive trials)
Improved data quality and clinical outcomes measurement
Post-market surveillance (knowledge channels)
• Reduce preventable, anticipatable adverse events (5-10%)
• The market is balking at healthcare inflation, new technologies and
therapeutics will find increasing resistance for reimbursement
– SW could prove many time less expensive than traditional IT solutions
– Less code creation and maintenance if Rules are SW based
• SW content is more manageable to achieve business goals
95
Key Semantic Web Principles
•
•
•
•
•
•
•
•
Plan for change
Free data from the application that created it
Lower reliance on overly complex Middleware
The value in "as needed" data integration
Big wins come from many little ones
The power of links - network effect
Open-world, open solutions are cost effective
Importance of "Partial Understanding"
96
Thank You
More info at
http://www.w3.org/2001/sw/hcls/