Design Templates: Translational Biomedical Research and

Download Report

Transcript Design Templates: Translational Biomedical Research and

caGrid, Fog and Clouds
Joel Saltz MD, PhD
Director Center for
Comprehensive Informatics
Overview
•
•
•
•
Introduction
Tools
Use Case
Fog and Clouds
Fog Computing
Grid interoperable enterprise virtualization
caGrid, Enterprise and Clouds
Biomedical Middleware: caGrid, TRIAD,
i2b2
caGrid Components
– Security (GAARDS)
– Language (metadata,
ontologies)
– Semantic/Federated
query
– Workflow
– Grid Service Graphical
Development Toolkit
(Introduce)
– DICOM, IHE compatibility
– Advertisement and
Discovery
Vocabulary/Ontology
Concept Code
Relationships
Preferred Name
Definition
Synonyms
Interoperability
– Registered metadata
– Ontology concept
codes used to
annotate models
– XML schemas that
define data structures
also registered
– Thus both data
semantics AND data
structures are
registered. That is
how we achieve
(relative)
interoperability.
Use Case: Will Treatment work and if not, why
not?
Avastin and Glioblastoma in
RTOG-0825
Treatment: Radiation therapy
and Avastin (anti angiogenesis)
Predict and Explain: Genetic,
gene expression, microRNA,
Pathology, Imaging
RT, imaging, Pathology
markup/annotations/query
Active Data workflows
Avastin and GBMs in RTOG-0825
Analysis on pre-treatment tissue to extract imaging and molecular
biomarkers that are indicative of Outcome/Avastin response.
whole genome mRNA and microRNA expression profiling of GBM
tumor specimens to identify outcome/Avastin response biomarkers
Analyzing the Pathology imaging and diagnostic imaging registered
with the therapy plan to extract any biomarkers that can indicate
Avastin response.
Does advanced imaging (eg: diffusion weighted imaging) provide
markers that can predict patient response?
RT, Diagnostic Imaging and Pathology: Support Human/Algorithm
analyses, annotation, markup
Data management and display framework that integrates the pathology
with the radiology, therapy treatment information and the clinical
data. This involves integrating platforms that manage imaging data
at ACRIN, pathology at UCSF and molecular data from Emory.
For the sake of quality control, reproducibility and data
sharing, results of RT, imaging, Pathology observations
and analyses need to be described in a well defined
manner
Finding: mass
Mass ID: 1
Margins: spiculated
Length: 2.3cm
Width: 1.2cm
Cavitary: Y
Calcified: N
Spatial relationships: Abuts
pleural surface; invades aorta
Distinguish (and maybe redefine) astrocytic, oligodendroglial
and oligoastrocytic tumors using TCGA and Rembrandt
Important since treatment and Outcome differ
• Link nuclear shape, texture to biological and clinical
behavior
• How is nuclear shape, texture related to gene
expression category defined by clustering analysis of
Rembrandt data sets?
• Relate nuclear morphometry and gene expression to
neuroimaging features (Vasari feature set)
• Genetic and gene expression correlates of high
resolution nuclear morphometry and relation to MR
features using Rembrandt and TCGA datasets.
Annotation and Markup of Pathology Data
needs Human/Algorithm Cooperation
Astrocytoma vs Oligodendroglima
• TCGA finds genetic, gene expression
overlap
• Pathologists have also long seen
overlap
• Relationship between Pathology,
Molecular, Radiology
• Relationship to Outcome, treatment
response
Example: Compute Intensive Workflow
Fog Computing
Attributes of Fog Computing
• For legal and organizational reasons, data location is
constrained
• Virtual machines used to create software stacks that use
caGrid + other middleware to expose data services (we
regularly ship VMs with caGrid based software stacks)
• Enterprises such as Emory increasingly rely on
virtualization architectures
• Mid-range active storage platforms (eg Emory 1PB
archival, 100TB fast disk, 2K cores, infiniband
interconnect) will rely heavily on virtualization
Relationship between Fog and Cloud
Computing
• Workflows have enterprise, caGrid and compute
intensive components
• Interoperability between enterprise and caGrid
software stacks major current NCI/ONR issue
• Given virtualization, HPC and large scale data
requirements can be tackled with cloud
approach
Issues
• Organizational, legal requirements create
constraints on placement of datasets, where
VMs can be run
• Mapping of VMs, datasets:
– Huge variation in communication, I/O bandwidth,
compute capabilities in Fog/Cloud continua
– Multiple software stacks, heterogeneous hardware
architectures
• Security: Fog to Cloud: multiple distinct but
overlapping security related requirements and
constraints
Thank you