The CardioVascular Research Grid (CVRG):A

Download Report

Transcript The CardioVascular Research Grid (CVRG):A

The CardioVascular Research Grid (CVRG):
A National Infrastructure for
Representing, Sharing, Analyzing,
and Modeling Cardiovascular Data
Stephen J. Granite, MS
Director of Database/Software Development
The Johns Hopkins University Center for Cardiovascular
Bioinformatics and Modeling
Why Is There A Need For The CVRG?
The challenge of how best to represent CV data
Emerging data representation standards are seldom used
No standards for representing and no
culture of sharing electrophysiological data
The challenge of sharing data
National initiatives in CV genetics, genomics and proteomics
are underway but there is no direct, easy way to discover data
Facilitate data discovery
The challenge of how best to develop
and deploy “hardened” data analysis workflows
The challenge of discovering new
knowledge from the CV data itself
Grids And E-Science
Grids1
Interconnected networks of computers and storage systems
Running common software
Enabling resource sharing and problem
solving in multi-institutional environments
E-Science
Computationally intensive science carried out on grids
Science with immense data sets that require grid computing
Two major “bio-grids” are active today
The Biomedical Informatics Research Network
(BIRN; http://www.nbirn.net/)
The Cancer Biomedical Informatics Grid
(caBIG; http://cabig.nci.nih.gov/)
1I.
Foster and C. Kesselman (2004). The Grid: Blueprint for a New Computing Infrastructure. Elsevier.
The Biomedical Informatics Research Network (BIRN)
Principal Investigator: Mark Ellisman UCSD
Grid infrastructure for sharing,
analyzing and visualizing brain
imaging data sets
32 participating research sites,
> 400 investigators
4 “driving biological projects”
BIRN is a “bottom up”
effort (scientific applications
drive technology)
BWH Image
Segmentation
UCLA Image
Acquisition
End User Shape
Visualization
JHU Shape Analysis
The Cancer Biomedical Informatics Grid (caBIG)
caGrid Lead Development Team: Joel Saltz OSU
NCI intramural research effort,
with selected external collaborators
Develop open-source software
Enables cancer researchers to become a caBIG node
Share data with the cancer research community
Develop controlled vocabularies for describing
cancer phenotypes and multi-scale data
Develop grid analytic services
for analyzing cancer data sets
The CVRG Driving Biological Project (DBP)
The D. W. Reynolds Cardiovascular Clinical Research Center (PI - E. Marban)
Center studies the cause and treatment of Sudden
Cardiac Death (SCD) in the setting of heart failure (HF)
HF is the primary U.S. hospital discharge diagnosis
Incidence of ~ 400,000 per year, prevalence of ~ 4.5 million
Prevalence increasing as population ages
Leading cause of SCD (30-50% of deaths are sudden)
Medical expenditures ~ $20 billion per year
Manifestation of HF occurs at multiple biological levels
Genetic Predisposition via Single/Multi-Gene Mutations
Modified Gene/Protein Expression
Electrophysiological Remodeling and Altered Cellular Function
Heart Shape and Motion Changes
Reduced cardiac output, mechanical pump failure
The CVRG Driving Biological Project (DBP)
The D. W. Reynolds Cardiovascular Clinical Research Center (PI - E. Marban)
Multi-Scale Data
Genetic Variability
(SNPs)
Protein Expression
Profiling
Gene Expression
Profiling
Electrophysiological
Data
Multi-Modal
Imaging
Large patient cohort (~ 1,200)
at high risk for SCD
All have received ICD
placement to prevent SCD
Collecting multi-scale
data for all these patients
Patients with ICD firings are
defined as high risk for SCD;
patients without as low risk
Within the 1st year, only 5% of the
ICDs implanted have actually fired
Challenge – discover multi-scale
biomarkers that are predictive of
which patients should receive ICDs
The CVRG Project
R24 NHLBI Resource, start date 3/1/07
3 development teams
Winslow, Geman, Miller, Naiman, Ratnanather, Younes (JHU)
Saltz, Kurc (OSU)
Ellisman, Grethe (UCSD)
Aims
Develop tools for representing,
managing and sharing multi-scale data
SNP, genomic and proteomic data (Project 1)
Electrophysiological data (Projects 1 & 2)
Heart Shape and Motion (Cardiac Computational Anatomy)
data (Projects 1, 3 & 4)
Use multi-scale data to discover biomarkers
that predict need for ICD placement (Project 5)
Project 1: The CVRG Core Infrastructure
Develop and deploy CVRG-Core middleware
Reuse components and assure interoperability with BIRN and caBIG
Open-source software stack that instantiates a CVRG node
The CVRG-Core (Projects 1-5)
BioMANAGE
(Project 1)
BioINTEGRATE
(Project 1)
BioPORTAL
(Project 1)
Data Services
SNP
(Project 1)
Gene
Expression
Protein
Expression
(Project 1)
(Project 1)
EP Data
(Project 2)
CVRG Data Services
Imaging
(Project 1)
Patient/
Study
(Project 1)
Multiple
Analytical
Methods
(Projects 2,3,4&5)
CVRG Analytic Services
Project 2: Electrophysiological (EP)
Data Management And Dissemination
Goal
Adopt/develop data models to represent cardiovascular EP data
Create databases for managing and sharing these data
EP data
ECG
ONTOLOGIES
XML
DATABASES
ECG & EP
Data Analysis
Portal
Project 3: Mathematical Characterization Of
Cardiac Ventricular Anatomic Shape And Motion
Goal
Develop methods for statistically characterizing
variability of heart shape and motion in health and disease
Use these methods to discover shape and motion biomarkers for CV disease
Methods
Measure heart shape and motion over time in the Reynolds population using
multi-modal imaging (MR, multi-detector CT and Gd+ contrast-enhanced MR)
Model variation of heart shape/motion in both the low/high risk Reynolds patients
Discover shape and motion parameters
that predict who should receive ICD placement
Cardiac Computational
Anatomy And Shape Analysis
Large Deformation Diffeomorphic Metric Mapping2
Targets
(Normal Training Set)
2Beg
et al (2004). Mag. Res. Med. 52: 1167
?
Template
(smoothed)
Targets
(Diseased Training Set)
Cardiac Computational
Anatomy And Shape Analysis
Large Deformation Diffeomorphic Metric Mapping2
Targets
(Normal Training Set)
2Beg
et al (2004). Mag. Res. Med. 52: 1167
Diseased
Heart
Template
(smoothed)
Targets
(Diseased Training Set)
Project 4: Grid-Tools for Cardiac Computational Anatomy
4
Landmarking, Affine &
LDDMM Shape Analysis
5 Statistical Analysis
Supercomputing
TeraGrid
6 Visualization
3 Segmentation
1
2
Deidentification
and upload
BioMANAGE
Project 5: Statistical Learning
With Multi-Scale Cardiovascular Data
Goal – predict risk of SCD and identify patients to receive ICDs
Develop learning methods that work in the “small sample regime”
Patient A
Patient A
SCD HIGH RISK
Algorithms3-6
Patient B
Patient B
SCD LOW RISK
Deploy these multi-scale biomarker discovery tools on the CVRG
3Geman et al (2004). Stat. Appl. Genet. Mol. Biol. 3(1): Article 19.
Portal
4Xu
et al (2005). Bioinformatics, 21(20): 3905-3911
5Anderson et al (2007). Proteomics 7(8): 1197
6Price et al (2007). PNAS 104(9): 3414
Project 6: Resource Management
Establish CVRG Working Groups to create a mechanism
for community input on design and function of CVRGCore and the CVRG
CV ontologies/data models Testbed Projects
(HLB-STAT)
New Technologies
Data Sharing/IRB
Undertake outreach efforts to inform, train, and
support researchers in use of CVRG tools and resources
Acknowledgements
The CVRG Development Team
Johns Hopkins
University
Siamak Ardekani
Donald Geman
Stephen Granite
Joe Henessy
David Hopkins
Anthony Kolasny
Aaron Lucas
Michael Miller
Daniel Naiman
Tilak Ratnanather
Kyle Reynolds
Aik Tan
Rai Winslow
Gem Yang
Ohio State
University
Shannon Hastings
Tahsin Kurc
Stephen Langella
Scott Oster
Tony Pan
Justin Permar
Joel Saltz
NHLBI (R24 HL085343)
UCSD
Mark Ellisman
Jeff Grethe
Ramil Manasala
Microsoft Research
Faculty Summit 2007