Transcript MammoGrid

MammoGrid
European federated
mammogram database
implemented on a GRID
infrastructure
presented by Salvator Roberto Amendolia / CERN
on behalf of the MammoGrid Consortium
HealthGrid Forum
Brussels, 20th September, 2002
MammoGrid Consortium
• CERN (Technical Coordinator)
•
•
•
•
•
•
•
– Vitamib (France) - subcontractor Finance/Admin
Mirada Solutions (UK) – Medical Image Analysis S/W
Univ of Oxford (UK) – Medical Vision Laboratory
Univ of Pisa (I) – Medical Physics section
Univ oF Sassari (I) – Maths & Physics Dept
Univ West of England (UK) – Computing Research
Univ of Cambridge (UK) – Addenbrookes Hospital
Univ Hospital of Udine (I) – Inst of Diagnostic Imaging
– Ospedale Valdese Torino (I) - Breast Screening Unit - subcontr.
– Zybert Computing Ltd. (UK) - Subcontractor for GRIDserver
Project kick-off meeting CERN 18th-19th September 2002
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
2
MammoGrid Objectives
1. To evaluate current Grids technologies and determine
the requirements for Grid-compliance in a pan-European
mammography database.
2. To implement the MammoGrid database, using novel Gridcompliant and Federated-Database technologies that will provide
improved access to distributed data and will allow rapid
deployment of software packages to operate on locally stored
information.
3. To deploy enhanced versions of a standardization system that
enables comparison of mammograms in terms of intrinsic tissue
properties independently of scanner settings, and to explore its
place in the context of medical image formats (DICOM).
4. To develop software tools to automatically extract image
information that can be used to perform quality controls on the
acquisition process of participating centres (e.g. average
brightness, contrast).
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
3
MammoGrid Objectives (cont.)
5. To develop software tools to automatically extract tissue
information that can be used to perform clinical studies (e.g.
breast density, presence, number and location of microcalcifications) in order to increase the performance of breast
cancer screening programs.
6. To use the annotated information and the images in the
database to benchmark the performance of the software
described in points 3, 4 and 5.
7. To exploit the MammoGrid database and the algorithms
to propose initial pan-European quality controls on
mammographic acquisition and ultimately to provide a
benchmarking system to third party algorithms.
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
4
Mammography Diagnosis and
GRID.
GRID opens up the possibility of addressing important
clinical needs in radiology:
- Data sharing among clinicians for second review
diagnosis.
- Image Based epidemiological studies
- Computer Aided Quality Control in Acquisition and
Diagnosis
- Validation of Computer Aided Diagnosis Systems
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
5
A GRID INFRASTRUCTURE IS IDEAL
The databases to statistically validate image based clinical hypothesis are:

Populated by large number of cases

Contain large files (1 mammogram 10Mb+)

Geographically distributed repositories

Heterogenous database formats

Need to be accessible to co-workers
Development and validation of medical image analysis solutions demands:
 Computationally expensive simulations.
 Repeated runs for optimal parameter tuning.
 Statistical test rigs.
 Remote execution and maintenance
Services (e.g. security) must be system-resident, invisible, generic
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
6
Mammo GRID Philosophy
•
•
•
•
•
•
•
Project concentrates on applying emerging GRID technology rather on
developing it.
It plans to implement a ‘lightweight’ (but fully functional) GRID and study
its usage in hospitals
It will draw heavily on other Grids projects e.g. DataGrid
It will deliver a prototype federated database of mammograms in
hospitals in the UK and Italy
It will investigate :
– the role of Grids-based meta-data for resolving queries
– the use of standardised mammogram to resolve image and population
variability
– Health data security using a novel ‘Grid box’
– the infrastructure needed for CADe
It will provide rapid feedback from the Hospital community
And should inform the next generation of HealthGrids developments
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
7
HEP vs Mammogrid
• Similarities
–
–
–
–
Large number of big files
Files can be sensibly organized in directory tree
Need to replicate and move file copies between sites
Need to execute commands on the node which hosts data
locally
• Difficulties
– Complexity of co-working in medical environment
– Lack of trained IT personnel
– Confidentiality
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
8
Problem
• Typical next generation HEP experiment
– Large scale simulation & reconstruction effort
– Heavily distributed processing and event storage
• ~1000 scientists in ~100 of institutions
– Complex analyses of distributed data
– Large files (one event up to 2GB)
10^9 files/year (x n, n>2)
2 PB/year
• Experiment lifetime
– 20-25 years
• GRID
– Widely accepted as a solution
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
9
The challenge in HEP
Can we provide, building on top of available
public domain and open source components and
standards, a functional distributed computing
infrastructure to the community of our users
which will remain operational even if
underlying technologies keep changing?
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
10
One HEP solution
•
•
•
•
•
AliEn framework
– Lightweight, simplified but fully functional GRID
implementation
– Distributed file catalogue with support for replication
– Strong (certificate) based authentication
– Resource broker
– Possibility to submit and execute commands in the system
It makes extensive use of Open Source components and the latest
internet standards (SOAP, Web services, OpenSSL, OpenLDAP,
Globus, MySQL, perl, CPAN)
AliEn provides coherent interface and shields user from rapid
changes in underlying technology
On mid to long term, ALICE experiment remains committed to
integrate AliEn with DataGRID solutions as they become available
Given the worldwide nature of ALICE computing, AliEn will be
interfaced to other GRID solutions (U.S., Asia, Japan..)
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
11
AliEn and Open Source
1%
AliEn
Open Source
Modules
99%
Benefits of development based on OpenSource
components are more than obvious…
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
12
Some AliEn features
Authentication module which supports various authentication
methods (including Globus/GSI)
Distributed file catalogue built on top of RDBMS with user
interface that mimics the file system
Secure file transport and replication Service
Task queue which holds commands to be executed in the system
(commands, inputs and outputs are all registered in catalogue)
Computing and Storage elements
Metadata catalogue
Monitoring framework
C/C++/perl API
Web portal

EDG compatible authentication and JDL
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
13
GRID of GRIDs
AliEn User Interface
iVDGL stack
20 September 2002
AliEn stack
HealthGrid Forum - Brussels
EDG stack
MammoGrid Project
14
Federated System Solution
Hospital Italy
Local
Analysis
Local
Analysis
Local
Query
Local
Query
Local
Analysis
Local
Query
University Database
GRID
Query
Result
Healthcare Institute
Clinician’s Workstations
•Knowledge is stored alongside data
•Active (meta-)objects manage
various versions of data and
algorithms
•Small network bandwidth required
20 September 2002
Local
Analysis
Local
Query
Massively distributed data
AND distributed analyses
Hospital UK
HealthGrid Forum - Brussels
Shared meta-data
Analysis-specific data
MammoGrid Project
15
MammoGrid Workpackages
•
•
•
•
•
•
•
•
WP 1 Project Management
WP 2 User Requirements Specifications
WP 3 Information System Architecture and Grid Compliance
WP 4 Local Node Implementation
WP 5 Integration Environment
WP 6 Standardisation Software
WP 7 Acquisition Control Software for a Grid deployment scenario.
WP 8 Software for CAD Diagnosis and QC in a Grid deployment
scenario.
• WP 9 Pilot Study 1: Breast Density Measurements in a Grid deployment
Scenario
• WP10 Pilot Study 2: CAD for Quality Control in a Grid deployment
scenario
• WP11 Dissemination & Exploitation
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
16
MammoGrid Implementation
GRID/DB
infrastructure
WP 2
CERN/UWE
Hospitals
WP 3 - CERN/UWE
specifications
User
Req’s
& Specs
H/W
local node implem.
WP 4 - Mirada
Standardisation
S/W.
Information infrastructure
Project Management
WP 6 - Mirada
WP 1 - CERN (Vitamib)
Integration
test bed
Use case/
validation
WP 5 - CERN
Application S/W
WP 7&8 - Oxford,
Pisa/Sassari
Dissemination & Exploitation
20 September 2002
HealthGrid Forum - Brussels
WP 9&10
Cambridge
Udine
WP 11 - All
MammoGrid Project
17
Main Deliverables/milestones
• User Requirements Specification and Technical
System Specification (months 3, 6)
• Prototype GRID-compliant database and information
infrastructure (first release m. 18, final rel. m. 36)
• Packaged medical imaging workstation with interface
to GRID, secure GRID box, (month 12)
• Grid compliant SMF software (month 12)
• Application software (CADe etc.) (months 12, 24, 36)
• Clinical Trial results (month 24, 36)
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
18
Dissemination & Clustering
• “GRID related” dissemination efforts
– CERN/UWE members of Global Grid Forum, GGF and of the
Object Management Group, OMG
– CERN already in GRIDSTART for dissemination and closely
working with DataGrid. UWE to join.
– Pursue relationship with EU funded GRID projects (e.g.
CROSSGRID, BIOGRID, GEMSS)
– Develop relationship with NDMA project (USA)
• “Clinical dissemination”
– Advisory Group (Oxford, Torino, IMIM & GEIE-LINC) to give
visibility to future MammoGrid partners.
• “Medical Image Analysis dissemination”
– Academic dissemination through targeted conferences and
journals in computer science & medical informatics
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
19
Exploitation
• Mirada
– Establish SMF as a standard for breast density measurement.
– Establish SMF as a standard for mammogram data exchange.
– Prototype SMF based review workstation for CADiagnosis.
• CERN/UWE
– Study the spinning-out of GRID/database technologies to address needs
in healthcare
• Oxford
– Develop patentable technologies for medical image analysis products.
– Transfer of technology agreement with Mirada.
• Pisa/Sassari
– Develop patentable technologies for medical image analysis products.
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
20
MammoGrid Added Value
Previous EC projects
•
•
•
•
•
MammoGrid
Distributed network of medical
images DB (Medimedia)
GroupWare collaborative platform
(Horizon)
•
Remote consultation/diagnostic,
(Europath)
Case-by-case security
implementations
•
Point-to-point connectivity on the
Web
•
20 September 2002
•
•
Multiple federated mammogram
databases
Clinicians tele- and co-working in new
and innovative groupings (‘virtual
organisations’)
Distributed and ubiquitous analysis
and diagnosis
Security handled by services’ on the
Grid’
Massive connectivity and datasets
with massive available compute
power
HealthGrid Forum - Brussels
MammoGrid Project
21
eDiamond (UK) and GP-CALMA (I)
• similar approach, one UK based, one Italy based, one
Europe wide
• synergy!!
• Different areas of application
•
•
•
•
•
•
Teaching & CPD
Tele-diagnosis
Quality control
Epidemiology
Algorithm development: data mining
CADe development
20 September 2002
HealthGrid Forum - Brussels
eDiamond
GP-CALMA
MammoGrid
MammoGrid
eDiamond
GP-CALMA
MammoGrid Project
22
GRID - which one?
•
One year ago GRID projects were still in their infancy
•
Globus Toolkit™
•
Components Globus Toolkit™ are used in many current GRID
(including EU DataGrid) projects but…
– Open source toolkit for building GRID infrastructure and applications
– APIs, SDKs, and tools which implement Grid protocols & services
– toolkit is only a toolkit
– someone has to do integration work
•
Emerging new technologies and standards
– Web services, W3 standard protocols..
– B2B solution, not specifically designed to support massive distributed
computing
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
23
“Web Services”
• Geneva, May 2001:
– Instead of using Globus toolkit or waiting for DataGRID to deliver repackaged version of Globus, we decided to try different path and use
Web Services and related standards as a backbone of our GRID
implementation
• Web Services - components
– WSDL: Web Services Description Language
• Interface Definition Language for Web services
– SOAP: Simple Object Access Protocol
• XML-based RPC protocol; common WSDL target
– UDDI: Universal Desc., Discovery, & Integration
• Directory for Web services
GGF in Toronto, February 2002 :
Web Services declared as a key element of new OGSA (Open Grid
Services Architecture) initiative

20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
24
OGSA
I
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
25
AliEn Open Source Components
• SASL/OpenSSL/OpenCA as authentication protocol
• Globus/GSS as an implementation of authentication
compatible with other Grid projects
• ClassAds language for job description (compatible with EU
DataGrid)
• OpenLDAP for configuration management
• Apache for Web Portal
• MySQL as relational database backend
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
26
Conclusions on AliEn
•
•
•
•
After just one year of development with limited resources, AliEn
has become a lightweight, simplified but fully functional GRID
implementation
Adding AliEn interface between our application and external GRID
infrastructure
– Allows us early prototyping of GRID technology
– Enables massive distributed production
– Protects users from rapid changes in technology
It makes extensive use of Open Source components and the latest
internet standards (SOAP, OpenSSL, OpenLDAP, Globus, MySQL,
These components and modules are interchangeable and easily
replaceable by other (possibly non-OpenSource) components
offering the same functionality
20 September 2002
HealthGrid Forum - Brussels
MammoGrid Project
27
(…)
In our current picture, GridBox will act as adapter between GRID (services)
and Mirada Workstation
It is possible to define an abstract
interface (API to be used by Mirada
Workstation) for which we will
provide an implementation based on
AliEn.
mamogrid.cern.ch
Region#1
Hospital#1
Region#2
Region#3
Hospital#2
User
Town#1
Town#2
If there is clear benefit for our project,
we could consider using
Town#3
WebSphere as UDDI service
DB2 as relational DB backend
Hospital#1
20 September 2002
Hopsital#2
Hospital#3
However, this has not been planned
and would require additional effort.
HealthGrid Forum - Brussels
MammoGrid Project
28