presentation

Download Report

Transcript presentation

The MammoGrid Project Grids
Architecture
Richard McClatchey
CHEP’03, San Diego March 24th 2003
On behalf of the MammoGrid Consortium:
CERN, Mirada Solutions, Univ of Oxford, Univ of
Sassari & Pisa, Univ West of England, Univ Hospitals of
Cambridge (Addensbrookes) & Udine
Contents
1.
2.
3.
4.
5.
6.
7.
The MammoGrid project objectives
Project challenges and philosophy
HEP vs distributed medical image analysis
The MammoGrid infrastructure
Implementation and current status
Future plans
Conclusions & questions
R. McClatchey, CHEP’03 San Diego March 2003
2
What is the Mammogrid?
• EU FP5 project to build a pan-European
distributed Database of mammography
images using GRID Technologies.
• Aim: To provide a demonstrator for use in
epidemiological studies, quality control and
validation of computer aided detection
algorithms.
R. McClatchey, CHEP’03 San Diego March 2003
3
Mammogrid Objectives
1.
2.
3.
4.
5.
6.
7.
To evaluate current Grids technologies and determine the requirements for
Grid-compliance in a pan-European mammography database.
To implement the Mammogrid database, using novel Grid-compliant and
Federated-Database technologies that will provide improved access to
distributed data and will allow rapid deployment of software packages to
operate on locally stored information.
To deploy enhanced versions of a standardization system that enables
comparison of mammograms in terms of intrinsic tissue properties
independently of scanner settings, and to explore its place in the context of
medical image formats (DICOM).
To develop software tools to automatically extract image information that can
be used to perform quality controls on the acquisition process of participating
centers (e.g. average brightness, contrast).
To develop software tools to automatically extract tissue information that can
be used to perform clinical studies (e.g. breast density, presence, number and
location of micro-calcifications) in order to increase the performance of breast
cancer screening programs.
To use the annotated information and the images in the database to benchmark
the performance of the software described in points 3, 4 and 5.
To exploit the Mammogrid database and the algorithms to propose initial panEuropean quality controls on mammographic acquisition and ultimately to provide
a benchmarking system to third party algorithms.
R. McClatchey, CHEP’03 San Diego March 2003
4
Mammogrid Philosophy
• Project concentrates on applying emerging GRID
technology rather on developing it.
• It plans to implement a ‘lightweight’ (but fully
functional) GRID and study its usage in hospitals
• It will draw heavily on other Grids projects e.g.
DataGrid
• It will deliver a prototype federated database of
mammograms in hospitals in the UK and Italy
• It will provide rapid feedback from the Hospital
community
• And will inform the next generation of HealthGrids
developments
R. McClatchey, CHEP’03 San Diego March 2003
5
Why a Mammography Database?
• Breast cancer is a huge problem:
– 10% of women develop breast cancer,
– 19% of cancer deaths are due to breast cancer,
– 24% of all cancer cases are breast cancers,
– there are 348,000 cases in EU & USA, 50,000 die
every year,
– fortunately there is a solution.
• Early diagnosis through mammography screening
improves prognosis
R. McClatchey, CHEP’03 San Diego March 2003
6
...but
• Quality control in acquisition, diagnosis and efficient data
management is vital.
• Improving the reliability of screening and early diagnosis
requires:
– better epidemiological understanding,
– improved diagnostic tools,
– enhanced quality control,
– continuous training and
– efficient management of data and records.
• A way to achieve the above is through repositories of
mammography data for research and training that contain
sufficiently large statistical samples e.g.
– Mammogrid-EU,
– NDMA-US,
– eDIAMonD-UK (Mirada, IBM, Oxford, Edin. KCL, UCL)
– GPCalma-Italy
R. McClatchey, CHEP’03 San Diego March 2003
7
The Mammogrid Challenge
• Building this repository is not trivial because:
– Large numbers of exemplars are required.
– Cases must be obtained from many geographically
remote locations.
– Data itself is large: 2 breasts × 2 views × 4K × 4K
pix × 2 bytes = 128Mbyte per patient per visit, 3M
women per year UK, ~ 400 Terabytes in UK alone,
– Acquisition is highly variable, same image may look
different depending on machine and parameters.
How do you compare?
– Patient privacy and data security is key.
– Many relevant items of metadata.
R. McClatchey, CHEP’03 San Diego March 2003
8
A GRID Infrastructure is ideal
• The Databases to statistically validate image based clinical
hypothesis are:
 Populated by large number of cases
 Contain large files (1 mammogram 10Mb+)
 Geographically distributed repositories
 Heterogeneous database formats
 Need to be accessible to co-workers
• Development and validation of medical image analysis solutions
demands:
 Computationally expensive simulations.
 Repeated runs for optimal parameter tuning.
 Statistical test rigs.
 Remote execution and maintenance
• Services (e.g. security) must be system-resident, invisible, generic
R. McClatchey, CHEP’03 San Diego March 2003
9
High Energy Physics vs.
Mammogrid
• Mammogrid heavily relies on technologies developed
primarily in the field of high energy physics.
– Similarities
• Large number of big files
• Files can be sensibly organized in directory tree
• Need to replicate and move file copies between
sites
• Need to execute commands on the node which
hosts data locally
– Difficulties
• Complexity of co-working in medical environment
• Lack of trained IT personnel
– Confidentiality
R. McClatchey, CHEP’03 San Diego March 2003
10
Federated System Solution
Hospital Italy
Local
Analysis
Local
Analysis
Local
Query
Local
Query
Local
Analysis
Local
Query
University Database
Query
Result
GRID
Healthcare Institute
Clinician’s Workstations
•Knowledge is stored alongside data
•Active (meta-)objects manage
various versions of data and
algorithms
•Small network bandwidth required
Local
Analysis
Local
Query
Massively distributed data
AND distributed analyses
Hospital UK
R. McClatchey, CHEP’03 San Diego March 2003
Shared meta-data
Analysis-specific data
11
Mammogrid Implementation
User
Req’s
& Specs
WP 2
CERN/UWE
Hospitals
specifications
GRID/DB
infrastructure
WP 3 - CERN/UWE
H/W
local node implem.
WP 4 - Mirada
Standardisation
S/W.
WP 6 - Mirada
Information infrastructure
Project Management
WP 1 - CERN (Vitamib)
Integration
test bed
Use case/
validation
WP 5 CERN
Application S/W
WP 9&10
Cambridge
Udine
WP 7&8 - Oxford,
Pisa/Sassari
Dissemination & Exploitation
R. McClatchey, CHEP’03 San Diego March 2003
WP 11 - All
12
MammoGram Analysis Use-Case
View Patient Details
Example Use-Case:
Mammogram Analysis
(from Maintain Patient Basic Details)
•View and Annotate Images
•Run CAD
•Execute Queries
<<include>>
Obtain User Authorization
Annotate Mammogram Images
(from Use Case View)
<<extend>>
<<include>>
View Mammogram Image
<<extend>>
Mammogram
Analyst
Perform Radiological Analysis
<<extend>>
Define Queries
(from Use Case View)
<<extend>>
Execute Radiological Queries
<<extend>>
Run Cad Software
R. McClatchey, CHEP’03 San Diego March 2003
13
MammoGrid Data Structures
Patient
Database Entities:
•Hospitals
•Users (Radiologists)
•Equipment
•Patients
•Studies
•Series
•Images
Patient Study
Patient Study
Name
Date of Birth
Age at Menopause
Age at Menarche
Place of Birth
Ethnic Group
Nationality
Patient Study
Medical History Entry
Medical History Entry
Medical History Entry
Patient
Patient Study
Patient Study
Patient Study
Name
Date of Birth
Age at Menopause
Age at Menarche
Place of Birth
Ethnic Group
Nationality
Medical
History Entry
Medical
History
Entry
Medical
History Entry
Patient Study
Date
Time
Description
Weight
Sonography SeriesSymptoms
MR Series
Equipmen
t
Equipment
Mammography Series
Equipment
Mammography Image
Mammography Series
Equipment
X-ray
machine
Film
Processor
Mammography Image
Mammography Image
Mammography Image
Mammography Image
Digitiser
R. McClatchey, CHEP’03 San Diego March 2003








Laterality (Right/Left)
Implant present?
Modality (CC/MLO)
Exposure KvP
Exposure MAS
Breast Thickness
AEC Position
Exposure Comments
14
Mammogram
MammoGrams & Annotation
Mammography Image
Annotation








Features
Size of Features
Feature properties
Malignancy
Biopsy Proven?
Comments
Patient
Patient Study
Patient Study
Patient Study
Name
Date of Birth
Age at Menopause
Age at Menarche
Place of Birth
Ethnic Group
Nationality
Medical
History Entry
Medical
History
Entry
Medical
History Entry
Laterality (Right/Left)
Implant present?
Modality (CC/MLO)
Exposure KvP
Exposure MAS
Breast Thickness
AEC Position
Exposure Comments
Patient Study
Date
Time
Description
Weight
Sonography SeriesSymptoms
MR Series
Equipmen
t
Equipment
Mammography Series
Equipment
Mammography Image
Mammography Series
Equipment
X-ray
machine
Film
Processor
Mammography Image
Mammography Image
Mammography Image
Mammography Image
Digitiser
R. McClatchey, CHEP’03 San Diego March 2003








Laterality (Right/Left)
Implant present?
Modality (CC/MLO)
Exposure KvP
Exposure MAS
Breast Thickness
AEC Position
Exposure Comments
15
Main Deliverables/milestones
• User Requirements Specification and Technical
System Specification (months 3, 6)
• Prototype GRID-compliant database and information
infrastructure (first release m. 18, final rel. m. 36)
• Packaged medical imaging workstation with interface
to GRID, secure GRID box, (month 12)
• Grid compliant SMF software (month 12)
• Application software (months 12, 24, 36)
• Clinical Trial results (month 24, 36)
R. McClatchey, CHEP’03 San Diego March 2003
16
Overall Grids Architecture
Workstations
GRID
VPN Network
Central
File
Catalogue
GridBox
Alien Backup
Mirada WST
(MAS)
File Cat.
Replica
GridBox
Mammogrid
Data Backup
Alien Data
Mammogrid
Data
Cambridge Site
File Cat.
Replica
GridBox
Alien Data
High
Security
Level
File Cat.
Replica
GridBox
Mammogrid
Data
Alien Data
Mammogrid
Data
Data
replication
R. McClatchey, CHEP’03 San Diego March 2003
17
Local Site Architecture
GRID : Mammogrid – AliEn
Workstations
MAS:
Mirada
Acquisition
System
Mirada Workstation
Sends
Dicom Files
- Patient Personal
Information,
- Additional
Information,
-…
SOAP
Messages
Local Cache
File Transfer Daemon
DICOM File :
- Description Inf.
- Image
DICOM
Server
Object : Patient
Informatio
n Service
Web Services
Alien File
Catalogue
LFNs
Alien
Database
PFNs
Mammogrid
Database
Read / Write
operations
Digitizer
R. McClatchey, CHEP’03 San Diego March 2003
18
Clinician to Data
Clinician
Mirada
Workstation
Client
Frontend
DICOM Server
SOAP
Grid Server
...
Mammogrid Server
R. McClatchey, CHEP’03 San Diego March 2003
19
MammoGrid AliEn Prototype
Mirada-AliEn Interface
AliEn prototype
Interface
AliEn Catalogue
cambridge
cern
…
The Catalogue is
divided in several
databases, which can
be distributed.
udine
Perl
SOAP
Server
The catalogue
keeps the LFNPFN mapping
and the metadata
R. McClatchey, CHEP’03 San Diego March 2003
20
Interaction Diagram
Case : READ
SOAP Messages
GRID Environment
Mammogrid - AliEn
Mirada
WST
Query
Result Set
IS
Informatio
n Service
Negociation
Reads
Case : WRITE
Mirada
WST
FTD
File
Transfer
Daemon
File
Catalogue
Mammogrid - AliEn
Push(DICOM File)
DICOM
Server
Negociation
File Handle
Updates
FTD
File
Transfer
Daemon
File
Catalogue
File Catalogue
Alien Service
Mammogrid Service
R. McClatchey, CHEP’03 San Diego March 2003
21
Current Hardware Setup
Gridbox specifications :
 2x intel Xeon processors,
 2 GB DDR 200/266 MHz,
 Redundant Power Supply,
 2x 20 GB IDE HDD (7200
rpm) UDMA,
 RAID-1 IDE adapter,
 360 GB usable, RAID-1,
 Ethernet network adapter
10/100Mb/s,
 Gigabit network adapter
R. McClatchey, CHEP’03 San Diego March 2003
22
Conclusions
• Distributed Health informatics is an important
application area for Grids technologies – HealthGrid
• Many similarities with High Energy Physics
• Need rapid feedback from the user community –
MammoGrid user requirements specified BUT
• Effective Grid deployment needed now and
• Many open questions e.g :
– How to resolve distributed queries ?
– What role for meta-data ?
– How to maintain secure, reliable data ?
• MammoGrid : First results expected late 2003
R. McClatchey, CHEP’03 San Diego March 2003
23