Medical Imaging Research Experiences
Download
Report
Transcript Medical Imaging Research Experiences
Applications of Machine Learning to
Medical Imaging
Daniela S. Raicu, PhD
Associate Professor, CDM
DePaul University
Email: [email protected]
Lab URL: http://facweb.cs.depaul.edu/research/vc/
1
About me…
• BS in Mathematics from University
of Bucharest, Romania
• MS in CS from Wayne State
University, Michigan
• PhD in CS from Oakland
University, Michigan
My dissertation work
• Research areas: Data Mining & Computer Vision
• Dissertation topic: Content-based image retrieval
• Research hypothesis:
“A picture is worth thousands of words…”
• “There is enough information in the image content
to perform image retrieval whose similarity results
correspond to the human perceived similarity”.
My dissertation work (cont)
• Research hypothesis:
•“There is enough information in
the image content to perform
image retrieval whose similarity
results correspond to the human
perceived similarity”.
• Methodology:
1) extract color image features, 2)
define color-based similarity, 3) cluster
images based on color, 4) retrieve
similar images
• Output:
Color-based CBIR for general purpose
image datasets
Proof of hypothesis:
Google similar images:
http://similarimages.googlelabs.com/
Towards an academic career
• Assistant Professor at DePaul, 2002-2008
• Associate Professor, 2008- Present
• Teaching areas & research
interests:
data analysis, data mining,
image processing, computer
vision & medical informatics
• Co-director of the Intelligent
Multimedia Processing, Medical
Informatics lab & the NSF REU
Program in Medical Informatics
Outline
Part I: Introduction to Medical Informatics
Medical Informatics
Clinical Decision Making
Imaging Modalities and Medical Imaging
Basic Concepts in Image Processing
Part II: Advances in Medical Imaging Research
Computer-Aided Diagnosis
Computer-Aided Diagnostic Characterization
Texture-based Classification
Content-based Image Retrieval
Medical informatics research
What is medical informatics?
Medical informatics is the application of computers,
communications and information technology and
systems to all fields of medicine
- medical care
- medical education
- medical research.
MF Collen, MEDINFO '80, Tokyo
What is medical informatics?
Medical informatics is the branch of science concerned
with the use of computers and communication
technology to acquire, store, analyze, communicate, and
display medical information and knowledge to facilitate
understanding and improve the accuracy, timeliness,
and reliability of decision-making.
Warner, Sorenson and Bouhaddou, Knowledge
Engineering in Health Informatics, 1997
Clinical decision making
• Making sound clinical decisions requires:
•
– right information, right time, right
format
Clinicians face a surplus of information
– ambiguous, incomplete, or poorly
organized
• Rising tide of information
– Expanding knowledge sources
– 40K new biomedical articles per month
– Publicly accessible online health info
– Hundreds of pictures per scan for one patient
Clinical decision making:
What is the
problem?
• Man is an imperfect data processor
•
•
•
– We are sensitive to the quantity and
organization of information
Army officers and pilots commit ‘fatal errors’ when
given too many, too few, or poorly organized data
The same is true for clinicians who ‘watch’ for
events
Clinicians are particularly susceptible to errors of
omission
Clinical decision making:
What is the
problem?
• Humans are “non-perfectable” data processors
- Better performance requires more time to process
- Irony
• Clinicians increasingly face productivity
expectations
• Clinicians face increasing administrative tasks
Subdomains of medical informatics
Wikipedia)
•
•
•
•
•
•
•
•
•
imaging informatics
clinical informatics
nursing informatics
consumer health informatics
public health informatics
dental informatics
clinical research informatics
bioinformatics
pharmacy informatics
(by
What is medical imaging (MI)?
The study of medical imaging is concerned with the
interaction of all forms of radiation with tissue
and
the development of appropriate technology to extract
clinically useful information (usually displayed in an image
format) from observation of this technology.
Sources of Images:
• Structural/anatomical information (CT, MRI, US) - within each elemental
volume, tissue-differentiating properties are measured.
• Information about function (PET, SPECT, fMRI).
Examples of medical images
The imaging “chain”
Filtering
Reconstruction
“Raw data”
Raw data
Signal
acquisition
Processing
123……………
2346…………..
65789…………
6578…………..
Quantitative output
Analysis
Image analysis:
Turning an image into data
•
•
•
•
User extracted qualitative features
User extracted quantitative features
Semi automated
Automated
Exam Level:
Finding:
Feature 1
Feature 2
Feature 3
.
.
Feature 1
Feature 2
.
.
Major advances in medical imaging
Image Segmentation
Image Classification
Computer-Aided Diagnosis Systems
Computer-Aided Diagnostic Characterization
Content-based Image Retrieval
Image Annotation
These major advances can play a major role in early
detection, diagnosis, and computerized treatment
planning in cancer radiation therapy.
Computer-Aided Diagnosis
• Computed Aided Diagnosis (CAD) is diagnosis made by a
radiologist when the output of computerized image analysis methods
has been incorporated into his or her medical decision-making
process.
• CAD may be interpreted broadly to incorporate both
• the detection of the abnormality task and
• the classification task: likelihood that the abnormality
represents a malignancy
Motivation for CAD systems
The amount of image data acquired during a CT scan is
becoming overwhelming for human vision and the overload of
image data for interpretation may result in oversight errors.
Computed Aided Diagnosis for:
• Breast Cancer
• Lung Cancer
– A thoracic CT scan generates about 240 section images for
radiologists to interpret.
• Colon Cancer
– CT colonography (virtual colonoscopy) is being examined as a
potential screening device (400-700 images)
CAD for Breast Cancer
A mammogram is an X-ray of breast tissue used as a screening
tool searching for cancer when there are no symptoms of anything
being wrong. A mammogram detects lumps, changes in breast
tissue or calcifications when they're too small to be found in a
physical exam.
• Abnormal tissue shows up a
dense white on
mammograms.
• The left scan shows a
normal breast while the
right one shows malignant
calcifications.
CAD for Lung Cancer
• Identification of lung nodules in thoracic CT scan;
the identification is complicated by the blood vessels
• Once a nodule has been detected, it may be
quantitatively analyzed as follows:
• The classification of the nodule as
benign or malignant
• The evaluation of the temporal size in
the nodule size.
CAD for Colon Cancer
• Virtual colonoscopy (CT colonography) is a minimally invasive
imaging technique that combines volumetrically acquired helical
CT data with advanced graphical software to create two and threedimensional views of the colon.
Three-dimensional endoluminal view of the colon showing the
appearance of normal haustral folds and a small rounded polyp.
Role of Image Analysis & Machine
Learning for CAD
• An overall scheme for computed aided diagnosis systems
Organ
Segmentation
- Breast Images
- Thoracic Images
- Breast Boundary
- Lungs
- Colon
Classification
- Malignant
- Benign
Lesion /
Abnormality
Segmentation
- Nodule
- Polyps
Evaluation &
Interpretation
Feature
Extraction
- Texture
- Shape
- Geometrical
properties
SoC Medical imaging research
projects
1. Computer-aided characterization for lung nodules
Goal: establish the link between computer-based image
features of lung nodules in CT scans and visual
descriptors defined by human experts (semantic
concepts) for automatic interpretation of lung nodules
Example: This lung nodule has a “solid” texture and has
a “sharp” margin
Why computer-aided characterization?
Reader 1
Lobulation=4
Malignancy=5
“highly
suspicious”
Sphericity=2
Reader 3
Lobulation=2
Malignancy=5
“highly suspicious”
Sphericity=5 “round”
Reader 2
Lobulation=1
“marked”
Malignancy=5 “highly
suspicious”
Sphericity=4
Reader 4
Lobulation=5 “none”
Malignancy=5
“highly suspicious”
Sphericity=3 “ovoid”
Ratings and Boundaries across radiologists are
different!!!
25
Computer-aided characterization
• Research Hypothesis
• “The working hypothesis is that certain radiologists’
assessments can be mapped to the most important low-level
image features”.
• Methodology
• new semi-supervised probabilistic learning approaches that
will deal with both the inter-observer variability and the small
set of labeled data (annotated lung nodules).
• Our proposed learning approach will be based on an
ensemble of classifiers (instead of a single classifier as with
most CAD systems) built to emulate the LIDC ensemble
(panel) of radiologists.
Computer-aided
characterization (cont.)
• Expected outcome:
• an optimal set of quantitative diagnostic features linked to
the visual descriptors (semantic concepts).
• Significance:
• The derived mappings can serve to show
– the computer interpretation of the corresponding radiologist
rating in terms of a set of standard and objective image
features,
– automatically annotate new images,
– and augment the lung nodule retrieval results with their
probabilistic diagnostic interpretations.
Computer-aided characterization
• Preliminary results
– NIH Lung Image Database Consortium (LIDC):
• 149 distinct nodules from about 85 cases/patients;
• four radiologists marked the nodules using 9
semantic characteristics on a scale from 1 to 5
except for calcification (1 to 6) and internal
structure (1 to 4)
Computer-aided characterization
• LIDC high level concepts &
ratings
Characteristic
Possible Scores
Calcification
1. Popcorn
2. Laminated
3. Solid
4. Non-central
5. Central
6. Absent
Internal structure
1. Soft Tissue
2. Fluid
3. Fat
4. Air
Lobulation
1. Marked
2. . 3. . 4. .
5. None
Malignancy
1. Highly Unlikely
2. Moderately Unlikely
3. Indeterminate
4. Moderately Suspicious
5. Highly Suspicious
Characteristic
Possible Scores
Margin
1. Poorly Defined
2. . 3. . 4. .
5. Sharp
Sphericity
1. Linear
2. .
3. Ovoid
4. .
5. Round
Spiculation
1. Marked
2. . 3. . 4. .
5. None
Subtlety
1. Extremely Subtle
2. Moderately Subtle
3. Fairly Subtle
4. Moderately Obvious
5. Obvious
Texture
1. Non-Solid
2. .
3. Part Solid/(Mixed)
4. .
5. Solid
29
Computer-aided characterization
• Low-level image features
Shape Features
Size Features
Intensity Features
Texture Features
Circularity
Area
MinIntensity
11 Haralick features calculated
from co-occurrence matrices
Roughness
ConvexArea
Maxintensity
24 Gabor features
Elongation
Perimeter
SDIntensity
5 Markov Random Field features
Compactness
ConvexPerimeter
MinIntensityBG
Eccentricity
EquivDiameter
MaxIntensityBG
Solidity
MajorAxisLength
MeanIntensityBG
Extent
MinorAxisLength
SDIntensityBG
RadialDistanceSD
IntensityDifference
30
Computer-aided characterization
• Accuracy results
Characteri
stics
Decision
trees
Add instances
predicted with high
confidence (60%)
Add instances predicted with high
confidence (60%) and instances
with low margin (5%)
Lobulation
27.44%
81.00%
69.66%
Malignancy
42.22%
96.31%
96.31%
Margin
35.36%
98.68%
96.83%
Sphericity
36.15%
91.03%
90.24%
Spiculation
36.15%
63.06%
58.84%
Subtlety
38.79%
93.14%
92.88%
Texture
53.56%
97.10%
97.36%
Average
38.52%
88.62%
86.02%
31
Computer-aided characterization
• Challenges
• Small number of training samples and large
number of features
“curse of dimensionality”
problem
• Nodule size
• Variation in the nodules’ boundaries
• Different types of imaging acquisition parameters
• Clinical evaluation: observer performance studies
require collaboration with medical
schools or hospitals
SoC Medical imaging
research projects
2. Texture-based Pixel Classification
- tissue segmentation
- context-sensitive tools for radiology reporting
-
Pixel Level Texture
Extraction
d1 , d 2 , d k
Pixel Level
Classification
tissue _ label
Organ Segmentation
Texture-based Pixel Classification
• Texture Feature extraction: consider texture
around the pixel of interest.
• Capture texture characteristic based on
estimation of joint conditional probability
of pixel pair occurrences Pij(d,θ).
– Pij denotes the normalized co-occurrence matrix of
specify by displacement vector (d) and angle (θ).
Neighborhood
of a pixel
Haralick Texture Features
Haralick Texture Features
Examples of Texture Images
Texture images: original image, energy and cluster tendency, respectively.
M. Kalinin, D. S. Raicu, J. D. Furst, D. S. Channin,, " A Classification Approach for Anatomical Regions Segmentation", The IEEE
International Conference on Image Processing (ICIP), Genoa, Italy, September 11-14, 2005.
Texture Classification of Tissues
in CT Chest/Abdomen
Example of Liver Segmentation:
(J.D. Furst, R. Susomboon, and D.S. Raicu, "Single Organ
Segmentation Filters for Multiple Organ Segmentation", IEEE 2006 International Conference of the Engineering in
Medicine and Biology Society (EMBS'06))
Original Image
Initial Seed at 90%
Split & Merge at 85%
Split & Merge at 80%
Region growing at 70% Region growing at 60% Segmentation Result
Classification models: challenges
(a) Optimal selection of an adequate set of textural features
is a challenge, especially with the limited data we often have
to deal with in clinical problems. Consequently, the
effectiveness of any classification system will always be
conditional on two things:
(i) how well the selected features describe the tissues
(ii) how well the study group reflects the overall target
patient population for the corresponding diagnosis
Classification models: challenges
(b) how other type of information can be incorporated into
the classification models:
- metadata
- image features from other imaging modalities
(need of image fusion)
(c) how stable and general the classification models are
Content-based medical image
retrieval (CBMS) systems
Definition of Content-based Image Retrieval:
Content-based image retrieval is a technique for retrieving
images on the basis of automatically derived image
features such as texture and shape.
-
Applications of Content-based Image Retrieval:
• Teaching
• Research
• Diagnosis
• PACS and Electronic Patient Records
Diagram of a CBIR
Image Database
Image Features
Feature Extraction
[D1, D2,…Dn]
Similarity Retrieval
Query Image
Feedback
Algorithm
User Evaluation
Query Results
http://viper.unige.ch/~muellerh/demoCLEFmed/index.php
CBIR as a Diagnosis Aid
An image retrieval system can help when the diagnosis
depends strongly on direct visual properties of images in the
context of evidence-based medicine or case-based
reasoning.
CBIR as a Teaching Tool
An image retrieval system will allow students/teachers to browse
available data themselves in an easy and straightforward fashion by
clicking on “show me similar images”.
Advantages:
- stimulate self-learning and a comparison of similar cases
- find optimal cases for teaching
Teaching files:
• Casimage: http://www.casimage.com
• myPACS: http://www.mypacs.net
CBIR as a Research Tool
Image retrieval systems can be used:
• to complement text-based retrieval methods
• for visual knowledge management whereby the images
and associated textual data can be analyzed together
• multimedia data mining can be applied to
learn the unknown links between visual
features and diagnosis or other patient
information
• for quality control to find images that might have been
misclassified
CBIR as a tool for lookup and
reference in CT chest/abdomen
• Case Study: lung nodules retrieval
– Lung Imaging Database Resource for Imaging Research
http://imaging.cancer.gov/programsandresources/Inf
ormationSystems/LIDC/page7
– 29 cases, 5,756 DICOM images/slices, 1,143 nodule images
– 4 radiologists annotated the images using 9 nodule
characteristics: calcification, internal structure, lobulation,
malignancy, margin, sphericity, spiculation, subtlety, and texture
• Goals:
– Retrieve nodules based on image features:
• Texture, Shape, and Size
– Find the correlations between the image features and the
radiologists’ annotations
Choose a nodule
Choose an image
feature& a similarity
measure
M. Lam, T. Disney, M. Pham, D. Raicu, J. Furst, “Content-Based Image Retrieval for
Pulmonary Computed Tomography Nodule Images”, SPIE Medical Imaging Conference,
San Diego, CA, February 2007
Retrieved Images
CBIR systems: challenges
•Type of features
• image features:
- texture features: statistical, structural,
model and filter-based
- shape features
• textual features (such as physician annotations)
• Similarity measures
-point-based and distribution based metrics
• Retrieval performance:
• precision and recall
• clinical evaluation
uestions ?