Transcript Document

Automated Classification of Crystallization Images
Christian A. Cumbaa [email protected] and Igor Jurisica [email protected], Division of Signaling Biology, Ontario Cancer Institute, Toronto, Ontario
Machine classification
Results
Experiment 1: Independent crystallization conditions.
6 classifiers trained to detect clear, phase separation, precipitate, skin,
crystal, garbage.
Training/test images limited to unanimously-scored images (percategory).
• Table 1 summarizes the performance of each.
Experiment 2: Compound crystallization conditions.
One 10-way classifier trained to distinguish between 10 compound
categories: crystal only, crystal+phase separation, crystal+precipitate,
precipitate only, precipitate+skin, precipitate+phase separation, phase
separation+skin, phase separation only, clear drop, and garbage.
Training/test images limited to unanimously-scored images belonging to
one of the 10 categories.
• Table 2 summarizes the performance of the classifier.
• Figure 1 illustrates the distribution of true positives and false negatives.
• Figure 2 illustrates the distribution of true positives and false positives.
• Figure 3 gives example images of each.
Discussion
Experiments 1 and 2 reveal degrees of difficulty in recognizing
crystallization outcomes.
Most singleton categories in Experiment 2 performed generally well.
Clear drops are most accurately classified.
Many compound categories demonstrate the classifier's confusion
between certain mixtures of outcomes. All crystal-bearing categories
are confused to a degree. Precipitates as a whole are easily detected,
but compound precipitates are difficult to subdivide.
h
u t
T r
Crystal
Crystal/Phase Sep.
Crystal/Precip.
Precip.
Precip./Skin
Precip./Phase Sep.
Phase Sep./Skin
Phase Sep.
Clear
Garbage
.55
Machine classification
Crystal
Phase Sep. Precip.
+
+
+
Truth +
-
.22
234
448
1167 144223
9315
Precision
Recall
.83
.60
2849
5783
3429 52395
-
Clear
+
-
1453
28278
782
13103 106919
Garbage
+
407
122
5093 89159
2068 138131
0.17
0.17
0.95
0.31
0.85
0.16
0.34
0.94
0.96
0.80
0.97
0.77
Table 1: The confusion matrices summarizing the match between actual crystallization
outcomes and the labels assigned by the classification system (Experiment 1). Numbers
indicate counts of actual images.
.04
.16
.62
.98
.52
Machine classification
ry
st
al
C
ry
st
al
/P
C
ha
ry
st
se
al
Se
/P
Pr
re
p.
ec
ci
ip
p.
.
Pr
ec
ip
Pr ./Sk
ec
in
ip
Ph ./Ph
as
as
e
e
Se
Ph Se
as p./S p.
e
ki
S
n
C
e
le
p.
ar
G
ar
ba
ge
C
h
u t
T r
C
ry
C sta
ry l
C sta
ry l/
Pr sta Ph
ec l/P as
Pr ip re e S
ec . ci e
p. p.
Pr ip
ec ./S
Ph ip ki
as ./P n
Ph e h
as Se ase
C e p S
le S ./ e
G ar ep Ski p.
ar
. n
ba
ge
h
u t
60528
Skin
+
.46
Machine classification
Crystal
Crystal/Phase Sep.
Crystal/Precip.
Precip.
Precip./Skin
Precip./Phase Sep.
Phase Sep./Skin
Phase Sep.
Clear
Garbage
630
47059 41261
Figure 1: Distributions of classification labels (columns), as
applied by the image analysis system to observed
crystallization outcomes (rows). Elements on the diagonal
indicate correct classifications. Numbers indicate Recall
scores for each outcome.
T r
Goal: We aim to automatically classify all images generated by the HWI
robotic imaging system, and eliminate the need for a crystallographer to
search among hundreds of images for crystal hits, or other conditions of
interest.
Data source:
Truth data for 147456 images from Hauptman Woodward's HighThroughput Screening (HTS) Laboratory
• Each image evaluated by 3 or more experts
• Scored for presence/absence of 7 independent crystallization
conditions: clear, phase separation, precipitate, skin, crystal, garbage,
unsure
Experiment 2 was supplemented with
• 6456 crystal images (NESG-sourced proteins)
• 11504 crystal images (SGPP-sourced proteins)
Image analysis: Each image was processed by our image processing
algorithms in order to extract 840 numeric measures of image texture.
These features measure the presence of straight edges, grey-tone
statistics, etc., each measured at multiple scale and contrast levels.
Feature selection: For each target category of images, we select a
subset of the 840 features that most effectively distinguishes
positive/negative examples of each category. Images are therefore
reduced to a short vectors of numeric values.
Image classification: To train a classifier, we construct statistical models
of the probability distribution of feature-vector values: one for each
category. For these experiments, we use multivariate Gaussians to
estimate probability density.
New images are classified by comparing their feature vectors to each
category's probability distribution. The result, for each image, is itself a
probability distribution across all categories. The category with the
highest probability will be output by the classifier.
To avoid bias in our models, each data point is used in turn for training
and testing in a 10-fold cross-validation process.
Measuring performance: Two important performance measures are
used, precision and recall.
Precision measures the fraction of images classified as category C that
actually belong to C.
Recall measures the fraction of images belonging to C that were
classified as C.
C
ry
C sta
ry l
C sta
ry l/
Pr sta Ph
ec l/P as
Pr ip re e S
ec . ci e
p. p.
Pr ip
ec ./S
Ph ip ki
as ./P n
Ph e h
as Se ase
C e p S
le S ./ e
G ar ep Ski p.
ar
. n
ba
ge
Image classification
.59
.38
.49
.86
.62
Crystal
Crystal/Phase Sep.
Crystal/Precip.
Precip.
Precip./Skin
Precip./Phase Sep.
Phase Sep./Skin
Phase Sep.
Clear
Garbage
3934
354
1039
280
88
1
4
268
1174
21
578
433
281
117
51
14
0
421
94
2
1016
153
2972
1721
296
23
2
211
69
0
397
49
1325 24547
987
52
4
1213
810
27
120
24
206
1201
2557
5
3
98
29
8
19
13
101
199
38
18
1
49
1
0
7
2
4
2
16
0
11
24
1
0
422
115
77
274
73
29
2
3721
1229
32
101
1
12
128
9
0
1
123 28482
174
19
1
0
33
4
0
1
4
163
246
Table 2: The confusion matrix summarizing the match between actual crystallization
outcomes and the labels assigned by the image analysis system (Experiment 2).
Numbers indicate counts of actual images.
.13
.38
.61
.89
.48
Figure 2: Distributions of observed crystallization outcomes
(rows) grouped by labels (columns) applied by the image
analysis system. Elements on the diagonal indicate correct
classifications. Numbers indicate Precision scores for each
class.
New directions
New image analysis system (under development)
• Revised and expanded feature set
• Textural features of local regions of the image
• More precise texture, straight edge, and discrete
object metrics
World Community Grid
• New system will run on the World Community Grid
• 150 CPU-years compute time per day
• Will compute features for 60 million images
• Project launch Spring/Summer 2007
• http://www.worldcommunitygrid.org/
Acknowledgements
All images in these studies were generated at the
High-Throughput Screening lab at The HauptmanWoodward Institute. Multi-outcome truth data was
painstakingly generated by eight heroes at HWI,
and carefully organized, cleaned, and curated by
Max Thayer and Raymond Nagel at HWI.
This work was funded by the following grants and
organizations:
NIH U54 GM074899-01, Genome Canada, IBM,
NSERC RGPIN 203833-02.
Earlier work supported by NIH P50 GM62413-05,
NSERC and CITO.
True positives (highest-scoring)
False negatives
(lowest-scoring)
False positives
(highest-scoring)
Crystal
Crystal +
Phase Sep.
Crystal +
Precip.
Precip.
Precip
+ Skin
Precip +
Phase Sep.
Phase Sep.
+ Skin
Phase Sep.
Clear
Garbage
Figure 3: Example classifications and misclassifications for each category (Experiment 2).