Transcript Slide 1
Leaf Classification from
Local Boundary Analysis
Anne Jorstad
AMSC 664
University of Maryland
Spring 2008
Final Report
Advisor: Dr. David Jacobs, Computer Science
1
Background
• Electronic Field Guide for Plants
2
Background
• Current System:
– Inner-Distance Shape Context (IDSC)
• Measures the shortest distance between two points on a
path contained entirely within a figure
• Good for detecting similarities between deformable
structures
3
Background
• Current System:
– All shape information is compared at a global level,
no specific consideration of edge types
Cephalanthus occidentalis
(smooth boundary)
Carpinus caroliniana
(serrated boundary)
4
Problem Statement
Use local boundary information to make
classification decisions that complement the
existing system.
5
The Algorithm
• Input:
• Capture
boundary curve:
6
The Algorithm: Wavelets
• Discrete wavelet transform
– In: vector of points
– Out: two vectors, each half original length
• Approximation coefficients:
– general spatial information
• Detail coefficients:
– local detail information
– Repeat for multiple scales
7
The Algorithm: Wavelets
• Model leaf by its detail coefficients
over several scales
Input
Approximations,
continually subtracting
out detail information
8
The Algorithm: Data
• Forget leaves:
– Each boundary point:
• Lose one degree of freedom in preserving rotation invariance
– For 3 wavelet scales, leaf is ~2000 5-D points
• Combine data for all leaves:
– #leaves x ~2000 5-D points
• Group all points into meaningful clusters
9
The Algorithm: Clustering
• Goal: Sort points into “buckets” to get a
unique distribution for each leaf species
• K-Means Clustering:
group all points into 36
representative clusters
10
The Algorithm: Distribution
Comparison
• Distribution of individual leaf’s 2000
points over the 36 clusters represents leaf
(a)
(b)
(c)
Leaf image and corresponding histogram for (a) Corylus americana,
(b) Corylus americana, different example, (c) Asimina triloba
11
The Algorithm: Distribution
Comparison
• Compare distributions between leaves using the
chi-squared distance:
where
• Smallest distance defines best match
– New leaf is assigned the species of the closest match
12
Validation
• Training data: 20 species, 10 examples of each
→ 200 leaves
10 serrated species
10 smooth species
13
Validation
• Test data: same 20 species, 5 new examples of each
• Nearest-Neighbor Classification
• Species classification: 46% correct
• Serration classification: 100% correct
– closest match was to species with appropriate serration
14
Validation
• Test data: same 20 species, 5 new examples of each
• Nearest-Neighbor Classification
Local serration
information IS
being captured!
• Species classification: 46% correct
• Serration classification: 100% correct
– closest match was to species with appropriate serration
15
Combining Results
• Original IDSC results on same data set:
– Species correct: 62%
– Serration correct if species wrong: 53%
• No better than chance
• How to combine wavelet distances with IDSC
distances?
16
Combining Results
• Given
and
• Want to find:
17
Naïve Bayes Classification
• From Bayes’ Rule:
• Can now calculate all relevant probabilities
from training data
18
Naïve Bayes Classification
• Wavelet distances → binary serration value
– Add small linear smoothing term
• IDSC distances → species ranked in order
from nearest to farthest
– Add Gaussian smoothing term
19
Validation Results
• Test on same 20 species, 5 examples of each
• Adding serration information has improved
overall classification results!
20
Full Data Set
• 245 species, 7481 leaves
• Binary serration assignment no longer makes
sense:
21
Linear Optimization
• Train over previous
training set
% correct
• Find best linear weighting of distances:
alpha
22
Full Data Set
• Nearest-Neighbor Classification over all
7481 leaves
– Wavelet alone: 20% correct
– IDSC alone:
54% correct
– Combined:
64% correct
23
In Practice
• Electronic field guide displays top 5, 10 or 20
matches
• Calculate correct % in top n matches,
for n = 1, …, 20
24
% correct
In Practice
# matches considered
25
In Practice
• Need results in near real-time
– Otherwise no benefit over paper field guides
• Running time
– Preprocessing:
(several hours)
• Determine cluster centers
• Determine distributions for each leaf
– On the spot:
(0.92 seconds)
• Calculate single distribution
• Compare to all distributions in system
26
Conclusions
• Wavelets do capture local serration
information
• Wavelet + IDSC classification does a better
overall job than the original IDSC alone
• Calculations can be done in real time to make
the system realistic to use
27
References
•
Gaurav Agarwal, Haibin Ling, David Jacobs, Sameer Shirdhonkar, W. John Kress, Rusty Russell, Peter
Belhumeur, Nandan Dixit, Steve Feiner, Dhruv Mahajan, Kalyan Sunkavalli, Ravi Ramamoorthi, Sean
White. “First Steps Toward an Electronic Field Guide for Plants”. Taxon, vol. 55, no. 3, Aug. 2006.
•
Cene C.-H. Chuang, C.-C. Jay Kuo. “Wavelet Descriptor of Planar Curves: Theory and Applications”.
IEEE Transactions of Image Processing, Vol. 5, No. 1, January 1996.
•
Pedro F. Felzenszwalb, Jushua D. Schwartz. “Hierarchical Matching of Deformable Shapes”. IEEE
Conference on Computer Vision and Pattern Recognition, 2007.
•
Haibin Ling, David Jacobs. “Using the Inner-Distance for Classification of Articulated Shapes.” CVPR,
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
Vol. 2, 2005.
•
Jitendra Malik, Serge Belongie, Thomas Leung, Jainbo Shi. “Contour and Texture Analysis for Image
Segmentation”. International Journal of Computer Vision, vol. 34, no. 1, July 2001.
•
Stephane Mallat. “A Wavelet Tour of Signal Processing”. Academic Press, Chestnut Hill, Massachusetts,
1999.
28