Defense Presentation

Download Report

Transcript Defense Presentation

Semi-Supervised Training
for Appearance-Based Statistical Object
Detection Methods
Charles Rosenberg
Thesis Oral
May 10, 2004
Thesis Committee
Martial Hebert, co-chair
Sebastian Thrun, co-chair
Henry Schneiderman
Avrim Blum
Tom Minka, Microsoft Research
1
Motivation: Object Detection
Example eye detections from the Schneiderman detector.
• Modern object detection systems “work”.
• Lots of manually labeled training data required.
• How can we reduce the cost of training data?
2
Approach: Semi-Supervised Training
• Supervised training: costly fully labeled data
• Semi-Supervised training: fully and weakly
labeled data.
• Goal: Develop semi-supervised approach for the
object detection problem and characterize issues.
3
What is Semi-Supervised Training?
• Supervised Training
– Standard training approach
– Training with fully labeled data
• Semi-Supervised Training
– Training with a combination of fully labeled data and
unlabeled or weakly labeled data
• Weakly Labeled Data
– Certain label values unknown
– E.g. object is present, but location and scale unknown
– Labeling is relatively “cheap”
• Unlabeled Data
– No label information known
4
Issues for Object Detection
• What semi-supervised approaches are applicable?
– Ability to handle object detection problem uniqueness.
– Compatibility with existing detector implementations.
• What are the practical concerns?
– Object detector interactions
– Training data issues
– Detector parameter settings
• What kind of performance gain possible?
– How much labeled training data is needed?
5
Contributions
• Devised approach which achieves substantial
performance gains through semi-supervised
training.
• Comprehensive evaluation of semi-supervised
training applied to object detection.
• Detailed characterization and comparison of semisupervised approaches used.
6
Presentation Outline
•
•
•
•
•
•
Introduction
Background
Semi-supervised Training Approach
Analysis: Filter Based Detector
Analysis: Schneiderman Detector
Conclusions and Future Work
7
What is Unique About Object Detection?
• Complex feature set
– high dimensional, continuous with a complex distribution
• Large inherent variation
– lighting, viewpoint, scale, location, etc.
• Many examples per training image
– many negative examples and a very small
number of positive examples.
• Negative examples are free.
• Large class overlap
– the object class is a “subset”
of the clutter class
P(X)
8
Background
• Graph-Based Approaches
– Graph is constructed to represent the labeled and unlabeled data
relationships – construction method important.
– Edges in the graph are weighted according to distance measure.
– Blum, Chawla, ICML 2001. Szummer, Jaakkola, NIPS 2001. Zhu,
Ghahramani, Lafferty, ICML 2003.
• Information Regularization
– explicit about information transferred from P(X) to P(Y|X)
– Szummer, Jaakkola, NIPS 2002; Corduneanu, Jaakkola, UAI 2003.
• Multiple Instance Learning
– Addresses multiple examples per data element
– Dietterich, Lathrop, Lozano-Perez, AI 97. Maron, Lozano-Perez,
NIPS 1998. Zhang, Goldman, NIPS 2001.
• Transduction, other methods…
9
Presentation Outline
•
•
•
•
•
•
Introduction
Background
Semi-supervised Training Approach
Analysis: Filter Based Detector
Analysis: Schneiderman Detector
Conclusions and Future Work
10
Semi-Supervised Training Approaches
• Expectation-Maximization (EM)
– Batch Algorithm
• All data processed each iteration
– Soft Class Assignments
• Likelihood distribution over class labels
• Distribution recomputed each iteration
• Self-Training
– Incremental Algorithm
• Data added to active pool at iteration
– Hard Class Assignments
• Most likely class assigned
• Labels do not change once assigned
11
Semi-Supervised Training with EM
Train initial detector
model with initial
labeled data set.
Run detector on weakly
labeled set and compute most
likely detection.
•Dempster, Laird, Rubin, 1977.
•Nigam, McCallum, Thrun,
Mitchell. 1999.
Repeat for a
fixed number of
iterations
or until
convergence.
Compute expected statistics of
fully labeled examples and weakly
labeled examples weighted by class
likelihoods.
Update the
parameters of the
detection model.
Expectation step
Maximization Step
12
Semi-Supervised Training with Self-Training
Train detector model
with the labeled data set.
Run detector on weakly
labeled set and compute most
likely detection.
Score each detection with the
selection metric.
Repeat until
weakly labeled
data exhausted
for until some
other stopping
criterion.
Select the m best
scoring examples and
add them to the
labeled training set.
Nigam, Ghani, 2000. Moreno, Agaarwal, ICML 2003
13
Self-Training Selection Metrics
• Detector Confidence
– Score = detection confidence
– Intuitively appealing
– Can prove problematic in practice
• Nearest Neighbor (NN) Distance
– Score = minimum distance between detection and labeled
examples
data point score = minimum distance
14
Selection Metric Behavior
Confidence
Metric
= class 1
Nearest-Neighbor (NN)
Metric
= class 2
= unlabeled
15
Selection Metric Behavior
Confidence
Metric
= class 1
Nearest-Neighbor (NN)
Metric
= class 2
= unlabeled
16
Selection Metric Behavior
Confidence
Metric
= class 1
Nearest-Neighbor (NN)
Metric
= class 2
= unlabeled
17
Selection Metric Behavior
Confidence
Metric
= class 1
Nearest-Neighbor (NN)
Metric
= class 2
= unlabeled
18
Selection Metric Behavior
Confidence
Metric
= class 1
Nearest-Neighbor (NN)
Metric
= class 2
= unlabeled
19
Selection Metric Behavior
Confidence
Metric
= class 1
Nearest-Neighbor (NN)
Metric
= class 2
= unlabeled
20
Semi-Supervised Training & Computer Vision
• EM Approaches
– S. Baluja. Probabilistic Modeling for Face Orientation
Discrimination: Learning from Labeled and Unlabeled Data. NIPS
1998.
– R. Fergus, P. Perona, A. Zisserman. Object Class Recognition by
Unsupervised Scale-Invariant Learning. CVPR 2003.
• Self Training
– A. Selinger. Minimally Supervised Acquisition of 3D Recognition
Models from Cluttered Images. CVPR 2001.
• Summary
– Reasonable performance improvements reported
– “One of” experiments
– No insight into issues or general application.
21
Presentation Outline
•
•
•
•
•
•
Introduction
Background
Semi-supervised Training Approach
Analysis: Filter Based Detector
Analysis: Schneiderman Detector
Conclusions and Future Work
22
Filter Based Detector
Clutter GMM
Input Image
xi
Filter
Bank
Feature
Vector
Object GMM
Gaussian
Mixture Models
fi
Mo+Mc
23
Filter Based Detector Overview
• Input Features and Model
– Features = output of 20 filters at each pixel location
– Generative Model = separate Gaussian Mixture Model for object
and clutter class
– A single model is used for all locations on the object
• Detection
– Compute filter responses and likelihood under the object and
clutter models at each pixel location
– “Spatial Model” used to aggregate pixel responses into object
level responses
24
Spatial Model
Training Images
Log Likelihood Ratio
Object Masks
Log Likelihood Ratio
Spatial Model
Example Detection
25
Typical Example Filter Model Detections
Sample Detection Plots
Log Likelihood Ratio Plots
26
Filter Based Detector Overview
• Fully Supervised Training
– fully labeled example = image + pixel mask
– Gaussian Mixture Model parameters trained
– Spatial model trained from pixel masks
• Semi-Supervised Training
– weakly labeled example = image with the object
– Initial model is trained using the fully labeled
object and clutter data
– The spatial model and clutter class model are fixed
once trained with the initial labeled data set.
– EM and self-training variants are evaluated
27
Self-Training Selection Metrics
• Confidence based selection metric
– selection is detector odds ratio
P (Y Object| X )
P (Y Clutter| X )
• Nearest neighbor (NN) selection metric
– selection is distance to closest labeled example
– distance is based on a model of each weakly labeled
example
data point score = minimum distance
28
Filter Based Experiment Details
• Training Data
– 12 images desktop telephone + clutter, view points +/- 90 degrees
– roughly constant scale and lighting conditions
– 96 images clutter only
• Experimental variations
– 12 repetitions with different fully / weakly training data splits
• Testing data
– 12 images, disjoint set, similar imaging conditions
Correct Detection
Incorrect Detection
29
Example Filter Model Results
Labeled Data Only
Self-Training Confidence Metric
Expectation-Maximization
Self-Training NN Metric
30
Single Image Semi-Supervised Results
Labeled Only = 26.7%
Confidence Metric = 34.2%
Expect-Max = 19.2%
1-NN Selection Metric = 47.5%
31
Two Image Semi-Supervised Results
Reference
Close
Labeled Data Only + Near Pair = 52.5%
Near
Far
4-NN Metric + Near Pair = 85.8%
32
Presentation Outline
•
•
•
•
•
•
Introduction
Background
Semi-supervised Training Approach
Analysis: Filter Based Detector
Analysis: Schneiderman Detector
Conclusions and Future Work
33
Example Schneiderman Face Detections
34
Schneiderman Detector Details
Schneiderman 98,00,03,04
Detection Process
Wavelet
Transform
Feature
Construction
Classifier
1 log
Wavelet
Transform
Feature
Search
Search Over
Location + Scale
P ( F1 |o )
P ( F1 |c )
Feature
Selection
 ...
Adaboost
Training Process
35
Schneiderman Detector Training Data
• Fully Supervised Training
– fully labeled examples with
landmark locations
• Semi-Supervised Training
– weakly labeled example =
image containing the object
– initial model is trained using
fully labeled data
– Variants of self-training are
evaluated
36
Self Training Selection Metrics
• Confidence based selection metric
– Classifier output / odds ratio
1 log
P ( F1 |o )
P ( F1 |c )
  2 log
P ( F2 |o )
P ( F2 |c )
 ...   r log
• Nearest Neighbor selection metric
– Preprocessing = high pass filter +
normalized variance
– Mahalanobis distance to closest
labeled example
P ( Fr |o )
P ( Fr |c )
Labeled
Images
Candidate
Image
Score(Wi )  Min j Mah ( g (Wi ), g ( L j ), )
37
Schneiderman Experiment Details
• Training Data
– 231 images from the Feret data set and the web
– Multiple eyes per image = 480 training examples
– 80 synthetic variations – position, scale, orientation
– Native object resolution = 24x16 pixels
– 15,000 non-object examples from clutter images
38
Schneiderman Experiment Details
• Evaluation Metric
– +/- 0.5 object radius and +/- 1 scale octave are correct
– Area under the ROC curve (AUC) performance measure
Detection Rate in Percent
• ROC curve = Receiver Operating Characteristic Curve
• Detection rate vs. false positive count
Number of False Positives
39
Schneiderman Experiment Details
• Experimental Variations
– 5-10 runs with random data splits per experiment
• Experimental Complexity
– Training the detector = one iteration
– One iteration = 12 CPU hours on a 2 GHz class machine
– One run = 10 iterations = 120 CPU hours = 5 CPU days
– One experiment = 10 runs = 50 CPU days
– All experiments took approximately 3 CPU years
• Testing Data
– Separate set of 44 images with 102 examples
40
Example Detection Results
Fully Labeled Data Only
Fully Labeled + Weakly Labeled Data
41
Example Detection Results
Fully Labeled Data Only
Fully Labeled + Weakly Labeled Data
42
When can weakly labeled data help?
Full Data Normalized AUC
Performance vs. Fully Labeled Data Set Size
smooth
saturated
failure
Fully Labeled Training Set Size on a Log Scale
• It can help in the “smooth” regime
• Three regimes of operation: saturated, smooth, failure
43
Performance of Confidence Metric Self-Training
Full Data Normalized AUC
Confidence Metric Self-Training AUC Performance
24
30
34
40
48
Fully Labeled Training Set Size
60
• Improved performance over range of data set sizes.
• Not all improvements significant at 95% level.
44
Performance of NN Metric Self-Training
Full Data Normalized AUC
NN Metric Self-Training AUC Performance
24
30
34
40
48
Fully Labeled Training Set Size
60
• Improved performance over range of data set sizes.
• All improvements significant at 95% level.
45
Base Data Normalized AUC
Base Data Normalized AUC
MSE Metric Changes to Self-Training Behavior
Iteration Number
Confidence Metric Performance
vs. Iteration
Iteration Number
NN Metric Performance
vs. Iteration
NN metric performance trend is level or upwards
46
Example Training Image Progression
0.822
0.822
Confidence Metric
NN Metric
0.770
1
0.867
0.798
2
0.882
47
Example Training Image Progression
0.798
3
0.922
0.745
4
0.931
0.759
5
0.906
48
Training Data Size
Weakly labeled data set size
24
30
34
40
48
60
Fully Labeled Training Set Size
Ratio of Weakly to Fully Labeled Data
How much weakly labeled data is used?
Weakly labeled data set ratio
24
30
34
40
48
60
Fully Labeled Training Set Size
It is relatively constant over initial data set size.
49
Presentation Outline
•
•
•
•
•
•
Introduction
Background
Semi-supervised Training Approach
Analysis: Filter Based Detector
Analysis: Schneiderman Detector
Conclusions and Future Work
50
Contributions
• Devised approach which achieves substantial
performance gains through semi-supervised
training.
• Comprehensive evaluation (3 CPU years) of
semi-supervised training applied to object
detection.
• Detailed characterization and comparison of semisupervised approaches used – much more analysis
and many more details in the thesis.
51
Future Work
• Enabling the use of training images with clutter
for context
– Context priming
• A. Torralba, P. Sinha. ICCV 2001 and A. Torralba, K.
Murphy, W. Freeman, M. Rubin. ICCV 2003.
• Training with weakly labeled data only
– Online robot learning
– Mining the web for object detection
• K. Barnard, D. Forsyth. ICCV 2001.
• K. Barnard, P. Duygulu, N. de Frietas, D. Forsyth. D. Blei.
M. Jordan. JMLR 2003.
52
Conclusions
• Semi-supervised training can be practically
applied to object detection to good effect.
• Self-training approach can substantially
outperform EM.
• Selection metric is crucial for self-training
performance.
53
•••
54
•••
55
Filter Model Results
Algorithm
Single Image
Accuracy
Full Data Set
100.0%
True Location
86.7%
Labeled Only
26.7%
Batch EM
19.2%
Confidence Metric
34.2%
1-NN Metric
47.5%
4-NN / 40-MM Metric
53.3%
Close Pair
Accuracy
100.0%
95.8%
40.8%
35.8%
48.3%
64.2%
69.2%
Near Pair
Accuracy
100.0%
98.3%
52.5%
52.5%
73.3%
82.5%
85.8%
Far Pair
Accuracy
100%
98.3%
50.8%
54.2%
52.5%
70.8%
76.7%
• Key Points
– Batch EM does not provide performance increase
– Self-training provides a performance increase
– 1-NN and 4-NN metrics work better than confidence
– “Near Pair” accuracy is highest
56
Weakly Labeled Point Performance
Does confidence metric self-training improve point performance?
• Yes - over a range of data set sizes.
57
Weakly Labeled Point Performance
Does MSE metric self-training improve point performance?
• Yes – to a significant level over a range of data set sizes.
58
Schneiderman Features
59
Schneiderman Detection Process
60
Sample Schneiderman Face Detections
61
•••
62
Simulation Data
Labeled and Unlabeled Data
Hidden Labels
63
Simulation Data
Nearest Neighbor
Confidence Metric
64
Simulation Data
Model Based
Confidence Metric
65
•••
66
Future Work – Mining the Web
“Clinton” Colors
“Not-Clinton” Colors
Green regions are “Not-Clinton”.
67
Future Work – Mining the Web
“Flag” Colors
“Not-Flag” Colors
Green regions are “Not-Flag”.
68