CRF-OPT: An Efficient High Quality Conditional Random Field Solver

Download Report

Transcript CRF-OPT: An Efficient High Quality Conditional Random Field Solver

Constrained Optimization for
Validation-Guided Conditional
Random Field Learning
Presented by: Qiang Yang, HKUST
Minmin Chen, Yixin Chen, Michael Brent , Aaron Tenney
Washington University in St. Louis
4/10/2016
KDD 2009
1
Conditional Random Fields
 Conditional Random Fields (Lafferty, McCallum& Pereira 2001)
 Probabilistic model for sequential data segmentation and
labeling
Labeling
Sequence
1
1
1
2
2
A
T
Observation
Sequence
G
C
G
FEATURE
4/10/2016
KDD 2009
2
Applications
 Natural Language Processing
 Lafferty J., McCallum A.& Pereira F. 2001
 Sha F. & Pereira F. 2003
 Sarawagi S. & Cohen W. W. 2005
 Computer Vision
 Sminchisescu C., Kanaujia A. & Metaxas D. 2006
 Vishwanathan S. V. N. et al. 2006
 Bioinformatics
 Culotta A., Kulp D. & McCallum A. 2005
 Gross, S. S. et al. 2007
4/10/2016
KDD 2009
3
Challenges and Related Work
 Overfitting
 Likelihood training  Overfitting
 Model flexibility  Large feature set  Overfitting
 Related Work
 Regularization (Sha 2003, Vail 2007)
 Smoothing methods (Newman 1977, Rosenfeld 1996)
 Regularization with a Gaussian prior works as well as or better
than smoothing methods, but the regularization parameter is
hard to tune
4/10/2016
KDD 2009
4
Motivation of Proposed Work
 Cross-validation is often used to estimate the accuracy
of a classifier and to select models. Generally, the
performance on the validation set is strongly correlated
to the performance of the trained model on the unseen
data
 Constraints prescribing the performance of the trained
model on the validation set are used to guide the
learning process and avoid the model from fitting freely
and tightly to the training data
4/10/2016
KDD 2009
5
Single Training Multiple Validation (STMV)
Framework
 STMV Framework
 Small data set
Training
V1
V2
Testing
V3
 Large data set
Training
4/10/2016
Testing
KDD 2009
V1
V2
V3
6
Constrained Formulation
Original objective : maximize the
log likelihood of the labeling
sequence given the observation
sequence of the training data
 Where
Score of a specific labeling sequence
y for an observation sequence x
Score of the most likely sequence
found by Viterbi under current
model for validation sequence v(j)
Score of real labeling sequence for
validation sequence v(j)
 Constraints prescribing the difference of scores to ensure the model take
consideration of the performance on the validation sets
4/10/2016
KDD 2009
7
Extended Saddle Point (ESP) Theory
 Extended Saddle Point Theory (Wah & Chen 2005)
 Introduces a necessary and sufficient condition on the
Constrained Local Minimum (CLM) of a constrained nonlinear
programming problem in a continuous, discrete, or mixed
space
 Offer several salient advantages over previous constraint
handling theory
• Does not require the constraints to be differentiable or in closed form
• Satisfied over an extended region of penalty values
• Necessary and sufficient
4/10/2016
KDD 2009
8
Extended Saddle Point (ESP) Search
Algorithm
 Extended Saddle Point Search Algorithm
 Transform the constrained formulation into a
penalty form
where
 Outer loop updates the extended penalty values
 Inner loop minimizes
 Challenges
 Efficient calculation of the gradient of the
penalty function
• The first term of the constraint is determined by the most likely sequence
found by the Viterbi Algorithm.
• Change of the parameter W can result in a very different sequence 
non-differentiable
4/10/2016
KDD 2009
9
Approximation of the discontinuous
Gradient
Highest prob. Of reaching
this state at time t
f(s1, s2, X)
4/10/2016
KDD 2009
10
Experimental Results : Gene Prediction
 Find out the protein coding regions and associated
components
 Apply our STMV framework and ESP search algorithm
on CONTRAST, a state-of-the-art gene predictor
 Data set:
 Fruit fly genome, 27,463 genes, evenly divided into 4 sets
 Feature set includes around 33,000 features
 Compare performance to original CONTRAST, and
CONTRAST with regularization
 Gene, exon, nucleotide level sensitivity and specificity
4/10/2016
KDD 2009
11
Experimental Results : Gene Prediction(cont)
 The performance of original CRF, regularized CRF(CRFr) and constrained
CRF(CRFc) on CONTRAST
4/10/2016
KDD 2009
12
Experimental Result: Stock Price Prediction
 Predict if tomorrow’s stock price will raise or fall
comparing to today’s, based on historical data;
 Preprocessing techniques to smooth out the noise in
the raw data
 Date set:
 1741 stocks from NASDAQ and NYSE
 Each contains stock prices from 2002-02-01 to 2007-09-12
 Feature set includes 2,000 features
 Predict on day T+1:
 Training Sequence : day 1 ~ T
 Validation Sequence: day T-V+1 ~ T (V = 100)
4/10/2016
KDD 2009
13
Experimental Result: Stock Price
Prediction(cont)
4/10/2016
KDD 2009
14
Conclusion
 A Single Training Multiple Validation (STMV) framework
 Integrate validation into the training process by modeling the
validation quality as constraints in the problem formulation
 Effectively avoid overfitting of CRF models
 Approximation Scheme
 Efficiently approximate the discontinuous gradient of our
constrained formulation
 Extended Saddle Point (ESP) search algorithm
 Robustly find a constrained local minimum for our constrained
formulation
4/10/2016
KDD 2009
15
Questions?
 [email protected]
4/10/2016
KDD 2009
16