Transcript Lecture1
Advanced Pattern Recognition
Lecture 1
Spring 2007
[1] J.Shawe-Taylor, N.Christianini,
”Kernel methods for Pattern Analysis”
Cambridge University Press, 2004.
[2] B. Schőlkopf, A.Smola,
”Learning with kernels”, MIT Press, 2002.
[3] www.kernel-methods.net
[4] Journal papers, tutorials.
• Pattern Recognition is a field of Computational
Intelligence, where predefined form of an input signal
is searched or similarities between the signal forms are
studied.
• The input signal can be from an electric
measurement or e.g. text documents.
• Computational Intelligence includes methods groups
like Artificial Intelligence, Pattern Recognition, Fuzzy
Logic, Genetic Algorithms, Neural Networks, etc.
• Application fields include Image Analysis, Speech
Recognition, medical signal and DNA sequence
analysis.
Watanabe:
•Traditionally Pattern Recognition (PR) is divided
into statistical pattern recognition and structural
pattern recognition.
• In statistical PR signal statistics are needed for
recognition.
• In structural (syntactical) PR a pattern is described
by a grammar for structural elements.
PR applications
PR applications (cont’d)
Examples
• Handwriting recognition on PDA.
• Structural PR: recognition of hanzi.
• Cluster analysis (area, length, etc.).
How many clusters?
• Forest photo segmentation.
Example: Face detection
Example: Segmentation for multiple sclerosis
Salmon or Sea bass?
Overfitting problem.
Non-linear decision boundary
Planetary data:
3
T is period of convolution (years), and R is radius of orbit
The quantity R3/T2 remains the same.
Example: Planetary data (1)
T2
Const
3
R
log(T)
or
2 log T 3 log R C *
log(R)
Example: Planetary data (2)
y2/b2
y
x
x2/a2
The artificial planetary data lying on an ellipse in two
dimensions and the same data represented using features
x2 and y2 showing a linear relation:
x2 y2
2 1
2
a
b
We will define pattern recognition as a function
(pattern function) for which
f(x) = 0
(1)
For example, for the planetary data
f(R,T) = R3T2=0
Zero in ideal case, in practical cases f(x) not equal to
zero:
f(D,P) = R3T20
Let’s assume that we have a function g(x), which
predicts output (i.e. class where a pattern belongs to)
for input x.
Let’s also assume that we have a training set where we
know correct output g (i.e. classification).
The training set is formed by pairs (x,y). The function
f can be defined as
f(x,y) = L(g(x), y)=0
(2)
where g is prediction function, and L:XYR+ is a
loss function for which the volume is 0, when
predicted class (g(x)) and the correct class (y) are
equal.
In practical cases relation f(x,y)=0 is not exact but we
have to accept approximation f(x,y) 0.
Note 1. If (2) is exactly valid for a training set, here is
a risk of overfitting. It means that we tune the
recognition for that set at does not work well in
general.
In statistical sense, this means that Ef(x) 0,
where the E is the expectance value.
Definition A. Pattern Analysis Algorithm
takes as input a finite set of examples from the source
of data to be analyzed. Its output is either an indication
that no patterns detectable in the data, or a positive
pattern function f that the algorithm assert satisfies
Ef(x) = 0
(5)
Pattern Analysis Algorithm should be
• Computationally efficient, it is possible to use large
data sets;
• Robust, must be able to handle noisy data and
identify approximate patterns;
• Statistically stable, the output of the algorithm
should not be depending on the data sets used.
.
• It would be desirable to use linear functions. In
practice, the data is often not separable by linear
functions.
• In Kernel Methods we use linear feature space by
kernels without doing actual transform to the feature
space.
Example
( x1 , x2 ) ( z1 , z2 , z3 ) x12 , 2 x1 x2 , x22
Inner product is
( x), (u ) x12 , 2 x1 x2 , x22 , u 12 , 2u1u2 , u22
x12u12 2 x1 x2u1u2 x22u22 x1u1 x2u2 ( x, u ) 2
2