Transcript lecture9
LING 696B: Final thoughts on
nonparametric methods,
Overview of speech
processing
1
For those who are taking the
class for credits
Talk to me some time about what you
are planning to do (term project /
homeworks)
My OH: TR 2:00-3:00
2
Review: inductive inference
from last time
Hypothesis
Prediction
Estimation
Old data
New data
Interpolation/
Smoothing
3
Example from last time:
Transductive SVM
Generalization can also depend on other
new data (see demo)
4
Example from last time:
Gaussian process
Infinite feed-forward neural net:
Hidden: hj(x) = tanh(i vij xi + aj)
Output: ok(x) = j wjk hj(x) + bk
Weights: vij , wjk ; bias: aj , bk
Don’t train the network with backprop: letting
weights be random, then this network
becomes a Gaussian process model
Another non-parametric machine (see demo)
Hidden units can be thought of as complex kernel
extensions -- simple kernels work too
5
Making non-parametric
method more analogy-like
Function approximation: predict yY
from (x1, y1), …, (xN, yN) and a new xX
Building blocks of predictor: kernel
functions K(x1, x2): similarity between
x1 and x2
This is not yet “analogy” -- xRn has no
structure (data points)
6
Making non-parametric
method more analogy-like
What if the input x has some structure?
Example: x1, x2 are sequences
Extension: choose kernel functions
sensitive to the structure of x1, x2 , e.g.
string kernels Kt(x1, x2) = number of
common subsequences of length t
Finding the “right” metric requires some
understanding of the structure
Example: p kernels
K(x1, x2)= i p(x2|h)p(x1|h)p(h)
7
Making non-parametric
method more analogy-like
What if the output y has some
structure?
Make the error function sensitive to the
structure of y (intense computations)
These extensions have made the nonparametric, discriminative methods
(e.g. SVM) “outperform” other ones in
many tasks
8
Making non-parametric
method more analogy-like
What if the output has some structure?
Make the error function sensitive to the
structure of y (intense computations)
These extensions have made the nonparametric, discriminative methods
(e.g. SVM) “outperform” other ones in
many tasks
One exception: speech
9
Final thoughts on nonparametric models
Machine: most non-parametric methods
look like the following
minimize (error + constant*complexity)
10
Final thoughts on nonparametric models
Machine: most non-parametric methods
look like the following
minimize (error + constant*complexity)
People: are often able to generalize
without relying on explicit rules
11
Final thoughts on nonparametric models
Machine: most non-parametric methods
look like the following
minimize (error + constant*complexity)
People: are often able to generalize
without relying on explicit rules
Connectionist propaganda often sells this
Yet unable to control either the error or
complexity
12
Final thoughts on nonparametric models
Why not build explicit similarity/analogy
models with non-parametric methods?
13
Final thoughts on nonparametric models
Why not build explicit similarity/analogy
models with non-parametric methods?
Term project idea: find some
experimental data from literature, and
build a model that “outperforms” neural
nets
14
Final thoughts on nonparametric models
Why not build explicit similarity/analogy
models with non-parametric methods?
Term project idea: find some
experimental data from literature, and
build a model that “outperforms” neural
nets
Maybe “outperform” isn’t the right goal
How does the model help us understand
people?
15
Moving on to phonology
“these problems do not arise when
phonetic transcription is understood in
the terms outlined above, that is, not as
a direct record of the speech signal, but
rather as a representation of what the
speaker of a language takes to be the
phonetic properties of an utterance…”
-- SPE p. 294
16
Alternative to feature/
segment representations?
Exemplar people
Yet convincing arguments for real
alternatives are few
Coleman paper: maybe should explore
more “realistic” representations by
looking at acoustics
This is often hard, seen in many years of
research on speech recognition
17
Ladefoged’s experiment
“There was once a young rat named
Arthur, who could never take the
trouble to make up his mind. “
Superimposed with a word “dot”
Where is “dot”?
18
A very quick tour of speech
processing
Dimension reduction: finding basis for
speech signals
Most often used: fourier basis (sinusoids)
Orthogonal v.s. overcomplete basis
Short-time processing assumption:
taking snapshots over time
No perfect snapshots: either loses time or
frequency resolution
19
A zoo of signal representations
LPC/reflection coefficients
20
A zoo of signal representations
Mel-frequency filterbank / cepstra
21
A zoo of signal representations
PLP/RASTA spectra/cepstra for
“linguistics”
22
The perceptual relevance of
distance metrics
People do not information from all
frequency bands to get the linguistic
content
Example: low-pass, high-pass and
band-pass filtered speech
23
Extending distance metric to
sequences
Dynamic time warping
Template-based method
Depends on distance metric between single
frames
Often requires many heuristics (large
literature)
See example
24