Transcript lecture9

LING 696B: Final thoughts on
nonparametric methods,
Overview of speech
processing
1
For those who are taking the
class for credits


Talk to me some time about what you
are planning to do (term project /
homeworks)
My OH: TR 2:00-3:00
2
Review: inductive inference
from last time
Hypothesis
Prediction
Estimation
Old data
New data
Interpolation/
Smoothing
3
Example from last time:
Transductive SVM

Generalization can also depend on other
new data (see demo)
4
Example from last time:
Gaussian process

Infinite feed-forward neural net:




Hidden: hj(x) = tanh(i vij xi + aj)
Output: ok(x) = j wjk hj(x) + bk
Weights: vij , wjk ; bias: aj , bk
Don’t train the network with backprop: letting
weights be random, then this network
becomes a Gaussian process model


Another non-parametric machine (see demo)
Hidden units can be thought of as complex kernel
extensions -- simple kernels work too
5
Making non-parametric
method more analogy-like


Function approximation: predict yY
from (x1, y1), …, (xN, yN) and a new xX
Building blocks of predictor: kernel
functions K(x1, x2): similarity between
x1 and x2

This is not yet “analogy” -- xRn has no
structure (data points)
6
Making non-parametric
method more analogy-like

What if the input x has some structure?


Example: x1, x2 are sequences
Extension: choose kernel functions
sensitive to the structure of x1, x2 , e.g.
string kernels Kt(x1, x2) = number of
common subsequences of length t

Finding the “right” metric requires some
understanding of the structure

Example: p kernels
K(x1, x2)= i p(x2|h)p(x1|h)p(h)
7
Making non-parametric
method more analogy-like

What if the output y has some
structure?


Make the error function sensitive to the
structure of y (intense computations)
These extensions have made the nonparametric, discriminative methods
(e.g. SVM) “outperform” other ones in
many tasks
8
Making non-parametric
method more analogy-like

What if the output has some structure?



Make the error function sensitive to the
structure of y (intense computations)
These extensions have made the nonparametric, discriminative methods
(e.g. SVM) “outperform” other ones in
many tasks
One exception: speech
9
Final thoughts on nonparametric models

Machine: most non-parametric methods
look like the following
minimize (error + constant*complexity)
10
Final thoughts on nonparametric models


Machine: most non-parametric methods
look like the following
minimize (error + constant*complexity)
People: are often able to generalize
without relying on explicit rules
11
Final thoughts on nonparametric models


Machine: most non-parametric methods
look like the following
minimize (error + constant*complexity)
People: are often able to generalize
without relying on explicit rules


Connectionist propaganda often sells this
Yet unable to control either the error or
complexity
12
Final thoughts on nonparametric models

Why not build explicit similarity/analogy
models with non-parametric methods?
13
Final thoughts on nonparametric models


Why not build explicit similarity/analogy
models with non-parametric methods?
Term project idea: find some
experimental data from literature, and
build a model that “outperforms” neural
nets
14
Final thoughts on nonparametric models



Why not build explicit similarity/analogy
models with non-parametric methods?
Term project idea: find some
experimental data from literature, and
build a model that “outperforms” neural
nets
Maybe “outperform” isn’t the right goal

How does the model help us understand
people?
15
Moving on to phonology

“these problems do not arise when
phonetic transcription is understood in
the terms outlined above, that is, not as
a direct record of the speech signal, but
rather as a representation of what the
speaker of a language takes to be the
phonetic properties of an utterance…”
-- SPE p. 294
16
Alternative to feature/
segment representations?

Exemplar people


Yet convincing arguments for real
alternatives are few
Coleman paper: maybe should explore
more “realistic” representations by
looking at acoustics

This is often hard, seen in many years of
research on speech recognition
17
Ladefoged’s experiment



“There was once a young rat named
Arthur, who could never take the
trouble to make up his mind. “
Superimposed with a word “dot”
Where is “dot”?
18
A very quick tour of speech
processing

Dimension reduction: finding basis for
speech signals



Most often used: fourier basis (sinusoids)
Orthogonal v.s. overcomplete basis
Short-time processing assumption:
taking snapshots over time

No perfect snapshots: either loses time or
frequency resolution
19
A zoo of signal representations

LPC/reflection coefficients
20
A zoo of signal representations

Mel-frequency filterbank / cepstra
21
A zoo of signal representations

PLP/RASTA spectra/cepstra for
“linguistics”
22
The perceptual relevance of
distance metrics


People do not information from all
frequency bands to get the linguistic
content
Example: low-pass, high-pass and
band-pass filtered speech
23
Extending distance metric to
sequences

Dynamic time warping




Template-based method
Depends on distance metric between single
frames
Often requires many heuristics (large
literature)
See example
24