Transcript Motivation

Speech Recognition
Christian Schulze
Design of a speech recognition system which
distinguishes the figures 0 to 9 and the
words yes/no
Applications:
- speech input of telephone numbers for
cellular phones (necessary in cars)
- announcement of the different floors in
the elevator
Problem
Storage of all patterns requires too much
memory
Algorithm which compares respective words
with all stored patterns requires much
calculation power
=> too costly and too expensive
Instead of storing the whole signal storage of
representative features of the signal
=> One possiblity:
formants
What are formants?
Speech consists of different tones which are
combined with each other
Every tone has a special spectrum in the
frequency domain
The maxima of the contour of the spectrum
are called formants
Every tone has its own
representative formants (especially vowels)
Data collection
Recording of 50 analog
samples per word
Division of the signal into parts
of 10 ms length
Calculation of the spectrum using
Discrete Fourier Transformation
figure 8 (500 ms)
Storage of the first two maxima
=> 2-Formants-Recognition-System
Assign the signal
into 1 of 12 classes
(98 X 1) vector
used as
input vector for
training of an
MLP-network
Smoothing of the spectrum
using Cepstral Algorithm
Network and results
MLP using back propagation algorithm
3 hidden layers, each with 12 hidden neurons
Learning rate=0.01, Momentum=0.1
100000 epochs
So far best solution:
learning success rate = 86.11%
testing success rate = 61,67%
=> has to be improved upon