Transcript Document

Text Independent Speaker Recognition
with Added Noise
Jason Cardillo & Raihan Ali Bashir
April 11, 2005
Problem Definition

Many methods for Text Independent Speech
Recognition (MFCC, Gaussian, Markov
etc)
 Few methods perform well with noisy
speech samples.
Project Goal

Implement Text Independent Speaker
Recognition system robust to noise effect.
 The suggested implementation method is
Recurrent Neural Nets (RNN)
Definition of RNN

Recurrent networks (RNs) are models with bi-directional data flow. While a
feed-forward network propagates data linearly from input to output, RNs also
propagate data from later processing stages to earlier stages.

In a fully recurrent network, every neuron receives inputs from every
other neuron in the network. These networks are not arranged in layers.
Usually only a subset of the neurons receive external inputs in addition
to the inputs from all the other neurons, and another disjunct subset of
neurons report their output externally as well as sending it to all the
neurons. These distinctive inputs and outputs perform the function of
the input and output layers of a feed-forword or simple recurrent
network, and also join all the other neurons in the recurrent processing.
Why RNN for our Purpose?

RNN captures long-term contextual effect
over time
 Therefore can use temporal context to
compensate for missing data.
 Also allows a single net to perform both
imputation and classification.
Corrupted Data Solution
X= missing data at time t;
y = learning rate;
Vjm = indicates recurrent links from a hidden unit to the
missing input;
hid = activation of hidden unit j at time t-1
Input missing values for the next frame through the
recurrent links after a feed-forward pass.
Corrupted Data Solution(cont’d)

English Translation of previous slide:
– Basically fill in missing data with average of all
of the non-corrupted frames.
– Accomplished by factoring sum squared error
between correct targets and RNN output of
each frame
– Back propagate this result through time to fix
corrupted inputs
System Architecture
Performance Testing

Measured by comparing original error of
signal to error remaining after passing
through the system.
References
[1] Parveen,S, Green, P.D.Speech Recognition with Missing
Data using Recurrent Neural Nets. University of
Sheffield Dept of Computer Science.
http://www.dcs.shef.ac.uk/~shahla/nips002.pdf
[2] http://encyclopedia.laborlawtalk.com/Neural_network
[3] Recurrent Neural Networks
http://www.idsia.ch/~juergen/rnn.html
[4] Jain,BJ, Wysotzki F. Learning with Neural Networks in
the Domain of Graphs. Technical University of Berlin.
http://ki.cs.tu-berlin.de/~bjj/fgml04.pdf
Questions