Quasi-isometric Representation of Three Dimensional

Download Report

Transcript Quasi-isometric Representation of Three Dimensional

Voice Recognition by a
Realistic Model of Biological
Neural Networks
by
Efrat Barak
Supervised by
Karina Odinaev
Igal Raichelgauz
Structure
•
•
•
•
•
Project Objective
The Model
The Classification Process
Results & Analysis
Conclusion
Project Objective
Configure a neural network based
system for voice recognition
The Model
The Main Principle
The readout function recognizes the basin
that the network has converged to, and
classifies the input according to the
indicator of that basin
Correspondence with the Theory of
Attractor Neural Networks
• The system converges to a basin
• The basins are periodic attractors
Correspondence with the LSM theory
• The neural network may be treated as a
liquid
• The readout function receives only the
current state of the liquid and transforms it
to an output signal
• The system can perform several tasks
simultaneously
Neural Network Structure
• 22 Input Neurons
• 135 spiking neurons in a
3x3x15 formation
• LIF model for neurons
behavior
• 20% of the neurons are
inhibitory and 80% of
them are excitatory
• Dynamic synapses
Creating the Stimulus
30 seconds of recorded speech are encoded
into 1 second of spike trains, in the following
methods:
• Time Encoding –
A straight forward
conversion
Creating the Stimulus
• Mel Frequency Cepstral Coefficients (MFCCs)
encoding - In this method the frequency bands
are positioned logarithmically, on the mel scale.
A periodic spikes train is added to the second of
the voice segment.
Performing a Simulation
• A new network is created
• A stimulus of one speech segment is fed
to the network, followed by a periodic
driving force (Repeated for every
combination of segment and frequency).
• The basins are categorized by their activity
vector.
The Classification Process
The Indicators Map
• Nw (b) - The number of segments of the
wanted voice that converged to the basin b.
• NU (b) - The number of segments of the
unwanted voice that converged to the basin
b.
• N (b) - The total number of initials that
converged to the basin b.
The Indicators Map
100
W (b) 
Nw
RA
100
U (b) 
Nu
RB
The indicator of basin b:
W ( b)  U (b)
S ( b) 
N
The Indicators Map
Examples:
The Indicators Map
The Indicators Map
Indicators’ Average:
The Classification Process
Tuning
Step 1. Select frequencies
Tuning
Preceding to Step 2.
Why do we need a threshold?
Tuning
Step 2. Determine the threshold
The Classification Process
Results – Amplitude Encoded Input
Input Examples
Wanted Voice
Unwanted Voice
Results – Amplitude Encoded Input
Results of a verification test
Results – Amplitude Encoded Input
Results of a Classification Test
Input
Classified as
Our Classification
True Classification
Wanted
Wanted
71%
100%
Wanted
Unwanted
29%
0%
Unwanted
Wanted
55.9%
0%
Unwanted
Unwanted
44.1%
100%
Results – Amplitude Encoded Input
Results of Classification by Two Different
Systems
Input
Classified
as
System 1
System 2
True
Classification
Wanted
Wanted
71%
94%
100%
Wanted
Unwanted
29%
6%
0%
Unwanted
Wanted
55.9%
61.23%
0%
Unwanted
Unwanted
44.1%
38.77%
100%
Results – Amplitude Encoded Input
Cross Classification
Results – Amplitude Encoded Input
Results of cross classification for systems 1 and 2:
50.2% Answered , 49.8% Unanswered
Input
Classified
as
System 1
System 2
Cross
Classification
Wanted
Wanted
71%
94%
97.1%
Wanted
Unwanted
29%
6%
2.9%
Unwanted
Wanted
55.9%
61.23%
66.5%
Unwanted
Unwanted
44.1%
38.77%
33.5%
Results – MFCC Encoded Input
Input Examples
Wanted Voice
Unwanted Voice
Results – MFCC Encoded Input
Results of a classification test
Two sets of new data were used
Classified
as
Test I
Segments:
100 wanted,
400 unwanted
Test II
Segments:
30 wanted,
30 unwanted
Wanted
Wanted (Hit)
87%
86.8%
Wanted
Unwanted (Miss-Hit)
13%
13.2%
Unwanted
Wanted (False Alarm)
55.3%
45%
Unwanted
Unwanted (Hit)
44.7%
55%
True
Classification
Results – MFCC Encoded Input
Input
Classification
f=18Hz,
th=0.3
f=18Hz,
th=0
f=18Hz,
th=-0.12
f=18Hz,
th=-0.2
Data set 3
Segments:
Wanted (Hit)
58%
87%
96%
100%
100 wanted
Unwanted
(Miss-Hit)
42%
13%
4%
0%
400 unwanted
Wanted
(False Alarm)
32.2%
55.3%
77.5%
93.75%
Unwanted (Hit)
67.8%
44.7%
22.5%
6.25%
Data set 4
Segments:
Wanted (Hit)
47.3%
86.8%
97%
100%
30 wanted
Unwanted
(Miss-Hit)
52.7%
13.2%
2.6%
0%
30 unwanted
Wanted
(False Alarm)
17.5%
45%
82.5%
92.5%
Unwanted (Hit)
82.5%
55%
17%
7%
Basins Creation Pattern
(a) 324 initials
(b) 100 initials
(c) 60 initials
Conclusion
• A system for voice recognition, based
on neuro-computations, was designed
• The system succeeded in recognizing
the wanted voice when the input was
encoded by its amplitude.
Conclusion
• The MFCC method yielded very different
inputs, therefore the ability of the system
to recognize such input was proven
partially.
• The system’s stability was proved
Suggestions for Future Projects
• Prepare the system for various types of
inputs
• Perform automatic tuning by using
statistical tools
• Prove that the system can perform several
tasks simultaneously
THE
END