Lip-recognition Software using a Kohonen Algorithm for

Download Report

Transcript Lip-recognition Software using a Kohonen Algorithm for

Demetz Clément
ECE 539
Final Project Fall 2003
Lip-recognition Software using a
Kohonen Algorithm for Image
Compression
Outline
-Problem and motivation
-Data creation: preprocessing
-Kohonen self organization map (SOM)
-Multi-Layer perceptron
-Final results
-Conclusion
-References
Problem
-Problem of voice recognition:
A combined approach
always leads to better
results
For cell phone and PDA: voice
recognition and visual recognition
Lip-recognition
Voice-recognition
Combined
recognition
Problem of lip-recognition
software
-Need high computational power.
-Need to be implement on low-power
systems (PDA, cell phone)
How can we reduce the size of the
information?
Pb: Find a way to implement such an
algorithm with few computation.
Motivation Reduce the size of the
image with a Kohonen Self
organization map
Filter
Image of a
cell phone
digital
camera
Kohonen SOM
Contour of
the mouth
Multi-Layer perceptron
Preprocessing
-Starting with low quality JPEG pictures
-Gradient filters are applied to only keep the contour
of the mouths.
-the opening of the mouth is a relevant input: needs
to follow a certain pattern to pronounce a sound.
JPEG picture
of the mouth
Dark part of the
mouth
Contour of the
dark part
Pb: a contour corresponds to thousands points: it is
still too large to have a low computation time
Kohonen Self Organisation
Map (SOM)
-Idea of using a Kohonen self
organization map to reduce the
information to 12 neurons
-problems:
•Initialization
•Bad stretching or turning of the SOM
Kohonen SOM
We want to keep all
the information:
here we are losing
the left part
-problems:
•Initialization
•Bad stretching or
turning of the SOM
Kohonen SOM
-A way to avoid problems:
•We link the first and the last neurons
1
Neurons (n)
*
Vector n+2
*
Neurons (n)
1
Kohonen process:
next iteration
1
Neurons (n)
*
Vector n+2
Kohonen SOM
-Results of the Kohonen Map: we keep
12 points representing the contour:
Multi-Layer perceptron
-We take the 12 points given by the
SOM as inputs. SOM applied many
times on each picture to create the
database
-3 classes of pictures: only 3 sounds,
because the lip-recognition is a support
to a voice recognition
-Training on 15 pictures, testing on 3
pictures.
Multi-Layer perceptron: Result
Testing
classifica
tion
rate(%)
Training
classificati
on
rate(%)
Layers
alph
a
momentu
m
Configuratio
n
(hidden l)
2
0.1
0.8
10
27
33
2
0.0
5
0.05
10
73.33
93
2
0.0
1
0.01
10
92
100
3
0.1
0.8
10 10
52
76
3
0.0
1
0.01
10 10
100
100
100% Classification rate is obtained
Multi-Layer perceptron: Result
100% Classification rate is obtained
With a 400 iterations training.
Conclusion
• Kohonen SOM reduces the problem to a 12
dimension problem (previously, working on pictures
mean thousands dimension) .
• Multi-Layer perceptron needs a training, but once
it is trained computations are made very fast.
• we can obtain a 100% classification rate with 3
sounds.
•Pb: because of Matlab, transforming picture into
Matrix needs computations. (solution: use another
language more picture processing-oriented)
Some references
-Image compression by Self-Organized kohonen Map
Christophe Amerijckx, Philippe Thissen..IEE Transition on Neural Networks
1998.
http://www.dice.ucl.ac.be/~verleyse/papers/ieeetnn98ca.pdf
-SRAM bitmap shape recognition and sorting Using Neural Networks.
Randall S. Collica. IEEE.
http://www.ibexprocess.com/solutions/wp_SRAM.pdf
-From your lips to your printer.
James Fallow.
-SRAM bitmap shape recognition and sorting using neural networks.
Collica, R.S., Card, J.P., and Martin.
W. ISBN 0894-6507
-A kohonen Neural Network Controlled All-optical router system.
E.E.E Frietman, M.T. Hill, G.D. Khoe.
http://www.ph.tn.tudelft.nl/~ed/pdfs/IJCR.pdf