05-yorku-brain-recognition-poster

Download Report

Transcript 05-yorku-brain-recognition-poster

Computational Video Group
From recognition in brain
to recognition in perceptual vision systems.
Case study: face in video.
Example: identifying computer users with low-resolution webcams.
Dmitry Gorodnichy and Gilles Bessens
http://iit-iti.nrc-cnrc.gc.ca
http://synapse.vit.iit.nrc.ca (www.perceptual-vision.com)
CVR Conference on Computational Vision in Neural and Machine Systems, York University, Toronto, Ontario, Canada. June 15 - 18, 2005
What we want?
?
Why to bother?
100
80
60
By humans
40
By computers
20
0
In
In
photos
video
Face recognition systems performance
(from NATO Biometrics workshop, Ottawa, Oct. 2004)
–Lots of $$$ already spent on face
recognition to video data…
– Still computers fail…
– And still Face Recognition Grand
Challenge (www.frvt.org) is seen:
“in making the video data of better
quality”… instead of …
developing approaches which
can deal with low-quality data
Wrong approach - wrong results
ICAO-conformed passport photograph
(presently used for forensic identification)
Image-based biometrics modalities
Photographic facial data and video-based
facial data are two different modalities:
 different nature of data
 different biometrics
 different approaches
 different testing benchmarks
In video: faces are meant to be of low
quality and resolution.
Humans recognize faces on TV of 12
pixel between the eyes all the time
Images from surveillance cameras (of 11/9 hijackers) and TV.
NB: VCD is 320x240 pixels
Another application: Seeing computers
Computers which can see: Users with accessibility needs (e.g. residents of the
SCO Health Center in Ottawa) will benefit the most from those. But other
users would benefit too.
Seeing tasks:
1. Where - to see where is a user: {x,y,z…}
2. What - to see what is user doing: {actions}
3. Who - to see who is the user: {names}
binary event
PVS
xy,za,bg
recognition /
memorization
ON
OFF
Unknown User!
monitor
Our goal: to built Perceptual Vision Systems which can do all
three tasks.
Where we are now?
Precision & convenience of tracking the
convex-shape nose feature allows one to
use nose as mouse (or joystick handle)
Copyright S. A. LA NACION 2003. Todos los derechos reservados.
image
Motion,colour,edges,Haar-wavelets 
nose search box: x,y,width,height
Convex-shape template matching 
nose tip detection: I,J (pixel precision)
Integration over continuous intensity 
X,Y (sub-pixel pixel precision)
(X,Y)
Rating by Planeta Digital
(Aug. 2003)
Keys to resolving recognition problem
To understand how human brain does it
• 12 pixels between the eyes to be sufficient !
• Main three features of human vision recognition system:
1) Efficient visual attention mechanisms
2) Accumulation of data in time
3) Efficient neuro-associative mechanisms
• Main three neuro-associative principles:
1. Non-linear processing
2. Massively distributed collective decision making
3. Synaptic plasticity
a) to accumulate learning data in time by adjusting synapse,
b) to associate a visual stimulus to a semantic meaning based on the
computed synaptic values
Lessons from biological vision
Saliency based localization and rectification
- implemented
Accumulation over time and space
- implemented
Local brightness adjustment
- implemented
Recognition decision at time t depends
on our recognition decision at time t+1
- implemented
Lessons from biological memory
• Brain stores information using synapses
connecting the neurons.
• In brain: 1010 to 1013 interconnected neurons
• Neurons are either in rest or activated, depending
on values of other neurons Yj and the strength of
synaptic connections:
Yi={+1,-1}
• Brain is a network of “binary” neurons evolving in
time from initial state (e.g. stimulus coming from
retina) until it reaches a stable state – attractor.
• Attractors are our memories!
Refs: Hebb’49, Little’74,’78, Willshaw’71
From visual image  to saying name
From neuro-biological prospective, memorization and recognition
are two stages of the associative process:
From receptor stimulus R  to effector stimulus E
In brain
Main associative principle
Stimulus neuron
“Dmitry”
Response neuron
Xi: {+1 or –1}
Yj: {+1 or –1}
Synaptic strength:
-1 < Cij < +1
In computer
Main question of learning: How to update synaptic weights Cij as f(X,Y) ?
Learning process
Learning rules: From biologically plausible to mathematically justifiable
Cijm  Cijm1  Cijm
1 m m
Hebb (correlation learning): C  Vi V j is of form
N
m
Better however is of form: Cij  aF (Cijm1 ,Vi m ,V jm )
•
•
• Should be of form: Cijm  aF (C m1 ,V m )
• Widrow-Hoff’s (delta) rule:
m
ij
Cijm  aF (Vi m ,V jm )
• We use Projection Learning rule:
It is most preferable, as it is:
- both incremental and takes into account relevance of training stimuli and attributes;
- guaranteed to converge (obtained from stability condition Vm =CVm);
- fast in both memorization and recognition; also called pseudo-inverse rule: C=VV+
Refs: Amari’71,’77, Kohonen’72, Personnaz’85, Kanter-Sompolinsky’86,Gorodnichy‘95-’99
Steps of video-based recognition
1. Face-looking regions are detected using rapid classifiers.
2. They are verified to have skin colour and not to be static.
3. Face rotation is detected and rotated, eye aligned and resampled to
24
12-pixels-between-the-eyes resolution face is extracted.
4. Extracted face is converted to a binary feature vector (Receptor): Yr
5. This vector is then appended by nametag vector (Effector): V= Y(0)=(Yr,Ye)
6. In memorization: synapses of the network are updated: dCij(V)  V
In recognition: memory recall as attractor is achieved: Y(t*) Y(0)
2. .IOD
12
Recognition process
• Each frame initializes the system to state Y(0) = (01000011…, 0000)
from which associative recall is achieved as a result of convergence
to an attractor Y(t*)= Y(t*+1) = (01000001…, 0010) – as in brain…
• Effector component of attractor (0010) is analyzed. Possible outcomes:
S00 (none of nametag neurons fire), S10 (one fires) and S11 (several fire)
• Final decision is made over several frames:
0000100000
0000000000
0000100000
0000100000
0000100000
0010100000
0000100000
0000100000
(e.g. this is ID=5 in all these cases)
0000100000
0000010000
0000100000
0000100000
0000100000
Tested!
- Using TV programs annotation
- Using IIT-NRC 160x120 facial video database
(one video to memorize, another to recognize)
Perceptual Vision Interface Nouse™
• Evolved from a single demo program to a hands-free perceptual vision
system which can recognize users.
• Uses a 160x120 low-fi webcam to constantly monitor the user’s
identity
• Runs in background (a user
may not even know he is being
watched)
•Integrated with facial tracking
• Provides means for complete
hands-free interaction
From our website: Try friv.exe yourself
- Works with your web-cam or .avi file
- Shows “brain model” synapses as watch (in memorization mode)
- Shows nametag neurons states as your watch a facial video (in recognition mode)
References
• Gorodnichy, D. Video-based framework for face recognition in video. Second Workshop
on Face Processing in Video (FPiV'05) in Proceedings of Second Canadian Conference on
Computer and Robot Vision (CRV'05), pp. 330-338. Victoria, BC, Canada. 9-11 May, 2005.
NRC 48216.
• Gorodnichy, D. Associative neural networks as means for low-resolution video-based
recognition. International Joint Conference on Neural Networks (IJCNN'05). Montreal,
Quebec, Canada. July 31-August 4, 2005. NRC 48217.
• Gorodnichy, D. Projection Learning vs. Correlation Learning: From Pavlov Dogs to Face
Recognition. In Correlation Learnings AI'05 Workshop. May 8, 2005. Victoria, B.C.. NRC
48209.
• Bessens, G., Gorodnichy, D. Towards Building User Seeing Computers. Second Canadian
Conference on Computer and Robot Vision Workshop on Face Processing in Video
(FPiV'05). May 9-11, 2005. Victoria, B.C. NRC 48210.
• Gorodnichy, D. Recognizing Faces in Video Requires Approaches Different from Those
Developed for Face Recognition in Photographs, NATO IST - 044 Workshop on
"Enhancing Information Systems Security through Biometrics". Ottawa, Ontario, Canada.
October 18-20, 2004. NRC 47149.
• Dmitry O. Gorodnichy and Gerhard Roth. Nouse 'Use your nose as a mouse' perceptual
vision technology for hands-free games and interfaces. Image and Vision Computing,
Volume 22, Issue 12 , 1 October 2004, Pages 931-942, 2004. NRC 47140.
• D.O. Gorodnichy, A.M. Reznik. Increasing Attraction of Pseudo-Inverse Autoassociative
Networks, Neural Processing Letters, volume 5, issue 2, pp. 123-127, 1997.