hayhoe-talk1

Download Report

Transcript hayhoe-talk1

Adaptive Control of Gaze and
Attention
Mary Hayhoe
University of Texas at Austin
Jelena Jovancevic
University of Rochester
Brian Sullivan
University of Texas at Austin
Selecting information from visual scenes
What controls the selection process?
Fundamental Constraints
Acuity is limited.
High acuity only in central retina.
Attention is limited.
Not all information in the image can be processed.
Visual Working Memory is limited.
Only a limited amount of information can be retained
across gaze positions.
Neural Circuitry for Saccades
planning movements
target selection
saccade decision
inhibits SC
saccade command
signals to muscles
Saliency and Attentional Capture
Image properties eg contrast, edges, chromatic saliency can
account for some fixations when viewing images of scenes
(eg Itti & Koch, 2001; Parkhurst & Neibur, 2003; Mannan et al, 1997).
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Saliency is computed from the image using feature maps (color, intensity,
orientation) at different spatial scales, filtered with a center-surround
mechanism, and then summed. Gaze goes to the peak.
From Itti & Koch (2000).
Attentional Capture
Certain stimuli thought to capture attention or gaze in a
bottom-up manner, by interrupting ongoing visual tasks.
(eg sudden onsets, moving stimuli, etc Theeuwes et al, 2001 etc )
This is conceptually similar to the idea of salience.
Limitations of Saliency Models
Will this work in natural vision?
Important information may not be salient eg an irregularity
in the sidewalk.
Salient information may not be important - eg retinal image
transients from eye/body movements.
Doesn’t account for many observed fixations, especially in
natural behavior - previous lecture.
(Direct comparisons: Rothkopf et al 2007, Stirk & Underwood, 2007)
Need to Study Natural Behavior
Viewing pictures of scenes is different from acting within scenes.
Heading
Obstacle avoidance
Foot placement
Dynamic Environments
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
The Problem
Any selective perceptual system must choose
what to select, and when to select it.
How is this done given that the natural world is
unpredictable? (The “initial access” problem,
Ullman, 1984)
Answer - it’s not all that unpredictable and we’re really good
at learning it.
Looming stimuli seem like good candidates for bottom-up
attentional capture (Regan & Gray, 200; Franceroni & Simons,2003).
Is bottom up capture effective in natural environments?
Human Gaze Distribution when Walking
• Experimental Question:
How sensitive are
subjects to unexpected
salient events?
• General Design:
Subjects walked along a
footpath in a virtual
environment while
avoiding pedestrians.
Do subjects detect
unexpected potential
collisions?
Virtual Walking Environment
Virtual Research V8 Head Mounted
Display with 3rd Tech HiBall Wide
Area motion tracker
V8 optics with ASL501 Video Based
Eye Tracker (Left) and ASL 210
Limbus Tracker (Right)
D&c emily
Limbus
Tracker
Video
Based
Tracker
Virtual Environment
Monument
Bird’s Eye view of the virtual walking environment.
Experimental Protocol
• 1 - Normal Walking: “Avoid the
pedestrians while walking at a normal pace
and staying on the sidewalk.”
Normal walking
• 2 - Added Task: Identical to condition 1.
Additional instruction:” Follow the yellow
pedestrian.”
Follow leader
Probability of fixation
Distribution of Fixations on Pedestrians Over Time
1
0.8
0.6
Normal Walking
0.4
0.2
Follow Leader
0
0-1
1-2
2-3
3-4
4-5
Time since the appearance onscreen (sec)
-Pedestrians fixated most when they first appear
-Fewer fixations on pedestrians in the leader trials
What Happens to Gaze in Response to
an Unexpected Salient Event?
Pedestrians’ paths
Colliding pedestrian
path
•The Unexpected Event: Pedestrians veered onto a
collision course for 1 second (10% frequency). Change occurs
during a saccade.
Does a potential collision evoke a fixation?
Fixation on Collider
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
No Fixation During Collider Period
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
Probability of Fixation During Collision Period
Pedestrians’ paths
Colliding pedestrian path
1
0.8
Probability of fixation
More fixations
on colliders in
normal walking.
Normal Walking
0.6
No Leader
Leader
0.4
0.2
0
Controls
Colliders
Why are colliders fixated?
Small increase in probability of fixating the collider
could be caused
either
by a weak effect of attentional capture
or
by active, top-down search of the
peripheral visual field.
Probability of Fixation During Collision Period
Pedestrians’ paths
Colliding pedestrian path
1
No effect in
Leader
condition
0.8
Probability of fixation
More fixations
on colliders in
normal walking.
Normal Walking
0.6
No Leader
Leader
0.4
Follow Leader
0.2
0
Controls
Colliders
Why are colliders fixated?
Small increase in probability of fixating the collider
could be caused
either
by a weak effect of attentional capture
or
by active, top-down search of the
peripheral visual field.
Failure of collider to attract attention with an added
task (following) suggests that detections result
from active search.
Prior Fixation of Pedestrians Affects
Probability of Collider Fixation
Conditional probabilities
•
Fixated pedestrians may be monitored in periphery, following the first fixation
•
This may increase the probability of fixation of colliders
Other evidence for detection of colliders?
0.3
0.2
0.1
0
Not fixated
Fi xated
Change in the dist. to the Leader (m)
Change in the dist. to the Leader (m)
Do subjects slow down during collider period?
0.3
0.2
0.1
0
No prior fixations With prior fixations
Subjects slow down, but only when they fixate collider. Implies fixation
measures “detection”.
Slowing is greater if not previously fixated. Consistent with peripheral
monitoring of previously fixated pedestrians.
Detecting a Collider Changes Fixation Strategy
Timeof
fixating
normal
pedestrians
following
Sum
Pedestri
an Fi xations
Foll owi ng
a
detection
a Colli
collider
Detection of
of a
der
1
Normal Walking
Fixation durations (s)
0.8
0.6
No Leader
Follow
Leader
Leader
0.4
0.2
0
“Miss”
Not Fixated
“Hit”
Fixated
Longer fixation on pedestrians following a detection of a collider
Effect of collider speed
No Leader
1
Probability of fixation
0.8
Colli ders
0.6
Controls
0.4
0.2
0
Constant
Increased
Coll iders Speed
Colliders are fixated with equal probability whether or not they
increase speed (25%) when they initiate the collision path.
No systematic effects of stimulus properties on fixation.
1
1
0.8
0.4
No Leader
Leader
0.2
Probability of fixation
0.6
0.6
Purple
Red
Green
No Leader
Leader
0.4
0.2
0
0
0-5
Pink
1
0.8
0.8
No Leader
Leader
0.4
0.2
0
3.5-4
4-4.5
4.5-5
Distance to the observer (m)
Probability of fixation
1
0.6
5-10
10-15
15-20
20-25
Degrees of rotation
Pedestrian color
Probability of fixation
Probability of fixation
0.8
0.6
No Leader
0.4
Leader
0.2
0
1-2
2-3
Number of pedestri ans
3-4
Summary
• Subjects fixate pedestrians more when they
first appear in the field of view, perhaps to
predict future path.
• A potential collision can evoke a fixation but
the increase is modest.
• Potential collisions do not evoke fixations in
the leader condition.
• Collider detection increases fixations on
normal pedestrians.
Subjects rely on active search to detect potentially
hazardous events like collisions, rather than reacting
to bottom-up, looming signals (attentional capture).
To make a top-down system work, Subjects need to
learn statistics of environmental events and distribute
gaze/attention based on these expectations.
Possible reservation…
Perhaps looming robots not similar enough to
real pedestrians to evoke a bottom-up
response.
Walking -Real World
• Experimental question:
Do subjects learn to deploy gaze
in response to the statistics of
environmental events?
Experimental Setup
A subject wearing the ASL Mobile Eye
System components: Head mounted
optics (76g), Color scene camera,
Modified DVCR recorder, Eye Vision
Software, PC Pentium 4, 2.8GHz
processor
Experimental Design (ctd)
• Occasionally some pedestrians veered on a collision
course with the subject (for approx. 1 sec)
• 3 types of pedestrians:
Trial 1: Rogue pedestrian - always collides
Safe pedestrian - never collides
Unpredictable pedestrian - collides 50% of time
Trial 2: Rogue
Safe
Safe
Rogue
Unpredictable - remains same
Fixation on Collider
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
Effect of Collision Probability
Probability of fixating increased with higher collision probability.
(Probability is computed during period in the field of view, not just
collision interval.)
Detecting Collisions: proactive or reactive?
• Probability of fixating risky pedestrian similar, whether or
not he/she actually collides on that trial.
Almost all of the fixations on the Rogue were
made before the collision path onset (92%).
Thus gaze, and attention are anticipatory.
Effect of Experience
1
Probability of fixation
Probability of fixation
1
Pedestrian fixations after conflicting experience
(Trial 2)
Pedestrian fixations with no prior experience
(Trial 1)
0.8
0.6
0.4
0.2
0
0.8
0.6
0.4
0.2
0
Safe
Rogue
Safe (previously Rogue)
Safe and Rogue pedestrians interchange roles.
Rogue (previously Safe)
Learning to Adjust Gaze
N=5
• Changes in fixation behavior fairly fast, happen over 4-5
encounters (Fixations on Rogue get longer, on Safe shorter)
Shorter Latencies for Rogue Fixations
• Rogues are fixated earlier after they appear in the field of view.
This change is also rapid.
Effect of Behavioral Relevance
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Fixations on all pedestrians go down when pedestrians STOP instead
of COLLIDING.
STOPPING and COLLIDING should have comparable salience.
Note the the Safe pedestrians behave identically in both conditions only the Rogue changes behavior.
Summary
• Fixation probability increases with probability
of a collision path.
• Fixation probability similar whether or not the
pedestrian collides on that encounter.
• Fixations are anticipatory.
• Changes in fixation behavior fairly rapid
(fixations on Rogue get longer, and earlier,
and on Safe shorter, and later)
Neural Substrate for Learning Gaze Patterns
Dopaminergic neurons in basal ganglia signal expected
reward.
Neurons at all levels of saccadic eye movement circuitry
are sensitive to reward. (eg Hikosaka et al, 2000; 2007; Platt &
Glimcher, 1999; Sugrue et al, 2004; Stuphorn et al, 2000 etc)
This provides the neural substrate for learning gaze
patterns in natural behavior, and for modelling these
processes using Reinforcement Learning. (eg Sprague,
Ballard, Robinson, 2007)
Neural Circuitry for Saccades
planning movements
target selection
saccade decision
inhibits SC
saccade command
signals to muscles
R L Modeling of Gaze Control
Walter the Virtual Humanoid
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
Virtual Humanoid
has a small library
of simple visual
behaviors:
– Sidewalk Following
– Picking Up Blocks
– Avoiding Obstacles
Sprague, Ballard, & Robinson TAP (2007)
Each behavior uses a limited, task-relevant
selection of visual information from scene.
Walter’s sequence of fixations
obstacles
litter
sidewalk
Walter learns where/when to direct gaze using reinforcement
learning algorithm.
Conclusions
Subjects must learn the statistical structure of the
world and allocate attention and gaze accordingly.
Control of gaze, and attention, is proactive, not reactive,
and thus is model based.
Anticipatory use of gaze is probably necessary
for much visually guided behavior, because of
visuo-motor delays.
Subjects behave very similarly despite unconstrained
environment and absence of instructions.
Need reinforcement learning models to account for
control of attention and gaze in natural world.
How do subjects perceive unexpected events?
ctd
• Task-based models can do a good job by
learning scene statistics (Real walking: Jovancevic
& Hayhoe, 2007)
• Another solution: attention may be attracted
to deviations from expectations based on
memory representation of scene.
• Hollingworth & Henderson (2002) argue that
elaborate representations of scenes are built up in
long-term memory.
• To detect a change, subjects may compare the
current image with the learnt representation.
• If so, such representations might serve as a basis
for attracting attention to changed regions of
scenes (eg Brockmole & Henderson, 2005).
Thus subjects should be more sensitive to
changes in familiar environments than to
unfamiliar ones because the memory
representation is well-defined.
Overview of the Experiment
• Question: If subjects become familiar with an
environment, are changes more likely to attract
attention? (cf Brockmole & Henderson, 2005).
• Design: Subjects walked along a footpath in a virtual
environment including both stable & changing objects
while avoiding pedestrians.
Virtual Environment
Virtual Environment
MONUMENT
Experimental Setup
V8 optics with ASL501 Video
Based Eye Tracker (Left)
Video Based
Tracker
Virtual Research V8 Head
Mounted Display with 3rd Tech
HiBall Wide Area motion tracker
Object Changes
Stable Objects
Disappearance
Moved Object
Replaced
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
New Object
Procedure
• Two groups, 19 subjects/ group:
– Inexperienced Group: One familiarization trial
– Experienced Group: 19 familiarization laps before the
changes occurred
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
Average gaze duration/object/lap
700
Experienced
650
600
•
Total gaze duration on
changed objects were
much longer after
experience in the
environment.
•
Fixation durations on
stable objects were
almost the same for the
two groups.
550
500
msec
450
400
Inexperienced
No Experience
350
Experienced
300
250
200
150
100
50
0
Stable Objects
Changing Objects
Effects of Different Changes
1100
1000
Experienced
900
800
msec
700
600
Inexperienced
Inexperienced
500
Experienced
400
300
200
100
0
Stable
Replaced
Disappeared
Moved
New
Distribution of gaze
60
60
Inexperienced
Experienced
50
50
40
40
30
30
20
20
10
10
0
0
Ground
Environment
Pedestrians
Changing Obj.
Stable Obj.
Ground
Environment
Pedestrians
Changing Obj.
Stable Obj.
Object fixations account for only a small percentage of
gaze allocation.
Change Blindness
• Probability of being aware of the changes was correlated
with gaze duration on the changing objects (rho=0.59).
• Awareness of the changes was low, suggesting that
fixations are a more sensitive indicator.
• Change blindness in the natural world may be fairly
uncommon, because most scenes are familiar.
• Suggests we learn the structure of natural scenes over
time, and that attention is attracted by deviations from
the normal state.
• These results are consistent with Brockmole & Henderson
(2005) and generalize the result to immersive
environments and long time scales.
• Consistent with Predictive Coding models of cortical
function.
Rao & Ballard, 1999.
Predictive Coding: Input is matched to
stored representation.
Difference signal reveals mis-match
e = I - Ur
I
LGN
UT
r
+
-
Bottom-up input
from retina
U
Cortex
Top-down signal
based on memory
Unmatched residual signal prompts a re-evaluation of
image data and may thereby attract attention.
“Surprise”
• A mechanism that attracts attention and gaze based on
mis-match with a model is similar to the idea of Bayesian
“Surprise” (Itti & Baldi, 2005).
• One question is where the prior comes from. Itti & Baldi
calculate surprise with respect to image changes over a
short time scale. Here we suggest surprise is measured
with respect to a memory representation.
Conclusion
• Familiarity with the visual environment increases the
probability that gaze will be attracted to changes in the
scene.
• A mechanism whereby attention is attracted by deviations
from a learnt representation may serve as a useful adjunct
to task-driven fixations when unexpected events occur in
natural visual environments.
Thank You
Behaviors Compete for Gaze/
Attentional Resources
The probability of fixation is lower for both Safe and Rogue
pedestrians in both the Leader conditions than in the baseline
condition .
Note that all pedestrians are allocated fewer fixations, even the Safe
ones.