Transcript lecture12

Control of Attention and Gaze in
Natural Environments
Selecting information from visual scenes
What controls the selection process?
Fundamental Constraints
Acuity is spatially restricted.
Attention is limited.
Visual Working Memory is limited.
Humans must select a limited subset of the available
information in the environment.
Only a limited amount of information can be retained.
What controls these processes?
Saliency - bottom-up
Image properties eg contrast, edges, chromatic saliency can
account for some fixations when viewing images of scenes.
Limitations of Saliency Models
Important information may not be salient eg Stop signs in a
cluttered environment.
Salient information may not be important - eg retinal image
transients from eye/body movements.
Doesn’t account for many observed fixations, especially in
natural behavior (eg Land etc).
Need to Study Natural Behavior
Natural vision is not the same as viewing pictures.
Behavioral goals determine what information is needed.
Task structure (often) allows interpretation of role of
fixations.
Top-down factors
Viewing pictures of scenes is different from acting within scenes.
Heading
Obstacle avoidance
Foot placement
To what extent is the selection of information from
scenes determined by cognitive goals (ie top-down)
and how much by the stimulus itself (ie salient regions
- bottom-up effects)?
Modeling Top Down Control
Walter the Virtual Humanoid
Virtual Humanoid
has a small library
of simple visual
behaviors:
– Sidewalk Following
– Picking Up Blocks
– Avoiding Obstacles
Sprague & Ballard (2003)
Each behavior uses a limited, task-relevant
selection of visual information from scene.
This is computationally efficient.
Walter’s sequence of fixations
obstacles
litter
sidewalk
Walter learns where/when to direct gaze using reinforcement
learning algorithm.
What about unexpected events?
Walter the Virtual Humanoid
Sprague & Ballard (VSS 2004)
Dynamic Environments
Computational
load
Unexpected
events
Bottom-up
Expensive
Top-down
Efficient
Can handle
unexpected
salient events
How to deal with
unexpected
events?
Driving Simulator
Gaze distribution is very different for different tasks
Total Fixation Duration(% )
Follow
Follow
+Stop
ROAD
7.02
5.89
CAR
77.1
6.44
42
120 m
SIDE
0.779
INT
14.9
45.4
30 m
BACK
0.191
0.243
Time fixating
Intersection.
The Problem
Any selective perceptual system must
choose the right visual computations, and
when to carry them out.
How do we deal with the unpredictability of the
natural world?
Answer - it’s not all that unpredictable and we’re really good
at learning it.
Human Gaze Distribution when Walking
• Experimental Question:
How sensitive are
subjects to unexpected
salient events?
• General Design:
Subjects walked along a
footpath in a virtual
environment while
avoiding pedestrians.
Do subjects detect
unexpected potential
collisions?
Virtual Walking Environment
Virtual Research V8 Head Mounted
Display with 3rd Tech HiBall Wide
Area motion tracker
V8 optics with ASL501 Video Based
Eye Tracker (Left) and ASL 210
Limbus Tracker (Right)
D&c emily
Limbus
Tracker
Video
Based
Tracker
Virtual Environment
Monument
Bird’s Eye view of the virtual walking environment.
Experimental Protocol
• 1 - Normal Walking: Avoid the pedestrians
while walking at a normal pace and staying
on the sidewalk.
Normal walking
• 2 - Added Task: Identical to condition 1.
However, the additional instruction of
following a yellow pedestrian was given
Follow leader
What Happens to Gaze in Response to
an Unexpected Salient Event?
Pedestrians’ paths
Colliding pedestrian
path
•The Unexpected Event: Pedestrians on a non-colliding
path changed onto a collision course for 1 second (10%
frequency). Change occurs during a saccade.
Does a potential collision evoke a fixation?
Fixation on Collider
No Fixation During Collider Period
Probability of Fixation During Collision Period
Pedestrians’ paths
Colliding pedestrian path
1
No effect in
Leader
condition
0.8
Probability of fixation
More fixations
on colliders in
normal walking.
Normal Walking
0.6
No Leader
Leader
0.4
Follow Leader
0.2
0
Controls
Colliders
Why are colliders fixated?
Small increase in probability of fixating the
collider.
Failure of collider to attract attention with an
added task (following) suggests that
detections result from top-down monitoring.
Detecting a Collider Changes Fixation Strategy
Timeof
fixating
normal
pedestrians
following
Sum
Pedestri
an Fi xations
Foll owi ng
a
detection
a Colli
collider
Detection of
of a
der
1
Normal Walking
Fixation durations (s)
0.8
0.6
No Leader
Follow
Leader
Leader
0.4
0.2
0
“Miss”
Not Fixated
“Hit”
Fixated
Longer fixation on pedestrians following a detection of a collider
Subjects rely on active search to detect potentially
hazardous events like collisions, rather than reacting
to bottom-up, looming signals.
To make a top-down system work, Subjects need to
learn statistics of environmental events and distribute
gaze/attention based on these expectations.
Possible reservations…
Perhaps looming robots not similar enough to
real pedestrians to evoke a bottom-up
response.
Walking -Real World
• Experimental question:
Do subjects learn to deploy gaze in
response to the probability of
environmental events?
• General design: Subjects walked on
an oval path and avoided pedestrians
Experimental Setup
A subject wearing the ASL Mobile Eye
System components: Head mounted
optics (76g), Color scene camera,
Modified DVCR recorder, Eye Vision
Software, PC Pentium 4, 2.8GHz
processor
Experimental Design (ctd)
• Occasionally some pedestrians veered on a collision
course with the subject (for approx. 1 sec)
• 3 types of pedestrians:
Trial 1: Rogue pedestrian - always collides
Safe pedestrian - never collides
Unpredictable pedestrian - collides 50% of time
Trail 2: Rogue
Safe
Safe
Rogue
Unpredictable - remains same
Fixation on Collider
Effect of Collision Probability
• Probability of fixating increased with higher collision
probability.
Detecting Collisions: pro-active or reactive?
• Probability of fixating risky pedestrian similar, whether or
not he/she actually collides on that trial.
Learning to Adjust Gaze
• Changes in fixation behavior fairly fast, happen over 4-5
encounters (Fixations on Rogue get longer, on Safe shorter)
Shorter Latencies for Rogue Fixations
• Rogues are fixated earlier after they appear in the field of view.
This change is also rapid.
Effect of Behavioral Relevance
Fixations on all pedestrians go down when pedestrians STOP instead
of COLLIDING.
STOPPING and COLLIDING should have comparable salience.
Note the the Safe pedestrians behave identically in both conditions only the Rogue changes behavior.
• Fixation probability increases with probability
of a collision.
• Fixation probability similar whether or not the
pedestrian collides on that encounter.
• Changes in fixation behavior fairly rapid
(fixations on Rogue get longer, and earlier,
and on Safe shorter, and later)
Our Experiment:
Virtual environment - want to compare real and virtual.
Do observers learn to deploy visual attention based on
environmental probabilities?
Safe pedestrians - rarely collide
Risky pedestrians - often collide
Rogue pedestrians - collide a lot
Do subjects fixate risky and rogue pedestrians more?
How quickly does this happen?
Conclusions
Subjects must learn the probabilistic structure of the
world and allocate gaze accordingly. That is, gaze control
is model-based.
Subjects behave very similarly despite unconstrained
environment and absence of instructions.
Control of gaze is proactive, not reactive, and thus
is model based.
Anticipatory use of gaze is probably necessary
for much visually guided behavior.
Behaviors Compete for Gaze/
Attentional Resources
The probability of fixation is lower for both Safe and Rogue
pedestrians in both the Leader conditions than in the baseline
condition .
Note that all pedestrians are allocated fewer fixations, even the Safe
ones.
Conclusions
Data consistent with task-driven sampling of visual information
rather than bottom up capture of attention
- No effect of increased salience of collision event.
- Colliders fail to attract gaze in the leader condition,
suggesting the extra task interferes with detection.
Observers rapidly learn to deploy visual attention based on
environmental probabilities.
Such learning is necessary in order to deploy gaze and
attention effectively.
Certain stimuli thought to capture attention bottom-up
(eg Theeuwes et al, 2001 etc )
Looming stimuli seem like good candidates for bottom-up
attentional capture (Regan & Gray, 200; Franceroni & Simons,2003).
No effect of increased collider speed.
Leader
Follow Leader
1
1
0.8
0.8
Colli ders
Controls
0.6
0.4
0.2
0
Constant
Increased
Coll iders Speed
Probability of fixation
Probability of fixation
No Leader
Normal
Walking
0.6
Colli ders
Controls
0.4
0.2
0
Constant
Increased
Collider speed
Greater saliency of the unexpected event does not
increase fixations.
Other evidence for detection of colliders?
0.3
0.2
0.1
0
Not fixated
Fi xated
Change in the dist. to the Leader (m)
Change in the dist. to the Leader (m)
Do subjects slow down during collider period?
0.3
0.2
0.1
0
No prior fixations With prior fixations
Subjects slow down, but only when they fixate collider. Implies fixation
measures “detection”.
Slowing is greater if not previously fixated. Consistent with peripheral
monitoring of previously fixated pedestrians.
Conclusions
• Subjects learn the probabilities of
events in the environment and distribute
gaze accordingly
• The findings from the Leader
manipulation support the claim that
different tasks compete for attention
Effect of Context
Fixations of Safe pedestrian in different contexts
Probability of fixation
1
0.8
0.6
0.4
0.2
0
Safe ped./ Safe
environment
Safe ped./ No prior
experience
Safe ped./ Conflicting
experience
Probability of fixating Safe pedestrian higher in a context of a
riskier environment
Summary
• Direct comparison between real and virtual collisions is
difficult, but colliders are still not reliably fixated.
• Subjects appear to be sensitive to several parameters of the
environment:
– Experience
• Experience with the Rogue pedestrian elevated fixation probabilities of
the Safe pedestrian to 70% (50% wto. exp.)
• Experience with the Safe lead to 80% fixation probability of the Rogue
(89% wto. exp.)
• Experience of Safe carries less weight than the experience of Rogue
Shinoda et al. (2001)
“Follow the car.”
or
“Follow the car and obey
traffic rules.”
Total Fixation Duration(% )
Follow
Follow
+Stop
ROAD
7.02
5.89
Road
CAR
77.1
SIDE
0.779
6.44
42
Car
Roadside
INT
14.9
45.4
BACK
0.191
0.243
Time fixating
Intersection.
Intersection
Detection of signs at intersection results from frequent looks.
120 m
30 m
How well do human subjects detect unexpected
events?
Shinoda et al. (2001)
Detection of briefly presented
Stop signs.
Intersection
P = 1.0
Mid-block
P = 0.3
Greater probability of detection in probable locations
Suggests Ss learn where to attend/look.
What do Humans Do?
Shinoda et al. (2001) found better detection of
unexpected stop signs in a virtual driving
task.
Spatial distribution of fixations
Percentage of Total
Fixation Duration
1
0.8
0.6
No Leader
Leader
0.4
0.2
0
Other
Pedestrians
Leader
0.2
Change in velocity (m/s)
0.15
0.1
0.05
No Leader
Leader
0
-0.05
-0.1
-0.15
-0.2
No prior fixation
With prior fixations