Transcript c05-I

Eye-Based Interaction in Graphical Systems:
Theory & Practice
Part I
Introduction to the Human Visual System
A: Visual Attention
“When the things are apprehended by the senses,
the number of them that can be attended to at
once is small, `Pluribus intentus, minor est ad
singula sensus' ”
— William James
• Latin translation: “Many filtered into few
•
for perception”
Visual scene inspection is performed
minutatim (piecemeal), not in toto
A.1: Visual Attention—chronological
review
• Qualitative historical background: a
dichotomous theory of attention—the
“what” and “where” of (visual) attention
• Von Helmholtz (ca. 1900): mainly concerned with
eye movements to spatial locations, the “where”,
I.e., attention as overt mechanism (eye movements)
• James (ca. 1900): defined attention mainly in terms
of the “what”, i.e., attention as a more internally
covert mechanism
A.1: Visual Attention—chronological
review (cont’d)
• Broadbent (ca. 1950): defined attention as
“selective filter” from auditory experiments;
generally agreeing with Von Helmholtz’s “where”
• Deutsch and Deutsch (ca. 1960): rejected “selective
filter” in favor of “importance weightings”; generally
corresponding to James’ “what”
• Treisman (ca. 1960): proposed unified theory of
attention—attenuation filter (the “where”) followed
by “dictionary units” (the “what”)
A.1: Visual Attention—chronological
review (cont’d)
• Main debate at this point: is attention
•
•
parallel (the “where”) or serial (the “what”)
in nature?
Gestalt view: recognition is a wholistic
process (e.g., Kanizsa figure)
Theories advanced through early
recordings of eye movements
A.1: Visual Attention—chronological
review (cont’d)
• Yarbus (ca. 1967): demonstrated sequential, but
variable, viewing patterns over particular image
regions (akin to the “what”)
• Noton and Stark (ca. 1970): showed that subjects
tend to fixate identifiable regions of interest,
containing “informative details”; coined term
“scanpath” describing eye movement patterns
• Scanpaths helped cast doubt on the Gestalt
hypothesis
A.1: Visual Attention—chronological
review (cont’d)
Fig.2: Yarbus’ early scanpath
recording:
• trace 1: examine at will
• trace 2: estimate wealth
• trace 3: estimate ages
• trace 4: guess previous activity
• trace 5: remember clothing
• trace 6: remember position
• trace 7: time since last visit
A.1: Visual Attention—chronological
review (cont’d)
• Posner (ca. 1980): proposed attentional “spotlight”,
an overt mechanism independent from eye
movements (akin to the “where”)
• Treisman (ca. 1986): once again unified “what” and
“where” dichotomy by proposing the Feature
Integration Theory (FIT), describing attention as a
“glue” which integrates features at particular
locations to allow wholistic perception
A.1: Visual Attention—chronological
review (cont’d)
• Summary: the “what” and “where”
•
dichotomy provides an intuitive sense of
attentional, foveo-peripheral visual
mechanism
Caution: the “what/where” account is
probably overly simplistic and is but one
theory of visual attention
B: Neurological Substrate of the
Human Visual System (HVS)
• Any theory of visual attention must
•
address the fundamental properties of
early visual mechanisms
Examination of the neurological substrate
provides evidence of limited information
capacity of the visual system—a
physiological reason for an attentional
mechanism
B.1: The Eye
Fig. 3: The eye—“the world’s
worst camera”
• suffers from numerous
optical imperfections...
• ...endowed with several
compensatory
mechanisms
B.1: The Eye (cont’d)
Fig. 4: Ocular optics
B.1: The Eye (cont’d)
• Imperfections:
• spherical abberations
• chromatic abberations
• curvature of field
• Compensations:
• iris—acts as a stop
• focal lens—sharp focus
• curved retina—matches
curvature of field
B.2: The Retina
• Retinal photoreceptors constitute first
•
•
stage of visual perception
Photoreceptors  transducers converting
light energy to electrical impulses (neural
signals)
Photoreceptors are functionally classified
into two types: rods and cones
B.2: The Retina—rods and cones
• Rods: sensitive to dim and achromatic
•
•
light (night vision)
Cones: respond to brighter, chromatic
light (day vision)
Retinal construction: 120M rods, 7M cones
arranged concentrically
B.2: The Retina—cellular makeup
• The retina is composed of 3 main layers of
•
•
different cell types (a 3-layer “sandwich”)
Surprising fact: the retina is “inverted”—
photoreceptors are found in the bottom
layer (furthest away from incoming light)
Connection bundles between layers are
called plexiform or synaptic layers
B.2: The Retina—cellular makeup
(cont’d)
Fig.5: The retinocellular
layers (w.r.t. incoming
light):
• ganglion layer
• inner synaptic
plexiform layer
• inner nuclear layer
• outer synaptic
plexiform layer
• outer layer
B.2: The Retina—cellular makeup
(cont’d)
Fig.5 (cont’d): The neuron:
• all retinal cells are types
of neurons
• certain neurons mimic a
“digital gate”, firing when
activation level exceeds a
threshold
• rods and cones are
specific types of
dendrites
B.2: The Retina—retinogeniculate
organization (from outside in, w.r.t. cortex)
• Outer layer: rods and cones
• Inner layer: horizontal cells, laterally
•
connected to photoreceptors
Ganglion layer: ganglion cells, connected
(indirectly) to horizontal cells, project via
the myelinated pathways, to the Lateral
Geniculate Nuclei (LGN) in the cortex
B.2: The Retina—receptive fields
• Receptive fields: collections of
•
•
interconnected cells within the inner and
ganglion layers
Field organization determines impulse
signature of cells, based on cell types
Cells may depolarize due to light
increments (+) or decrements (-)
B.2: The Retina—receptive fields
(cont’d)
Fig.6: Receptive fields:
• signal profile
resembles a
“Mexican hat”
• receptive field
sizes vary
concentrically
• color-opposing
fields also exist
B.3: Visual Pathways
• Retinal ganglion cells project to the LGN
along two major pathways, distinguished
by morphological cell types:  and  cells
•  cells project to the magnocellular (M-) layers
•  cells project to the parvocellular (P-) layers
• Ganglion cells are functionally classified
by three types: X, Y, and W cells
B.3: Visual Pathways—functional
response of ganglion cells
• X cells: sustained stimulus, location, and
fine detail
• nervate along both M- and P- projections
• Y cells: transient stimulus, coarse
features, and motion
• nervate along only the M-projection
• W cells: coarse features and motion
• project to the Superior Colliculus (SC)
B.3: Visual Pathways (cont’d)
Fig.7: Optic tract and radiations
(visual pathways):
• The LGN is of particular
clinical importance
• M- and P-cellular
projections are clearly
visible under microscope
• Axons from M- and P-layers
of the LGN terminate in
area V1
B.3: Visual Pathways (cont’d)
Characteristics
ganglion size
transmission time
receptive fields
sensitivity to small objects
sensitivity to change in light levels
sensitivity to contrast
sensitivity to motion
color discrimination
Magno
large
fast
large
poor
large
low
high
no
Parvo
small
slow
small
good
small
high
low
yes
Table.1: Functional characteristics of ganglionic
projections
B.4: The Occipital Cortex and
Beyond
Fig.8: The brain
and visual
pathways:
• the cerebral
cortex is
composed of
numerous
regions
classified by
their function
B.4: The Occipital Cortex and
Beyond (cont’d)
• M- and P- pathways terminate in distinct
•
•
layers of cortical area V1
Cortical cells (unlike center-surround
ganglion receptive fields) respond to
orientation-specific stimulus
Pathways emanating from V1 joining
multiple cortical areas involved in vision
are called streams
B.4: The Occipital Cortex and
Beyond—directional selectivity
• Cortical Directional Selectivity (CDS) of
•
•
cells in V1 contributes to motion
perception and control of eye movements
CDS cells establish a motion pathway
from V1 projecting to areas V2 and MT (V5)
In contrast, Retinal Directional Selectivity
(RDS) may not contribute to motion
perception, but is involved in eye
movements
B.4: The Occipital Cortex and
Beyond—cortical cells
• Two consequences of visual system’s
motion-sensitive, single-cell organization:
• due to motion sensitivity, eye movements are never
perfectly still (instead tiny jitter is observed, termed
microsaccade)—if eyes were stabilized, image
would fade!
• due to single-cell organization, representation of
natural images is quite abstract: there is no “retinal
buffer”
B.4: The Occipital Cortex and
Beyond—2 attentional streams
• Dorsal stream:
• V1, V2, MT (V5), MST, Posterior Parietal Cortex
• sensorimotor (motion, location) processing
• the attentional “where”?
• Ventral (temporal) stream:
• V1, V2, V4, Inferotemporal Cortex
• cognitive processing
• the attentional “what”?
B.4: The Occipital Cortex and
Beyond—3 attentional regions
• Posterior Parietal Cortex (dorsal stream):
• disengages attention
• Superior Colliculus (midbrain):
• relocates attention
• Pulvinar (thalamus; colocated with LGN):
• engages, or enhances, attention
C: Visual Perception (with emphasis on foveoperipheral distinction)
• Measurable performance parameters may
•
•
often (but not always!) fall within ranges
predicted by known limitations of the
neurological substrate
Example: visual acuity may be estimated
by knowledge of density and distribution
of the retinal photoreceptors
In general, performance parameters are
obtained empirically
C.1: Spatial Vision
• Main parameters sought: visual acuity,
•
contrast sensitivity
Dimensions of retinal features are
measured in terms of projected scene
onto retina in units of degrees visual
angle,
S
A  2arctan
2D
where S is the object size and D is
distance
C.1: Spatial Vision—visual angle
Fig.9: Visual angle
C.1: Spatial Vision—common visual
angles
Object
thumbnail
sun or moon
US quarter coin
US quarter coin
US quarter coin
Distance
arm’s length
arm’s length
85 m
5 km
Angle subtended
1.5-2 deg
.5 deg
2 deg
1 min
1 sec
Table 2: Common visual angles
C.1: Spatial Vision—retinal regions
• Visual field: 180° horiz.  130° vert.
• Fovea Centralis (foveola): highest acuity
• 1.3° visual angle; 25,000 cones
• Fovea: high acuity (at 5°, acuity drops to 50%)
• 5° visual angle; 100,000 cones
• Macula: within “useful” acuity region (to about 30°)
• 16.7° visual angle; 650,000 cones
• Hardly any rods in the foveal region
C.1: Spatial Vision—visual angle and
receptor distribution
Fig.10: Retinotopic receptor distribution
C.1: Spatial Vision—visual acuity
Fig.11: Visual acuity at
eccentricities and light levels:
• at photopic (day) light levels,
acuity is fairly constant
within central 2°
• acuity drops of linearly to 5°;
drops sharply (exp.) beyond
• at scotopic (night) light
levels, acuity is poor at all
eccentricities
C.1: Spatial Vision—measuring
visual acuity
• Acuity roughly corresponds to foveal
•
receptor distribution in the fovea, but not
necessarily in the periphery
Due to various contributing factors
(synaptic organization and later-stage
neural elements), effective relative visual
acuity is generally measured by
psychophysical experimentation
C.2: Temporal Vision
• Visual response to motion is characterized
•
•
•
by two distinct facts: persistence of vision
(POV) and the phi phenomenon
POV: essentially describes human
temporal sampling rate
Phi: describes threshold above which
humans detect apparent movement
Both facts exploited in media to elicit
motion perception
C.2: Temporal Vision—persistence of
vision
Fig.12: Critical Fusion Frequency:
• stimulus flashing at about
50-60Hz appears steady
• CFF explains why flicker is
not seen when viewing
sequence of still images
• cinema: 24 fps  3 = 72Hz
due to 3-bladed shutter
• TV: 60 fields/sec, interlaced
C.2: Temporal Vision—phi
phenomenon
• Phi phenomenon explains why motion is
•
•
•
perceived in cinema, TV, graphics
Besides necessary flicker rate (60Hz),
illusion of apparent, or stroboscopic,
motion must be maintained
Similar to old-fashioned neon signs with
stationary bulbs
Minimum rate: 16 frames per second
C.2: Temporal Vision—peripheral
motion perception
• Motion perception is not homogeneous
•
across visual field
Sensitivity to target motion decreases with
retinal eccentricity for slow motion...
• higher rate of target motion (e.g., spinning disk) is
needed to match apparent velocity in fovea
• …but, motion is more salient in periphery
than in fovea (easier to detect moving
targets than stationary ones)
C.2: Temporal Vision—peripheral
sensitivity to direction of motion
Fig.13: Threshold isograms for
peripheral rotary movement:
• periphery is twice as
sensitive to horizontalaxis movement as to
vertical-axis movement
• (numbers in diagram
are rates of pointer
movement in rev./min.)
C.3: Color Vision—cone types
• foveal color vision is
•
•
Fig.14: Spectral sensitivity curves
of cone photoreceptors
facilitated by three types of
cone photorecptors
a good deal is known
about foveal color vision,
relatively little is known
about peripheral color
vision
of the 7,000,000 cones,
most are packed tightly
into the central 30° foveal
region
C.3: Color Vision—peripheral color
perception fields
• blue and yellow fields are
•
•
Fig.15: Visual fields for monocular
color vision (right eye)
larger than red and green
fields
most sensitive to blue, up
to 83°; red up to 76°;
green up to 74°
chromatic fields do not
have definite borders,
sensitivity gradually and
irregularly drops off over
15-30° range
C.4: Implications for Design of
Attentional Displays
• Need to consider distinct characteristics
of foveal and peripheral vision, in
particular:
• spatial resolution
• temporal resolution
• luminance / chrominance
• Furthermore, gaze-contingent systems
must match dynamics of human eye
movement
D: Taxonomy and Models of Eye
Movements
• Eye movements are mainly used to
•
reposition the fovea
Five main classes of eye movements:
• saccadic
• vestibular
• smooth pursuit
• physiological nystagmus
• vergence
• (fixations)
• Other types of movements are nonpositional (adaptation, accommodation)
D.1: Extra-Ocular Muscles
Fig.16: Extrinsic muscles of the eyes:
• in general, eyes move within 6 degrees
of freedom (6 muscles)
D.1: Oculomotor Plant
Fig.17: Oculomotor system:
• eye movement signals
emanate from three
main distinct regions:
• occipital cortex (areas
17, 18, 19, 22)
• superior colliculus (SC)
• semicircular canals
(SCC)
D.1: Oculomotor Plant (cont’d)
• Two pertinent observations:
1 eye movement system is, to a large extent, a
feedback circuit
2 controlling cortical regions can be functionally
characterized as:
• voluntary (occipital cortex—areas 17, 18, 19, 22)
• involuntary (superior colliculus, SC)
• reflexive (semicircular canals, SCC)
D.2: Saccades
• Rapid eye movements used to reposition
•
•
•
•
fovea
Voluntary and reflexive
Range in duration from 10ms - 100ms
Effectively blind during transition
Deemed ballistic (pre-programmed) and
stereotyped (reproducible)
D.2: Saccades—modeling
xt  g 0 s t  g 1 s t  1  


g
k 0
s
k t k
Fig.18: Linear moving average filter model:
• st = input (pulse), xt = output (step), gk = filter coefficients
• e.g., Haar filter {1,-1}
D.3: Smooth Pursuits
• Involved when visually tracking a moving
•
•
target
Depending on range of target motion, eyes
are capable of matching target velocity
Pursuit movements are an example of a
control system with built-in negative
feedback
D.3: Smooth Pursuits—modeling
h( s t  x t )  x t  1
Fig.19: Linear, time-invariant filter model:
• st = target position, xt = (desired) eye position, h = filter
• retinal receptors give additive velocity error
D.4: Nystagmus
• Conjugate eye movements characterized by
•
sawtooth-like time course pattern (pursuits
interspersed with saccades)
Two types (virtually indistinguishable):
• Optokinetic: compensation for retinal movement of target
• Vestibular: compensation for head movement
• May be possible to model with combination
of saccade/pursuit filters
D.5: Fixations
• Possibly the most important type of eye
movement for attentional applications
• 90% viewing time is devoted to fixations
• duration: 150ms - 600ms
• Not technically eye movements in their
own right, rather characterized by
miniature eye movements:
• tremor, drift, microsaccades
D.6: Eye Movement Analysis
• Two significant observations:
1 only three types of eye movements are mainly
needed to gain insight into overt localization of
visual attention:
• fixations
• saccades
• smooth pursuits (to a lesser extent)
2 all three signals may be approximated by linear,
time-invariant (LTI) filter systems
D.6: Eye Movement Analysis—
assumptions
• Important point: it is assumed observed
eye movements disclose evidence of overt
visual attention
• it is possible to attend to objects covertly (without
moving eyes)
• Linearity: although practical, this
assumption is an operational
oversimplification of neuronal (non-linear)
systems
D.6: Eye Movement Analysis—goals
• goal of analysis is to locate
regions where signal
average changes abruptly
• fixation end, saccade start
• saccade end, fixation start
• two main approaches:
• summation-based
• differentiation-based
• both approaches rely on
empirical thresholds
Fig.20: Hypothetical eye movement
signal
D.6: Eye Movement Analysis—
denoising
Fig.21: Signal denoising—reduce noise due to:
• eye instability (jitter), or worse, blinks
• removal possible based on device
characteristics (e.g., blink = [0,0])
D.6: Eye Movement Analysis—
summation based
• Dwell-time fixation detection depends on:
• identification of a stationary signal (fixation), and
• size of time window specifying range of duration
(and hence temporal threshold)
• Example: position-variance method:
• determine whether M of N points lie within a certain
distance D of the mean () of the signal
• values M, N, and D are determined empirically
D.6: Eye Movement Analysis—
differentiation based
• Velocity-based saccade/fixation detection:
• calculated velocity (over signal window) is
compared to threshold
• if velocity > threshold then saccade, else fixation
• Example: velocity detection method:
• use short Finite Impulse Response (FIR) filters to
detect saccade (may be possible in real-time)
• assuming symmetrical velocity profile, can extend to
velocity-based prediction
D.6: Eye Movement Analysis (cont’d)
(a) position-variance
(b) velocity-detection
Fig.22: Saccade/fixation detection
D.6: Eye Movement Analysis—
example
Fig.23: FIR filter velocity-detection
method based on idealized
saccade detection:
• 4 conditions on measured
acceleration:
| I1 |  A
• acc. > thresh. A
| I2 |  B
• acc. > thresh. B
Sgn( I 2 )  Sgn( I 1 ) • sign change
Tmin  I 2  I 1  Tmax • duration thresh.
• thresholds derived from
empirical values
D.6: Eye Movement Analysis—
example (cont’d)
• Amplitude thresholds A, B: derived from expected peak
•
saccade velocities: 600°/s
Duration thresholds Tmin, Tmax: derived from expected
saccade duration: 120ms - 300ms
Fig.24: FIR filters for saccade detection