CSE 415 Intro Artificial Intelligence
Download
Report
Transcript CSE 415 Intro Artificial Intelligence
Image Understanding 1
Outline:
Saccade Art: What's going on?
Vision and Intelligence
Motivating applications and ideas
Human vision and illusions
Image representation:
Sampling, Quantization, Thresholding
Stereo vision as an AI problem
Stereograms, Geometry of stereograms, Computing
correspondences
Letting cues vote for hypotheses:
Polar representation of a line, Hough transform
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
1
Saccade Art
A great example is at the San Francisco Exploratorium.
An online example is at
http://www.cs.washington.edu/research/metip/SaccArt/
Block off all but one stripe. Can you still see the effect? How
few stripes are enough for you?
Is it an image? Is there any image?
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
2
Is Vision Part of Intelligence?
25% of the brain by volume is concerned with vision.
This is Brodmann area 17, part of the striate (visual)
cortex.
image from: http://en.wikipedia.org/wiki/Primary_Visual_Cortex
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
3
Vision Requires Intelligence
1.
The image is usually missing relevant information.
2.
The gaps must be filled in by making inferences using knowledge and
context.
3.
How are these inferences made? At many levels: At the retinal level,
relevant structure is extracted from "receptive fields". The brain sends
expectations to the eyes. An intermediate level theory is the Marr
"Primal Sketch". High level: object-recognition.
4.
Active vision is the use of motion, including eye movements, to gather
scene data – the eyes must be controlled by the brain.
5.
"Visual thinking" includes spatial reasoning, navigation, pattern
recognition, associative memory based on visual aspects, and
perception itself.
6.
Specific phenomena that reflect on visual intelligence include:
unawareness of blind spots, visual illusions, pareidolia, hallucination.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
4
Motivation
Allow computer and robots to read books.
Allow mobile robots to navigate using vision.
Support applications in industrial inspection, medical image
analysis, security and surveillance, and remote sensing of the
environment.
Permit computers to recognize users’ faces, fingerprints, and to
track them in various environments.
Provide prostheses for the blind.
Develop artistic intelligence.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
5
Human Vision
25% of brain volume is allocated to visual perception.
Human vision is a parallel & distributed system,
involving 2 eyes, retinal processing, and multiple
layers of processing in the striate cortex.
Most humans are trichromats and they perceive color
in a 3-D color space (except for bichromats and
monochromats).
Vision provides a high-bandwidth input mechanism...
“a picture is worth 1000 words.”
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
6
The Human Eye
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
7
Retina:
Cross
section
(a) schematic
(b) photo
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
8
Densities of Rods and Cones
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
9
Visual Pathway
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
10
Visual Illusions
Help us understand
• the limits of human perception
• the processes of perception
• ways to produce effects in art and
architecture
• possible approaches to artificial
perception
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
11
Visual Illusions
They provide insights about the nature of the human visual
system, helping us understand how it works.
Mueller-Lyer illusion
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
12
Herman Grid Illusion
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
13
Herman Grid Illusion (dark on light)
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
14
Subjective Contour (Triangle)
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
15
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
16
?
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
17
Dalmation Illusion
Camouflage vs Acute Perception
Hyperacute perception: Hallucination
Pareidolia: Attributing significance to patterns
perceived in random arrangements
(UFO, visions in the twilight, etc)
From http://www.gifford.co.uk/~principia/Illusions
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
18
Perception: Stimulus + Expectation
Image understanding is the interpretation of visual
stimuli using context and knowledge.
IU by computer normally begins with digital images
from a camera.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
19
Image Understanding
Outline:
Saccade Art: What's going on?
Vision and Intelligence
Motivating applications and ideas
Human vision and illusions
Image representation:
Sampling, Quantization, Thresholding
Stereo vision as an AI problem
Stereograms, Geometry of stereograms, Computing
correspondences
Letting cues vote for hypotheses:
Polar representation of a line, Hough transform
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
20
Image Representation
Sampling: Number and density of “pixel” measurements
Quantization: Number of levels permitted in pixel values.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
21
Image Representation (cont.)
Sampling: e.g., 4 by 4, square grid, 1 pixel/cm
Quantization: e.g., binary, {0, 1}, 0 = black, 1 = white.
0
0
1
1
1
0
0
0
0
1
1
1
0
0
0
0
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
22
Aliasing due to Under-sampling
Here the apparent frequency is about 1/5 the true frequency.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
23
Shannon/Nyquist Sampling
A band is a range of frequency values. (But sometimes it's
defined as a range of wavelengths.)
A signal that is bandlimited to band B has no frequency
components outside of B.
Now, we'll assume B = [0 , fmax]
Theorem:
If a continuous signal z(t) is bandlimited to B, then it is
possible to sample it at a frequency of fs > 2 fmax such that
z(t) can be perfectly reconstructed from the samples.
2 fmax is called the Nyquist rate.
fs / 2 is called the Nyquist frequency, and depends on fs.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
24
Minimal Sampling
Let P be an oscillating pattern in an image.
To capture P in a sampled representation, you need
(1) The rest of the image to be bandlimited to P's frequency,
and you need either
(2a) two samples per cycle and luck (the phase must be
right),
or
(2b) more than two samples per cycle.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
25
Quantization
Capturing a wide dynamic range of brightness levels or
colors requires fine quantization. Common is 256 levels
of each of red, green and blue.
Segmentation is simplified by having a small number of
levels -- provided foreground and background pixels are
reliably distinguished by their dark or light value.
Grayscale thresholding is typically to used to reduce the
number of quantization levels to 2.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
26
Vision as Inferring Information
from Clues
Deriving 3D structure from 2D info requires
additional information: e.g., constraints.
Deriving global descriptions from local data
requires information fusion, i.e., inference.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
27
Stereo Vision as an AI Problem
Projection from 3 dimension to 2 loses information.
With 2 projections, we can gain back some of that information.
Recovering the missing information is an inference problem.
The missing information is constrained by knowledge about the
real world and assumptions about the scene.
The use of knowledge and assumptions to make inferences is a
standard approach in artificial intelligence.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
28
Stereo and Stereograms
A stereogram can help us understand what
information is required by a human to make
convincing inferences about depth.
This can provide a model for a stereo image
understanding system.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
29
Stereograms
Two-view stereograms:
1. spatially separated left-eye/right-eye pair
(including virtual-reality goggles)
2. superimposed, with separation using color filters.
3. superimposed, with temporal shuttering.
4. superimposed, with separation using polarizing
filters.
Single-view stereograms:
1. Magic-eye pictures with depth-modulated carrier.
2. Wallpaper offering depth effects due to its
periodicity.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
30
Geometry of Stereograms
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
31
Computing Correspondence
Approach 1: Extract features and find a consistent
matching of features in each view.
Approach 2: Directly compute a disparity map,
performing local correlations of the views.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
32
Processing Incomplete and
Uncertain Evidence:
How it's sometimes handled in image
understanding
Case study: the Hough Transform
(rhymes with "rough France dorm")
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
33
Inferring Trends via Voting
Methods
The classical Hough Transform identifies prominent
lines in a scene by letting each edge point vote for the
line(s) it is on.
Voting methods can do well under noisy conditions.
Votes are tallied in an array of accumulators, indexed
by theta and rho (polar parameters of a line).
ρ = x cos θ + y sin θ.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
34
Letting a Point Vote for all the
Lines that Pass Through It
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
35
Hough Transform: Polar
representation
ρ = x cos θ + y sin θ.
(x, y)
ρ
(0, 0)
θ
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
36
Hough Transform (Cont.)
nondirectional, unweighted Hough Transform:
H(θ,ρ) = Σ Σ f(x,y) δ(x cos θ + y sin θ - ρ).
δ(z)
=
1
0
if | z | < 1
otherwise
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
37
H.T. Peak Detection
After vote accumulation:
Apply smoothing to suppress non-dominant peaks.
Extract peaks.
Trace lines in image space to determine endpoints.
CSE 415 -- (c) S. Tanimoto, 2008
Image Understanding I
38