Presentation (PowerPoint File)

Download Report

Transcript Presentation (PowerPoint File)

Graphical Models in Vision.
Alan L. Yuille.
UCLA. Dept. Statistics
The Purpose of Vision.
“To Know What is Where by Looking”.
Aristotle. (384-322 BC).
 Information Processing: receive a signal by
light rays and decode its information.
 Vision appears deceptively simple, but there
is more to Vision than meets the Eye.

Ames Room
Perspective.
What are Humans Ideal for?

Clearly humans are not good at determining
the size of objects in images – at least for
these types of stimuli.
 But they are good at determining context and
taking contextual cues into account – i.e. use
perspective cues to estimate depth and make
adjustments.
 What reasoning/statistical tasks are humans
ideal for?
Brightness of Patterns: Adelson (MIT)
Visual Illusions

The perception of brightness of a surface,
 or the length of a line,
 depends on context.
 Not on basic measurements like:
 the no. of photons that reach the eye
 or the length of line in the image.
Vision is ill-posed.
Vision is ill-posed – the data in the retina is
not sufficient to unambiguously determine
the visual scene.
 Vision is possible because we have prior
knowledge about visual scenes.


Even simple perception is an act of creation.
Perception as Inference

Helmholtz. 1821-1894.
 “Perception as Unconscious Inference”.
Ball in a Box. (D. Kersten)
How Hard is Vision?

The Human Brain devotes an enormous amount of
resources to vision.
 (I) Optic nerve is the biggest nerve in the body.
 (II) Roughly half of the neurons in the cortex are
involved in vision (van Essen).
 If intelligence is proportional to neural activity,
then vision requires more intelligence than
mathematics or chess.
Vision and the Brain
Half the Cortex does Vision
Vision and Artificial Intelligence

The hardness of vision became clearer when
the Artificial Intelligence community tried to
design computer programs to do vision. ’60s.
 AI workers thought that vision was “lowlevel” and easy.
 Prof. Marvin Minsky (pioneer of AI) asked
a student to solve vision as a summer project.
Chess and Face Detection

Artificial Intelligence Community preferred
Chess to Vision.
 By the mid-90’s Chess programs could beat
the world champion Kasparov.
 But computers could not find faces in
images.
Man and Machine.

David Marr (1945-1980)

Three Levels of explanation:
1. Computation Level/Information Processing
2. Algorithmic Level
3. Hardware: Neurons versus silicon chips.
Claim: Man and Machine are similar at Level 1.
Vision: Decoding Images
Vision as Probabilistic Inference

Represent the World by S.
 Represent the Image by I.
 Goal: decode I and infer S.
 Model image formation by likelihood
function, generative model, P(I|S)
 Model our knowledge of the world by a
prior P(S).
Bayes Theorem



Then Bayes’ Theorem states we show infer
the world S from I by
P(S|I) = P(I|S)P(S)/P(I).
Rev. T. Bayes. 1702-1761
Bayes to Infer S from I

.
P(I|S) likelihood function . P(S) prior.
Ambiguity and Complexity of Images.

Similar objects give rise to very different
images. Different objects can cause similar
images.
Ideal Observers
The Image of a cylinder is consistent with
multiple objects and viewpoints.

The likelihood is ambiguous
(concave or convex).
 The prior resolves the ambiguity by
biasing towards convex objects viewed
from above.
Influence Graphs and Visual Tasks

Influence Graphs and the Visual Task
A Simple Taxonomy of Graphs

A Taxonomy of Graphs:
B.
C.
D.
Examples of Vision Tasks

Visual Inference:
(1) Estimating Shape.
(2) Segmenting Images.
(3) Detecting Faces.
(4) Detecting and Reading Text.
(5) Parsing the full image – detect and
recognize all objects in the image,
understand the viewed scene.
Segmentation (Level Sets)
Segmentation (Level Sets)
Analysis by Synthesis

Invert generation process to parse the image.

Probabilistic Grammars
for image generation
(week 2).
Probabilistic Grammars for Images

(I) Image are generated by composing visual patterns:
 (II) Parse an image by decomposing it into patterns.
Generative Models for Patterns

Examples of images synthesized from
generative models (MCMC).
Shape Inference
Face and Text Detection.
Text Detection
Towards Full Image Parsing

The image genome project (Zhu).

Attempt to determine the grammar for
images by interactive parsing of images.

Thereby learn the statistical regularities of
images – the priors and the representations.
Parse graph with horizontal relations
Example: street scene
Database
Inventory of the annotated image database by Nov.06
561,726 images
3,309,257 POs
Database
PO means a parsed object node in the database
images
scene 10,139
217,007 POs
aerial
image
723 images
48,907 POs
business
parking
airport
sports
street
bathroom shopping residential
parking bedroom meeting industry
corridor dinner
intersection
harbor
marina
lecture
highway hall
kitchen
school
forest
landscape office
livingroom
animal
rural
cityview
seashore
outdoor
indoor
activity
land mammal
cat
pig
horse
tiger
cattle
bear
panda
orangutang
kangaroo
zebra
...
natural
generic
object
22,405 images
129,184 POs
manmade
other
plant
other
marine insert
bird
flower
fruit
crocodile mountain/hill
crane
bass
ant
body of water
turtle
eagle
butterfly
shark
...
ibis
dolpin cockroach forg
parrot
dragonfly crab
trout
snak
flamingo goldfish mayfly
...
owl
shrimp scorpion
pigeon octopus tick
...
...
robin
duck
hen
...
face
1,194 images
13,889 POs
age
pose
expression
text
english
chinese
vehicle furniture electronic
airplane table
car
chair
bed
bus
bicycle bench
couch
SUV
...
truck
motorcycle
cruise ship
ambulance
...
602 images
18,878 POs
weapon
battleship
television
cannon
lamp
microwave helicopter
tank
camera
rifle
ceiling fan
ambulance sword
...
telepnone
cell phone
mp3
air-condition
...
frames
video 525,850
2,794,727POs
surveillance
video clips
cartoon
movie clips
other
food
flag
container
computer
tools
music instrument
stationery
...
others
804 images
86,665 curves
low-middle
level vision
attribute curve
graphlet
weak boundary
...
Back to the Brain

Top-Level; compare human performance to
Ideal Observers.

Explain human perceptual biases (visual
illusions) as strategies that are “statistical
effective”.
Brain Architecture



The Bayesian models have interesting
analogies to the brain.
Generative models and analysis by
synthesis.
This is consistent with top-down
processing? (Kersten’s talk next week).
Conclusion





Vision is unconscious inference.
Bayesian Approach lead to vision as analysis by
synthesis -- inverting the image generation
process.
This requires “sophisticated” priors about the
statistics of natural images.
This can be formulated mathematically in terms of
Probabilistic Grammars for image formation.
These grammars can be learnt by analysing the
“sophisticated” statistics of natural images.