Transcript sony ccd

Human Vision and Cameras
CS 678
Spring 2017
Outline
•
•
•
•
Human vision system
Human vision for computer vision
Cameras and image formation
Projection geometry
• Reading: Chapter 1 (F&P book)
Chapter 2 (Szeliski)
Human Eyes
• The human eye is the organ which gives us the
sense of sight, allowing us to observe and learn
more about the surrounding world than we do
with any of the other four senses.
• The eye allows us to see and interpret the
shapes, colors, and dimensions of objects in the
world by processing the light they reflect or
emit. The eye is able to detect bright light or
dim light, but it cannot sense objects when light
is absent.
Anatomy of the Human Eye
• http://www.tedmontgomery.com/the_eye/
Some Concepts
• Retina – The retina is
the innermost layer of
the eye and is
comparable to the film
inside of a camera. It is
composed of nerve
tissue which senses the
light entering the eye.
Concepts
• The macula lutea is the small, yellowish
central portion of the retina. It is the area
providing the clearest, most distinct vision.
• The center of the macula is called the fovea
centralis, an area where all of the
photoreceptors are cones; there are no rods in
the fovea.
• Learn more concepts at
http://www.tedmontgomery.com/the_eye/
Rods and Cones
• The retina contains two types of photoreceptors, rods and cones.
• The rods are more numerous, some 120 million, and are more
sensitive than the cones. However, they are not sensitive to color.
• The 6 to 7 million cones provide the eye's color sensitivity and they
are much more concentrated in the central yellow spot known as
the macula.
The Electromagnetic Spectrum
Human Vision System
• We do not “see” with our eyes, but with our
brains
Human Vision for Computer Vision
"Human vision is vastly better at recognition
than any of our current computer systems, so
any hints of how to proceed from biology are
likely to be very useful."
– David Lowe
Feedforward Processing
LGN -- The lateral geniculate nucleus
V1 – The primary visual cortex
V2 – Visual area V2
IT – Inferior temporal cortex
Thorpe & Fabre-Thorpe 2001
A Hierarchical Model
• Contains alternating layers called simple and complex
cell units creating increasing complexity: [Hubel & Wiesel
1962, Riesenhuber & Poggio 1999, Serre et al. 2007]
– Simple Cell (linear operation) – Selective
– Complex Cell (nonlinear operation) – Invariant
S1 Layer
• Being selective
• Applying Gabor filters to the input image,
setting parameters to match what is known
about the primate visual system
8 bands and 4 orientations
Models with Four Layers
(S1
•
C1
S2
C2)
Powerful for object category recognition, Representative works include
– Riesenhuber & Poggio ’99, Serre et al. ’05, ’07, Mutch & Lowe ’06
Air planes
Motor bikes
Faces
Cars
S1
C1
Pixelwise
MAX
S2
C2
Template
matching
Global
MAX
Spatial
pooling
MAX
Gabor
filtering
.
.
.
v1
v2
.
.
.
.
.
.
On-line
Off-line
Pre-learned prototypes
…
p1
p2
pN
Serre et al.’s model (PAMI’07)
vN
Biologically Inspired Models
• T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T.
Poggio. Robust object recognition with cortex-like
mechanisms. IEEE Trans. Pattern Anal. Mach. Intell.,
29(3):411–426, 2007.
• G. Guo, G. Mu, Y. Fu, and T. S. Huang. Human age
estimation using bioinspired features. In IEEE CVPR,
2009.
• Any other biologically inspired models?
• Optional Homework: read Serre et al.’ paper or other
newer models, discuss and compare those models, and
think about the advantages and disadvantages
Cameras
Image Formation
Digital Camera
Film
Alexei Efros’ slide
How do we see the world?
• Let’s design a camera
– Idea 1: put a piece of film in front of an object
– Do we get a reasonable image?
Slide by Steve Seitz
Pinhole camera
• Add a barrier to block off most of the rays
– This reduces blurring
– The opening known as the aperture
Slide by Steve Seitz
Pinhole camera model
• Pinhole model:
– Captures pencil of rays – all rays through a single
point
– The point is called Center of Projection (focal point)
– The image is formed on the Image Plane
Slide by Steve Seitz
Dimensionality Reduction Machine (3D to 2D)
3D world
2D image
Point of observation
What have we lost?
• Angles
• Distances (lengths)
Slide by A. Efros
Figures © Stephen E. Palmer, 2002
Projection properties
• Many-to-one: any points along same ray map
to same point in image
• Points → points
– But projection of points on focal plane is
undefined
• Lines → lines (collinearity is preserved)
– But line through focal point projects to a point
• Planes → planes (or half-planes)
– But plane through focal point projects to line
Projection properties
• Parallel lines converge at a vanishing point
– Each direction in space has its own vanishing point
– But parallels parallel to the image plane remain parallel
– All directions in the same plane have vanishing points on the same line
How do we construct the vanishing point/line?
Vanishing points
• each set of parallel lines
meets at a different point
– The vanishing point for this
direction
• Sets of parallel lines on the
same plane lead to collinear
vanishing points.
– The line is called the horizon
for that plane
• Good ways to spot faked
images
– scale and perspective don’t
work
– vanishing points behave badly
Distant objects are smaller
Size is inversely proportional to distance.
Perspective distortion
• What does a sphere project to?
Shrinking the aperture
• Why not make the aperture as small as possible?
– Less light gets through
– Diffraction effects…
Slide by Steve Seitz
Shrinking the aperture
The reason for lenses
Adding a lens
• A lens focuses light onto the film
– Rays passing through the center are not
deviated
Slide by Steve Seitz
Adding a lens
focal point
f
• A lens focuses light onto the film
– Rays passing through the center are not deviated
– All parallel rays converge to one point on a plane
located at the focal length f
Slide by Steve Seitz
Adding a lens
“circle of
confusion”
• A lens focuses light onto the film
– There is a specific distance at which objects are “in
focus”
• other points project to a “circle of confusion” in the image
Slide by Steve Seitz
Thin lens formula
D’
D
f
Frédo Durand’s slide
Thin lens formula
Similar triangles everywhere!
D’
D
f
Frédo Durand’s slide
Thin lens formula
Similar triangles everywhere!
D’
y’/y = D’/D
D
f
y
y’
Frédo Durand’s slide
Thin lens formula
y’/y = D’/D
y’/y = (D’-f)/f
Similar triangles everywhere!
D’
D
f
y
y’
Frédo Durand’s slide
Thin lens formula
Any point satisfying the thin lens equation is in
focus.
1 +1 =1
D’ D f
D’
D
f
Frédo Durand’s slide
Depth of Field
http://www.cambridgeincolour.com/tutorials/depth-of-field.htm
Slide by A. Efros
How can we control the depth of field?
• Changing the aperture size affects depth of field
– A smaller aperture increases the range in which the
object is approximately in focus
– But small aperture reduces amount of light – need to
increase exposure
Slide by A. Efros
Varying the aperture
Large aperture = small DOF
Small aperture = large DOF
Slide by A. Efros
Nice Depth of Field effect
Source: F. Durand
Field of View (Zoom)
Slide by A. Efros
Field of View (Zoom)
Slide by A. Efros
Field of View
f
f
FOV depends on focal length and size of the camera retina
Smaller FOV = larger Focal Length
Slide by A. Efros
Field of View / Focal Length
Large FOV, small f
Camera close to car
Small FOV, large f
Camera far from the car
Sources: A. Efros, F. Durand
Same effect for faces
wide-angle
standard
telephoto
Source: F. Durand
Approximating an affine camera
Source: Hartley & Zisserman
Lens systems
• A good camera lens may
contain 15 elements and cost a
thousand dollars
• The best modern lenses may
contain aspherical elements
Lens Flaws: Chromatic Aberration
• Lens has different refractive indices for different
wavelengths: causes color fringing
Near Lens Center
Near Lens Outer Edge
Lens flaws: Spherical aberration
• Spherical lenses don’t focus light perfectly
Rays farther from the optical axis focus closer
Lens flaws: Vignetting
Radial Distortion
– Caused by imperfect lenses
– Deviations are most noticeable for rays that pass through the
edge of the lens
No distortion
Pin cushion
Barrel
Digital camera
• A digital camera replaces film with a sensor array
– Each cell in the array is light-sensitive diode that converts photons to electrons
– Two common types
• Charge Coupled Device (CCD)
• Complementary metal oxide semiconductor (CMOS)
– http://electronics.howstuffworks.com/digital-camera.htm
Slide by Steve Seitz
CCD vs. CMOS
•
CCD: transports the charge across the chip and reads it at one corner of the array. An
analog-to-digital converter (ADC) then turns each pixel's value into a digital value by
measuring the amount of charge at each photosite and converting that measurement to
binary form
•
CMOS: uses several transistors at each pixel to amplify and move the charge using more
traditional wires. The CMOS signal is digital, so it needs no ADC.
http://electronics.howstuffworks.com/digital-camera.htm
http://www.dalsa.com/shared/content/pdfs/CCD_vs_CMOS_Litwiller_2005.pdf
Color sensing in camera: Color filter array
Bayer grid
Estimate missing components
from neighboring values
(demosaicing)
Why more green?
Human Luminance Sensitivity Function
Source: Steve Seitz
Problem with demosaicing: color moire
Slide by F. Durand
The cause of color moire
detector
Fine black and white detail in image
misinterpreted as color information
Slide by F. Durand
Color sensing in camera: Prism
• Requires three chips and precise alignment
• More expensive
CCD(R)
CCD(G)
CCD(B)
Color sensing in camera: Foveon X3
• CMOS sensor
• Takes advantage of the fact that red, blue and green light
penetrate silicon to different depths
http://www.foveon.com/article.php?a=67
http://en.wikipedia.org/wiki/Foveon_X3_sensor
better image quality
Source: M. Pollefeys
Issues with digital cameras
• Noise
• low light is where you most notice noise
• light sensitivity (ISO) / noise tradeoff
• stuck pixels
• Resolution: Are more megapixels better?
• requires higher quality lens
• noise issues
• In-camera processing
• oversharpening can produce halos
• RAW vs. compressed
• file size vs. quality tradeoff
• Blooming
• charge overflowing into neighboring pixels
• Color artifacts
• purple fringing from microlenses, artifacts from Bayer patterns
• white balance
• More info online:
– http://electronics.howstuffworks.com/digital-camera.htm
– http://www.dpreview.com/
Slide by Steve Seitz
Historical context
• Pinhole model: Mozi (470-390 BCE),
Aristotle (384-322 BCE)
• Principles of optics (including lenses):
Alhacen (965-1039 CE)
• Camera obscura: Leonardo da Vinci
(1452-1519), Johann Zahn (1631-1707)
• First photo: Joseph Nicephore Niepce (1822)
• Daguerréotypes (1839)
• Photographic film (Eastman, 1889)
• Cinema (Lumière Brothers, 1895)
• Color Photography (Lumière Brothers, 1908)
• Television (Baird, Farnsworth, Zworykin, 1920s)
• First consumer camera with CCD:
Sony Mavica (1981)
• First fully digital camera: Kodak DCS100 (1990)
Alhacen’s notes
Niepce, “La Table Servie,” 1822
CCD chip
Modeling projection
y
z
x
• The coordinate system
– We will use the pinhole model as an approximation
– Put the optical center (O) at the origin
– Put the image plane (Π’) in front of O
Source: J. Ponce, S. Seitz
Modeling projection
y
z
x
x
y
( x, y , z )  ( f ' , f ' , f ' )
z
z
•Projection equations
– Compute intersection with Π’ of ray from P = (x,y,z) to O
– Derived using similar triangles
• We get the projection by throwing out the last coordinate:
x
y
( x, y , z )  ( f ' , f ' )
z
z
Source: J. Ponce, S. Seitz
Homogeneous coordinates
x
y
( x, y , z )  ( f ' , f ' )
z
z
• Is this a linear transformation?
• no—division by z is nonlinear
Trick: add one more coordinate:
homogeneous image
coordinates
homogeneous scene
coordinates
Converting from homogeneous coordinates
Slide by Steve Seitz
Perspective Projection Matrix
• Projection is a matrix multiplication using homogeneous
coordinates:
 x
0
0    x 
1 0
y
x
y
0 1



  y
(f' , f' )
0
0

 z  

z
z
0 0 1 / f ' 0    z / f '
divide by the third
coordinate
1 
Perspective Projection Matrix
• Projection is a matrix multiplication using homogeneous
coordinates:
 x
0
0    x 
1 0
y
x
y
0 1





(f' , f' )
0
0

y

 z  

z
z
0 0 1 / f ' 0    z / f '
divide by the third
coordinate
1 
In practice: lots of coordinate transformations…
2D
point
(3x1)
=
Camera to
pixel coord.
trans. matrix
(3x3)
Perspective
projection matrix
(3x4)
World to
camera coord.
trans. matrix
(4x4)
3D
point
(4x1)
Weak perspective
Assume object points are all at same depth -z0
Orthographic Projection
• Special case of perspective projection
– Distance from center of projection to image plane is infinite
Image
World
– Also called “parallel projection”
– What’s the projection matrix?
Slide by Steve Seitz
Pros and Cons of These Models
• Weak perspective (including orthographic) has
simpler mathematics
– Accurate when object is small relative to its
distance.
– Most useful for recognition.
• Perspective is much more accurate for scenes.
– Used in structure from motion.
• When accuracy really matters, we must model
the real camera
– Use perspective projection with other calibration
parameters (e.g., radial lens distortion)