Transcript Document
3D Geometry and
Camera Calibration
3D Coordinate Systems
• Right-handed vs. left-handed
x
y
x
z
z
y
2D Coordinate Systems
• y axis up vs. y axis down
• Origin at center vs. corner
• Will often write (u, v) for image coordinates
u
v
v
u
v
u
3D Geometry Basics
• 3D points = column vectors
x
p y
z
• Transformations = pre-multiplied matrices
a b
Tp d e
g h
c x
f y
i z
Rotation
• Rotation about the z axis
cos
R z sin
0
sin
cos
0
• Rotation about x, y axes similar
(cyclically permute x, y, z)
0
0
1
Arbitrary Rotation
• Any rotation is a composition of rotations about
x, y, and z
• Composition of transformations =
matrix multiplication (watch the order!)
• Result: orthonormal matrix
– Each row, column has unit length
– Dot product of rows or columns = 0
– Inverse of matrix = transpose
Arbitrary Rotation
• Rotate around x, y, then z:
cos y cos z
R cos y sin z
sin
y
cos x sin z sin x sin y cos z
cos x cos z sin x sin y sin z
sin x cos y
sin x sin z cos x sin y cos z
sin x cos z cos x sin y sin z
cos x cos y
• Don’t do this! Compute simple matrices and
multiply them!
Scale
• Scale in x, y, z:
sx
S 0
0
0
sy
0
0
0
s z
Shear
• Shear parallel to xy plane:
1 0 x
σ xy 0 1 y
0 0 1
Translation
• Can translation be represented by multiplying
by a 33 matrix?
• No.
• Proof:
A : A0 0
Homogeneous Coordinates
• Add a fourth dimension to each point:
x
x
y
y
z z
w
• To get “real” (3D) coordinates, divide by w:
x x
w
y y
z w
z
w
w
Translation in
Homogeneous Coordinates
1
0
0
0
0 0 t x x x t x w
1 0 t y y y t y w
0 1 tz z
z tz w
0 0 1 w w
• After divide by w, this is just a translation
by (tx , ty , tz)
Perspective Projection
• What does 4th row of matrix do?
1
0
0
0
0 0 0 x x
1 0 0 y y
0 1 0 z
z
0 1 0 w z
• After divide,
x x
z
y y
z z
1
z
Perspective Projection
• This is projection onto the z=1 plane
(x,y,z)
(x/z,y/z,1)
(0,0,0)
z=1
• Add scaling, etc. pinhole camera model
Putting It All Together:
A Camera Model
Scale to
pixel size
Translate
to image
center
Camera
orientation
Perspective
projection
3D point
TimgS pixPcamR camTcam x
Then perform
homogeneous
divide, and
get (u,v) coords
Camera
location
(homogeneous coords)
Putting It All Together:
A Camera Model
Intrinsics
Extrinsics
TimgS pixPcamR camTcam x
Putting It All Together:
A Camera Model
Camera coordinates
Normalized device coordinates
Eye coordinates
Image coordinates
Pixel coordinates
World coordinates
TimgS pixPcamR camTcam x
More General Camera Model
• Multiply all these matrices together
• Don’t care about “z” after transformation
a
e
i
b
f
j
c
g
k
ax by cz d
d x
homogeneous ix jy kz l
h y
ex fy gz h
ix jy kz l
z
divide
l 1
• Scale ambiguity 11 free parameters
Radial Distortion
• Radial distortion can not be represented
by matrix
uimg cu u
vimg cv v
*
img
*
img
1 k (u
1 k (u
* 2
img
* 2
img
* 2
img
v
* 2
img
v
)
)
• (cu, cv) is image center,
u*img= uimg– cu, v*img= vimg– cv,
k is first-order radial distortion coefficient
Camera Calibration
• Determining values for camera parameters
• Necessary for any algorithm that requires
3D 2D mapping
• Method used depends on:
– What data is available
– Intrinsics only vs. extrinsics only vs. both
– Form of camera model
Camera Calibration – Example 1
• Given:
– 3D 2D correspondences
– General perspective camera model
(11-parameter, no radial distortion)
• Write equations:
ax1 by1 cz1 d
u1
ix1 jy1 kz1 l
ex1 fy1 gz1 h
v1
ix1 jy1 kz1 l
Camera Calibration – Example 1
x1
0
x
2
0
y1
0
y2
z1
0
z2
1
0
1
0
x1
0
0
y1
0
0
z1
0
0 u1 x1
1 u1 x1
0 u 2 x2
u1 y1
u1 y1
u2 y2
u1 z1
u1 z1
u2 z2
0
0
0 x2
y2
z2
1 u 2 x2
u2 y2
u2 z2
u1 a
u1 b
u2 c 0
u 2
l
• Linear equation
• Overconstrained (more equations than unknowns)
• Underconstrained (rank deficient matrix – any multiple
of a solution, including 0, is
also a solution)
Camera Calibration – Example 1
• Standard linear least squares methods for
Ax=0 will give the solution x=0
• Instead, look for a solution with |x|= 1
• That is, minimize |Ax|2 subject to |x|2=1
Camera Calibration – Example 1
• Minimize |Ax|2 subject to |x|2=1
• |Ax|2 = (Ax)T(Ax) = (xTAT)(Ax) = xT(ATA)x
• Expand x in terms of eigenvectors of ATA:
x = m1e1+ m2e2+…
xT(ATA)x = l1m12+l2m22+…
|x|2 = m12+m22+…
Camera Calibration – Example 1
• To minimize
l1m12+l2m22+…
subject to
m12+m22+… = 1
set mmin= 1 and all other mi=0
• Thus, least squares solution is minimum (nonzero) eigenvalue of ATA
Camera Calibration – Example 2
• Incorporating additional constraints into camera
model
– No shear, no scale (rigid-body motion)
– Square pixels
– etc.
• These impose nonlinear constraints on camera
parameters
Camera Calibration – Example 2
• Option 1: solve for general perspective model,
then find closest solution that
satisfies constraints
• Option 2: nonlinear least squares
– Usually “gradient descent” techniques
– Common implementations available
(e.g. Matlab optimization toolbox)
Camera Calibration – Example 3
• Incorporating radial distortion
• Option 1:
– Find distortion first (straight lines in
calibration target)
– Warp image to eliminate distortion
– Run (simpler) perspective calibration
• Option 2: nonlinear least squares
Camera Calibration – Example 4
• What if 3D points are not known?
• Structure from motion problem!
• As we saw last time, can often be solved since #
of knowns > # of unknowns
Multi-Camera Geometry
• Epipolar geometry – relationship between
observed positions of points in multiple cameras
• Assume:
– 2 cameras
– Known intrinsics and extrinsics
Epipolar Geometry
P
p1
C1
p2
C2
Epipolar Geometry
P
p1
C1
l2
p2
C2
Epipolar Geometry
P
Epipolar line
l2
p1
p2
C1
C2
Epipoles
Epipolar Geometry
• Goal: derive equation for l2
• Observation: P, C1, C2 determine a plane
P
l2
p1
C1
p2
C2
Epipolar Geometry
• Work in coordinate frame of C1
• Normal of plane is T Rp2, where T is relative
translation, R is relative rotation
P
l2
p1
C1
p2
C2
Epipolar Geometry
• p1 is perpendicular to this normal:
p1 (T Rp2) = 0
P
l2
p1
C1
p2
C2
Epipolar Geometry
• Write cross product as matrix multiplication
T x T* x ,
0
*
T Tz
T
y
P
Tz
0
Tx
l2
p1
C1
p2
C2
Ty
Tx
0
Epipolar Geometry
• p1 T* R p2 = 0
p1T E p2 = 0
• E is the essential matrix
P
l2
p1
C1
p2
C2
Essential Matrix
• E depends only on camera geometry
• Given E, can derive equation for line l2
P
l2
p1
C1
p2
C2
Fundamental Matrix
• Can define fundamental matrix F analogously,
operating on pixel coordinates instead of
camera coordinates
u1 T F u2 = 0
• Advantage: can sometimes estimate F without
knowing camera calibration