Diapositiva 1 - University of Haifa

Download Report

Transcript Diapositiva 1 - University of Haifa

Articulated Bodies
Tracking
Eran Sela
Articulated Body
• Every general 3D motion can be perceived by a moving
group of joints and links.
• An articulated body has only joints and fixed length
limbs.
Motivation
• Based on input data such as depth map,
color, silhouette map – We’ll see today
two works about:
• How to implement realtime
skeleton tracking on the articulated body.
• The tracking can be used to move
computers graphic models & to capture
3D motion of human’s body.
Tracking Methods
 Supervised or semi-supervised learning trackers:
Training sorts of decision trees or other statistical models based on
labeled & unlabeled data.
 Model based skeleton tracking:
Modeling the human body with primitives/surfaces and fitting
the model to the data using an optimization scheme.
 Image processing based tracking:
Generate skeleton based on mathematical condition the data
conform to.
Presentation timeline
1.
Articulated Soft Objects for Video-based Body Modeling
 Modeling the articulated body
 Optimization framework to the data (Least squares).
 Data constraints
 Results
2.
A Multiple Hypothesis Approach to Figure Tracking
 Introduction
 The 2D Scaled Prismatic Model
 Mode-based Multiple-Hypothesis Tracking
 Multiple Modes as Piecewise Gaussians
 Results
Articulated Soft Objects for Video-based
Body Modeling
Input:
Video sequence containing:
 Depth map (using stereo cameras or other method).
 Silhouette map (The points where the line of sight from the camera is
tangent to the surface).
Output:
 A set of 3D ellipsoid primitives with translation, orientation and scale
corresponding to the articulated body parts.
Modelling with Primitives vs Soft objects
Problem: primitive models such as cylinder and spheres
are too crude for precise recovery of both shape and motion
Solution: use Soft objects.
Each primitive defines a field function and the skin
is taken to be a level set of the sum of these fields.
Has the following advantages:
• Effective use of stereo and silhouette data
• Accurate shape description by a small number of
parameters.
• Explicit modeling of 3–D geometry
Modelling with Primitives vs Soft objects
Problem: primitive models such as cylinder and spheres
are too crude for precise recovery of both shape and motion
Solution: use Soft objects.
Each primitive defines a field function and the skin
is taken to be a level set of the sum of these fields.
Has the following advantages:
• Effective use of stereo and silhouette data
• Accurate shape description by a small number of
parameters.
• Explicit modeling of 3–D geometry
Modelling with Primitives vs Soft objects
Problem: primitive models such as cylinder and spheres
are too crude for precise recovery of both shape and motion
Solution: use Soft objects.
Each primitive defines a field function and the skin
is taken to be a level set of the sum of these fields.
Has the following advantages:
• Effective use of stereo and silhouette data
• Accurate shape description by a small number of
parameters.
• Explicit modeling of 3–D geometry
Modelling the body parts:
State Vector:
B – number of body parts
N – number of consecutive
frames
J – number of joints
The state vector θ changes on each frame.
Generalized algebraic surfaces
Example in 2D:
 𝑓1𝑖𝑚𝑝𝑙𝑖𝑐𝑖𝑡 : 𝑒 −2
𝑥 2 +𝑦 2
 𝑓2𝑖𝑚𝑝𝑙𝑖𝑐𝑖𝑡 : 𝑒 −2
(𝑥−1)2 +(𝑦−0.5)2
 𝑓𝑖𝑚𝑝𝑙𝑖𝑐𝑖𝑡 : 𝑒 −2
𝑥 2 +𝑦 2
= 0.1
+ 𝑒 −2
= 0.1
(𝑥−1)2 +(𝑦−0.5)2
= 0.1
Generalized algebraic surfaces
Example in 3D:
 𝐷 𝑥, 𝑦, 𝑧 =
𝑖 𝑏𝑖 𝑒
−𝑎𝑖 (𝑥−𝑥0 )2 +(𝑦−𝑦0 )2 +(𝑧−𝑧0 )2
 𝐷 𝑥, 𝑦, 𝑧 =1, 𝑎𝑖 = 2 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑖 = [1,2]
Metaballs
Blinn [2]
Metaballs (Generalized algebraic surfaces), are defined by a summation over n
3-dimensional Gaussian density distributions, each called a source or
primitive.
The final surface S is found where the density function F
equals some threshold amount, in our case:
d(x,y,z) is an algebraic ellipsoidal distance function. Will be defined next
𝑓𝑖 𝑑 is a 1D function that is diffrentiable over the whole domain and has a long
range effect because it approaches zero slowly.
Ellipsoids as sources
Why choosing ellipsoids as sources for metaballs?
 They are simple
 Allow accurate modeling of human limbs with relatively few primitives
 Their shape is controlled by higher level width and length parameters
And thus problems like over-fitting to high-curvature regions do not
occur.
Next we define the 3D quadratic distance
Function d() from the (x,y,z) point to each
ellipsoid source.
3D Quadratic distance
For a specific metaball and a state vector θ we define 4x4 matrix:
𝑄𝜃 = 𝐿𝜃𝑤,𝑙 ∙ 𝐶𝜃𝑤,𝑙 ∙ 𝑆𝜃𝑙,𝑟
𝐿𝐶𝜃𝑤,𝑙 = 𝐿𝜃𝑤,𝑙 ∙ 𝐶𝜃𝑤,𝑙
Is the scaling and translation along the major axis of the ellipsoid
is the radii of the ellipsoid (half the axis length along the
principal directions.
is the primitive’s center.
are the coefficients from the state vector.
3D Quadratic distance
• 𝐿 = (𝑙𝑥 , 𝑙𝑦 , 𝑙𝑧 ) is initial scaling of each ellipsoid to be proportional
to the body part dimensions and prevent over-fitting to high
curvature regions. This parameters are constant per each part for
all the frames.
• 𝜃𝑤 , 𝜃𝑙 is the per frame ellipsoid scaling and they are changing at
each frame. The scaling is identical for x and y axis.
World frame and joint frame
What changes every frame?
The translation of each ellipsoid
center from the world frame is
constant (The vector C).
The scaling of each joint
𝛉𝐰 , 𝛉𝐥 doesn’t change per
frame.
World frame 𝛉𝐠
Translation Rotation
Joint 1 frame 𝛉𝐫
Rotation
E is per joint
rotation matrix to
the quadratic frame
and is constant per
frame.
Joint 2 frame 𝛉𝐫
Rotation
3D Quadratic distance
is the skeleton induced transformation. A 4x4 rotation-translation matrix
From the world frame to the frame to which the metaball is attached.
Given the rotation
of a joint J, we write:
Is homogenous 4x4 transformation from the joint frame to the quadric frame.
Is transformation from the world frame to joint frame.
Is the ellipsoidal quadratic distance field.
Least Square Framework
• In each frame we’re given 3D data 𝑥 ∈ 𝑅3
• Based on the state vector θ, we define an
observation to be the difference between
the total field function and the threshold
value.
• This constrains the point to lie on the
surface parameterized by the state vector
θ:
𝑜𝑏𝑠𝑖 = 𝐹 𝑥, 𝜃 − 𝑇,
𝑖 ∈ 1. . 𝑛𝑜𝑏𝑠 , 𝑛𝑜𝑏𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 3𝐷 𝑝𝑜𝑖𝑛𝑡𝑠.
Least Square Framework
𝑛
𝐹 𝑥, 𝜃 =
𝑓𝑖 (𝑑𝑖 𝑥, 𝜃 )
𝑖=1
Least squares optimization framework is used to estimate the state vector parameters:
𝑜𝑏𝑠 = 𝐹 𝑥, 𝜃 − 𝑇
𝑦𝑖 𝜃 = 𝑜𝑏𝑠𝑖 − 𝜀𝑖 ,
1 ≤ 𝑖 ≤ 𝑛𝑜𝑏𝑠
𝑦𝑖 (𝜃) is the observation equation for the least squares framework.
𝜀𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙. 𝑛𝑜𝑏𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑊𝑒 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑣 𝑇 𝑃𝑣 𝑤ℎ𝑒𝑟𝑒 𝑣 = [𝜖1 , … , 𝜖𝑛𝑜𝑏𝑠 ]
𝐴𝑛𝑑 𝑃 𝑖𝑠 diagonal 𝑤𝑒𝑖𝑔ℎ𝑡 𝑚𝑎𝑡𝑟𝑖𝑥 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠.
Each weight 𝑝𝑖 can be determined by the object space coordinates, silhouette rays,
or temporal constraints.
Least Square Framework
Solution to the optimization problem is based on Levenberg-Marquardt algorithm
For solving the least squares problem, and find the new state vector θ.
The Jacobian matrix is calculated for any point x:
Silhouettes Observations
The silhouette points defined as the points where the line of sight from the camera
Is perpendicular to the normal of the surface.
Why silhouette data is important?
Integrate silhouette constraint
𝐺𝑖𝑣𝑒𝑛 𝑡ℎ𝑒 𝑙𝑎𝑠𝑡 𝑠𝑡𝑎𝑡𝑒 𝑣𝑒𝑐𝑡𝑜𝑟 𝜃 ∈ Θ 𝑡ℎ𝑒 𝑠𝑖𝑙ℎ𝑜𝑢𝑒𝑡𝑡𝑒 𝑝𝑜𝑖𝑛𝑡 𝑠𝑎𝑡𝑖𝑠𝑓𝑦:
Integrate silhouette constraint
• We integrate silhouette observations into our framework
by performing an initial search (using Brent’s line
minimization) along the line of sight to find the point that
is closest to the model at its current configuration.
• Then when we find the closest silhouette point to the
model we give it a higher weight in the P weight matrix, so
the silhouette points are more significant for the fitting.
Fitting Result
Sensor configuration:
• Depth is acquired by 3 cameras in an L configuration taking non-interlaced
images at 30 frames/sec, with an effective resolution of 640 x 400.
• stereo algorithm produced very dense point clouds which are then filtered
yielding about 4000 evenly distributed 3–D points on the surface of the
subject
• In the top row are the original sequences of upper body motions of different
persons. Results of the tracking and fitting are shown in the bottom row.
Although the two persons have very different body sizes the system adjusts
the generic model accordingly.
Fitting Result
First
person:
Second
person:
End of topic 1
Presentation timeline
1.
Articulated Soft Objects for Video-based Body Modeling
 Modeling the articulated body
 Optimization framework to the data (Least squares).
 Data constraints
 Results
2.
A Multiple Hypothesis Approach to Figure Tracking
 Introduction
 The 2D Scaled Prismatic Model
 Mode-based Multiple-Hypothesis Tracking
 Multiple Modes as Piecewise Gaussians
 Results
A Multiple Hypothesis Approach to
Figure Tracking
 A 2D human figure tracking.
 Probability approach to estimate the 2D
human figure model.
 Maintaining a set of possible tracking solutions.
 Every possible track can be potentially updated
with every new update.
 Over time, the track branches into many
possible directions.
A Multiple Hypothesis Approach to
Figure Tracking
 A 2D human figure tracking.
 Probability approach to estimate the 2D
human figure model.
 Maintaining a set of possible tracking solutions.
 Every possible track can be potentially updated
with every new update.
 Over time, the track branches into many
possible directions.
A Multiple Hypothesis Approach to
Figure Tracking
 A 2D human figure tracking.
 Probability approach to estimate the 2D
human figure model.
 Maintaining a set of possible tracking solutions.
 Every possible track can be potentially updated
with every new update.
 Over time, the track branches into many
possible directions.
A Multiple Hypothesis Approach to
Figure Tracking
 A 2D human figure tracking.
 Probability approach to estimate the 2D
human figure model.
 Maintaining a set of possible tracking solutions.
 Every possible track can be potentially updated
with every new update.
 Over time, the track branches into many
possible directions.
A Multiple Hypothesis Approach to
Figure Tracking
 A 2D human figure tracking.
 Probability approach to estimate the 2D
human figure model.
 Maintaining a set of possible tracking solutions.
 Every possible track can be potentially updated
with every new update.
 Over time, the track branches into many
possible directions.
Used in radars
• The MHT is designed for situations in
which the target motion model is very
unpredictable, as all potential track
updates are considered.
• As each radar update is received every
possible track can be potentially updated
with every new update. Over time, the
track branches into many possible
directions.
The 2D Scaled Prismatic Model
How we can enforce 3D kinematic constraints of the model that
conform to the 2D monocular image data?
Scaled Prismatic Models (SPM):
• Each link in a scaled prismatic model describes the image plane
projection
of an associated rigid link in an underlying 3D kinematic chain.
• Each link has 2 DOF: the distance between the joint centers of
adjacent links, and the rotation angle at its joint center around an axis
which is perpendicular to the image plane.
• It captures the foreshortening that occurs when 3D links rotate into
and out of the image plane.
The 2D Scaled Prismatic Model
How we can enforce 3D kinematic constraints of the model that
conform to the 2D monocular image data?
Scaled Prismatic Models (SPM):
• Each link in a scaled prismatic model describes the image plane
projection
of an associated rigid link in an underlying 3D kinematic chain.
• Each link has 2 DOF: the distance between the joint centers of
adjacent links, and the rotation angle at its joint center around an axis
which is perpendicular to the image plane.
• It captures the foreshortening that occurs when 3D links rotate into
and out of the image plane.
The 2D Scaled Prismatic Model
How we can enforce 3D kinematic constraints of the model that
conform to the 2D monocular image data?
Scaled Prismatic Models (SPM):
• Each link in a scaled prismatic model describes the image plane
projection
of an associated rigid link in an underlying 3D kinematic chain.
• Each link has 2 DOF: the distance between the joint centers of
adjacent links, and the rotation angle at its joint center around an axis
which is perpendicular to the image plane.
• It captures the foreshortening that occurs when 3D links rotate into
and out of the image plane.
Tracking problem representation
 We model the human 2D figure as a branched SPM chain.
 Each link in the arms, legs, and head is modeled as an SPM link.
 Each link 2 DOF, leading to a total body model with 18 DOF’s.
 The tracking problem consists of estimating a vector of SPM
parameters for the figure in each frame of a video sequence,
given some initial state.
Probability Density Representation
 The choice of representation for the probability density of
a tracker state is largely dominated by two concerns:
 The unimodality constraint imposed when using a
Gaussian-based parametric representation such as the
Kalman Filter is inaccurate when tracking in a cluttered
environment.
 Sample-based representation (such as used in the
CONDENSATION algorithm) requires a prohibitive
number of samples for encoding the probability
distribution of a high-DOF SPM model.
Condensation Algorithm
 Condensation algorithm is an application of particle filtering in which:
 Observations and hidden states are represented by hand contours.
 Contours can be represented as splines, list of angles between
phalanxes, etc.
 There is a model for P(next state|previous state).
 Can be set manually by studying the anatomy of a hand.
 Can be learned by gathering lots of examples of sequences of hand
movement.
 Learning can be done using special gloves which report exact hand
location and shape.
 P(state|observation) is estimated using visual features (SIFT,Harris,
etc.)
Probability Density Representation
A hybrid approach:
 Supports a multimodal description but requires fewer
samples for modeling.
 The representation is based on retaining only the modes
(or peaks) of the probability density and modeling the
local neighborhood surrounding each mode with a
Gaussian.
MHT Algorithm
 Input:
 Video sequence containing 1 or more humans
 Output:
 A state vector per each frame of values for all the DOF of
the SPM chains assembling the model.
Mode-based Multiple-Hypothesis Tracking
The tracking praoblem is solved per each frame by finding 𝑥𝑡 that maximizes:
(Bayes rule)
𝑊ℎ𝑒𝑟𝑒 𝑥𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑟𝑎𝑐𝑘𝑒𝑟 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑡𝑖𝑚𝑒 𝑡.
𝑧𝑡 is the observed data at time t.
𝑍𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 the past image observations (i.e. 𝑧𝜏 for 𝜏 = 0, … , 𝑡)
𝑘 𝑖𝑠 𝑎 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡.
𝑧𝑡 𝑖𝑠 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑍𝑡−1 𝑔𝑖𝑣𝑒𝑛 𝑥𝑡 .
The algorithm
The stages of the algorithm at each time-frame are:
 1. Generating the new prior density p(𝑥𝑡 |𝑍𝑡−1 ) by passing the
modes of p(𝑥𝑡−1 |𝑍𝑡−1 )through the Kalman filter prediction step.
 2. Likelihood computation, involving:
 (a) Creating initial hypothesis seeds by sampling the distribution of
p 𝑥𝑡 𝑍𝑡−1 ).
 (b) Refining the hypotheses through differential state-space search to
obtain the modes of the likelihood p(𝑧𝑡 |𝑥𝑡 ).
 (c) Measure the local statistics associated with each likelihood mode
(saving the modes selected in the likelihood to be updated later).
 3. Computing the posterior density p(𝑥𝑡 |𝑍𝑡 ) via Baye’s Rule, then
updating and selecting the set of modes.
The algorithm
The stages of the algorithm at each time-frame are:
 1. Generating the new prior density p(𝑥𝑡 |𝑍𝑡−1 ) by passing the
modes of p(𝑥𝑡−1 |𝑍𝑡−1 )through the Kalman filter prediction step.
 2. Likelihood computation, involving:
 (a) Creating initial hypothesis seeds by sampling the distribution of
p 𝑥𝑡 𝑍𝑡−1 ).
 (b) Refining the hypotheses through differential state-space search to
obtain the modes of the likelihood p(𝑧𝑡 |𝑥𝑡 ).
 (c) Measure the local statistics associated with each likelihood mode
(saving the modes selected in the likelihood to be updated later).
 3. Computing the posterior density p(𝑥𝑡 |𝑍𝑡 ) via Baye’s Rule, then
updating and selecting the set of modes.
The algorithm
The stages of the algorithm at each time-frame are:
 1. Generating the new prior density p(𝑥𝑡 |𝑍𝑡−1 ) by passing the
modes of p(𝑥𝑡−1 |𝑍𝑡−1 )through the Kalman filter prediction step.
 2. Likelihood computation, involving:
 (a) Creating initial hypothesis seeds by sampling the distribution of
p 𝑥𝑡 𝑍𝑡−1 ).
 (b) Refining the hypotheses through differential state-space search to
obtain the modes of the likelihood p(𝑧𝑡 |𝑥𝑡 ).
 (c) Measure the local statistics associated with each likelihood mode
(saving the modes selected in the likelihood to be updated later).
 3. Computing the posterior density p(𝑥𝑡 |𝑍𝑡 ) via Baye’s Rule, then
updating and selecting the set of modes.
The algorithm
The stages of the algorithm at each time-frame are:
 1. Generating the new prior density p(𝑥𝑡 |𝑍𝑡−1 ) by passing the
modes of p(𝑥𝑡−1 |𝑍𝑡−1 )through the Kalman filter prediction step.
 2. Likelihood computation, involving:
 (a) Creating initial hypothesis seeds by sampling the distribution of
p 𝑥𝑡 𝑍𝑡−1 ).
 (b) Refining the hypotheses through differential state-space search to
obtain the modes of the likelihood p(𝑧𝑡 |𝑥𝑡 ).
 (c) Measure the local statistics associated with each likelihood mode
(saving the modes selected in the likelihood to be updated later).
 3. Computing the posterior density p(𝑥𝑡 |𝑍𝑡 ) via Baye’s Rule, then
updating and selecting the set of modes.
The algorithm
The stages of the algorithm at each time-frame are:
 1. Generating the new prior density p(𝑥𝑡 |𝑍𝑡−1 ) by passing the
modes of p(𝑥𝑡−1 |𝑍𝑡−1 )through the Kalman filter prediction step.
 2. Likelihood computation, involving:
 (a) Creating initial hypothesis seeds by sampling the distribution of
p 𝑥𝑡 𝑍𝑡−1 ).
 (b) Refining the hypotheses through differential state-space search to
obtain the modes of the likelihood p(𝑧𝑡 |𝑥𝑡 ).
 (c) Measure the local statistics associated with each likelihood mode
(saving the modes selected in the likelihood to be updated later).
 3. Computing the posterior density p(𝑥𝑡 |𝑍𝑡 ) via Baye’s Rule, then
updating and selecting the set of modes.
Generating Prior Distributions
 Obtaining the prior density 𝑃 𝑥𝑡 𝑍𝑡−1 in the next time frame is
similar to the Kalman filter prediction step.
 𝑣 is acquired by a naive constant velocity predictor, (e.g.
𝑣=
𝑥𝑡−1 −𝑥𝑡−2
∆𝑡
).
𝑣
Kalman Filter
𝑃 𝑥𝑡−1 𝑍𝑡−1
𝑃 𝑥𝑡 𝑍𝑡−1
Kalman
Filter
State Prediction:
Measurement Prediction:
𝑥𝑘 - state prediction
𝑢𝑘 - control signal (Most of the time there is no control signal)
𝑤𝑘 - process noise
A,B,H - define the physics of interest ( acceleration, position, speed… )
𝑧𝑘 - measurement prediction
𝑣𝑘 - measurement noise
Kalman Filter
• Two groups of the equations for the Kalman filter:
o Time update equations (Prediction)
o Measurement update equations. (Correction)
• The time update equations are responsible for projecting forward (in time) the
current state and error covariance estimates to obtain the a priori estimates for
the next time step.
• The measurement update equations are responsible for the feedback—i.e. for
incorporating a new measurement into the a priori estimate to obtain an
improved a posteriori estimate.
Kalman Filter
Predict
1. Predict the state ahead:
xˆt  Axˆt 1
2. Predict the error covariance
Update
1.
2.
Update the state estimate:
xˆt  xˆt  Kt zt  Hxˆt 
Update the error covariance:
t  I  Kt H t
ahead:
where Kalman gain Kt is:
t  At 1 AT  Q

K t   t H T H t H T  R

1
52
Multiple Modes as Piecewise Gaussians
 Representing p 𝑥𝑡 𝑍𝑡−1 ) pdf efficiently
 In situations when the modes can occur in clusters (as is often the
case), it is erroneous to use the individual modes directly as
components in a Gaussian sum representation.
 This can be result in a cluster of weaker modes being over-
represented at the expense of strong but isolated modes
 Unlike MOG (Mixture of Gaussians) where a good Gaussian sum
approximation may be obtained via a complex fitting process (e.g.
EM algorithm).
Multiple Modes as Piecewise Gaussians
Given a set of N modes for which the i-th mode has a
state 𝑚𝑖 , an estimated covariance 𝑆𝑖 and a probability 𝑝𝑖
an accurate construction of the probability density function requires
a local maxima of value 𝑝𝑖 located at each 𝑚𝑖 , with the local
neighborhood surrounding 𝑚𝑖 being approximately Gaussian with
covariance 𝑆𝑖 .
Sampling from Piecewise Gaussians
Sampling the distribution of 𝑃(𝑋𝑡 |𝑍𝑡−1 ):
1. Select the i-th mode with probability 𝑃𝑖 from the set of N modes.
2. Obtain a single sample 𝑠 from the original Gaussian distribution
associated with the i-th mode.
3. If 𝑠 lies within the boundaries of the i-th mode (i.e. the i-th
mode’s Gaussian is the maximum for sample s, or P(s) satisfies the
equation bellow) take it, otherwise reject it.
4. Return to step 1 until the required number of accepted samples
have been obtained.
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
S from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Selected mode
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
s from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Selected mode
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
s from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Selected mode
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
s from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Not satisfies p(x),
Reject it !
Selected mode
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
s from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Selected mode
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
s from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Selected mode
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
s from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Sample satisfies
p(x), keep it.
Selected mode
Sampling from Piecewise Gaussians
Select mode with
probability 𝑝𝑖
Obtain sample
s from mode i
Check if 𝑝𝑖 𝑠
Satisfies p(s)
Stop after
enough samples
accepted
Not enough samples
Sample satisfies
p(x), keep it.
Selected mode
Template Registration
 In order to estimate the likelihood distribution template images of the
model should be registered.
 This can be done for example by randomizing values for the SPM
model chains and rendering a 3D graphic model of a person then his
joints conforms to the model state.
Likelihood Computation
• 𝑢 represent image pixel coordinates.
• 𝐼(𝑢) are the image pixel values at 𝑢.
• 𝑇(𝑢, 𝑥𝑡 ) are the overlapping template pixel values at 𝑢 when the SPM
model has state 𝑥𝑡 . The templates can be image rendered from the
model at state 𝑥𝑡 (template registration process).
• 𝜎 2 is the pixel noise variance (this has to be known apriori or
experimentally obtained).
We maximize it minimizing the log likelihood:
Using Iterative Gauss-Newton method.
Deriving Posterior Distributions
 We calculate the posterior for each mode based on the prior
density 𝑃(𝑥𝑡 |𝑍𝑡−1 ) and the likelihood
𝑃 𝑧𝑡 𝑥𝑡 , both represented as PWG.
 We then update the selected modes (those chosen in the likelihood
maximization) using the calculated posterior.
 To prevent an exponential increase in modes in our experiments,
each likelihood mode generates a posterior mode by combining
with the most compatible prior mode.
Example of the process for each frame
Passing p 𝑥𝑡−1 𝑍𝑡−1
Through Kalman filter to
Get p(𝑥𝑡 |𝑍𝑡−1 )
Computing the prior
p(𝑥𝑡 |𝑍𝑡 ) and updating the
selected modes
Saving the modes selected
in the likelihood
maximization
Creating initial hypothesis
by sampling from
p 𝑥𝑡 𝑍𝑡−1 )
Estimate the ML
probability p 𝑧𝑡 𝑥𝑡
from new image I and
templates
I
Example of the process for each frame
Passing p 𝑥𝑡−1 𝑍𝑡−1
Through Kalman filter to
Get p(𝑥𝑡 |𝑍𝑡−1 )
Computing the prior
p(𝑥𝑡 |𝑍𝑡 ) and updating the
selected modes
Saving the modes selected
in the likelihood
maximization
Creating initial hypothesis
by sampling from
p 𝑥𝑡 𝑍𝑡−1 )
Estimate the ML
probability p 𝑧𝑡 𝑥𝑡
from new image I and
templates
I
Example of the process for each frame
Passing p 𝑥𝑡−1 𝑍𝑡−1
Through Kalman filter to
Get p(𝑥𝑡 |𝑍𝑡−1 )
Computing the prior
p(𝑥𝑡 |𝑍𝑡 ) and updating the
selected modes
Saving the modes selected
in the likelihood
maximization
Creating initial hypothesis
by sampling from
p 𝑥𝑡 𝑍𝑡−1 )
Estimate the ML
probability p 𝑧𝑡 𝑥𝑡
from new image I and
templates
I
Example of the process for each frame
Passing p 𝑥𝑡−1 𝑍𝑡−1
Through Kalman filter to
Get p(𝑥𝑡 |𝑍𝑡−1 )
Computing the prior
p(𝑥𝑡 |𝑍𝑡 ) and updating the
selected modes
Saving the modes selected
in the likelihood
maximization
Creating initial hypothesis
by sampling from
p 𝑥𝑡 𝑍𝑡−1 )
Estimate the ML
probability p 𝑧𝑡 𝑥𝑡
from new image I and
templates
I
Example of the process for each frame
Passing p 𝑥𝑡−1 𝑍𝑡−1
Through Kalman filter to
Get p(𝑥𝑡 |𝑍𝑡−1 )
Computing the prior
p(𝑥𝑡 |𝑍𝑡 ) and updating the
selected modes
Saving the modes selected
in the likelihood
maximization
Creating initial hypothesis
by sampling from
p 𝑥𝑡 𝑍𝑡−1 )
Estimate the ML
probability p 𝑧𝑡 𝑥𝑡
from new image I and
templates
I
Experimental Results
 The algorithm was tested on three sequences involving Fred
Astaire from the movie ‘Shall We Dance’. A 2D 19-DOF SPM model
is manually initialized in the first image frame, after which
tracking is fully automatic.
 First experiment:
 Each joint probability distribution in the state-space is described via
only 1 mode (unimodal).
 Second experiment:
 Typically each joint probability distribution in the state-space is
described via 10 modes in a PWG representation
Experimental Results
 Single hypothesis (tracker initialized with single mode) tracker:
The single hypothesis tracker fails to handle the self-occlusion caused by
Fred Astaire’s legs crossing
Experimental Results
 Multi hypothesis (tracker initialized with 10 modes) tracker:
• Top row: the multiple modes of the tracker are shown.
• Bottom row: the dominant mode is shown, which demonstrate the ability
of the tracker to handle ambiguous situations and thus survive the
occlusion event.
References
 Plankers and Fua, “Articulated Soft Objects for
Video-based Body Modeling”, ICCV 2001
 Cham, T.J. and Rehg, J.M. “A Multiple Hypothesis
Approach to Figure Tracking”, CVPR 1999
(II:239-245)
The End