Video-Based People Tracking

Download Report

Transcript Video-Based People Tracking

Brubaker, Sigal and Fleet
Presented by Patrick Davis
1

The process to track people…
◦ Process the image
◦ Get the skeleton
◦ Try to guess the posture/movement

Challenges
◦ Things that cover your bones
 Muscle
 Fat
 Cloths
◦ Thing that clutter the image

Lots of Formulas…
◦ I will not cover those

Lots of Models used to simplify processing
2


Even though this paper was written in 2010,
the paper does not discuss the Kennect.
Fair warning
◦ If you ask about the Kennect or how it works, I will
not know the answer.
◦ Most of what the paper is concerned with is one
camera processing, the Kennect is not just one
camera
3


Tracking is boiled down to a set of sequences
of probable poses
The formulas are basically what they call
likelihood
◦ Likelihood is used to approximate the current body
position.
◦ The result is then combined to find the most
probable motion
4

The skeleton can be treated as a structure
◦ Think of a tree with hinges on each of the branches
◦ Each type of hinge can move a certain way
 Degrees of Freedom
◦ This allows a simple way of connecting arms to
shoulder in a logical way

This can lead to Gimbal Lock
◦ Gimbal lock is the loss of one degree of freedom in
a three dimensional space.
5

Parameters can be used to improve posture
estimation
◦ Computationally expensive
◦ SCAPE model uses a small number of parameters
and replaces it with a mesh. Is only practical with
offline processing.


Clothing is assumed to be tight fitting
Parameters can be entered via calibration
targets
6

2D images

Creating a logical green screen

Appearance

Edges and Gradient Based Features
◦ If one can identify the joints and track the joint location
they can infer the likelihood of the current position.
◦ This can be simplified with some pre initialization.
◦ by removing the background we can more easily see the
foreground person.
◦ Requires the background to be static
◦ Processing foreground is harder because of shading
differences but this is helped by the likelihood of the pose.
◦ Where the edge detection is used to get the edge and the
distance between points. Once this is achieved later
calculations are only measured at edge points
7

Joint Limits

Smoothness and linear Dynamical models

Activity Specific Models

Physics-based Motion Models
◦ While your knees do not bend forward there is not enough knowledge
about joint limits alone to infer the body position
◦ Human motion is smooth theoretically. Each pose is equal to the previous
pose plus some noise (Markov model)
◦ If you know the type of motion being tracked (or person) you can apply
stronger models to the movement. (Principal Component Analysis is used
to approximate the pose based on mean poses).
◦ We can tell a computer what sitting in a chair looks like and the computer
can more easily recognize someone sitting in a chair
◦ These model forces such as muscle forces and gravity to put constraints
on the allowed motions. These can get very complex very quickly.
◦ Just think of all the forces on your body as you sat down in your chair.
8



The goal is to compute the approximation of
the likely body position
This can be done by tracking the movement
of each pixel
Markov Chain Monte Carlo Filtering
◦ Produce a sequence of particles (variables) that as
time (or frames) the particles move closer to the
target formation.
9

Discriminative Methods for Pose Estimation
◦ These are usually derived from a set of training
examples that are assumed to be fair samples from
the joint distribution over states and
measurements.

Nearest Neighbor
◦ Given a set of known poses the body is mapped to
the closest example
 Drawbacks
 Large Training Sets
 All training sets must be stored and retrieved
 It can produce ambiguities (such as a difference between
sitting down in a chair and squats at the gym)
10
Questions
11