Transcript Document

Human Face Modeling and
Animation
Example of problems in application of
multimedia signal processing
Introduction
• Face is very important INTERFACE for people
• It is an interface which is very natural, very reach
in signalling. We know we have special structures in the
brain for processing face information
The problem of this lecture: We would like to have
computer interfaces using face talking to us. That
could be for example a kind of assistant to us.
HOW TO MAKE THIS?
Application Areas
•
•
•
•
•
Advanced user interfaces: Social agents and avatars
Education: Pedagogical agents
Medicine: Facial tissue surgical simulation
Criminalistics: Forensic face recognition
Teleconferencing: Transmitting & receiving facial
images
• Game industry: Realistic games
• Media and art: Movies: Avatar, Alice in
Wonderland….
faces in part realistic, Shrek – faces non realistic
FACE IS COMPLICATED
Face anatomy:
•Face muscles
•More than 200
•Shape varying
•Connected to bones & tissues
•Skin
Facial Modeling
• The idea: USE 3D graphics to render face
• Involves determination of geometric descriptions
& animation capabilities.
• Face has a very complex, flexible 3-D surface. It
has color and texture variations and usually
contains creases and wrinkles.
• Methods for effective animation and efficient
rendering: volume representation, surface
representation and face features.
Methods for Effective Animation
and Efficient Rendering
• Volume representation: includes
constructive solid geometry (CSG),
volume element (voxel) arrays and
aggregated volume elements such as
octrees.
• Surface representation: includes implicit
surfaces, parametric surfaces and
polygonal surfaces.
Surface Representation
• Implicit surfaces: defined by a function
F(x,y,z)=0.
• Parametric surfaces: generated by three
functions of two parametric variables, one
function for each of the spatial
dimensions. Include B-splines, Betasplines, Bezier patches, NURBS.
Surface Representation
• Polygonal surfaces: include
regular polygonal meshes and
arbitrary polygon networks.
• Face Features:
is usually the sum of many parts
and details. A complete approach
is to integrate the facial mask,
facial features details and hair
into models of the complete head.
Face mask
Detailed features
Polygonal surface
Techniques Used for Surface
Measurements
Specifying the 3-D surface of a face is a
significant challenge. The general
approaches to specify the 3-D face are:
• 3-D digitizer
• Photogrammetric measurement
• Modeling based on laser scans
3-D Digitizer
• Special hardware devices that rely
on mechanical, electro- magnetic
or acoustic measurements to
locate positions in space.
• Works best with sculptures or
physical models that do not
change during the measuring
process unlike real faces.
• Used to sequentially measure the
vertex position for each sculpture.
Plastic sculpture is
digitized as meshes
Photogrammetric Measurement
• This method captures facial surface shapes and
expressions photographically.
• Basic idea is to take multiple simultaneous photographs
of the face, each from different views.
• If certain constraints are observed when taking the
photographs, the desired 3-D surface data points can be
computed based on measured data from the multiple 2D views.
• Then a 3-D coordinate system with a coordinate origin
near the center of the head is established.
Procedure of Photogrammetric
Measurement
Setup of cameras and
projectors
Photographs from
different views
Mesh
Modeling Based on Laser
Scans
• Laser based surface scanning devices can be used to
measure faces.
• These devices typically produce a very large regular
mesh of data values in a cylindrical coordinate system.
• In addition to the range data, a color camera captures
the surface color of the head, so for each range point the
corresponding color is also available. This is useful for
texture mapping in the final rendering of the talking
head.
Procedure of Modeling Based
on Laser Scans
Scanned surface
range data
Scanned surface
color data
An adapted mesh overlaid on its surface color data
Face Modeling
1. Geometrical graphic model
2. Cyber scanner
as generic face model
to obtain texture &
range data of a face
and range data viewed in 3-D
3. Feature points are selected and face model is
automatically warped to produce a customized face model
The end result – is it looking realistic?
Facial Animation
To make face ”alive. Several
approaches to facial animation:
• Interpolation
• Performance driven animation
• Direct parametrization
• Pseudo muscle based animation
• Muscle based animation
Interpolation
• Most widely used technique.
• Uses key-framing approach.
• The desired facial expression is specified for a certain
point in time and then again for another point in time
some number of frames later.
• A computer algorithm then generates the frames in
between these key frames.
• Facial animation is achieved by digitizing the face in
each of the several different expressions and then
interpolating between these expressions.
• Key-frame animation requires complete specification of
the model geometry for each key facial expression.
Interpolation Between
Expressions
Surprised
Sad
Worried
Actual
Actual
Interpolated
Performance Based Animation
• Involves using information derived by measuring
real human actions to drive synthetic characters.
• Often uses interactive input devices, such as
gloves, instrumented body suits and laser or
video based motion tracking systems.
• One of the approaches is expression mapping.
In this different expressions and phoneme poses
are digitized directly from a real person.
Examples of Performance
Based Animation
Human facial movements and phonemes are digitized
to be used by animated character
Direct Parameterized Model
• Sets of parameters are used to define facial
conformation and to control facial expressions.
• Uses local region interpolations, geometric
transformations and mapping techniques.
• The three basic ideas used in this model are:
• 1. fundamental concept of parameterization
• 2. development of appropriate descriptive parameter set
3. development of parameterized model coupled with
image synthesizer to create the desired image.
Pseudo Muscle Based Facial
Animation
• Muscle actions are simulated using
geometric deformation operators.
• Facial tissue dynamics are not simulated.
• Includes abstract muscle action and free
form deformation.
Muscle Based Animation
• Uses a mass-and-spring model to simulate
facial muscles.
• Muscles are of two types: linear muscles
that pull and elliptic muscles that squeeze.
• Muscle parameters: muscle vector and
zone of muscle effect.
Linear and Elliptical Muscles
Linear muscle
Muscle parameters
1. muscle vector
2. zone of muscle
effect
Elliptical muscle
Modeling the Primary Facial
Expressions
• Following are the basic characterization of facial
expressions that are considered to be generic to the
human face: Happiness, Anger, Fear, Surprise, Disgust
and Sadness.
• Facial Action Coding System (FACS) is a widely used
notation for the coding of facial articulation.
• FACS describes 66 muscle actions (some muscle
blends) which in combination can give rise to thousands
of possible facial expressions.
• Examples of facial expressions are shown in the
following slide.
Examples of Facial Expressions
Neutral face
Anger
Happiness
Surprise
Fear
Disgust
Facial Image Synthesis
The next step is to actually generate the
sequence of facial images that form the desired
animation. Image synthesis includes three
major tasks:
• Transforming the geometric model and its components
into the viewing coordinate system.
• Determining which surfaces are visible from the viewing
position.
• Computing the color values for each image pixel based
on the lighting conditions and the properties of the visible
surfaces.
Basic Idea
• Face Tracker
– Piece-wise Bezier Volume
Deformation Face Model
• Purpose: To design FACS motion units
B ézier volum e
(top layer)
B ézier volum e
(top layer)
F ace surface m esh
F ace surface m esh
B ézier volum e
(bottom layer)
D eform ation
B ézier volum e
(bottom layer)
Warping in 3D
• Face expression change
(a)
(b)
(a) Bézier controlling mesh (b) An expression “smile”.
Face animation is too hard?
• Try to take REAL face an use
it for animation
• This may be much easier than
generation of complete synthetic
natural face
BUT HOW TO DO IT?
• Let’s say our goal is to generate
natural face talking and controlled by
computer
To do it:
We take video of real person and we
will change lip movements and face
expression according to the speech
How to do it: Analysis Stage
• Given video of the subject speaking,
extract mouth position and lip shape
• Hand label training images:
– 34 points on mouth (20 outer boundary, 12
inner boundary, 1 at bottom of upper teeth, 1
at top of lower teeth)
– 20 points on chin and jaw line
– Morph training set to get to 351 images
Audio Analysis
• Want to capture visual dynamics of
speech
• Phonemes are not enough
• Consider coarticulation
• Lip shapes for many phonemes are
modified based on phoneme’s context
(e.g. /T/ in “beet” vs. /T/ in “boot”)
Audio Analysis (continued)
• Segment speech into triphones
• e.g. “teapot” becomes /SIL-T-IY/, /T-IY-P/,
/IY-P-AA/, /P-AA-T/, and /AA-T-SIL/)
• Emphasize middle of each triphone
• Effectively captures forward and backward
coarticulation
Audio Analysis (continued)
• Training footage audio is labeled with phonemes
and associated timing
• Use gender-specific segmentation
• Convert transcript into triphones
Synthesis Stage
• Given some new speech utterance
– Mark it with phoneme labels
– Determine triphones
– Find a video example with the desired
transition in database
• Compute a matching distance to each
triphone:
error = αDp + (1- α)Ds
Viseme Classes
• Cluster phonemes into viseme classes
(speech units + face movements)
• Use 26 viseme classes (10 consonant, 15
vowel):
(1) /CH/, /JH/, /SH/, /ZH/
(2) /K/, /G/, /N/, /L/
…
(25) /IH/, /AE/, /AH/
(26) /SIL/
Lip Shape Distance
• Ds is distance between lip shapes in
overlapping triphones
– Eg. for “teapot”, contours for /IY/ and /P/
should match between /T-IY-P/ and /IY-P-AA/
– Compute Euclidean distance between 4element vectors (lip width, lip height, inner lip
height, height of visible teeth)
• Solution depends on neighbors in both
directions (use DP)
Time Alignment of Triphone Videos
• Need to combine triphone videos
• Choose portion of overlapping triphones
where lip shapes are close as possible
• Already done when computing Ds
Time Alignment to Utterance
• Still need to time align with target audio
– Compare corresponding phoneme transcripts
– Start time of center phoneme in triphone is
aligned with label in target transcript
– Video is then stretched/compressed to fit time
needed between target phoneme boundaries
Combining Lips and
Background
• Need to stitch new mouth movie into
background original face sequence
• Compute transform M as before
• Warping replacement mask defines mouth
and background portions in final video
Mouth mask
Background mask
Combining Lips and
Background
• Mouth shape comes from triphone image,
and is warped using M
• Jaw shape is combination of background
jaw and triphone jaw lines
• Near ears, jaw dependent on background,
near chin, jaw depends on mouth
• Illumination matching is used to avoid
seams mouth and background
Video Rewrite Results
• Video: 8 minutes of video, 109 sentences
• Training Data: front-facing segments of video,
around 1700 triphones
“Emily” sequences
Video Rewrite Results
• 2 minutes of video, 1157 triphones
JFK sequences
Conclusion
• Face is very important but very difficult
interface
• Perhaps in the future we will have realistic
synthetic faces
• At present the best is to take video of real
person and perform face warping on it
• This looks quite natural