Perceptual and Sensory Augmented Computing Machine Learning
Download
Report
Transcript Perceptual and Sensory Augmented Computing Machine Learning
Augmented Computing
and Sensory
Perceptual
WS 13/14
Learning,
Machine
Machine Learning – Lecture 21
Learning Bayesian Networks & Extensions
27.01.2014
Bastian Leibe
RWTH Aachen
http://www.vision.rwth-aachen.de
[email protected]
Announcements
• Exam
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
1st Date: Monday, 17.02., 13:30 – 17:30h
2nd Date:Monday, 17.03., 09:30 – 12:30h
Closed-book exam, the core exam time will be 2h.
Admission requirement: 50% of the exercise points or passed
test exam
We will send around an announcement with the exact starting
times and places by email.
• Test exam
Date: Thursday, 06.02., 10:15 – 11:45h, room 5056
Core exam time will be 1h
Purpose: Prepare you for the questions you can expect.
Possibility to collect bonus exercise points!
B. Leibe
2
Announcements (2)
• Last lecture next Tuesday: Repetition
Summary of all topics in the lecture
“Big picture” and current research directions
Opportunity to ask questions
Please use this opportunity and prepare questions!
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
B. Leibe
3
Course Outline
• Fundamentals (2 weeks)
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Bayes Decision Theory
Probability Density Estimation
• Discriminative Approaches (5 weeks)
Linear Discriminant Functions
Statistical Learning Theory & SVMs
Ensemble Methods & Boosting
Decision Trees & Randomized Trees
• Generative Models (4 weeks)
Bayesian Networks
Markov Random Fields
Exact Inference
Applications & Extensions
B. Leibe
4
Recap: Graph Cuts for Binary Problems
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
D p (t )
t
n-links
a cut
w pq
D p (s )
s
“expected” intensities of
object and background
I s and I t
D (t ) exp || I
p
can be re-estimated
Dp (s) exp || I p I s ||2 / 2 2
t 2
2
I
||
/
2
p
EM-style optimization
Slide credit: Yuri Boykov
B. Leibe
5
[Boykov & Jolly, ICCV’01]
Recap: s-t-Mincut Equivalent to Maxflow
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Flow = 0
Augmenting Path Based
Algorithms
Source
2
v1
5
9
1
2
Sink
v2
4
1. Find path from source to sink
with positive capacity
2. Push maximum possible flow
through this path
3. Repeat until no path can be
found
Algorithms assume non-negative capacity
Slide credit: Pushmeet Kohli
B. Leibe
6
Recap: When Can s-t Graph Cuts Be Applied?
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
unary potentials
E ( L)
pairwise potentials
E p (Lp )
p
t-links
E(L
pqN
p
, Lq )
n-links
L p {s, t}
• s-t graph cuts can only globally minimize binary energies
that are submodular.
E(L) can be minimized
by s-t graph cuts
[Boros & Hummer, 2002, Kolmogorov & Zabih, 2004]
E ( s, s) E (t , t ) E ( s, t ) E (t , s)
Submodularity
(“convexity”)
• Submodularity is the discrete equivalent to convexity.
Implies that every local energy minimum is a global minimum.
Solution will be globally optimal.
B. Leibe
7
Recap: -Expansion Move
• Basic idea:
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Break multi-way cut computation into a sequence of
binary s-t cuts.
other labels
No longer globally optimal result, but guaranteed approximation
quality and typically converges in few iterations.
Slide credit: Yuri Boykov
B. Leibe
8
Recap: Converting an MRF to an s-t Graph
Graph *g;
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
For all pixels p
Source (0)
/* Add a node to the graph */
nodeID(p) = g->add_node();
/* Set cost of terminal edges */
set_weights(nodeID(p), fgCost(p), bgCost(p));
bgCost(a2)
bgCost(a1)
cost(p,q)
a1
end
for all adjacent pixels p,q
add_weights(nodeID(p), nodeID(q), cost);
end
a2
fgCost(a1)
fgCost(a2)
Sink (1)
g->compute_maxflow();
label_p = g->is_connected_to_source(nodeID(p));
a1 = bg a2 = fg
// is the label of pixel p (0 or 1)
Slide credit: Pushmeet Kohli
B. Leibe
9
Topics of This Lecture
• Learning Bayesian Networks
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Learning with known structure, full observability
Learning with known structure, partial observability
Structure learning
• Models for Sequential Data
Independence assumptions
Markov models
• Hidden Markov Models (HMMs)
Traditional view
Graphical Model view
• Extensions
B. Leibe
10
Bayesian Networks
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
• What we’ve learned so far…
We know they are directed graphical models.
Their joint probability factorizes into conditional probabilities,
We know how to convert them into undirected graphs.
We know how to perform inference for them.
– Sum/Max-Product BP for exact inference in (poly)tree-shaped BNs.
– Loopy BP for approximate inference in arbitrary BNs.
– Junction Tree algorithm for converting arbitrary BNs into trees.
• But what are they actually good for?
How do we apply them in practice?
And how do we learn their parameters?
B. Leibe
11
Image source: C. Bishop, 2006
Parameter Learning in Bayesian Networks
• We need to specify two things:
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Structure of Bayesian network (graph topology)
Parameters of each conditional probability table (CPT)
• It is possible to learn both from training data.
But learning structure is much harder than learning parameters.
Also, learning when some nodes are hidden is much harder than
when everything is observable.
• Four cases:
Structure
Observability
Method
Known
Full
Maximum Likelihood Estimation
Known
Partial
EM (or gradient ascent)
Unknown
Full
Search through model space
Unknown
Partial
EM + search through model space
B. Leibe
12
Learning Parameters
• Example:
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
p(x) = p(x 1)p(x 2jx 1 )p(x 3jx 1 )p(x 4jx 2 )
Assume each variable xi is discrete and
can take Ki values.
The parameters of this model can be represented
with 4 tables (called conditional probability tables – CPT):
– p(x1 = k) = µ1,k
µ1 has K1 entries.
– p(x2 = k’|x1 = k) = µ2,k,k’
µ2 has K1£K2 entries.
– p(x3 = k’|x1 = k) = µ3,k,k’
– p(x4 = k’|x2 = k) = µ4,k,k’
X
µi ;k ;k 0 = 1
– Note that
k0
Slide credit: Zoubin Ghahramani
B. Leibe
13
Case 1: Known Structure, Full Observability
• Assume a training data set: D = f x ( n ) gNn= 1
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
How do we learn µ from D?
• Maximum Likelihood:
(n )
(n )
(n )
(n )
(n )
(n )
(n )
p(x (n ) jµ) = p(x 1 jµ1 )p(x 2 jx 1 ; µ2 )p(x 3 jx 1 ; µ3 )p(x 4 jx 2 ; µ4 )
YN
p(D jµ) =
p(x ( n ) jµ) =
n= 1
YN Y4
(n )
(n)
p(x i jx pa( i ) ; µi )
n= 1 i = 1
• Maximum Log-Likelihood:
XN X4
ln p(D jµ) =
(n)
(n)
ln p(x i jx pa( i ) ; µi )
n= 1 i = 1
Slide credit: Zoubin Ghahramani
B. Leibe
14
Case 1: Known Structure, Full Observability
• Maximum Log-Likelihood:
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
XN X4
ln p(D jµ) =
(n)
(n)
ln p(x i jx pa( i ) ; µi )
n= 1 i = 1
This decomposes into a sum of functions µi.
Each µi can be optimized separately:
µi ;k ;k 0
n i ;k ;k 0
= P
00
k 00 n i ;k ;k
where ni,k,k’ is the number of times in D that xi = k’
and xpa(i) = k.
• ML solution
Simply calculate frequencies!
Slide credit: Zoubin Ghahramani
B. Leibe
15
Case 2: Known Structure, Hidden Variables
• ML learning with hidden variables
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Assume a model parameterized by µ
with observed variables X and hidden
(latent) variables Z.
• Goal
Maximize parameter log-likelihood
given the observed data
X
L (µ) = ln p(X jµ) = ln
Z1
Z2
Z3
X
p(X ; Zjµ)
Z
• EM Algorithm: Iterate between two steps:
E-step: fill-in hidden / missing variables
M-step: apply complete-data learning to filled-in data.
Slide adapted from Zoubin Gharahmani
B. Leibe
16
Learning with Hidden Variables: EM Algorithm
• Goal:
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Maximize parameter log-likelihood X
given the observed data.
L (µ) = ln p(X jµ) = ln
• EM Algorithm: Derivation
p(X ; Zjµ)
Z
We do not know the values of the latent variables in Z, but we
can express their posterior distribution given X and (an initial
guess for) µ.
E-step: Evaluate p(ZjX ; µold )
Since we cannot use the complete-data log-likelihood directly,
we maximize its expected value under the posterior distribution
of Z.
M-step: Maximize
new
µ
X
= arg max
µ
Z
p(ZjX ; µold ) ln p(X ; Zjµ)
B. Leibe
17
Learning with Hidden Variables: EM Algorithm
• Note on the E-step:
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
The E-step requires solving the inference problem.
old
I.e. finding the distribution over the hidden variables p(ZjX ; µ )
given the current model parameters.
This can be done using belief propagation or the junction tree
algorithm.
As inference becomes a subroutine of the learning procedure,
fast inference algorithms are crucial!
Slide adapted from Bernt Schiele
B. Leibe
18
Example Application
• Mixture-of-Gaussian Fitting with EM
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Standard application of EM.
Corresponding Bayesian network:
• Important point here
Bayesian networks can be treacherous!
They hide the true complexity in a very simple-looking diagram.
E.g., the diagram here only encodes the information that we
have latent variables zn which depend on observed variables xn
and parameters µ=(¼, ¹, §) (parameter arrows are optional).
– The information that p(xn|zn, µ) encodes a mixture-of-Gaussians
needs to be communicated additionally!
– On the other hand, this general framework can also be used to
apply EM for other types of distributions or latent variables.
B. Leibe
20
Image source: C.M. Bishop, 2006
Summary: Learning with Known Structure
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
• ML-Learning with complete data (no hidden variables)
Log-likelihood decomposes into sum of functions of µi.
Each µi can be optimized separately.
ML-solution: simply calculate frequencies.
• ML-Learning with incomplete data (hidden variables)
Iterative EM algorithm.
E-step: compute expected counts given previous settings µ(t) of
parameters E[ni,j,k|D,µ(t)].
M-step: re-estimate parameters µ using the expected counts.
( t + 1)
µi ;j ;k
Slide credit: Bernt Schiele
E [n i ;j ;k jD ; µ( t ) ]
à P
0 jD ; µ( t ) ]
E
[n
0
i
;j
;k
k
B. Leibe
21
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Cases 3+4: Unknown Structure
• Goal
Learn a directed acyclic graph (DAG) that best explains the data.
• Constraints-based learning
Use statistical tests of marginal and conditional independence.
Find the set of DAGs whose d-separation relations match the
results of conditional independence tests.
• Score-based learning
Use a global score such as BIC (Bayes Information Criterion).
Find a structure that maximizes this score.
Slide adapted from Zoubin Gharahmani
B. Leibe
22
Cases 3+4: Unknown Structure
• Extremely hard problem
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
NP-hard
Number of DAGs on N variables is super-exponential in N.
– 4 nodes: 543 DAGs
– 10 nodes: O(1018) DAGs.
Need to use heuristics to prune down the search space and use
efficient methods to evaluate hypotheses.
• Additional problem: often not enough data available.
Need to make decisions about statistical conditional
independence.
Typically only feasible if the structure is relatively simple and a
lot of data is available…
B. Leibe
23
Example Application
• Analyzing gene expression from micro-array data
1000s of measurement spots (probes) on micro-array, each
sensitive to a specific DNA marker (e.g a section of a gene).
The probes measure if the corresponding
gene is expressed (=active).
Collect samples from patients with
a certain disease or condition.
Monitor 1000s of genes simultaneously.
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
• Interesting questions
Is there a statistical relationship
between certain gene expressions?
If so, can we derive the structure
by which they influence each
Micro-array with ~40k probes
other?
24
B. Leibe
Image source: Wikipedia
Topics of This Lecture
• Learning Bayesian Networks
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Learning with known structure, full observability
Learning with known structure, partial observability
Structure learning
• Models for Sequential Data
Independence assumptions
Markov models
• Hidden Markov Models (HMMs)
Traditional view
Graphical Model view
• Extensions
B. Leibe
25
Sequential Data
• Many real-world problems
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
involve sequential data
Speech recognition
Visual object tracking
Robot planning
DNA sequencing
Financial forecasting
…
• In the following, we will
look at sequential problems
from a Graphical Models
perspective…
B. Leibe
26
Image source: C.M. Bishop, 2006
Models for Sequential Data
• Simplest model
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Treat all observations as independent (i.i.d.)
Corresponding graphical model:
• What can we infer from such a model?
Only relative frequencies of certain events.
Such a model is of limited use in practice.
In practice, the data often exhibits trends that help prediction!
B. Leibe
27
Image source: C.M. Bishop, 2006
Markov Models
• Markov assumption
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Each observation only depends on the most recent previous
observation:
• First-order Markov chain:
• Second-order Markov chain:
B. Leibe
28
Image source: C.M. Bishop, 2006
Markov Models
• We can generalize this further
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Mth order Markov chains
However, this does not scale well.
Suppose all xn can take on K possible values.
#Parameters in the model:
K M ¡ 1 (K ¡ 1)
Complexity grows sharply!
• Goal
We want a model that is not as limited by the Markov assumption
But that can be specified by few parameters
We can achieve that by introducing a state space model.
B. Leibe
29
Topics of This Lecture
• Learning Bayesian Networks
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Learning with known structure, full observability
Learning with known structure, partial observability
Structure learning
• Models for Sequential Data
Independence assumptions
Markov models
• Hidden Markov Models (HMMs)
Traditional view
Graphical Model view
• Extensions
B. Leibe
30
Hidden Markov Models (HMMs)
• Traditional view
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
The system is at each time in
a certain state k
k 2 f 1; 2; 3g
The (Markovian) state transition
probabilities are given by the
matrix A
2
A 11
A = 4 A 21
A 31
A 12
A 22
A 32
3
A 13
A 23 5
A 33
We cannot observe the states directly, they are hidden.
We just know the initial probability distribution over states ¼.
Each state produces a characteristic output, given by a
probability distribution over output symbols Á.
B. Leibe
31
Image source: C.M. Bishop, 2006
Hidden Markov Models (HMMs)
• HMMs
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Widely used in speech recognition,
natural language modelling,
handwriting recognition, biological
sequence analysis (DNA, proteins),
financial forecasting,…
Really changed the field…
• Often used in special forms
E.g., left-to-right HMM
2
3
A 11 A 12 A 13
A = 4 0 A 22 A 23 5
0
0 A 33
• How can we encode them as graphical models?
B. Leibe
32
Image source: C.M. Bishop, 2006
Hidden Markov Models (HMMs)
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
• Graphical Model view
Introduce latent variables z for the current system state.
The observed output x is conditioned on this state.
The state transition probabilities p(zn jzn ¡ 1 ) are given by the
entries of A:
p(zn k = 1jzn ¡
1;j
B. Leibe
= 1) = A j k
33
Image source: C.M. Bishop, 2006
Hidden Markov Models (HMMs)
K states
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
• State transitions
B. Leibe
34
Image source: C.M. Bishop, 2006
Interpretation as a Generative Model
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Model
Samples
• Ancestral Sampling from an HMM
Choose initial latent variable z1 according to ¼k
Sample the corresponding observation x1
Choose state of variable z2 by sampling from p(z2|z1)
…
B. Leibe
35
Image source: C.M. Bishop, 2006
Three Main Tasks in HMMs
1. Likelihood Estimation
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Given: an observation sequence and a model
What is the likelihood of this sequence given
the model?
“Forward-backward algorithm”
2. Finding the most probable state sequence
Given: an observation sequence and a model
What is the most likely sequence of states?
“Viterbi algorithm”
3. Learning HMMs
Given: several observation sequences
How can we learn/adapt the model parameters?
“Baum-Welch algorithm”
B. Leibe
36
Image source: C.M. Bishop, 2006
Three Main Tasks in HMMs
1. Likelihood Estimation
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Given: an observation sequence and a model
What is the likelihood of this sequence given
the model?
“Forward-backward algorithm”
special case of Sum-Product!
2. Finding the most probable state sequence
Given: an observation sequence and a model
What is the most likely sequence of states?
“Viterbi algorithm”
special case of Max-Sum!
Learning HMMs
3.
Given: several observation sequences
How can we learn/adapt the model parameters?
“Baum-Welch algorithm”
application of EM!
B. Leibe
37
Image source: C.M. Bishop, 2006
Topics of This Lecture
• Learning Bayesian Networks
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Learning with known structure, full observability
Learning with known structure, partial observability
Structure learning
• Models for Sequential Data
Independence assumptions
Markov models
• Hidden Markov Models (HMMs)
Traditional view
Graphical Model view
• Extensions
B. Leibe
38
Discussion
• Limitations of HMMs
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Representation of the times for which the system remains in a
given state:
p(T) = (A k k ) T (1 ¡ A k k ) / exp(T ln A kk )
Not appropriate for certain applications.
Poor at capturing long-range correlations between the observed
variables due to restriction to Markov chains.
Generative model spends much of its effort on modeling the
data distribution p(X,Z), even though the data will be given at
test time.
Extensions with discriminative training, CRFs
• With the Graphical Model background, we can now think
of extensions…
B. Leibe
39
HMM Extensions
• Autoregressive HMM
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Can better capture longer-range
correlations between observed
variables.
Still simple probabilistic structure
• Input-Output HMM
Targeted at supervised learning for
sequential data
B. Leibe
40
Image source: C.M. Bishop, 2006
HMM Extensions
• Autoregressive HMM
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Can better capture longer-range
correlations between observed
variables.
Still simple probabilistic structure
• Input-Output HMM
Targeted at supervised learning for
sequential data
Used, e.g., for articulated tracking
[Gammeter et al., ECCV’08]
B. Leibe
41
Image source: C.M. Bishop, 2006
HMM Extensions (2)
• Factorial HMM
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Multiple independent chains of
latent variables that influence the
observations.
Trades off larger #variables vs.
smaller #latent states.
However, more complex to train
(needs approximate inference)
B. Leibe
42
Image source: C.M. Bishop, 2006
Other Extensions…
• Up to now…
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Assumption that the zn are discrete.
We can also make them continuous to model the state with a
parametric distribution.
Using Gaussian distributions and linear dynamics, we get
Kalman filters.
B. Leibe
43
Image source: C.M. Bishop, 2006
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Tracking with Linear Dynamic Systems
• Idea:
Track the state of an object or variable over time by making
predictions and applying corrections.
B. Leibe
44
Image source: C.M. Bishop, 2006
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Tracking with Linear Dynamic Systems
Radar-based tracking
of multiple targets
Visual tracking of
articulated
objects
(L. Sigal et. al., 2006)
• Many applications in different domains…
Slide adapted from Erik Sudderth
B. Leibe
45
Extensions: Conditional Random Fields
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
• HMM is a generative model
Goal: model the joint distribution p(X,Z)
Interpretation of the directed links: conditional probabilities
• Limitations
To define the joint probability, need to enumerate all possible
observation sequences
Not practical to represent multiple interacting features or longrange dependencies
p(X ; Z)
B. Leibe
46
Image source: C.M. Bishop, 2006
Conditional Random Fields
• Alternative: conditional model
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
Idea: model the conditional distribution p(Z|X) instead.
Specify the probabilities of possible label sequences given an
observation sequence.
• Advantages
Does not expend effort on the observations, which are fixed
anyway at test time.
Interpretation of the factors: feature functions, more flexible
p(ZjX )
B. Leibe
47
Image source: C.M. Bishop, 2006
Conditional Random Fields
• Applications
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
First used for natural language interpretation
Replaced HMMs for some tasks.
Learned using iterative estimation algorithms.
• In the meantime, also many applications in vision
Image segmentation
Object recognition
Context modeling
Scene categorization
Gesture recognition
…
Used mainly instead of MRFs (solved with Graph Cuts).
B. Leibe
48
[He, Zemel, Carreira-Perpinan, 2004]
References and Further Reading
Augmented Computing
WS 13/14
and Sensory
Learning,
Machine
Perceptual
• A thorough introduction to Graphical Models in general
and Bayesian Networks in particular can be found in
Chapter 8 of Bishop’s book.
Christopher M. Bishop
Pattern Recognition and Machine Learning
Springer, 2006
• HMMs and their interpretation as graphical models are
described in detail in Chapter 13.
• Original CRF paper:
J. Lafferty, A. McCallum, F. Pereira, Conditional Random Fields: Probabilistic
Models for Segmenting and Labeling Sequence Data, ICML, 2001.
B. Leibe
49