Presentation (PowerPoint File)

Download Report

Transcript Presentation (PowerPoint File)

Dynamic Programming
& Hidden Markov Models.
Alan Yuille
Dept. Statistics UCLA
Goal of this Talk
• This talk introduces one of the major
algorithms: dynamic programming (DP).
• Then describe how it can be used in
conjunction with EM for learning.
Dynamic Programming
• Dynamic Programming exploits the
graphical structure of the probability
distribution. It can be applied to any
structure without closed loops.
• Consider the two-headed coin example
given in Tom Griffiths talk (Monday).
Probabilistic Grammars
• By the Markov Condition:
• Hence we can exploit the graphical
structure to efficiently compute:
The structure means that the sum over x2 drops out. We need
only sum over x1 and x3. Only four operations instead of eight.
Dynamic Programming Intuition
• Suppose you wish to travel to Boston from Los
•
•
Angeles by car.
To determine the cost of going via Chicago –
you only need to calculate the shortest cost from
Los Angeles to Chicago and then, independently,
the shortest cost from Chicago to Boston.
Decomposing the route in this way gives an
efficient algorithm which is polynomial in the
number of nodes and feasible for computation.
Dynamic Programming Diamond
• Compute the shortest cost from A to B.
Application to a 1-dim chain.
• Consider a distribution defined on a 1-dim chain.
• Important property: directed and undirected graphs are
•
equivalent (for 1-dim chain).
P(A,B) = P(A|B) P(B)
or P(A,B) = P(B|A) P(A)
• For these simple graphs with two nodes -- you cannot
•
distinguish causation from correlation without intervention
(Wu’s lecture Friday).
For this lecture – we will treat a simple one-dimensional cover
directed and undirected models simultaneously. (Translating
between directed and undirected is generally possible for
graphs without closed loops – but has subtleties).
Probability distribution on 1-D chain
1-D Chain.
1-Dim Chain:
• (Proof by induction).
1-Dim Chain
• We can also use DP to compute other
properties: e.g. to convert the distribution
from undirected form:
• To directed form:
1-Dim Chain
Special Case: 1-D Ising Spin Model
Dynamic Programming Summary
• Dynamic Programming can be applied to perform
inference on all graphical models defined on trees
–The key insight is that, for trees, we can define
an order on the nodes (not necessarily unique)
and process nodes in sequence (never needing to
return to a node that have already been
processed).
Extensions of Dynamic Programming:
• What to do if you have a graph with closed
•
•
•
loops?
There are a variety of advanced ways to exploit
the graphical structure and obtain efficient exact
algorithms.
Prof. Adnan Darwiche (CS, UCLA) is an expert on
this topic. There will be an introduction to his
SamIam code.
Also can use approximate methods like BP.
Junction Trees.
• It is also possible to take a probability distribution
•
•
•
defined on a graph with closed loops and
reformulate it as a distribution on a new nodes
without closed loops. (Lauritzen and Spiegelhalter
1990).
This lead to a variety of algorithm generally known
as junction trees.
This is not a universal solution – because the
resulting new graphs may have too many nodes to
make them practical.
Google “junction trees” to find nice tutorials on
junction trees.
Graph Conversion
• Convert graph by a set of transformations.
Triangles & Augmented Variables
• From triangles to ordered triangles.
Original Variables: Loops
Augmented Variables: No Loops
Summary of Dynamic Programming.
• Dynamic Programming can be used to efficiently compute
•
•
•
•
•
properties of a distribution for graphs defined on trees.
Directed graphs on trees can be reformulated as
undirected graphs on trees, and vice versa.
DP can be extended to apply to graphs with closed loops
by restructuring the graphs (junction trees).
It is an active research area to determine efficient
inference algorithms which exploit the graphical structures
of these models.
Relationship between DP and reinforcement learning (week
2).
DP and A*. DP and pruning.
HMM’s: Learning and Inference
• So far we have considered inference only.
• This assumes that the model is known.
• How can we learn the model?
• For 1D models -- this uses DP and EM.
A simple HMM for Coin Tossing
• Two coins, one biased and the other fair, with the coins
•
•
•
•
switched occasionally.
The observable 0,1 is whether the coin is head or tails.
The hidden state A,B is which coin is used.
There are unknown transition probabilities between the
hidden states A and B, and unknown probabilities for the
observations conditioned on the hidden states.
The learning task is to estimate these probabilities from
a sequence of measurements.
HMM for Speech
HMM for Speech
HMM Summary
• HMM define a class of markov models with
hidden variables. Used for speech
recognition, and many other applications.
• Tasks involving HMM’s involve learning,
inference, and model selection.
• These can often be performed by
algorithms based on EM and DP.