An Introduction to Probabilistic Graphical Models.

Download Report

Transcript An Introduction to Probabilistic Graphical Models.

浙江大学计算机学院《人工智能引论》课件
第十讲
概率图模型导论
Chapter 10 Introduction to Probabilistic
Graphical Models
Weike Pan, and Congfu Xu
{panweike, xucongfu}@zju.edu.cn
Institute of Artificial Intelligence
College of Computer Science, Zhejiang University
October 12, 2006
References
 An Introduction to Probabilistic Graphical Models. Michael I.
Jordan.
 http://www.cs.berkeley.edu/~jordan/graphical.html
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
Outline
 Preparations
 PGM “is” a universal model
 Different thoughts of machine learning
 Different training approaches
 Different data types
 Bayesian Framework
 Chain rules of probability theory
 Conditional Independence
 Probabilistic Graphical Models (PGM)
 Directed PGM
 Undirected PGM
 Insights of PGM
Different thoughts of machine
learning
 Statistics (modeling uncertainty, detailed information)
vs.
Logics (modeling complexity, high level information)
 Unifying Logical and Statistical AI. Pedro Domingos, University of
Washington. AAAI 2006.
 Speech: Statistical information (Acoustic model + Language model +
Affect model…) + High level information (Expert/Logics)
Different training approaches
 Maximum Likelihood Training: MAP (Maximum a
Posteriori)
vs.
Discriminative Training: Maximum Margin (SVM)
 Speech: classical combination – Maximum Likelihood +
Discriminative Training
Different data types
 Directed acyclic graph (Bayesian Networks, BN)
 Modeling asymmetric effects and dependencies: causal/temporal
dependence (e.g. speech analysis, DNA sequence analysis…)
 Undirected graph (Markov Random Fields, MRF)
 Modeling symmetric effects and dependencies: spatial
dependence (e.g. image analysis…)
PGM “is” a universal model
 To model both temporal and spatial data, by unifying
 Thoughts: Statistics + Logics
 Approaches: Maximum Likelihood Training + Discriminative
Training
 Further more, the directed and undirected models
together provide modeling power beyond that which
could be provided by either alone.
Bayesian Framework
Problem description
Observation  Conclusion (classification or prediction)
Bayesian rule
Observation
Likelihood
Priori probability
A posteriori probability
P(O | ci ) P(ci )
P(ci | O) 
P(O)
Class i
Normalization factor
What we care is the conditional probability, and it’s is a ratio of two marginal probabilities.
Chain rules of probability theory
Conditional Independence
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
PGM
 Nodes represent random variables/states
 The missing arcs represent conditional independence assumptions
 The graph structure implies the decomposition
Directed PGM (BN)
Probability Distribution
Representation
Queries
Implementation
Conditional Independence
Interpretation
Probability Distribution
fi ( xi , x i )  0
 f ( x , x )  1
i
i
i
xi
Definition of Joint Probability Distribution
Check:
Representation
Graphical models represent joint probability distributions more economically, using
a set of “local” relationships among variables.
Conditional Independence (basic)
Interpret missing edges in terms of conditional independence
Assert the conditional independence of a node from
its ancestors, conditional on its parents.
Conditional Independence
(3 canonical graphs)
Conditional Independence
Marginal Independence
p( x, y, z )  p( x) p( z ) p( y | x, z )
 p ( x) p ( z )
p( x, y, z )
p( x, z )
p ( x, z )  p ( x ) p ( z )
Classical Markov chain
“Past”, “present”,
“future”
Common cause
Y “explains” all the dependencies
between X and Z
Common effect
Multiple, competing explanation
Conditional Independence (check)
Bayes ball algorithm (rules)
One incoming arrow
and one outgoing arrow
Two outgoing arrows
Two incoming arrows
Check through reachability
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
Undirected PGM (MRF)
Probability Distribution
Representation
Queries
Implementation
Conditional Independence
Interpretation
Probability Distribution(1)
 Clique
 A clique of a graph is a fully-connected subset of nodes.
 Local functions should not be defined on domains of nodes that extend
beyond the boundaries of cliques.
 Maximal cliques
 The maximal cliques of a graph are the cliques that cannot be extended
to include additional nodes without losing the probability of being fully
connected.
 We restrict ourselves to maximal cliques without loss of generality, as it
captures all possible dependencies.
 Potential function (local parameterization)
 X ( xC ): potential function on the possible realizations xC of the maximal
clique X C
C
Probability Distribution(2)
Maximal cliques
Probability Distribution(3)
 Joint probability distribution
Boltzman distribution
p ( x)
1
 X C ( xC )

Z C
p( x)
Normalization factor
Z

x
C
XC
( xC )
Z
1
 X C ( xC )
Z C
1
  exp{ H C ( xC )}
Z C
1
 exp{ H C ( xC )}
Z
C
1
 exp{ H ( x)}
Z

x
C
XC
( xC )
  exp{ H ( x)}
x
Conditional Independence
It’s a “reachability” problem in graph theory.
Representation
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
Insights of PGM (Michael I. Jordan)
 Probabilistic Graphical Models are a marriage between probability theory
and graph theory.
 A graphical model can be thought of as a probabilistic database, a machine
that can answer “queries” regarding the values of sets of random variables.
 We build up the database in pieces, using probability theory to ensure that
the pieces have a consistent overall interpretation. Probability theory also
justifies the inferential machinery that allows the pieces to be put together
“on the fly” to answer the queries.
 In principle, all “queries” of a probabilistic database can be answered if we
have in hand the joint probability distribution.
Insights of PGM (data structure &
algorithm)
 A graphical model is a natural/perfect tool for
representation(数据结构) and
inference (算法).
Thanks!