An Introduction to Probabilistic Graphical Models.
Download
Report
Transcript An Introduction to Probabilistic Graphical Models.
浙江大学计算机学院《人工智能引论》课件
第十讲
概率图模型导论
Chapter 10 Introduction to Probabilistic
Graphical Models
Weike Pan, and Congfu Xu
{panweike, xucongfu}@zju.edu.cn
Institute of Artificial Intelligence
College of Computer Science, Zhejiang University
October 12, 2006
References
An Introduction to Probabilistic Graphical Models. Michael I.
Jordan.
http://www.cs.berkeley.edu/~jordan/graphical.html
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
Outline
Preparations
PGM “is” a universal model
Different thoughts of machine learning
Different training approaches
Different data types
Bayesian Framework
Chain rules of probability theory
Conditional Independence
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
Different thoughts of machine
learning
Statistics (modeling uncertainty, detailed information)
vs.
Logics (modeling complexity, high level information)
Unifying Logical and Statistical AI. Pedro Domingos, University of
Washington. AAAI 2006.
Speech: Statistical information (Acoustic model + Language model +
Affect model…) + High level information (Expert/Logics)
Different training approaches
Maximum Likelihood Training: MAP (Maximum a
Posteriori)
vs.
Discriminative Training: Maximum Margin (SVM)
Speech: classical combination – Maximum Likelihood +
Discriminative Training
Different data types
Directed acyclic graph (Bayesian Networks, BN)
Modeling asymmetric effects and dependencies: causal/temporal
dependence (e.g. speech analysis, DNA sequence analysis…)
Undirected graph (Markov Random Fields, MRF)
Modeling symmetric effects and dependencies: spatial
dependence (e.g. image analysis…)
PGM “is” a universal model
To model both temporal and spatial data, by unifying
Thoughts: Statistics + Logics
Approaches: Maximum Likelihood Training + Discriminative
Training
Further more, the directed and undirected models
together provide modeling power beyond that which
could be provided by either alone.
Bayesian Framework
Problem description
Observation Conclusion (classification or prediction)
Bayesian rule
Observation
Likelihood
Priori probability
A posteriori probability
P(O | ci ) P(ci )
P(ci | O)
P(O)
Class i
Normalization factor
What we care is the conditional probability, and it’s is a ratio of two marginal probabilities.
Chain rules of probability theory
Conditional Independence
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
PGM
Nodes represent random variables/states
The missing arcs represent conditional independence assumptions
The graph structure implies the decomposition
Directed PGM (BN)
Probability Distribution
Representation
Queries
Implementation
Conditional Independence
Interpretation
Probability Distribution
fi ( xi , x i ) 0
f ( x , x ) 1
i
i
i
xi
Definition of Joint Probability Distribution
Check:
Representation
Graphical models represent joint probability distributions more economically, using
a set of “local” relationships among variables.
Conditional Independence (basic)
Interpret missing edges in terms of conditional independence
Assert the conditional independence of a node from
its ancestors, conditional on its parents.
Conditional Independence
(3 canonical graphs)
Conditional Independence
Marginal Independence
p( x, y, z ) p( x) p( z ) p( y | x, z )
p ( x) p ( z )
p( x, y, z )
p( x, z )
p ( x, z ) p ( x ) p ( z )
Classical Markov chain
“Past”, “present”,
“future”
Common cause
Y “explains” all the dependencies
between X and Z
Common effect
Multiple, competing explanation
Conditional Independence (check)
Bayes ball algorithm (rules)
One incoming arrow
and one outgoing arrow
Two outgoing arrows
Two incoming arrows
Check through reachability
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
Undirected PGM (MRF)
Probability Distribution
Representation
Queries
Implementation
Conditional Independence
Interpretation
Probability Distribution(1)
Clique
A clique of a graph is a fully-connected subset of nodes.
Local functions should not be defined on domains of nodes that extend
beyond the boundaries of cliques.
Maximal cliques
The maximal cliques of a graph are the cliques that cannot be extended
to include additional nodes without losing the probability of being fully
connected.
We restrict ourselves to maximal cliques without loss of generality, as it
captures all possible dependencies.
Potential function (local parameterization)
X ( xC ): potential function on the possible realizations xC of the maximal
clique X C
C
Probability Distribution(2)
Maximal cliques
Probability Distribution(3)
Joint probability distribution
Boltzman distribution
p ( x)
1
X C ( xC )
Z C
p( x)
Normalization factor
Z
x
C
XC
( xC )
Z
1
X C ( xC )
Z C
1
exp{ H C ( xC )}
Z C
1
exp{ H C ( xC )}
Z
C
1
exp{ H ( x)}
Z
x
C
XC
( xC )
exp{ H ( x)}
x
Conditional Independence
It’s a “reachability” problem in graph theory.
Representation
Outline
Preparations
Probabilistic Graphical Models (PGM)
Directed PGM
Undirected PGM
Insights of PGM
Insights of PGM (Michael I. Jordan)
Probabilistic Graphical Models are a marriage between probability theory
and graph theory.
A graphical model can be thought of as a probabilistic database, a machine
that can answer “queries” regarding the values of sets of random variables.
We build up the database in pieces, using probability theory to ensure that
the pieces have a consistent overall interpretation. Probability theory also
justifies the inferential machinery that allows the pieces to be put together
“on the fly” to answer the queries.
In principle, all “queries” of a probabilistic database can be answered if we
have in hand the joint probability distribution.
Insights of PGM (data structure &
algorithm)
A graphical model is a natural/perfect tool for
representation(数据结构) and
inference (算法).
Thanks!