Transcript BN-02old

Use graphs and not pure logic
•Variables represented by nodes and dependencies by edges.
• Common in our language: “threads of thoughts”, “lines of
reasoning”, “connected ideas”, “far-fetched arguments”.
•Still, capturing the essence of dependence is not an easy task.
When modeling causation, association, and relevance, it is hard to
distinguish between direct and indirect neighbors.
•If we just connect “dependent variables” we will get cliques.
1
Undirected Graphs can represent
Independence
Let G be an undirected graph (V,E).
Define IG(X,Z,Y) for disjoint sets of nodes X,Y, and Z
if and only if all paths between a node in X and a node
in Y pass via a node in Z.
In the text book another notation used is <X|Z|Y>G.
2
M = { IG(M1,{F1,F2},M2), IG(F1,{M1,M2},F2) + symmetry }
3
Dependency Models – abstraction of
Probability distributions
4
5
6
Recall that composition and contraction are implied.
7
(namely, Composition holds)
8
9
10
The set  of all independence statements defined by (3.11) is
called the pairwise basis of G.
These are the independence statements that define the graph.
11
12
Edge minimal and unique.
13
The set  of all independence statements defined
by (3.12) is called the neighboring basis of G.
14
15
16
Testing I-mapness
Proof: (2) (1)(3) (2).
(2) (1): Holds because G is an I-map of G0 is an I map of P.
(1)(3): True due to I-mapness of G (by definition).
(3) (2):
17
Insufficiency of Local tests for non
strictly positive probability
distributions
Consider the case X=Y=Z=W. What is a Markov network for
it ? Is it unique ? The Intersection property is critical !
18
Markov Networks that represents probability
distributions (rather than just independence)
19
The two males and females example
20
21
Theorem 6 does not guarantee that every dependency of P will
be represented by G. However one can show the following
claim:
Theorem X: Every undirected graph G has a distribution P such
that G is a perfect map of P.
(In light of previous notes, it must have the form of a product
over cliques).
22
Proof Sketch
Given a graph G, it is sufficient to show that for an independence
statement  = I(,Z,) that does NOT hold in G, there exists a
probability distribution that satisfies all independence statements
that hold in the graph and does not satisfy  = I(,Z,).
Well, simply pick a path in G between  and  that does not contain
a node from Z. Define a probability distribution that is a perfect map
of the chain and multiply it by any marginal probabilities on all other
nodes forming P . Now “multiply” all P (Armstrong relation) to
obtain P.
Interesting task (Replacing HMW #4): Given an undirected graph
over binary variables construct a perfect map probability
distribution. (Note: most Markov random fields are perfect maps !).
23
Recall:
The set  of all independence statements defined by (3.12) was
called the neighboring basis of G.
Interesting conclusion of Theorem X. All independence statements
that follow for strictly-positive probability from the neighborhood
basis are derivable via symmetry, decomposition, intersection, and
weak union. These axioms are sound and complete for
neighborhood bases !
Same conclusion with pairwise bases. In fact for saturated
statements independence and separation have the same
characterization. See paper P2 in the recitation class.
24
Drawback: Interpreting the Links is
not simple
Another drawback is the difficulty with extreme probabilities.
Both drawbacks disappear in the class of decomposable
models, which are a special case of Bayesian networks
25
Decomposable Models
Example: Markov Chains and Markov Trees
Assume the following chain is an I-map of some P(x1,x2,x3,x4)
and was constructed using the methods we just described.
The “compatibility functions” on all links can be easily
interpreted in the case of chains.
Same also for trees. This idea actually works for all chordal graphs.
26
Chordal Graphs
27
Interpretation of the links
Clique 1
Clique 2
Clique 3
A probability distribution that can be written as a product
of low order marginals divided by a product of low order
marginals is said to be decomposable.
28
Importance of Decomposability
When assigning compatibility functions it suffices to
use marginal probabilities on cliques and just make
sure to be locally consistent. Marginals can be
assessed from experts or estimated directly from data.
29
The Diamond Example – The
smallest non chordal graph
Adding one more link will turn the graph to become chordal.
Turning a general undirected graph into a chordal graph in some
optimal way is the key for all exact computations done on Markov
and Bayesian networks.
30
Chordal Graphs
31
Example of the Theorem
1. Each cycle has a chord.
2. There is a way to direct edges legally, namely,
AB , AC , BC , BD , CD, CE
3. Legal removal order (eg): start with E, than D,
than the rest.
4. The maximal cliques form a join (clique) tree.
32