S1-Inference of Independence - Final
Download
Report
Transcript S1-Inference of Independence - Final
Axioms and Algorithms
for Inferences Involving
Probabilistic Independence
Dan Geiger, Azaria Paz, and Judea Pearl,
Information and Computation 91(1), March 1991, 128-141.
Presentation by Guy Moses & Omer Weissbrod
for the course 236372 - Bayesian Networks
Computer Science Faculty, Technion – winter 2009
partially based on the presentation by Ilan Gronau
What’s ahead?
Introduction
- some definitions, notations and reminders.
Proof of Completeness.
- “if it’s true – it can be proved”.
Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
The Membership Algorithm
– description, proof of correctness, complexity analysis.
2
Definitions
• U (Universe) – set of random variables with probability
distribution P.
• X,Y – finite sets of random variables:
X = x1,…,xn, Y = y1,…,ym
• P(X,Y) = P(X)·P(Y) - a short-hand notation for the equality:
Pr{x1=a1,…, xn=an, y1=b1, …, ym=bm} =
Pr{x1=a1,…, xn=an} · Pr{y1=b1, …, ym=bm}
for every choice of a1, …, an, b1, …, bm
• (X,Y) – short-hand for P(X,Y) = P(X)·P(Y)
This is called an independence statement.
3
*note that X,Y are disjoint sets of variables (XY = ).
Notations
• - a specific independence statement of the form
(X,Y)
• - a set of independence statements of the form
(X,Y): = 1, … , k
• XY - short-hand notation for the union X Y
• P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y)
for that specific P.
4
Soundness and Completeness
Definitions:
• iff every distribution that satisfies also satisfies .
• iff cl(),i.e. there exists a derivation chain 1,…,n=
s.t. for each j, either j or j is derived by an axiom from
the previous statements.
For a set of axioms A:
Soundness: A is sound iff for every and :
Completeness: A is complete iff for every and :
Completeness - Alternative definition:
A is complete iff for every and every cl() there exists a
distribution P that satisfies cl)( and does not satisfy .
5
Independence Axioms
1a Trivial Independence:
1b Symmetry:
1c Decomposition:
1d Mixing:
( X , )
( X , Y ) (Y , X )
( X , YZ ) ( X , Y )
( X , Y ) ( XY , Z ) ( X , YZ )
We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly).
Today we’ll show they are complete (can derive every true statement).
6
What’s ahead?
Introduction
- some definitions, notations and reminders.
Proof of Completeness.
- “if it’s true – it can be proved”.
Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
The Membership Algorithm
– description, proof of correctness, complexity analysis.
7
Minimal Statement
•
Definition: =(X,Y) cl() is minimal if for every non-empty X’,Y’
s.t. X’X, Y’ Y, X’Y’XY we have (X’,Y’) cl().
•
For every =(X,Y) cl() we can find an appropriate minimal
’=(X’,Y’)cl() through iterative decomposition.
•
Observation:
P satisfies
Therefore:
P doesn’t satisfy ’
P satisfies ’
(decomposition soundness),
P doesn’t satisfy .
•
Our plan: Given an arbitrary cl(), We will find a distribution P that
satisfies cl() but doesn’t satisfy ’. This will prove completeness (using
the alternative completeness definition and the observation above).
•
To simplify annotation, we will assume WLOG that =(X,Y) is already
minimal.
8
Completeness Proof
Let =(X,Y) cl() be a minimal statement where:
X={x1,…,xn}, Y={y1,…,ym}, and Z={z1,z2,…,zk} stand for the rest of
the variables in U.
We will construct P as follows: All variables, except x1, are fair coins
(probability for each of their two values)
x1 is defined thus:
n
m
i 2
j 1
x1 xi y j mod 2
Part 1: P does not satisfy
We will inspect the following scenario: x1=1, all other variables are 0.
P(x1, … , xn, y1, … , ym) P(x1, … , xn)·P(y1, … , ym)
=0
=0.5n
Therefore, P does not satisfy , as required.
9
=0.5m
Completeness Proof – cont’d
Part 2: P satisfies cl()
Let (V,W) cl(). We will show that P(V,W)=P(V)·P(W).
This is done by inspecting different scenarios:
Scenario 1: either V or W contains only elements of Z. We will
assume WLOG that W contains only elements of Z.
all variables in Z are independent under P and therefore:
P VW P V P zi P V P W
zi W
Z
Z
Z
Y
Y
W
Z
Z
Z
X
Z
X
10
Z
Y
Z
X
Z
Y
V
Z
Y
Z
Z
X
Y
Z
Completeness Proof – cont’d
Part 2: P satisfies cl()
Let (V,W) cl(). We will show that P(V,W)=P(V)·P(W).
This is done by inspecting different scenarios:
Scenario 2: Both V and W contain elements of X Y,
but V W doesn’t contain all elements of X Y.
Without full information about the assignments of the
variables in X Y, x1 could turn out to be 0 or 1 with
probability , and therefore:
Z
P VW 0.5
0.5 0.5
V
W
Z
V W
0.5
V W
P V P W
Z
Y
Z
Z
Z
X
Z
X
11
Y
Z
Z
Y
Z
Z
Y
X
Y
Z
Z
X
Y
Z
Completeness Proof – cont’d
Part 2: P satisfies cl() - continued
Scenario 3: Both V and W contain elements of X Y,
and (X Y)(V W).
We will show a derivation chain for =(X,Y), contradicting our original
assumption that cl():
Mark: (V,W)=(XVYVZV, XWYWZW)cl()
where: Y=YVYW, X=XVXW, ZVZWZ, V=XVYVZV, W=XWYWZW
Z
Remove all z’s by decomposition: (XVYV, XWYWZ)cl()
Y
Z
Z
Due to minimality of =(X,Y): (XV,YV)cl()
and (XZW,Y)cl()
Z
Z
(XV,YV)(XVYV, XWYW) mix
(XW,Y) (XWY, XV)
12
mix
X
(XV,YV XYWYW) = (XV,XWY)
Z
(Y, XVXW) = (Y,X) =
X
Z
Y
V
Z
Y
Z
Z
X
Z
Y
X
Y
Z
Completeness Proof – Summary
Reminder: Completeness - Alternative definition:
A is complete iff for every and every cl() there exists a
distribution P that satisfies cl)( and does not satisfy .
We’ve shown: given a minimal cl(), there exists a distribution P
that obeys:
1.
P does not satisfy .
2.
P satisfies .
Given a non-minimal cl(), we will derive its minimal statement ’,
and devise a distribution P’ that satisfies but does not
satisfy ’. Due to soundness of decomposition, P’ cannot satisfy
as well.
13
Scope of Completeness
The proof uses P - a binary p.d. (probability distribution
function) therefore:
all p.d.’s over U
discrete p.d.’s
binary p.d.’s
however,
normal
p.d.’s
•P
for normal p.d.’s, the axiom set a1-d1 is not complete.
a stronger axiom is required:
replace: 1d Mixing: I P ( X , Y ) I P ( X Y , Z ) I P ( X , Y Z )
14
with: 1d' Composition: I P ( X , Y ) I P ( X , Z )
IP ( X ,Y Z )
What’s ahead?
Introduction
- some definitions, notations and reminders.
Proof of Completeness.
- “if it’s true – it can be proved”.
Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
The Membership Algorithm
– description, proof of correctness, complexity analysis.
15
Some more Definitions and Tools
Definition: Span
span(): the set of elements represented in statement .
Example: span(x1x2,x3,x4) = {x1,x2,x3,x4}
span(): the set of elements represented in all statements of .
Example: span({(x1,x2),(x1,x3)}) = {x1,x2,x3}
16
Some more Definitions and Tools
Definition: Projection
The projection of on X, denoted (X), is the statement
derived from by removing all elements not in X from .
Example:
if =(x1x2x3, x4x5) and X={x2,x3,x4} then (X)=(x2x3, x4).
The projection of on X, denoted (X), is {(X) | }.
17
Some more Definitions and Tools
Projection Lemma: iff ‘ ,
where ’ = (span())
) if ' then clearly because all the
statements in ‘ can be derived from the statements in
by decomposition.
18
Some more Definitions and Tools
Projection Lemma: iff ’ ,
where ’ = (span()), s = span()
) if then there is a derivation chain for : 1, 2, … , k.
For each j:
if k j, k<j, (by symmetry or decomposition)
then k(s) j(s) by symmetry or decomposition respectively.
Similarly,
if j is derived from k and l by mixing,
then j(s) is derived from k(s), l(s) by mixing.
19
Some more Definitions and Tools
Projection Lemma: iff ’ ,
where ’ = (span()), s = span()
Observations from projection lemma:
• Variables not in are unnecessary for determining whether .
• The problem of verifying whether can be simplified to the
problem of verifying whether ' , where '= (span()).
• This problem can be solved with a possibly reduced time and space
complexity.
20
Conditions for
Inference of Independence
Maim claim: for a given , we have ’ iff:
1.
is trivial: =(X,) (up to symmetry)
OR
2.
is in ’: ’ (up to symmetry)
OR
3.
is derivable from ’:
there exists ’’ s.t. span() = span(’)
and
for ’=(AP,BQ) =(AQ,BP) (A,B,Q,P may be empty)
’ (A,P), ’ (B,Q)
21
(up to symmetry)
Proof of Main Claim
Maim claim: for a given , we have ’ iff:
1. is trivial*: =(X,)
*up to symmetry
2. is in* ’: ’
3. is derivable* from ’: ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
) if 1. is trivial*
OR
2. is in* ’. than the proof is immediate.
otherwise,
3. there exists ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
we will show a constructive proof under these conditions
22
Proof of Main Claim
Maim claim: for a given , we have ’ iff:
1. is trivial*: =(X,)
*up to symmetry
2. is in* ’: ’
3. is derivable* from ’: ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
) (contd.) given that ’ (AP,BQ), ’ (A,P), ’ (B,Q).
1.
(A,P)(AP,BQ)
2.
(B,Q)(AP,BQ)
3.
(PB,Q)(A,PBQ)
mix
mix
mix
We’ve proven this direction.
23
(A,PBQ)
(APB,Q)
(AQ,PB)
dec.
=
(PB,Q)
(AQ, BP) =
Proof of Main Claim
Maim claim: for a given , we have ’ iff:
1. is trivial*: =(X,)
*up to symmetry
2. is in* ’: ’
3. is derivable* from ’: ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
) Given ’ , if 1. is trivial* OR 2. is in* ’,
than the proof is immediate.
Otherwise, since no axiom can add new variables to a
statement, there must exist ’’ s.t. span() = span(’)
in the derivation chain of .
also: = (AQ,BP)
dec.
= (AQ,BP)
dec.
24
(A,P)
(Q,B)
Conclusions from Claim
•
We’ve seen that, after discarding unneeded variables,
it is possible to tell whether ’ (when it’s not
immediately obvious) by:
a. Finding another statement ’’ for which
span() = span(’),
b. Verifying that ’ (A,P), ’ (B,Q)
when ’=(AP,BQ) =(AQ,BP).
•
This suggests using a recursive “divide and conquer”
approach.
25
What’s ahead?
Introduction
- some definitions, notations and reminders.
Proof of Completeness.
- “if it’s true – it can be proved”.
Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
The Membership Algorithm
– description, proof of correctness, complexity analysis.
26
The Membership Algorithm
Procedure Find(,):
1.
set ’ := (span()).
2. if is trivial, or ’ (up to symmetry)
then Find(,) := TRUE.
3. else if for all non-trivial ’’: span() span(’),
then Find(,) := FALSE.
4. else there exists ’’: span() = span(’),
and ’=(AP,BQ) =(AQ,BP),
set 1:= (A,P), 2:= (B,Q).
Find(,) := (Find(’,1) Find(’,2))
27
Algorithm Correctness Proof
We will prove that Find(,) := TRUE cl() by
induction on k=.
Induction base: if k=1 then is trivial, therefore the
algorithm will return TRUE in step 2 and cl().
28
Algorithm Correctness Proof
Induction assumption: Find(,) := TRUE cl()
for each ’<k.
Induction step: Find(,) := TRUE iff either:
1. Step 2 returns TRUE is trivial or ’ cl().
2. Step 4 returns TRUE
iff (according to algorithm’s definition)
Find(’,1) := TRUE Find(’,2) := TRUE
iff (according to induction assumption)
1cl(’) 2cl(’)
iff (according to main claim)
cl(’)
iff cl() (according to projection lemma)
29
Complexity Analysis
Definitions:
n = the number of distinct variables in {}.
k = the number of distinct variables in {}.
• First projection cost: O(||·n) – happens only once.
• Recursive step: T)k) ||·k + T(k1) + T(k2)
where k1+k2=k, k1=|1|, k2=|2|
• Can be shown by induction: T)k) ||·k·(depth of recursion)
• Worst case analysis: T)k) ||·k·k= ||·k2
• Total run time is bounded by: O(||·n + ||·k2)
which is also: O(||·n2) since k n.
30
Improvements and Variations
• Instead of arbitrarily choosing ’, find one whose substatements {A,B,P,Q} have balanced size (can improve
run-time complexity).
• Using the derivation chain presented in the
constructive proof, the algorithm can also return a
derivation chain for with a length
of O(k).
31
Variations (contd.)
The algorithm can be expanded into a polynomial algorithm
for the following problems:
• Given two sets and ,
is cl() cl() ?
is cl() = cl() ?
• Minimize the size of while preserving cl():
Start with a maximal-size statement and remove from
all statements derivable from it.
Repeat with the next largest statement etc.
32
Thank you!