S1-Inference of Independence - Final

Download Report

Transcript S1-Inference of Independence - Final

Axioms and Algorithms
for Inferences Involving
Probabilistic Independence
Dan Geiger, Azaria Paz, and Judea Pearl,
Information and Computation 91(1), March 1991, 128-141.
Presentation by Guy Moses & Omer Weissbrod
for the course 236372 - Bayesian Networks
Computer Science Faculty, Technion – winter 2009
partially based on the presentation by Ilan Gronau
What’s ahead?
 Introduction
- some definitions, notations and reminders.
 Proof of Completeness.
- “if it’s true – it can be proved”.
 Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
 The Membership Algorithm
– description, proof of correctness, complexity analysis.
2
Definitions
• U (Universe) – set of random variables with probability
distribution P.
• X,Y – finite sets of random variables:
X = x1,…,xn, Y = y1,…,ym
• P(X,Y) = P(X)·P(Y) - a short-hand notation for the equality:
Pr{x1=a1,…, xn=an, y1=b1, …, ym=bm} =
Pr{x1=a1,…, xn=an} · Pr{y1=b1, …, ym=bm}
for every choice of a1, …, an, b1, …, bm
• (X,Y) – short-hand for P(X,Y) = P(X)·P(Y)
This is called an independence statement.
3
*note that X,Y are disjoint sets of variables (XY = ).
Notations
•  - a specific independence statement of the form
(X,Y)
•  - a set of independence statements of the form
(X,Y):  = 1, … , k
• XY - short-hand notation for the union X  Y
• P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y)
for that specific P.
4
Soundness and Completeness
Definitions:
•    iff every distribution that satisfies  also satisfies .
•    iff   cl(),i.e. there exists a derivation chain 1,…,n= 
s.t. for each j, either j   or j is derived by an axiom from
the previous statements.
For a set of axioms A:
Soundness: A is sound iff for every  and :

Completeness: A is complete iff for every  and :

Completeness - Alternative definition:
A is complete iff for every  and every   cl() there exists a
distribution P that satisfies cl)( and does not satisfy .
5
Independence Axioms
1a Trivial Independence:
1b Symmetry:
1c Decomposition:
1d Mixing:
( X , )
( X , Y )  (Y , X )
( X , YZ )  ( X , Y )
( X , Y )  ( XY , Z )  ( X , YZ )
We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly).
Today we’ll show they are complete (can derive every true statement).
6
What’s ahead?
 Introduction
- some definitions, notations and reminders.
 Proof of Completeness.
- “if it’s true – it can be proved”.
 Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
 The Membership Algorithm
– description, proof of correctness, complexity analysis.
7
Minimal Statement
•
Definition: =(X,Y)  cl() is minimal if for every non-empty X’,Y’
s.t. X’X, Y’ Y, X’Y’XY we have (X’,Y’)  cl().
•
For every =(X,Y)  cl() we can find an appropriate minimal
’=(X’,Y’)cl() through iterative decomposition.
•
Observation:
P satisfies 

Therefore:
P doesn’t satisfy ’
P satisfies ’

(decomposition soundness),
P doesn’t satisfy .
•
Our plan: Given an arbitrary   cl(), We will find a distribution P that
satisfies cl() but doesn’t satisfy ’. This will prove completeness (using
the alternative completeness definition and the observation above).
•
To simplify annotation, we will assume WLOG that =(X,Y) is already
minimal.
8
Completeness Proof
Let =(X,Y)  cl() be a minimal statement where:
X={x1,…,xn}, Y={y1,…,ym}, and Z={z1,z2,…,zk} stand for the rest of
the variables in U.
We will construct P as follows: All variables, except x1, are fair coins
(probability  for each of their two values)
x1 is defined thus:
n
m
i 2
j 1
x1   xi   y j  mod 2 
Part 1: P does not satisfy 
We will inspect the following scenario: x1=1, all other variables are 0.
P(x1, … , xn, y1, … , ym)  P(x1, … , xn)·P(y1, … , ym)
=0
=0.5n
Therefore, P does not satisfy , as required.
9
=0.5m
Completeness Proof – cont’d
Part 2: P satisfies cl()
Let (V,W)  cl(). We will show that P(V,W)=P(V)·P(W).
This is done by inspecting different scenarios:
Scenario 1: either V or W contains only elements of Z. We will
assume WLOG that W contains only elements of Z.
all variables in Z are independent under P and therefore:
P VW   P V    P  zi   P V   P W 
zi W
Z
Z
Z
Y
Y
W
Z
Z
Z
X
Z
X
10
Z
Y
Z
X
Z
Y
V
Z
Y
Z
Z
X
Y
Z
Completeness Proof – cont’d
Part 2: P satisfies cl()
Let (V,W)  cl(). We will show that P(V,W)=P(V)·P(W).
This is done by inspecting different scenarios:
Scenario 2: Both V and W contain elements of X  Y,
but V  W doesn’t contain all elements of X  Y.
Without full information about the assignments of the
variables in X  Y, x1 could turn out to be 0 or 1 with
probability , and therefore:
Z
P VW   0.5
 0.5  0.5
V
W
Z
V W
 0.5
V W
 P V   P W 
Z
Y
Z
Z
Z
X
Z
X
11
Y
Z
Z
Y
Z
Z
Y
X
Y
Z
Z
X
Y
Z
Completeness Proof – cont’d
Part 2: P satisfies cl() - continued
Scenario 3: Both V and W contain elements of X  Y,
and (X  Y)(V  W).
We will show a derivation chain for =(X,Y), contradicting our original
assumption that   cl():
Mark: (V,W)=(XVYVZV, XWYWZW)cl()
where: Y=YVYW, X=XVXW, ZVZWZ, V=XVYVZV, W=XWYWZW
Z
Remove all z’s by decomposition: (XVYV, XWYWZ)cl()
Y
Z
Z
Due to minimality of =(X,Y): (XV,YV)cl()
and (XZW,Y)cl()
Z
Z
(XV,YV)(XVYV, XWYW) mix
(XW,Y)  (XWY, XV)
12
mix
X
(XV,YV XYWYW) = (XV,XWY)
Z
(Y, XVXW) = (Y,X) = 
X
Z
Y
V
Z
Y
Z
Z
X
Z
Y
X
Y
Z

Completeness Proof – Summary
Reminder: Completeness - Alternative definition:
A is complete iff for every  and every   cl() there exists a
distribution P that satisfies cl)( and does not satisfy .
We’ve shown: given a minimal   cl(), there exists a distribution P
that obeys:
1.
P does not satisfy .
2.
P satisfies .
Given a non-minimal   cl(), we will derive its minimal statement ’,
and devise a distribution P’ that satisfies  but does not
satisfy ’. Due to soundness of decomposition, P’ cannot satisfy
 as well.
13
Scope of Completeness
The proof uses P - a binary p.d. (probability distribution
function) therefore:
all p.d.’s over U
discrete p.d.’s
binary p.d.’s
however,
normal
p.d.’s
•P
for normal p.d.’s, the axiom set a1-d1 is not complete.
a stronger axiom is required:
replace: 1d Mixing: I P ( X , Y )  I P ( X  Y , Z )  I P ( X , Y  Z )
14
with: 1d' Composition: I P ( X , Y )  I P ( X , Z )
 IP ( X ,Y  Z )
What’s ahead?
 Introduction
- some definitions, notations and reminders.
 Proof of Completeness.
- “if it’s true – it can be proved”.
 Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
 The Membership Algorithm
– description, proof of correctness, complexity analysis.
15
Some more Definitions and Tools
Definition: Span
span(): the set of elements represented in statement .
Example: span(x1x2,x3,x4) = {x1,x2,x3,x4}
span(): the set of elements represented in all statements of .
Example: span({(x1,x2),(x1,x3)}) = {x1,x2,x3}
16
Some more Definitions and Tools
Definition: Projection
The projection of  on X, denoted (X), is the statement
derived from  by removing all elements not in X from .
Example:
if =(x1x2x3, x4x5) and X={x2,x3,x4} then (X)=(x2x3, x4).
The projection of  on X, denoted (X), is {(X) |   }.
17
Some more Definitions and Tools
Projection Lemma:    iff ‘  ,
where ’ = (span())
) if '   then clearly    because all the
statements in ‘ can be derived from the statements in
 by decomposition.
18
Some more Definitions and Tools
Projection Lemma:    iff ’  ,
where ’ = (span()), s = span()
) if    then there is a derivation chain for : 1, 2, … , k.
For each j:
if k  j, k<j, (by symmetry or decomposition)
then k(s)  j(s) by symmetry or decomposition respectively.
Similarly,
if j is derived from k and l by mixing,
then j(s) is derived from k(s), l(s) by mixing. 
19
Some more Definitions and Tools
Projection Lemma:    iff ’  ,
where ’ = (span()), s = span()
Observations from projection lemma:
• Variables not in  are unnecessary for determining whether   .
• The problem of verifying whether    can be simplified to the
problem of verifying whether '  , where '= (span()).
• This problem can be solved with a possibly reduced time and space
complexity.
20
Conditions for
Inference of Independence
Maim claim: for a given ,  we have ’   iff:
1.
 is trivial: =(X,) (up to symmetry)
OR
2.
 is in ’: ’ (up to symmetry)
OR
3.
 is derivable from ’:
there exists ’’ s.t. span() = span(’)
and
for ’=(AP,BQ) =(AQ,BP) (A,B,Q,P may be empty)
’  (A,P), ’  (B,Q)
21
(up to symmetry)
Proof of Main Claim
Maim claim: for a given ,  we have ’   iff:
1.  is trivial*: =(X,)
*up to symmetry
2.  is in* ’: ’
3.  is derivable* from ’:  ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)
) if 1.  is trivial*
OR
2.  is in* ’. than the proof is immediate.
otherwise,
3. there exists ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)
we will show a constructive proof under these conditions
22
Proof of Main Claim
Maim claim: for a given ,  we have ’   iff:
1.  is trivial*: =(X,)
*up to symmetry
2.  is in* ’: ’
3.  is derivable* from ’:  ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)
) (contd.) given that ’  (AP,BQ), ’  (A,P), ’  (B,Q).
1.
(A,P)(AP,BQ)
2.
(B,Q)(AP,BQ)
3.
(PB,Q)(A,PBQ)
mix
mix
mix
We’ve proven this direction.
23
(A,PBQ)
(APB,Q)
(AQ,PB)
dec.
=
(PB,Q)
(AQ, BP) = 
Proof of Main Claim
Maim claim: for a given ,  we have ’   iff:
1.  is trivial*: =(X,)
*up to symmetry
2.  is in* ’: ’
3.  is derivable* from ’:  ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)
) Given ’  , if 1.  is trivial* OR 2.  is in* ’,
than the proof is immediate.
Otherwise, since no axiom can add new variables to a
statement, there must exist ’’ s.t. span() = span(’)
in the derivation chain of .
also:  = (AQ,BP)
dec.
 = (AQ,BP)
dec.
24
(A,P)
(Q,B)

Conclusions from Claim
•
We’ve seen that, after discarding unneeded variables,
it is possible to tell whether ’   (when it’s not
immediately obvious) by:
a. Finding another statement ’’ for which
span() = span(’),
b. Verifying that ’  (A,P), ’  (B,Q)
when ’=(AP,BQ) =(AQ,BP).
•
This suggests using a recursive “divide and conquer”
approach.
25
What’s ahead?
 Introduction
- some definitions, notations and reminders.
 Proof of Completeness.
- “if it’s true – it can be proved”.
 Preparations for the Membership Algorithm
– more definitions, and some theoretical groundwork.
 The Membership Algorithm
– description, proof of correctness, complexity analysis.
26
The Membership Algorithm
Procedure Find(,):
1.
set ’ := (span()).
2. if  is trivial, or ’ (up to symmetry)
then Find(,) := TRUE.
3. else if for all non-trivial ’’: span()  span(’),
then Find(,) := FALSE.
4. else there exists ’’: span() = span(’),
and ’=(AP,BQ) =(AQ,BP),
set 1:= (A,P), 2:= (B,Q).
Find(,) := (Find(’,1)  Find(’,2))
27
Algorithm Correctness Proof
We will prove that Find(,) := TRUE  cl() by
induction on k=.
Induction base: if k=1 then  is trivial, therefore the
algorithm will return TRUE in step 2 and cl().
28
Algorithm Correctness Proof
Induction assumption: Find(,) := TRUE  cl()
for each ’<k.
Induction step: Find(,) := TRUE iff either:
1. Step 2 returns TRUE   is trivial or ’  cl().
2. Step 4 returns TRUE
iff (according to algorithm’s definition)
Find(’,1) := TRUE  Find(’,2) := TRUE
iff (according to induction assumption)
1cl(’)  2cl(’)
iff (according to main claim)
cl(’)
iff cl() (according to projection lemma) 
29
Complexity Analysis
Definitions:
n = the number of distinct variables in   {}.
k = the number of distinct variables in {}.
• First projection cost: O(||·n) – happens only once.
• Recursive step: T)k)  ||·k + T(k1) + T(k2)
where k1+k2=k, k1=|1|, k2=|2|
• Can be shown by induction: T)k)  ||·k·(depth of recursion)
• Worst case analysis: T)k)  ||·k·k= ||·k2
• Total run time is bounded by: O(||·n + ||·k2)
which is also: O(||·n2) since k n.
30
Improvements and Variations
• Instead of arbitrarily choosing ’, find one whose substatements {A,B,P,Q} have balanced size (can improve
run-time complexity).
• Using the derivation chain presented in the
constructive proof, the algorithm can also return a
derivation chain for  with a length
of O(k).
31
Variations (contd.)
The algorithm can be expanded into a polynomial algorithm
for the following problems:
• Given two sets  and ,
is cl()  cl() ?
is cl() = cl() ?
• Minimize the size of  while preserving cl():
Start with a maximal-size statement and remove from
 all statements derivable from it.
Repeat with the next largest statement etc.
32
Thank you!