Transcript S=T - Atrey

Selected Topics in
Graphical Models
Petr Šimeček
Independence

Unconditional Independence: X  Y



Discrete r.v.
p( x , y )  p( x )  p( y ) x , y
Continuous r.v. f( x , y )  f( x )  f( y ) P  a . s.
Conditional Independence: X  Y | Z


p( x , y , z )  p( z )  p( x , z )  p( y , z )
x , y , z
Continuous r.v. f( x , y, z )  f( z )  f( x , z )  f( y, z )
Discrete r.v.
P a .s.
List of Independence Relationships
N random variables X1, X2, …, XN and their
distribution P
List of all conditional and unconditional
independence relations between them
{( A, B | C ); A, B , C  {1,..., N } disjunct ,
( X i , i  A)  ( X i , i  B ) | ( X i , i  C )}
Representation by Graph
X1
X2
X6
X2
X3
X3
X1
X5
X4
X4
Example – Sprinkler Network
Cloudy
Sprink
ler
Rain
Wet
Grass
Example – Sprinkler Network
CLO
UDY
T
F
0.5
0.5
Cloudy
Sprink
ler
Rain
Wet
Grass
Example – Sprinkler Network
CLO
UDY
Rain
T
F
0.5
0.5
Cloudy
RAIN
T
F
C=T
0.8
0.2
C=F
0.2
0.8
Wet
Grass
SPRINK
T
F
C=T
0.1
0.9
C=F
0.5
0.5
Sprink
ler
Example – Sprinkler Network
CLO
UDY
Rain
T
F
0.5
0.5
Cloudy
RAIN
T
F
C=T
0.8
0.2
C=F
0.2
0.8
Wet
Grass
SPRINK
T
F
C=T
0.1
0.9
C=F
0.5
0.5
Sprink
ler
WET GRASS
T
F
R=T
S=T
0.99
0.01
R=T
S=F
0.9
0.1
R=F
S=T
0.9
0.1
R=F
S=F
0
1
Example – Sprinkler Network
P( C , R, S ,W )  P( C ). P( R | C ). P( S | C ). P(W | R, S )
The number of parameters needn’t grow
C
exponentially with the number of
variables!
It depends on the number of parents
of nodes.
S
R
W
Purpose 1– Propagation of
Evidence
Cloudy
Rain
What is the
probability that
it is raining if
we know that
grass is wet?
Wet
Grass
Sprink
ler
Propagation of Evidence
In general: I have observed some
variable(s). What is the probability of
other variable(s)? What is the most
probable value(s)?
Why don’t transfer BN to contingency table?
Marginalization does not work for N large:
needs 2N memory, much time,
has low precision…
Propagation of Evidence
In general: I have observed some
variable(s). What is the probability of
other variable(s)? What is the most
probable value(s)?
Why don’t transfer BN to contingency table?
Marginalization does not work for N large:
needs 2N memory, much time,
has low precision…
Purpose 2 – Parameter Learning
CLO
UDY
Rain
T
F
?
?
Cloudy
RAIN
T
F
C=T
?
?
C=F
?
?
Wet
Grass
SPRINK
T
F
C=T
?
?
C=F
?
?
Sprink
ler
WET GRASS
T
F
R=T
S=T
?
?
R=T
S=F
?
?
R=F
S=T
?
?
R=F
S=F
?
?
Parameter Learning
We know:
 graph (CI structure)
 sample (observations) of BN
We don’t know:
 conditional probabilistic distributions
(could be estimated by MLE, Bayesian stat.)
Purpose 3 – Structure Learning
CLOUDY SPRINKLER RAIN
TRUE
FALSE
FALSE
FALSE
TRUE
FALSE
TRUE
FALSE
TRUE
WET GRASS
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE
FALSE
FALSE
FALSE
TRUE
FALSE
FALSE
TRUE
TRUE
FALSE
TRUE
TRUE
TRUE
…
FALSE
…
TRUE
…
FALSE
…
Structure Learning
We know:
 independent observations (data) of BN
 sometimes, the casual ordering of vars
We don’t know:
 graph (CI structure)
 conditional probabilistic distributions
Solution:
 CI tests
 maximization of some criterion – huge s. space
(AIC, BIC, Bayesian approach)
Example – Entry Examination
Markov Equivalence
Some of arcs can be changed without
changing CI relationships.
Wet
Grass
Rain
Wet
Grass
Rain
The best one can hope to do is to identify
the model up to Markov equivalence.
Structure Learning

Theory


algorithms proved to be asymptotically right
Janžura, Nielsen (2003)
1 000 000 observations for 10 binary variables

Practice


in medicine – usually 50-1500 obs.
BNs are often used in spite of that
Structure Learning - Simulation
3 variables, take m from 100 to 1000
 for each m do 100 times





generate of Bayesian network
generate m samples
use K2 structure learning algorithm
count the probability of successful
selection for each m
This should give an answer to the question:
“Is it a chance to find the true model?”
200
400
600
Number of Observations
800
1000
45
50
55
60
65
Probability of True Model Selection (%)
70
75
To Do List:

software: free, open source, easy to use,
fast, separated API

more simulation: theory x practice

popularization of structural learning

Czech literature: maybe my PhD. thesis
References:
Castillo E. et al. (1997): Expert Systems
and Probabilistic Network Models, Springer
Verlag.
 Neapolitan R. E. (2003): Learning
Bayesian Networks, Prentice Hall.
 Janžura N., Nielsen J. (2003): A numerical
method for learning Bayesian Networks
from Statistical Data, WUPES.
