Learning Bayesian Metanetworks

Download Report

Transcript Learning Bayesian Metanetworks

Contextual level
Predictive level
Learning Bayesian Metanetworks
from Data with Multilevel
Uncertainty
Vagan Terziyan, Oleksandra Vitko
[email protected], [email protected]
University of Jyväskylä ,
Kharkov National University of Radioelectronics
AIAI-2004 (WCC 2004), Toulouse, France
24 August 2004
Contents
 Bayesian Metanetworks


Metanetworks for
managing conditional
dependencies
Metanetworks for
managing feature
relevance
 Learning Bayesian
Metanetworks from
Data
 Conclusions
This presentation: http://www.cs.jyu.fi/ai/AIAI-2004.ppt
Oleksandra Vitko
Department of Artificial Intelligence
Kharkov National University of
Radioelectronics (Ukraine)
http://www.cs.jyu.fi/ai/oleksandra
Vagan Terziyan
Industrial Ontologies Group
Department of Mathematical
Information Technologies
University of Jyvaskyla (Finland)
http://www.cs.jyu.fi/ai/vagan
2
Bayesian Metanetworks
3
Bayesian Metanetwork
 Definition. The Bayesian Metanetwork is a
set of Bayesian networks, which are put on
each other in such a way that the elements
(nodes or conditional dependencies) of every
previous probabilistic network depend on the
local probability distributions associated with
the nodes of the next level network.
4
Two-level Bayesian C-Metanetwork
for Managing Conditional Dependencies
Contextual level
Predictive level
5
Contextual and Predictive Attributes
air pressure
dust
humidity
temperature
Machine
emission
Environment
Sensors
X
x1
x2
x3
predictive attributes
x4
x5
x6
x7
contextual attributes
6
Contextual Effect on Conditional
Probability (1)
X
x1
x2
x3
x4
xk
x6
x7
contextual attributes
predictive attributes
Assume conditional
dependence between
predictive attributes
(causal relation between
physical quantities)…
x5
xt
xr
… some contextual
attribute may effect
directly the conditional
dependence between
predictive attributes but
not the attributes itself
7
Contextual Effect on Conditional
Probability (2)
•X ={x1, x2, …, xn} – predictive attribute
with n values;
•Z ={z1, z2, …, zq} – contextual attribute
with q values;
•P(Y|X) = {p1(Y|X), p2(Y|X), …, p r(Y|X)} –
conditional dependence attribute (random
variable) between X and Y with r possible
values;
•P(P(Y|X)|Z) – conditional dependence
between attribute Z and attribute P(Y|X);
r
n
P(Y  y j )  { pk (Y  y j | X  xi )  P( X  xi ) 
k 1 i 1
q
  [ P( Z  zm )  P( P(Y | X )  pk (Y | X ) | Z  zm )]}
m1
8
Contextual Effect on Conditional
Probability (3)
Xt1 : I am in Paris
xt
Xt2 : I am in Moscow
P1(Xr |Xk )
Xk1
Xk2
Xk1 : order flowers
Xr1
0.3 0.9
Xr1 : visit football match
Xk2 : order wine
Xr2
0.4 0.5
Xr2 : visit girlfriend
xr
xk
Xk : Order present
P2(Xr |Xk )
Xk1
Xk2
Xr1
0.1 0.2
Xr2
0.8 0.7
Xr : Make a visit
9
Contextual Effect on Conditional
Probability (4)
Xt1 : I am in Paris
Xt2 : I am in Moscow
P( P (Xr |Xk ) | Xt )
X t1
X t2
P1(Xr |Xk )
0.7
0.2
P2(Xr |Xk )
0.3
0.8
xt
xr
xk
P1(Xr |Xk )
Xk1
Xk2
P2(Xr |Xk )
Xk1
Xk2
Xr1
0.3 0.9
Xr1
0.1 0.2
Xr2
0.4 0.5
Xr2
0.8 0.7
10
Contextual Effect on Unconditional
Probability (1)
X
x1
x2
x3
x4
X
xk
x7
xt
P(X)
x1 x2 x3 x4
x6
contextual attributes
predictive attributes
Assume some predictive
attribute is a random
variable with appropriate
probability distribution
for its values…
x5
… some contextual
attribute may effect
directly the probability
distribution of the
predictive attribute
11
Contextual Effect on Unconditional
Probability (2)

X ={x1, x2, …, xn} – predictive attribute with n
values;
· Z ={z1, z2, …, zq} – contextual attribute with q values
and P(Z) – probability distribution for values of Z;
• P(X) = {p1(X), p2(X), …, pr(X)} – probability
distribution attribute for X (random variable) with r
possible values (different possible probability
distributions for X) and P(P(X)) is probability
distribution for values of attribute P(X);
· P(Y|X) is a conditional probability distribution of Y
given X;
· P(P(X)|Z) is a conditional probability distribution for
attribute P(X) given Z
r
n
P(Y  y j )  {P(Y  y j | X  xi )  pk ( X  xi ) 
k 1 i 1
q
  [ P( Z  zm )  P( P( X )  pk ( X ) | Z  zm )]}
m1
12
Contextual Effect on Unconditional
Probability (3)
P( P (Xk ) | Xt )
X t1
X t2
P1(Xk )
0.4
0.9
P2(Xk )
0.6
0.1
xt
P1(Xk)
Xt2 : I am in Moscow
P2(Xk)
0.7
0.5
0.3
0.2
Xk
Xk
Xk1 Xk2
Xk1 Xk2
Xk1 : order flowers
Xk2 : order wine
Xt1 : I am in Paris
xk
Xk : Order present
13
Causal Relation between Conditional
Probabilities
xm
xn
P(P(Xn| Xm))
P(Xn| Xm)
P1(Xn|Xm) P2(Xn|Xm) P3(Xn|Xm)
P(P(Xr| Xk))
P(P(Xr| Xk)|P(Xn| Xm))
P(Xr| Xk)
P1(Xr|Xk) P2(Xr|Xk)
xk
There might be causal relationship
between two pairs of conditional
probabilities
xr
14
Two-level Bayesian C-Metanetwork
for managing conditional dependencies
Contextual level
P(B|A)
P(Y|X)
A
B
X
Predictive level
Y
15
Example of Bayesian C-Metanetwork
The nodes of the
2nd-level network
correspond to the
conditional
probabilities of the
1st-level network
P(B|A) and P(Y|X).
The arc in the 2ndlevel network
corresponds to the
conditional
probability
P(P(Y|X)|P(B|A))
P(Y  y j )  { pk (Y  y j | X  xi )  P( X  xi ) 
i
k
 [ P( P(Y | X )  pk (Y | X ) | P( P( B | A)  pr (Y | X )) P( P( B | A)  pr ( B | A))]}.
16
r
Two-level Bayesian R-Metanetwork
for Modelling Relevant Features’ Selection
Contextual level
Predictive level
17
Feature relevance modelling (1)
We consider relevance as a
probability of importance of
the variable to the inference
of target attribute in the
given context. In such
definition relevance inherits
all properties of a probability.
P(X)
Probability to have this model
is:
X
P((X)=”no”)= 1-X
Probability to have this
model is:
P((X)=”yes”)= X
P(Y|X)
Y
P0(Y)
Y
18
Feature relevance modelling (2)
1
P(Y )    P(Y | X )  [nx  X  P( X )  (1  X )].
nx X
19
General Case of Managing Relevance (1)
Predictive attributes:
X1 with values {x11,x12,…,x1nx1};
X2 with values {x21,x22,…,x2nx2};
…
XN with values {xn1,xn2,…,xnnxn};
Target attribute:
Y with values {y1,y2,…,yny}.
Probabilities:
P(X1), P(X2),…, P(XN);
P(Y|X1,X2,…,XN).
Relevancies:
X1 = P((X1) = “yes”);
X2 = P((X2) = “yes”);
…
XN = P((XN) = “yes”);
Goal: to estimate P(Y).
20
General Case of Managing Relevance (2)
Probability
P(XN)
P(Y ) 
1
N
 nxs
s 1
  ... [ P(Y | X 1, X 2,...XN ) 
X1 X 2
XN
nxr 


r ( ( Xr )" yes")
Xr
 P( Xr) 
(1  


Xq
)]
q ( ( Xq )"no ")
21
Example of Relevance Bayesian
Metanetwork (1)
Conditional
relevance !!!
1
P(Y ) 
  {P(Y | X )  [nx  P( X ) 
nx X
  P( X |  A )  P( A )  (1   X )]}.
A
22
Example of Relevance Bayesian
Metanetwork (2)
23
Learning Bayesian
Metanetworks from Data
24
Learning Task
 Given training set D of training
examples <X1, X2, … Xn, Y>
 Goal is to restore:
 the set of levels of Bayesian
Metanetwork {l1,, l2,, …lL}, each level is
a Bayesian network;
 the interlevel links for each pair of
successive levels {lr , lr+1};
 the network structure and parameters
at each level, particularly probabilities
P(vi) and P(vi|parents(vi)) for each
variable vi.
<9.7
<0.2
<1.3
<??
0.6
1.3
2.8
5.6
8 14 18>
5 ?? ??>
?? 0 1 >
0 10 ??>
……………….
25
Learning Bayesian Metanetwork
 Use well-known learning methods for learning
component Bayesian networks on each level
of the Metanetwork
 Add procedures for learning interlevel
relationships for the case of multilevel
probabilistic Metanetworks
26
Learning Process
Stage 1.
Division of
attributes
on the
levels
Stage 2.
Learning
the
network
structure
Stage 3.
Learning the
interlevel
links to the
subsequent
level
Stage 4.
Learning
the
network
parameters
over all levels of Metanetwork
27
Stage 1.
Division of attributes among the levels
 The task of this stage is to divide the input
vector of attributes <X1, X2, … Xn> into the
predictive, contextual and perhaps
metacontextual attributes.
X
x1
x2
x3
predictive attributes
x4
x5
x6
x7
contextual attributes
28
Stage 2.
Learning the network structure at the
current level of Metanetwork
 can be made by well-known methods with
good performance
(Cheng-Greiner method,
KA2 algorithm, etc.)
A
B
C
D
E
29
Stage 3.
Learning the interlevel links between
the current and subsequent levels
 This is a new stage that has been added
specifically for a Bayesian Metanetwork
learning.
 Differs for the C-Metanetwork and for the
R-Metanetwork.
30
Learning Interlevel Links in
C-Metanetwork
P1 (B|A)
<A, B, X, Y>1
Context 1
P1 (Y|X)
P2 (B|A)
<A, B, X, Y>2
...
<A, B, X, Y>n
Context 2
P2 (Y|X)
Pn (B|A)
Context n
Pn (Y|X)
31
Different probability tables
corresponding
to different
contexts
are associated
with vertexes of
the second-level
Bayesian network
32
Context variables in C-Metanetwork
context random
variable U
{Pj(B|A)}
P(W|U)
context random
variable W
{Pt(Y|X)}
P(W | U)  {P((W  Pj (Y | X)) | (U  Pt (B| A))}
33
Learning Interlevel Links in
R-Metanetwork
Context 1
<A, B, X, Y>1
1(A) 1(B) 1(X)
Context 2
<A, B, X, Y>2
2(A) 2(B) 2(X)
...
Context n
<A, B, X, Y>n
n(A) n(B) n(X)
34
Different relevancies
corresponding
to different
contexts
are associated
with vertexes of
the second-level
Bayesian network
35
Context variables in R-Metanetwork
context random
variable U
{j(X)}
P(W|U)
context random
variable W
{t(A)}
P(W|U) = {P((W = ψj (X))| (U =ψt (A))=ψjt (X|A)}
36
Stage 4.
Learning the parameters in the
network at the current level
P(A)
 is made by the standard
A
P(B)
B
P(D)
procedure just taking into
account the dynamics of
D
parameters’ values in
different contexts
P(D|A,B)
P(C)
C
P(E)
P(C|B)
E
P(E|D,C)
37
When Bayesian Metanetworks ?
1.
Bayesian Metanetwork can be considered as
very powerful tool in cases where structure
(or strengths) of causal relationships
between observed parameters of an object
essentially depends on context (e.g. external
environment parameters);
2.
Also it can be considered as a useful model
for such an object, which diagnosis depends
on different set of observed parameters
depending on the context.
38
Conclusions
 The main challenge of this work is the
extension of the standard Bayesian learning
procedures with the algorithm of learning the
interlevel links
 The experiments on the data from the highly-
contextual domain have shown the
effectiveness of the proposed models and
learning procedures
39
Read more about Bayesian
Metanetworks in:
Terziyan V., A Bayesian Metanetwork, In: In:
International Journal on Artificial Intelligence Tools,
Vol. 14, Ns. 3-4, World Scientific (to appear).
http://www.cs.jyu.fi/ai/IJAIT-2003.doc
Terziyan V., Vitko O., Bayesian Metanetwork for
Modelling User Preferences in Mobile
Environment, In: German Conference on Artificial
Intelligence (KI-2003), Hamburg, Germany,
September 15-18, 2003.
http://www.cs.jyu.fi/ai/papers/KI-2003.pdf
Vitko O. The Multilevel Probabilistic Networks for
Modelling Complex Information Systems under
Uncertainty. Ph.D. Thesis, Kharkov National University of
Radioelectronics, 2003.
40