Augmented Bayesian Classifier

Download Report

Transcript Augmented Bayesian Classifier

Augmented Bayesian
Classifier
Beijie Xu
1
problem to be solved
• NAÏVE Bayesian Classifier
– Why naïve? - conditional independence
among attributes
– Is that true in real world situation?
• Augmented Bayesian Classifier
– Allows for relationship between attributes
2
Assumption / limitation
• Each node/attribute may have at most one
(non-class) parent.
• An arc from A1 to A2 indicates that
attribute A2 depends on A1
3
Naïve Bayesian Classifier
• The probability of an instance j belonging
to class i is proportional to
P(Ci ) P( Ak  Vkj | Ci )
---- ①
k
• Augmented Bayesian is to modify this
probability (considering the dependency
between attributes)
4
If A1A2
• Multiple ① (slide#5) with
P( Aa  Vaj | Ci & Ab  Vbj )
P( Aa  Vaj | Ci )
• Denominator: factor OUT the effect of the
orphan node Aa
• Nominator: factor IN the effect of the arc
from Ab to Aa. (the mutual effect of a and b
given class Ci)
5
I pˆ D ( Ai, Aj | C )
P ( x, y , z ) P ( z )
I p ( X , Y | Z )   P( x, y, z ) log
P ( x, z ) P ( y , z )
x, y ,z
Let X,Y,Z be three variables, then the conditional mutual information
between X and Y given Z is determined by above equation
6
Greedy Hill Climbing Search (HCS)
• Step 1
– an J*I matrix where cell[i,j] is the probability
that instance j belongs to class Ci. (there are
J instances and I classes)
7
Greedy Hill Climbing Search (HCS)
(2/3)
• Step2:
– Try every possible arc. If there is an arc
addition, which improves accuracy, then add
the arc which improves accuracy the most to
the currect network
8
Greedy Hill Climbing Search (HCS)
(3/3)
• Step3: update I*J matrix
Go back to step 2
Iterate until no “arc addition”
9
Three definitions
• Orphan – A node without a parent other than the class
node is called an orphan.
• SuperParen – if we extend arcs from node Ax to every
orphan, node Ax is a SuperParent
• FavoriteChild – if we extend an arc from node Ax to each
orphan in turn, and test the effect on predictive accuracy,
the node pointed to by the best arc is said to be the
FavoriteChild of Ax.
10
SuperParent (1/3)
• If an attribute is truly
independent of all
other attributes
P(Ci ) P( Ak  Vkj | Ci )  P(Ci ) P( Ak  Vk j | Ci & ASP )
k
k
--- (2)
11
SuperParent (2/3)
• Step1
– Find an attribute Asp that makes the right-hand
side of equation (2) (slide#11) differs from the
left-hand side the most.
12
SuperParent (3/3)
• Step2:
– Find Asp’s FavoriteChild
Step1-2
iteration
13
questions
• Does SuperParent produce the same set
of augmented arcs as HCS?
• How many children attributes can an
attribute have in each algorithm?
• Does the order of trying (possible arcs)
affect the result?
14
15
references
• E, Keogh and M. J. Pazzeni, “Learning the
structure of augmented Bayesian
classifiers,” International Journal on
Artificial Intelligence Tools, vol. 11, no. 4,
pp. 587-601, 2002.
16