Transcript NO CNV
Automatic threshold selection for conditional independence
tests in learning a Bayesian network
Rafi Bojmel, supervised by Dr. Boaz Lerner
Department of Electrical Engineering, Ben-Gurion University
Constraint Based (CB)
Approach - ‘PC algorithm’
Weight
5
Apple or
Orange?
3
Pixels
Coverge
Stage I
i
i
Pixels
Coverge
j
is the Conditionl Mutual Information
(CMI) and ε is the selected
threshold
ZCD (order=0)
150
ZCD (order=1)
100
50
Weight
Stages II+III:
Orient edges
2
Radius
Apple or
Orange?
Weight
5
Apple or
Orange?
3
4
Pixels
Coverge
Color
?
4
Stage III
1
Find the in/dependence relations between variables
and set of conditional probability distributions to
represent the problem most accurately.
Two approaches (type of algorithms):
2
Radius
Weight
5
Apple or
Orange?
3
4
BN Structure Learning:
Pixels
Coverge
Color
Color
Pixels
Coverge
Automatic Threshold
Selection (ATS)
The Challenge:
•
Automate the learning process by
finding a technique to define the
best threshold (ε) candidate.
• Preserve or improve classification
accuracy.
The Methods:
• Zero Crossing Decision (ZCD)
• Best Candidate (BC)
Both are based on the hypothesis that a
histogram/PDF of the CMI values for a
given database will allow differing
between dependent and independent
nodes, as demonstrated in Figure 3.
Figure 2. The three
stages PC algorithm.
In the first stage, each
edge is removed if and
only if the two nodes
are mutual independent
or conditional mutual
independent, derived
from CI tests.
Ideal and bimodal PDF of Mutual Information
search-and-score: heuristic search method to
construct a model and then evaluates it using a
score.
bimodal MI PDF
ideal MI PDF
MI
f (mi)
constraint-based (CB): constructs the network
by analyzing dependency relationships among
nodes. These relationships are measured by
performing conditional independence (CI) tests.
86.2 (6.0)
72.5 (15.3)
Car
7
92.94 (1.06)
85.07 (1.83)
93.8 (2.4)
93.8 (2.4)
Cmc
10
51.12 (3.16)
50.92 (2.3)
51.3 (3.6)
51.3 (3.6)
Corral
7
98.52 (3.31)
84.53 (15.45)
100 (0)
100 (0)
Crx
16
86.38 (2.63)
86.38 (2.63)
84.5 (3.8)
86.4 (2.6)
Flare
11
84.3 (2.54)
84.3 (2.5)
83.9 (2.5)
84.3 (2.2)
Iris
5
96.0 (4.35)
96.0 (4.35)
94 (4.9)
96.0 (3.4)
Led-7
8
73.59 (1.56)
73.31 (1.8)
NO CNV
71.6 (2.8)
Mofn
11
93.16
81.4
100 (0)
94.1 (7.7)
Voting
17
95.87 (1.71)
95.64 (1.87)
NO CNV
97.1 (3.4)
85.8 (14.2)
82.3 (12.8)
86.7* (15.7)
84.71 (15.4)
3
0.
28
0.
26
0.
24
0.
22
0.
2
0.
18
0.
16
0.
Figure 4. Illustrates the ATS - ZCD technique.
3
Figure 1. describes BN of a
simple apple/orange
classifier. After learning the
structure, we are able to use it
for Bayesian inference. For
example, we measure all
attributes 1-4, use the statistic
relations, and answer the
question “how likely is the fruit
to be an apple”?
85.51 (0.52)
CMI Values
5
1
86.2 (1.5)
* 'NO CNV' (No Convergence) databases were taken out of average values
(ignored) Thus, comparison between average values is not reliable.
0.
2
Radius
15
0
Stage II
1
BC
Table 1. Mean classification accuracy (STD) of the PC and other CB
algorithms on a CV10 experiment. Numbers inside the table are
classification accuracies (percent) together with the standard deviation in
brackets where reported.
200
14
I X 4 , X 5 <
3
i
0.
Color
Apple or
Orange?
j
12
4
5
sS x j X j xi X i
i
0.
I X1, X 4 <
j
250
j
0
Efficient graphical model to represent joint
probability distribution (arcs) of a set of random
variables (nodes).
Weight
ZCD
Order = 0
Australian
Average values (std):
300
1
Bayesian network (BN)
2
Radius
Order 1 (Conditional Mutual Information, |S|=1)
0.
1
Manual
Threshold
Selection
Automatic Thresholds
Selection (ATS)
Order 0 (Mutual Information)
08
Color
n
Other Best CB
from previous
experiment
Histogram of CMI values - Illustration
0.
4
Stage I:
Find all independent nodes and
remove edges accordingly. Based
on conditional independence (CI)
tests. I.e., if I(Xi,Xj | {S}) < ε then
remove edge, where:
P x , x | s
I X , X | S P x , x , s log
P x | s P x | s
06
2
Radius
0.
1
04
Bayesian network (BN) has become one of
the most studied machine learning models for
knowledge representation and probabilistic
inference.
CB algorithm that learns a structure from
complete undirected graph and then "thins" it to
its accurate structure.
Advantage: run time (quick).
Disadvantage: arbitrary significance level
(threshold) to decide on independencies.
CMI Counter
mechanisms by which knowledge is acquired
through experience.
Hard-core ML based applications:
Web search engines, On-line help services
Document processing (text classification, OCR)
Biological data analysis, Military applications
Perform the PC algorithm as needed. On each
step where a threshold selection is needed:
Draw a histogram of CMI values of all
nodes in the database for the relevant
conditional order
Select the first CMI value where the
histogram crosses the CMI counter axis
(see figure 4).
0.
Machine Learning (ML) investigates the
Database
Zero Crossing Desicision (ZCD)
02
Introduction
PC
Figure 3. Ideal Mutual
Information probability
density function (PDF)
Mutual Information I (Xi,Xj) = mi
0
0.2
0.4
0.6
0.8
1
Best Candidate (BC)
Hill-Climbing technique where we select the
best candidate threshold out of a set of them
for each conditional set size separately
Conclusions
On 80% out of the 10 databases each one of the ATS
techniques improved the PC algorithm accuracy, as
well as any other CB algorithm as examined in
previous papers.
Dominance of condition set from order zero: This fact
clarifies that direct and simple relations between
attributes are significantly more common in real life.
Results testify that there is a potential of both enjoying
automatic process and improve performance.
Further research is to be executed in the future in order
to valid and improve the proposed techniques.
Experiments and Results
Ten Real-world databases were taken from the UCI
Repository.
All databases were analyzed by CV10 experiment.
CI tests were carrying using the normalized CMI
criteria: I*=I/N, where N=∏(|Xi|·|Xj|·|S|), with
threshold selection using either of the ATS techniques
Prediction accuracies of the PC algorithm using
manual selection of thresholds, ZCD, BC as well as
best performance achieved out of other CB
algorithms are summarized in Table 1.
Literature cited
Bishop, C. M. Neural Networks for Pattern Recognition. Oxford. 1995.
Cheng, J. & Greiner, R. Learning Bayesian belief network classifiers: Algorithms and system.
Proc. 14th Canadian Conference of Artificial Intelligence, pages 141-150, 2001.
Murphy, K. The Bayes Net Toolbox for Matlab. Computing Science and Statistics, vol 33.
2001. http://www.cs.ubc.ca/~murphyk/
Newman, D.J., Hettich, S., Blake, C.L., and Merz, C.J. Repository of machine learning
databases. U. of California, Irvine, Dept. of Information and Computer Science 1998
http://www.ics.uci.edu/~mlearn/MLRepository.html
Spirtes, P. Glymour, C. & Scheines, R. Causation Prediction and Search, 2nd edition, MIT
Press, 2000.
Yehezkel, R. & Lerner, B. Recursive autonomy identification for Bayesian network structure
learning. The 10th International Workshop on Artificial Intelligence & Statistics,
AISTATS 2005, pages 429-436, 2005.