presentation - The Chinese University of Hong Kong

Download Report

Transcript presentation - The Chinese University of Hong Kong

Constructing a Large Node Chow-Liu
Tree Based on Frequent Itemsets
Kaizhu Huang, Irwin King, Michael R. Lyu
Multimedia Information Processing Laboratory
The Chinese University of Hong Kong
Shatin, NT. Hong Kong
{kzhuang, king, lyu}@cse.cuhk.edu.hk
ICONIP2002, November 19, 2002
Orchid Country Club, Singapore
Outline
Background


Probabilistic Classifiers
Chow-Liu Tree
Motivation
Large Node Chow-Liu tree
Experimental Results
Conclusion
A Typical Classification Problem
Given a set of symptoms, one wants to find
out whether these symptoms give rise to a
particular disease.
Background
a constant for a
given instance of
A1,A2,…An
Probabilistic Classifiers
The classification function is defined as:
The joint probability is not easily estimated from
the dataset; thus the assumption about the
distribution has to be made, dependence or
independence relationship among variables.

Background
Chow-Liu Tree (CLT)

Assumption: a dependence tree exists among the
variables, given the class variable C.
A tree
dependence
structure
Background
Chow-Liu Tree

Advantages



Class Variable
Comparable with some of the state-of-the-art classifiers.
The tree structure enables it a resistance to the over-fitting problem
and a decomposition characteristic.
Disadvantages

It cannot model non-tree dependence
relationship among attributes or variables.
Motivation
Class Variable
1.
2.
3.
Class Variable
Fig. (b) can represent the same independence relationship as Fig. (a):
Given B and E, there is an independence relationship among A, C,
and D.
Fig. (b) is still a tree structure, which inherits the advantages of a tree.
By combining several nodes, a large node tree structure can represent a
non-tree structure. This motivates our Large Node Chow-Liu tree
approach.
Overview of Large Node Chow-Liu
Tree (LNCLT)
Underlying structure
Step 1:draft the ChowLiu tree
Step 2:draft the ChowLiu tree
The same independence relationship
Step 1. Draft the Chow-Liu tree
Draft the CL-tree of the dataset according to the CLT algorithm.
Step 2. Refine the Chow-Liu tree based on some
combination rules
Refine the Chow-Liu tree into a large node Chow-Liu tree based on
some combination rules
Combination Rules
Bounded cardinality
The cardinality of each large node should not greater than
a bound “k”.
Frequent Itemsets
Each large node should be Frequent itemset.
Father-son or sibling relationship
The nodes in a large node should be a father-son or sibling
relationship.
Combination Rules (1)
Bounded Cardinality


The cardinality of each large node ( the number
of nodes in a large node) should not greater than
a bound “k”.
An example is that: if we set “k” as the number
of the attributes or variables, the LNCLT will be
a “one large node tree”, which will lose all the
merits as a tree.
“One node tree” will lose all
the merits of the “tree”.
Combination Rules (2)
Frequent Itemsets
Food store example:
In a food store, if you buy {bread}, it will be highly possible for you
to buy {butter}. Thus {bread, butter} is called a frequent itemset.
Frequent Itemset is the set of attributes that occur with each other
frequently.
Frequent Itemsets are possible “large nodes”, since the
attributes in a Frequent Itemset act just like one “attribute”—
they occur with each other frequently at the same time.
Combination Rules (3)
 Father-son or sibling relationship
Combining Father-son and sibling nodes will
increase the data fitness of the tree structure on the
datasets (Proved in the paper).
Combining Father-son and sibling nodes will
maintain the graphical structure as a “tree structure”.
Combining non-father or non-sibling nodes
may result in a non-tree structure
Father-son Combination
Sibling Combination
Constructing Large Node Chow-Liu
Tree
1.
Generate the frequent itemsets
Call Apriori[AS94] to generate the frequent itemsets, which have
the size less than k. Record all the frequent itemsets together with
their frequnecy into list L.
2.
Draft the Chow-Liu tree
Draft the CL-tree of the dataset according to the CLT algorithm.
3.
Combine nodes based on Combining rules
Iteratively combine the frequent itemset with maximum frequency,
which satisfy the combination conditions: father-son or sibling
relationship until L is NULL.
Example:Constructing LNCLT
1.
{A,C} does not satisfy the
2. combination
f({B,C}) iscondition,
the biggestfilter
and
3..satisfies
Filtercombination
the frequent itemsets
out
{A,C}
4. which
{D,have
E } iscoverage
thethem
frequent
with
condition,
combine
satisfies
the
, theand
{D,E}
is left.
into{B,C}
(c)itemset
combination condition,
Example:
combine them into (d)
We assume the k is 2, after step 1, we get the
frequent itemsets {A, B} {A, C},{B, C}, {B,
E}, {B, D}, {D, E}. And f({B, C})>f({A,
B})> f({B, E}) >f({B, D})>f({D, E}) (f(*)
represents the frequency of frequent
itemsets). (b) is the CLT in step2.
Experimental Setup
Dataset:
MNIST-handwritten digit (28*28 gray-level bitmap)
database
training dataset size: 60000
testing dataset size: 10000
 Experimental Environments


Platform: win2000
Developing tool: Visual C++ 6.0
Experiments
Data fitness Comparison
Experiments
Data fitness Comparison
Minus Log Likelihood in testing dataset
Minus Log Likelihood in training dataset
36
31
27
LNCLT
22
CLT
(bist/digit)
(bits/digit)
32
16
12
11
1
2
3
4
5
digit
6
7
8
9
CLT
21
17
0
LNCLT
26
0
1
2
3
4
5
digit
6
7
8
9
Experiments
Recognition Rate
Future Work
Evaluate our algorithm extensively in other
benchmark datasets.
Examine other combining rules.
Conclusion
 A novel Large Node Chow-Liu tree is
constructed based on Frequent Itemsets.
 LNCLT can partially overcome the
disadvantages of CLT, i.e., inability to
represent non-tree structures.
 We demonstrate that our LNCLT model has
a better data fitness and a better prediction
accuracy theoretically and experimentally.
Main References





[AS1994] R. Agrawal, R. Srikant, 1994,“Fast algorithms for mining association rules”,
Proc. VLDB-94 1994.
[Chow, Liu1968] Chow, C.K. and Liu, C.N. (1968). Approximating discrete probability
distributions with dependence trees. IEEE Trans. on Information Theory, 14,(pp462-467)
[Friedman1997] Friedman, N., Geiger, D. and Goldszmidt, M. (1997). Bayesian Network
Classifiers. Machine Learning, 29,(pp.131-161).
[Cheng1997] Cheng, J. Bell, D.A. Liu, W. 1997, Learning Belief Networks from Data: An
Information Theory Based Approach. In Proceedings of ACM CIKM’97
[Cheng2001] Cheng, J. and Greiner, R. 2001, Learning Bayesian Belief Network Classifiers:
Algorithms and System, E.Stroulia and S. Matwin(Eds.): AI 2001, LNAI 2056, (pp.141-151),

Learning Bayesian Belief Network Classifiers: Algorithms and System, E.Stroulia and S.
Matwin(Eds.): AI 2001, LNAI 2056, (pp.141-151).
Q&A
Thanks.