Forest Trees for On

Download Report

Transcript Forest Trees for On

Forest Trees for On-line Data
Joao Gama, Pedro Medas, Ricado Rocha
Proc. ACM Symposium on Applied ComputingSAC04
2005/5/27
報告人:董原賓
Outline



The VFDT System
Ultra Fast Forest Trees (UFFT)
Experiment
The VFDT System


The Very Fast Decision Tree (VFDT) System:
A decision tree is learned by recursively
replacing leaves with decision nodes
maintains a set of sufficient statistics at each
decision node and install a split-test at that
node when there is enough statistical
evidence in favor of a split test
The VFDT System


Use Information gain to find the best
attribute as the splitting criteria
The main innovation of the VFDT
system is the use of Hoeffding bounds
to decide when to install a split-test at a
given leaf node
VFDT:Information Gain

Information gain using attribute Aj :
H(Aj) = info(examples) – info(Aj)
info(Aj) = ΣiPi(Σk-Piklog(Pik))
nijk = number of examples that reached the leaf with
class k, attribute j and value of attribute i
The VFDT System

EX:
Name
Gender
Height
Class
John
Male
1.85m
Tall
Mary
Female
1.68m
Tall
Kent
Male
1.55m
Short
Sue
Female
1.71m
Tall
Paul
Male
1.79m
Tall
VFDT:Information Gain



Info(example) = 4/5log(5/4) + 1/5log(5/1) = 0.217
Info(Gender) = 3/5 ( 2/3log(3/2) + 1/3log(3/1) ) + 2/5 ( 2/2log(2/2) )
= 0.166 + 0 = 0.166
H(Attributegender) = Info(example)-Info(Gender) =0.217–0.166= 0.051
John(T), Mary(T), Kent(S), Sue(T), Paul(T)
Gender
male
John(T), Kent(S), Paul(T)
female
Mary(T),
Sue(T)
VFDT:Hoeffding Bound


Suppose we have made n independent
observations of a random variable r whose
range is R
The Hoeffding bound states with probability
1-δ that the true average of r, ris at least
r -ε and ε=
VFDT:Hoeffding Bound


Let xa be the attribute with the highest H(), xb
the attribute with second-highest H() and △H
= H(xa) – H(xb)
if △H > εthen Hoeding bound states with
probability 1-δthat xa is really the attribute
with highest value in the evaluation function
and the leaf node must be transformed into a
decision node that splits on xa
Hoeffding Bound
Use EX of slide6:variable r = Height
with range R = 1.85 – 1.55 = 0.3
 n = 5 and δ = 0.2
ε=
= (0.32*(ln5/2*5))1/2 = 0.12

The VDFT System


It is not efficient to compute H() every time
an example arrives because the evaluation of
the function for each example could be very
expensive
VFDT only computes the attribute evaluation
function H() when a minimum number of
examples has been seen since the last
evaluation. This minimum number of
examples nmin is an user-defined parameter
The VDFT System


When two or more attributes continuously
have very similar values of H(), even given a
large number of examples, the Hoeding
bound will not decide between them
To solve this problem the VFDT uses a
constantτ introduced by the user. If
ΔH <ε<τthen the leaf node is transformed
into a decision node
Ultra Fast Forest Trees (UFFT)



Using Hoeffding Bound to decide when to
install a splitting test in a leaf leading to a
decision node
Use analytical techniques to choose the
splitting criteria, and the information gain to
estimate the merit of each possible splittingtest
The algorithm builds a binary tree for each
possible pair of classes
Ultra Fast Forest Trees (UFFT)



During the training phase the algorithm
maintains a short term memory. A fixed
number of the most recent examples are
maintained in a data structure
The sufficient statistics of these leaves are
initialized with the examples in the short term
memory that will fall at these leaves
The Ultra Fast Forest Tree system (UFFT) is
an incremental algorithm
The Splitting Criteria


use the analytical method for split point
selection to choose, for all numerical
attributes, the most promising valuej
This analysis assumes that the
distribution of the values of an attribute
follows a normal distribution for both
classes
The Splitting Criteria
Let

and σ2 are the sample mean and variance
of the class
σ2 = ΣiP(xi)(xi - )2

p(-) denotes the estimated probability than
an example belongs to class –
d1 and d2 are the possible roots of the
equation
The Splitting Criteria


Let d be the root closer to the sample means
of both classes . The splitting test candidate
for each numeric attribute i will be Atti ≦ di
To compute the information gain we need to
construct a contingency table with the
distribution per class of the number of
examples less than d and greater than d
The Splitting Criteria


The information kept on the tree is not
sufficient to compute the exact number
of examples for each entry in the
contingency table
With the assumption of normality, we
can compute the probability of
observing a value less or greater than di
The Splitting Criteria
dj = 1.7m
Class Tall
P2Tall
Class Short
P2Short
Class Tall
Class Short
P1Tall
P1Short
AttHeight≦1.7
AttHeight>1.7
P1Tall
P1Short
P2Tall
P2Short
The Splitting Criteria

The information gain of an attribute:
H(Atti) = info(PTall , PShort) + Σj2=1(Pj * info(PjTall , PjShort))
info(PTall , PShort) = PTall * log(1/PTall) + PShort * log(1/PShort)
PTall = P1Tall + P2Tall

PShort = P1Short + P2Short
This method only requires to maintain the
mean and standard deviation for each class
per attribute
The Splitting Criteria
Name
Income Height
John
20000
1.85m
Tall
Mary
35000
1.68m
Tall
Kent
25000
1.71m
Short
Sue
55000
1.59m
Short
Paul
18000
1.79m
Tall
PTall = 3/15 + 6/15 = 3/5
Class
Class Tall
Class Short
AttHeight≦1.7
AttHeight>1.7
P1Tall = 1/5
P1Short = 1/5
P2Tall = 2/5
P2Short = 1/5
PShort = 1/5 + 1/5 = 2/5
H(Attheight) = ( 3/5*log(5/3) + 2/5*log(5/2) ) + ( 1/5*log5+1/5*log5 )
+ ( 2/5*log(5/2) + 1/5*log5 ) = 0.292 + 0.28 + 0.299 = 0.871
From Leaf to Decision Node



To expand the tree, the test Atti ≦ di is installed in the
leaf, and the leaf becomes a decision node with two
new descendant leaves.
To expand a leaf two conditions must be satisfied:
1. The information gain of the selected
splitting test to be positive
2. it must exist statistical support in favor of the best
splitting test, is asserted using the Hoeffding
bound as in VFDT
In VFDT, nmin is a user defined constant, in UFFT, the
number is computed as
Short Term Memory



Use a short term memory that maintains in
memory a limited number of the most recent
examples
The short term memory is used to update the
statistics at the new leaves when they are
created
The examples in the short term memory
traverse the tree and the ones that reach the
new leaves will update the sufficient statistics
of the tree.
Classification strategies at
leaves

implement two different classification
strategies:
1.majority class
2.naive-bayes classifier,
at each leaf we maintain sufficient statistics to
compute the information gain and these are the
necessary statistics to compute the conditional
probabilities of P(xi|Class)
Forest of Trees



The UFFT algorithm builds a binary tree for
each possible pair of classes
In the general case of n classes, the
algorithm grows a forest of n(n-1)/2 binary
trees
When a new example is received during the
tree growing phase each tree will receive the
example if the class attached to it is one of
the two classes in the tree label
Experiment


Athlon XP2000+ 1.66GHz, 512MB RAM,
Linux RedHat 7.3
Three artificial datasets:
1. Waveform, with 21 and 40 attributes and 3 classes
2. LED, with 24 binary attributes(17 are irrelevant) and 10 classes
3. Balance, with 4attributes and 3 classes

The parameters values δ= 0.05, τ=
0.001 and nmin = 300
Experiment Result