Trees Diagram as a Visual Way to Improve Interpretability of Tree
Download
Report
Transcript Trees Diagram as a Visual Way to Improve Interpretability of Tree
“BOF” Trees Diagram as a
Visual Way to Improve
Interpretability of Tree
Ensembles
Vesna Luzar-Stiffler, Ph.D.
University Computing Centre, and CAIR Research Centre,
Zagreb, Croatia
Charles Stiffler, Ph.D.
CAIR Research Centre, Zagreb, Croatia
[email protected], [email protected]
BOF Trees Visualization Zagreb, June 12, 2004
Outline
Introduction/Background
Trees
Ensemble Trees
Visualization Tools
Simulation Results
Web Survey Results
Conclusions/Recommendations
BOF Trees Visualization Zagreb, June 12, 2004
Introduction / Background
Classification / Decision Trees
Data mining (statistical learning) method for
classification
Invented twice:
Statistical community: Breiman: Friedman et.al. (1984)
Machine Learning community: Quinlan (1986)
Many positive features
Interpretability, ability to handle data of mixed type
and missing values, robustness to outliers, etc.
Disadvantage
unstable vis-à-vis seemingly minor data perturbations
low predictive power
BOF Trees Visualization Zagreb, June 12, 2004
Introduction / Background
Possible improvements: Ensembles
Bagging i.e., Bootstraping trees (Breiman, 1996)
Boosting, e.g., AdaBoost (Freund & Schapire, 1997)
Random Forests (Breiman, 2001)
Stacking, randomized trees, etc.
Advantage:
Improved prediction
Disadvantage
Loss of interpretability (“black box”)
BOF Trees Visualization Zagreb, June 12, 2004
Classification Tree
Let
fˆ ( x)
be the classification
tree prediction at
input x obtained from
the full “training” data
Z=
{(x1,y1),(x2,y2)…(xN,yN)}
BOF Trees Visualization Zagreb, June 12, 2004
Bagging Classification Tree
Let
1
fˆ ( x)
*b
be the classification
2
tree prediction at
input x obtained from
the bootstrap sample
Z*b, b=1,2,…B.
Bagging estimate:
B
ˆf ( x) 1 fˆ ( x)
B
B
bag
BOF Trees Visualization Zagreb, June 12, 2004
b 1
*b
Visualization tools
Graphs based on predictor “importances”
(Bxp) matrix F (p=# of predictors)
1 ˆ
ˆ
For bagged trees, we take the avg: I I (T )
B
2
k
B
Diagram 1, importance mean bar chart
Diagram 2, (“BOF Clusters”) is the cluster
means chart (NEW)
Diagram 3, (“BOF MDPREF”) is the
multidimensional preference bi-plot (NEW)
BOF Trees Visualization Zagreb, June 12, 2004
b 1
2
k
b
Visualization tools
Graphs based on proximity (nxn) matrix P,
(n=# of cases)
Diagram 4 (“Proximity Clusters”) is the cluster
means chart (Breiman,2002)
Diagram 5 (“Proximity MDS”) is the
multidimensional scaling plot of “similar”
cases (Breiman,2002)
BOF Trees Visualization Zagreb, June 12, 2004
Simulation experiments
S1:
Generate a sample of
size n=30,
two classes, and
p=5 variables (x1-x5), with
a standard normal
distribution and pair-wise
correlation 0.95.
The responses are
generated according to
Pr(Y=1|x1≤0.5) = 0.2,
Pr(Y=1|x1>0.5)=0.8.
BOF Trees Visualization Zagreb, June 12, 2004
S2:
Generate a sample of
size n=30,
two classes, and
p=5 variables (x1-x5), with
a standard normal
distribution and pair-wise
correlation 0.95 between
x1 and x2, and 0 among
other predictors.
The responses are
generated according to
Pr(Y=1|x1≤0.5) = 0.2,
Pr(Y=1|x1>0.5)=0.8.
Diagram 1, Mean importance
S1
BOF Trees Visualization Zagreb, June 12, 2004
S2
Diagram 2, “BOF Clusters”
S1
BOF Trees Visualization Zagreb, June 12, 2004
S2
Diagram 3, “BOF MDPREF”
S1
BOF Trees Visualization Zagreb, June 12, 2004
S2
Diagram 4, “Proximity Clusters”
S1
BOF Trees Visualization Zagreb, June 12, 2004
S2
Web Survey data
ICT infrastructure/usage in Croatian
primary and secondary schools
25,000+ teachers (cases)
200+ variables
Response: “classroom use of a computer
by educators” (yes/no)
Partition
50% training
25% validation
25% test
BOF Trees Visualization Zagreb, June 12, 2004
Initial tree (before bagging)
BOF Trees Visualization Zagreb, June 12, 2004
Diagram 1, “Mean importance”
BOF Trees Visualization Zagreb, June 12, 2004
Diagram 2, “BOF Clusters”
BOF Trees Visualization Zagreb, June 12, 2004
Diagram 3, “BOF MDPREF”
BOF Trees Visualization Zagreb, June 12, 2004
Bootstrap tree 11
BOF Trees Visualization Zagreb, June 12, 2004
Bootstrap tree 22
BOF Trees Visualization Zagreb, June 12, 2004
Bootstrap tree 12
BOF Trees Visualization Zagreb, June 12, 2004
Clustering trees
BOF Trees Visualization Zagreb, June 12, 2004
Diagram 5, “Proximity MDS”
BOF Trees Visualization Zagreb, June 12, 2004
Conclusions/ Recommendations
There are SWs for trees
There are some SWs for tree ensembles
There are some visualization tools (old
and new)
The problem is
they are not “interfaced” (integrated)
BOF Trees Visualization Zagreb, June 12, 2004