Maximal Tree

Download Report

Transcript Maximal Tree

Decision Trees in Variable Selection
• In predictive modeling and data mining, we are
often confronted with a large number of input
variables in which some are irrelevant.
• Decision tree is an alternative method for
eliminating irrelevant variables and selecting
variables which have predictive power.
Decision Tree Results
Differences in Calculation of Variable Importance
Traditional Approach
• Looks at the zero-order
correlations between all
possible inputs and the target
Decision Tree Method
• Sum of the worth statistics for
an input across all the split
nodes
• Incorporates the effect of an
input across various split
• Captures a different dimension
from a multiple regression
Interactions
• Interactions of input variables can also be observed from a decision
tree map
• One can construct an interaction term that combines input variables
that can be useful in predictive modeling
• Increases predictive power of the model
Cross-Contributions of Decision Trees and Other
Approaches
Contribution of Other Methods to
Decision Trees
Association:
• creates associations and sequence as
composite inputs to decision tree to
determine relationships
Clustering :
• might be useful in creating composite
clusters for inclusion in decision trees
Regression :
• create linear composites for inclusion as
inputs – a data reduction technique
Neural Networks :
• fit and fine –tune unclassified
observations
Contribution of Decision Trees
To Other Methods
Regression :
• define strata for regression treatment
• compute dummy variables
• identify interactions
• Impute missing values based on inputs with
various levels of measurement
Neural Networks :
• Prequalify variables for inclusion, including
bins for categories
• Turn decision tree on predicted scores to
assist in interpretation
• Turn decision tree on score residuals
Using Decision Tree Node for Variable Selection
• The inputs which create significant splits in the development of the
tree are passed to the next node with the role of Input.
• Creates a special categorical variable called _NODE_ and optionally
passes it to the next node as an input.
• The variable _NODE_ can be used as a class input in the Regression
node.
Decision Tree Node Configurations
•Variable Selection Property – YES
•Leaf Variable Property – YES
•Leaf Role Property - INPUT