Fuzzy Decision Trees - Department of Computer Science and

Download Report

Transcript Fuzzy Decision Trees - Department of Computer Science and

Fuzzy-Rough Feature Significance
for Fuzzy Decision Trees
Richard Jensen
Qiang Shen
Advanced Reasoning Group
Department of Computer Science
The University of Wales,
Aberystwyth
Outline
• Utility of decision tree induction
• Importance of attribute selection
• Introduction of fuzzy-rough concepts
• Evaluation of the fuzzy-rough metric
• Results of F-ID3 vs FR-ID3
• Conclusions
Decision Trees
• Popular classification algorithm in data mining
and machine learning
• Fuzzy decision trees (FDTs) follow similar
principles to crisp decision trees
• FDTs allow greater flexibility
• Partitioning of the instance space; attributes are
selected to derive partitions
• Hence, attribute selection is an important factor
in decision tree quality
Fuzzy Decision Trees
• Object membership
•
•
•
•
Traditionally, node membership of {0,1}
Here, membership is any value in the range [0,1]
Calculated from conjunction of membership degrees
along path to the node
Fuzzy tests
•
Carried out within nodes to determine the membership
of feature values to fuzzy sets
• Stopping criteria
• Measure of feature significance
Decision Tree Algorithm
Training set S and (optionally) depth of decision tree l
Start to form decision tree from the top level,
Do loop until
(1) the depth of the tree gets to l or
(2) there is no node to expand
a) Gauge significance of each attribute of S not already
expanded in this branch
b) Expand the attribute with the most significance
c) Stop expansion of the leaf node of attribute if
maximum significance obtained
End do loop
Feature Significance
• Previous FDT inducers use fuzzy entropy
• Little research in the area of alternatives
• Fuzzy-rough feature significance has been used
previously in feature selection with much success
• This can also be used to gauge feature
importance within FDT construction
• The fuzzy-rough measure extends concepts from
crisp rough set theory
Crisp Rough Sets
Upper
Approximation
Set X
Lower
Approximation
Equivalence
class [x]B
[x]B is the set of all points which are indiscernible
with point x in terms of feature subset B.
Fuzzy Equivalence Classes
At the centre of Fuzzy-Rough Feature Selection
• Incorporate vagueness
• Handle real valued data
• Cope with noisy data
Crisp equivalence class
Fuzzy equivalence class
Image:
Rough Fuzzy Hybridization: A New Trend in Decision Making,
S. K. Pal and A. Skowron (eds), Springer-Verlag, Singapore, 1999
Fuzzy-Rough Significance
• Deals with real-valued features via fuzzy sets
• Fuzzy lower approximation:
• Fuzzy positive region:
• Evaluation function:
• Feature importance is estimated with this
Evaluation
• Is the γ’ metric a useful gauger of feature
significance?
• γ’ metric compared with leading feature rankers:
•
Information Gain, Gain Ratio, Chi2, Relief, OneR
• Applied to test data:
•
•
30 random feature values for 400 objects
2 or 3 features used to determine classification
• Task: locate those features that affect the
decision
Evaluation…
• Results for x*y*z2 > 0.125
• Results for (x + y)3 < 0.125
• FR, IG and GR perform best
• FR metric locates the most important features
FDT Experiments
• Fuzzy ID3 (F-ID3) compared with Fuzzy-Rough
ID3 (FR-ID3)
• Only difference between methods is the choice of
feature significance measure
• Datasets used taken from the machine learning
repository
• Data split into two equal halves: training and
testing
• Resulting trees converted to equivalent rulesets
Results
• Real-valued data
• Average ruleset size
•
•
56.7 for F-ID3
88.6 for FR-ID3
• F-ID3 performs marginally better than FR-ID3
Results…
• Crisp data
• Average ruleset size
•
•
30.2 for F-ID3
28.8 for FR-ID3
• FR-ID3 performs marginally better than F-ID3
Conclusion
• Decision trees are a popular means of
classification
• The selection of branching attributes is key to
resulting tree quality
• The use of a fuzzy-rough metric for this purpose
looks promising
• Future work
•
•
Further experimental evaluation
Fuzzy-rough feature reduction pre-processor