Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm

Download Report

Transcript Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm

Decision Trees & the Iterative
Dichotomiser 3 (ID3) Algorithm
David Ramos
CS 157B, Section 1
May 4, 2006
Review of Basics
● What exactly is a Decision Tree?
A tree where each branching node represents a choice between two or
more alternatives, with every branching node being part of a path to a leaf
node (bottom of the tree). The leaf node represents a decision, derived
from the tree for the given input.
● How can Decision Trees be used to classify instances of
data?
Instead of representing decisions, leaf nodes represent a particular
classification of a data instance, based on the given set of attributes (and
their discrete values) that define the instance of data, which is kind of like a
relational tuple for illustration purposes.
Review of Basics (cont’d)
● It is important that data instances have boolean or discrete data
values for their attributes to help with the basic understanding of
ID3, although there are extensions of ID3 that deal with continuous
data.
● Because Decision Trees can classify data instances into different
types, they can be “interpreted” as a “good generalization of
unobserved instances” of data, appealing to people because “it
makes the classification process self-evident” [1]. They represent
knowledge about these data instances.
How does ID3 relate to Decision Trees, then?
● ID3, or Iterative Dichotomiser 3 Algorithm, is a Decision Tree learning
algorithm. The name is correct in that it creates Decision Trees for
“dichotomizing” data instances, or classifying them discretely through
branching nodes until a classification “bucket” is reached (leaf node).
● By using ID3 and other machine-learning algorithms from Artificial
Intelligence, expert systems can engage in tasks usually done by human
experts, such as doctors diagnosing diseases by examining various
symptoms (the attributes) of patients (the data instances) in a complex
Decision Tree.
● Of course, accurate Decision Trees are fundamental to Data Mining and
Databases.
ID3 relates to Decision Trees (cont’d)
● Decision tree learning is a method for approximating discrete-valued
target functions, in which the learned function is represented by a
decision tree. Decision tree learning is one of the most widely used
and practical methods for inductive inference. [2]
● The input data of ID3 is known as sets of “training” or “learning”
data instances, which will be used by the algorithm to generate the
Decision Tree. The machine is “learning” from this set of preliminary
data. For future exams, remember that ID3 was developed
by Ross Quinlan in 1983.
Description of ID3
● The ID3 algorithm generates a Decision Tree by using a greedy
search through the inputted sets of data instances so as to
determine nodes and the attributes they use for branching. Also, the
emerging tree is traversed in a top-down (root to leaf) approach
through each of the nodes within the tree. This occurs
RECURSIVELY, reminding you of those “pointless” tree traversals
strategies in CS 146 that you hated doing.
● The traversal attempts to determine if the decision “attribute” on
which the branching will be based on for any particular emerging
node is the most ideal branching attribute (by using the inputted
sets of data). One particular metric that can be used to determine
the if a branching attribute is adequate is that of INFORMATION
GAIN, or ENTROPY (abstractly inversely proportional to
INFORMATION GAIN).
Why do we use Entropy as THE metric ?
● ID3 uses Entropy to determine if, based on the inputted set of data,
the selected branching attribute for a particular node of the
emerging tree is adequate. Specifically, the attribute that results in
the most reduction of Entropy related to the learning sets is the
best.
● GOAL: Find a way to optimally classify a learning set, such that the
resulting Decision Tree is not too deep and the number of branches
(internal nodes) of the Tree are minimized.
● SOLUTION: The more Entropy in a system, or measure of impurity
in a collection of data sets, the more branches and depth the tree
will have. FIND entropy reducing attributes in the learning
sets and use them for branching!
How is Entropy Related to Information Gain, which
is the other metric for determining if ID3 is
choosing appropriate branching attributes from the
training sets?
●
Information Gain = measuring the expected reduction in
Entropy…The higher the Information Gain, the more expected
reduction in Entropy.
● It turns out Entropy, or measure of non-homogeneousness within a
set of learning sets can be calculated in a straight forward manner.
For a more detailed presentation of the definition of Entropy
and its calculation, see Prof. Lee’s Lecture Notes.