Decision Trees - Department of Computer Science

Transcript Decision Trees - Department of Computer Science

DECISION TREES
Asher Moody, CS 157B
Overview



Definition
Motivation
Algorithms
 ID3





Example
Entropy
Information Gain
Applications
Conclusion
Decision Tree


Decision trees are a fundamental technique used in
data mining.
Decision trees are used for classification, clustering,
feature selection, and prediction.
Motivation



Decision trees help accurate classify data
Decision trees help understand the predictive nature
of the data by recognizing patterns
Decision trees depict the relationships between input
data and target outputs
Algorithms


Decision trees algorithms are greedy so once test
has been selected to partition the data other
options will not be explored
Popular Algorithms
 Computer
Science: ID3, C4.5, and C5.0
 Statistics: Classification and Regression Trees (CART)
ID3 Algorithm

Given: Examples(S); Target attribute (C); Attributes (R)

Initialize Root

Function ID3 (S,C,R)

Create a Root node for the tree

IF S = empty, return a single node with value Failure;

IF S = C, return a single node C;

IF R = empty, return a single node with most frequent target attribute (C);

ELSE

BEGIN… (next slide)
ID3 (cont)

BEGIN

Let D be the attribute with largest Gain Radio (D, S) among attributes in R;

Let {dj | j = 1, 2, …, n} be the values of attribute D;

Let {Sj | j = 1, 2, …, n} be the subsets of S consisting respectively of records with
value dj for attribute D;

Return a tree with root labeled D arcs d1, d2, …, dn going respectively to the trees;

For each branch in the tree

IF S = empty, add a new branch with most frequent C;

ELSE

ID3 (S1, C, R – {D}), ID3 (S2, C, R – {D}), …, IDC(Sn, C, R – {D})

END ID3

Return Root
Example 1
Example 2
Entropy


Entropy gives us a measure of how uncertain we are about the data
Maximum: The measure should be maximal if all the outcomes are
equally likely (uncertainty is highest when all possible events are
equiprobable).
where Pi is the proportion of instances in the dataset that take the ith value of the
target attribute
Information Gain

Gain calculates the reduction in entropy (gain in information)
that would result from splitting the data at a particular
attribute A.
where v is a value of A, |Sv| is the subset of instances of S where
A takes the value v, and |S| is the number of instances
Applications





Business: to track purchasing patterns
Medical: identify potential risks associated with
diseases
Banks: identify potential credit risks
Governments: to determine features of potential
terrorists
Seismology: to predict earthquakes
Conclusion





Search through attributes to find the proportions
Calculate the entropy for each possible data input
for a particular attribute
Calculate the gain for each attribute
Make the attribute with the highest gain the root
node
Continue the process until decision tree is complete
References




Berry, M. W. (2006). Lecture Notes in Data Mining.
World Scientific
http://www.decisiontrees.net
http://en.wikipedia.org/wiki/Entropy
http://en.wikipedia.org/wiki/Information_gain_in_
decision_trees

Decision Trees - Department of Computer Science

Transcript Decision Trees - Department of Computer Science

Directory