Data Mining and Decision Tree by Masumi Shimoda

Download Report

Transcript Data Mining and Decision Tree by Masumi Shimoda

Data Mining
Decision Tree
CS157B Spring 2006
Masumi Shimoda
• Brief introduction to data mining
• Definition
• Objective
• Application
• Decision tree
What is Data Mining?
• Process of automatically finding the
relationships and patterns, and
extracting the meaning of enormous
amount of data.
• Also called “knowledge discovery”
• Extracting the hidden, or not easily
recognizable knowledge out of the large
data… Know the past
• Predicting what is likely to happen if a
particular type of event occurs …
Predict the future
• Marketing example
• Sending direct mail to randomly chosen
• Database of recipients’ attribute data (e.g.
gender, marital status, # of children, etc) is
• How can this company increase the
response rate of direct mail?
Application (Cont’d)
• Figure out the pattern, relationship of
attributes that those who responded has
in common
• Helps making decision of what kind of
group of people the company should
• Data mining helps analyzing large
amount of data, and making
decision…but how exactly does it work?
• One method that is commonly used is
decision tree
Decision Tree
• One of many methods to perform data
mining - particularly classification
• Divides the dataset into multiple groups
by evaluating attributes
• Decision tree can be explained a series
of nested if-then-else statements.
Decision Tree (Cont’d)
• Each non-leaf node has a predicate
associated, testing an attribute of data
• Leaf node represents a class, or
• To classify a data, start from root node
and traverse down the tree by testing
predicates and taking branches
Example of Decision Tree
Advantages of Decision Tree
• Easy to visualize the process of
• Can easily tell why the data is classified in
a particular category - just trace the path to
get to the leaf and it explains the reason
• Simple, fast processing
• Once the tree is made, just traverse down
the tree to classify the data
Decision Tree is for…
• Classifying the dataset which
• The predicates return discrete values
• Does not have an attributes that all data
has the same value