Decision Tree

Download Report

Transcript Decision Tree

Decision Tree (Rule Induction)
Poll: Which data mining technique..?
Classification Process with 10 records
Step 1: Model Construction with 6
records
Training
Data
NAME
M ike
M ary
B ill
Jim
D ave
Anne
RANK
YEARS TENURED
A ssistan t P ro f
3
no
A ssistan t P ro f
7
yes
P ro fesso r
2
yes
A sso ciate P ro f
7
yes
A ssistan t P ro f
6
no
A sso ciate P ro f
3
no
Classification
Algorithms
Classifier
(Model)
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Step 2: Test model with 6 records &
Use the Model in Prediction
Classifier
Testing
Data
NAME
Tom
M erlisa
G eo rg e
Jo sep h
RANK
YEARS TENURED
A ssistan t P ro f
2
no
A sso ciate P ro f
7
no
P ro fesso r
5
yes
A ssistan t P ro f
7
yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Who buys notebook computer?
Training Dataset is given below:
This
follows an
example
from
Quinlan’s
ID3
Input
age
<=30
<=30
31…40
>40
>40
>40
31…40
<=30
<=30
>40
<=30
31…40
31…40
Input
Input Input
output
income student credit_rating buys_computer
high
no
fair
no
high
no
excellent
no
high
no
fair
yes
medium
no
fair
yes
low
yes fair
yes
low
yes excellent
no
low
yes excellent
yes
medium
no
fair
no
low
yes fair
yes
medium
yes fair
yes
medium
yes excellent
yes
medium
no
excellent
yes
high
yes fair
yes
Tree Output:
A Decision Tree for Credit Approval
age?
<=30
student?
overcast
30..40
yes
>40
credit rating?
no
yes
excellent
fair
no
yes
yes
no
Extracting Classification Rules from Trees





Represent the knowledge in the form of IF-THEN rules
One rule is created for each path from the root to a leaf
Each attribute-value pair along a path forms a conjunction
The leaf node holds the class prediction
Rules are easier for humans to understand
 Example
IF
IF
IF
IF
IF
age
age
age
age
age
=
=
=
=
=
“<=30” AND student = “no” THEN buys_computer = “no”
“<=30” AND student = “yes” THEN buys_computer = “yes”
“31…40” THEN buys_computer = “yes”
“>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
“>40” AND credit_rating = “fair” THEN buys_computer = “no”
An Example of ‘Car Buyers’ – Who buys
Lexton?
no
Job
M/F
Area
Age
Y/N
1
NJ
M
N
35
N
2
NJ
F
N
51
N
3
OW
F
N
31
Y
4
EM
M
N
38
Y
5
EM
F
S
33
Y
6
EM
M
S
54
Y
7
OW
F
S
49
Y
8
NJ
F
N
32
N
9
NJ
M
N
32
Y
10
EM
M
S
35
Y
11
NJ
F
S
54
Y
12
OW
M
N
50
Y
13
OW
F
S
36
Y
14
EM
M
N
49
N
Job
(14,5,9)
Emplyee
(5,2,3)
Owner
(4,0,4)
No Job
(5,3,2)
Age
Y
Res. Area
Below 43
(3,0,3)
Above 43
(2,2,0)
South
(2,0,2)
North
(3,3,0)
Y
N
Y
N
* (a,b,c) means
a: total # of records, b: ‘N’ counts, c: ‘Y’ counts
Lab on Decision Tree(1)
 SPSS Clementine, SAS Enterprise Miner
 See5/C5.0Download See5/C5.0 2.02
 Evaluation from http://www.rulequest.com
Lab on Decision Tree(2)
 From below initial screen, choose File –
Locate Data
Lab on Decision Tree(3)
 Select housing.data from Samples folder
and click open.
Lab on Decision Tree(3(4)
 This data set is on deciding house price in
Boston area. It has 350 cases and 13
variables.
Lab on Decision Tree (5)
 Input variables
– crime rate
– proportion large lots: residential space
– proportion industrial: ratio of commercial area
– CHAS: dummy variable
– nitric oxides ppm: polution rate in ppm
– av rooms per dwelling: # of room for dwelling
– proportion pre-1940
– distance to employment centers: distance to the center of city
– accessibility to radial highways: accessibility to high way
– property tax rate per $10\,000
– pupil-teacher ratio: teachers’ rate
– B: racial statistics
– percentage low income earners: ratio of low income people
 Decision variable
– Top 20%, Bottom 80%
Lab on Decision Tree(6)
 For the analysis, click Construct Classifier
or click Construct Classifier from File menu
Lab on Decision Tree(7)
 Click on Global
pruning to (V ).
Then, click OK
Lab on Decision Tree(8)
Decision Tree
Evaluation with Training data
Evaluation with Test data
Lab on Decision Tree(9)
 Understanding picture
– We can see that (av rooms per dwelling) is the
most important variable in deciding house price.
Lab on Decision Tree(11)
 의사결정나무 그림으로는 규칙을 알아보기 어렵다.
 To view the rules, close current screen and click Construct
Classifier again or click Construct Classifier from File menu.
Lab on Decision Tree(12)
 Choose/click Rulesets. Then click OK.
Lab on Decision Tree(13)