MIS 451 Building Business Intelligence Systems

Download Report

Transcript MIS 451 Building Business Intelligence Systems

MIS 451
Building Business Intelligence Systems
Classification (2)
Data
2
Classification

Divide and Conquer


Pick an attribute to divide the data set with the
most entropy reduction
Stop until no attribute to pick or data in all leaf
nodes are pure (I.e. belong to one class)
3
Classification

Step 1: there are four attributes to pick:
student, income, age, and credit rating





E(BD) = 0.940
E(D|student) = 0.789
E(D|age) = 0.694
E(D|income) = 0.911
E(D|credit) = 0.892
4
Classification


Step 2: Divide the original data set by age into
subset1 (<=30), subset 2 (31:40) and subset3 (>40)
Step 3-1: For subset 1, there are three attributes to
pick: income, student, and credit




E(BD) = 1.17
E(D|student) = 0
E(D|income) = ??
E(D|credit) = ??
5
Classification


Step 3-2: Divide subset 1 by student into subset1-1
(yes) and subset 1-2 (no)
Step 4-1: For subset 3, there are three attributes to
pick: income, student, and credit





E(BD) = 1.17
E(D|credit) = 0
E(D|income) = ??
E(D|student) = ??
Step 4-2: Divide subset 3 by credit into subset3-1
(fair) and subset 3-2 (excellent)
6
Extract rules from the model



Each path from the root to a leaf node forms
a IF-THEN rule.
In this rule, root and internal nodes are
conjuncted to form the IF part.
Left node denotes the THEN part of the rule.
7
Reading: Data Mining book pp279-291
8