Exam 3 Review - Temple Fox MIS

Download Report

Transcript Exam 3 Review - Temple Fox MIS

Exam 3 Review
Decision Trees
SAS Cluster Analysis
Association Rules
Data Visualization
SAS
• When to Use Which Analysis (D, C or A)?
– When someone gets an A in this class, what other
classes do they get an A in?
– What predicts whether a company will go
bankrupt?
– If someone upgrades to an iPhone, do they also
buy a new case?
– Which party will win the election?
– Can we group our website visitors into types
based on their online behaviors?
– Which customers will purchase our product?
– Can we identify different product markets based
on customer demographics?
Decision Trees
• Which is the Root Node?
• # Leafs Nodes?
• Probability of Purchase?
i) Female, 130 lbs, 12 ft? ii) 120 lbs, 5 feet,
male?
• Best predictor variable?
>=6’
<6’
Height
<150
<170
>=150
>=170
Weight
Weight
Outcome
Data
Outcome
Data
Outcome
Data
Outcome
Data
0
1
n
62%
38%
350
0
1
n
55%
45%
250
0
1
n
40%
60%
150
0
1
n
60%
40%
250
Male
Female
Gender
Outcome
Data
Outcome
Data
0
1
n
45%
55%
75
0
1
n
35%
65%
75
• Probability of Purchase?
i) 5 ft 5 inches?
ii) 6 ft 5 inches 190 lbs?
>=6’
<6’
Height
<150
<170
>=150
>=170
Weight
Weight
Outcome
Data
Outcome
Data
Outcome
Data
Outcome
Data
0
1
n
62%
38%
350
0
1
n
55%
45%
250
0
1
n
40%
60%
150
0
1
n
60%
40%
250
Male
Female
Gender
Outcome
Data
Outcome
Data
0
1
n
45%
55%
75
0
1
n
35%
65%
75
Decision Trees
• What does it mean that Gender is only on
the right side of the tree? Why is it not on
both sides?
• Based on the tree, which demographic is
MOST likely to buy the product? Least
likely to buy the product?
Decision Trees
• What Statistics are Used to Determine Splits for
Decision Trees?
– Gini Coefficient, Chi-Square Statistics (p-value)
• What does it mean when the Gini = 1?
• What does it mean when the Chi-square is bigger?
• What happens to the p-value as the Chi-square
gets bigger?
–
Clustering
• What statistics do we care about in cluster
analysis? What do they represent?
• What happens to these statistics as the
number of clusters is increased?
• Why do we standardize data? Why do we
eliminate outliers?
Clustering
• What are the pros and cons of having only
a few clusters (compared to having many
clusters)?
• What is bad about the below cluster
analysis result?
How would you improve it?
Association Rules
• How would you describe the following
association rule?
– {Meat, Dairy}  {Vegetables}
• How many items are in this item set?
• What is (are) the antecedents? What are the
consequents?
• What are the statistics we care about when
evaluating an association rule?
Association Rules
• Do the following two rules have to have
the same Confidence? The same
Support? The same Lift?
– {Meat, Dairy}  {Vegetables}
– {Vegetables}  {Meat, Dairy}
• What does Lift > 1 mean? Would you take
action on such a rule?
– What about Lift < 1?
– What about Lift = 1?
Association Rules
• What might you do as a manager if you
saw a very high Lift and Confidence for the
following rule about product purchase?
Why would you do this?
– {Pasta}  {Orange Juice}
Association Rules
• What is the most reliable association rule
below?
Data Visualization
• Look at In-Class Exercise Answers...