No Slide Title

Download Report

Transcript No Slide Title

DATA MINING: DEFINITIONS
AND DECISION TREE EXAMPLES
Emily Thomas
Director of Planning and Institutional Research
1
WHAT IS DATA MINING?
+ Data mining is the discovery of hidden
knowledge, unexpected patterns and
new rules in large databases.
- Data mining is exploratory. The results
lack the protection from spurious
conclusions that validates theory-based
hypothesis-driven statistics.
2
WHY USE DATA MINING?
In the corporate world:
• Large amounts of data are captured in
enterprise data bases.
• These databases are too large for
traditional statistical techniques.
• Identifying patterns in the data can target
profitable, or unprofitable, customers.
3
WHY USE DATA MINING?
In institutional research:
• Large numbers of variables
• We have insufficient time/resources to
investigate all the relationships that might be
informative.
• Identifying data patterns can shed light on
student behavior.
4
WHY DATA MINING NOW?
• Development of large, integrated
enterprise databases
• Development of data mining techniques
and software
• Development of simplified user
interface
5
DATA MINING TECHNIQUES
• Decision trees
• Rule induction
• Nearest neighbors
• Exploratory
factor analysis
• Stepwise regression
• Neural networks
• Clustering
• Genetic algorithms
6
DECISION TREE ANALYSIS
CHAID: Chi-squared Automatic Interaction Detector
(SPSS Answer Tree)
1. Select significant independent variables
2. Identify category groupings or interval breaks to
create groups most different with respect to the
dependent variable
3. Select as the primary independent variable the
one identifying groups with the most different
values of the dependent variable
4. Select additional variables to extend each branch
if there are further significant differences
7
TRANSFER RETENTION RATES
Percent of new full-time Fall 2002 transfers
returning in Spring 2003
All new transfers
Returned
Left
N
.88
.12
1,258
4 x 2 contingency table:
chi square=214.41 p=.0000
GPA 0-1.31
80
.65
43
.35
123
GPA 1.31-3.00
586
.90
63
.10
649
GPA 3.00-4.00
428
.94
25
.06
453
no GPA
7
.21
26
.79
33
8
TRANSFER RETENTION RATES
FALL 2002-SPRING 2003
Percent returning in Spring 2003
88%
(n=1258)
Fall 2002 GPA
0-1.31
65%
(n=123)
1.31-3.00
90%
(n=649)
3.00-4.00
94%
(n=453)
missing
21%
(n=33)
Age
13-20
81%
(n=62)
20-48
50%
(n=61)
9
SOS 2000: SATISFACTION WITH
THE QUALITY OF EDUCATION
Percent rating the quality of education good or excellent
70%
(n=1695)
Self-reported intellectual growth (chi square=418.46)
low/none
.30
n=122
7%
moderate
.56
n=560
33%
large
.79
n=689
41%
Very large
.91
n=324
19%
10
VERY LARGE INTELLECTUAL GROWTH
19% of students
What is your overall impression
of the quality of education at this college?
70%*
Very large intellectual growth
91%*
Quality of instruction
Dissatisfied
71%*
Satisfied
94%*
Very satisfied
97%*
Intellectually stimulated
* Percent of students reporting “excellent” or “good”
quality of education.
Not always
93%*
Always
100%*
11
LARGE INTELLECTUAL GROWTH
41% of students
79%
of students rated educational quality good or excellent
Satisfied with academic experience
Less than half time
51%*
About half the time
72%
Concern for you as an
individual
Dissatisfied
59%
Neutral
71%
Satisfied
94%
More than half the time
88%
Almost always
94%
Course availability
Dissatisfied
77%
Neutral
93%
Satisfied
93%
* Percent of students reporting “excellent” or “good” quality of education.
12
LOW OR MODERATE INTELLECTUAL GROWTH
40% of students
INTELLECTUAL GROWTH
None/small
30%*
Moderate
55%
Class size relative to type of course
Satisfied with academic experience
Very dissatisfied
8%
Dissatisfied-satisfied
41%
* Percent of students reporting “excellent” or
“good” quality of education.
Rarely
31%
Half the time
58%
Sense of belonging
Dissatisfied
18%
Satisfied
40%
More than
half time
77%
Quality of instruction
Dissatisfied
65%
Satisfied
86%
13
SOS 2000: SATISFACTION WITH
“THIS COLLEGE IN GENERAL”
FACULTY COME TO CLASS WELL PREPARED
Rarely/less than half the time
(25% of students)
Concern for you as an
individual
Condition of
Campus
buildings
and grounds
Condition
of
residence
hall
facilities
Sense of
belonging
Half or more than half the time
(43% of students)
Academic experiences
[in the classroom]
Sense of Sense of
bebelonging
longing
Personal
safety
Almost always
(31% of students)
Quality of instruction
Sense of
belonging
Concern
for you as
an
individual
Attitudes
of campus
staff
14
DECISION TREE
ADVANTAGES AND DISADVANTAGES
+
+
+
+
Discover unexpected relationships
Identify subgroup differences
Use categorical or continuous data
Accommodate missing data
- Possibly spurious relationships
- Presentation difficulties
15
BIBLIOGRAPHY
• AnswerTree 2.0: User’s Guide. SPSS, 1998.
• Adriaans, P and D Zantinge (1996). Data Mining.
Harlow, England and elsewhere: Addison-Wesley.
• Bordon, VMH (1995). Segmenting Student Markets
with a Student Satisfaction and Priorities Survey.
Research in Higher Education 16:2, 115-138.
• Neville, PG. (1999). “Decision Trees for Predictive
Modeling,” SAS Technical Report, The SAS Institute.
• Thomas, EH and N Galambos. What Satisfies
Students? Mining Student-Opinion Data with
Regression and Decision Tree Analysis. Forthcoming
in Research in Higher Education, May 2004.
16