Transcript Session4
Methods of Research and Enquiry
Basic Statistics and Correlational Research
by Dr. Daniel Churchill
What is statistics?
R&D in IT in Education
Statistics is a body of mathematical
techniques or processes for gathering,
organizing, analyzing, and interpreting
numerical data.
Basic Concepts
R&D in IT in Education
Measurement – assigning a number of
observation based on certain rules
Variable – a measured characteristic (e.g.,
age, grade level, test score, height, gender)
A constant – a measure that has only one
value
Continuous variable – can have a wide
range of values (e.g., height)
Discrete variables – have a finite number
of distinct values between any two given
points (age between 30-50)
Basic Concepts
R&D in IT in Education
Independent variables -- purported
causes
Dependent variables -- purported effects
Two instructional strategies, co-operative
groups and traditional lectures, were used
during a three week social studies unit.
Students’ exam scores were analyzed for
differences between the groups.
The independent variable is the instructional
approach (of which there are two levels)
The dependent variable is the students’
achievement
Obj. 2.3
Basic Concepts
R&D in IT in Education
A population – entire group of elements
that have at least one characteristics in
common
A sample – a small group of observations
selected from the total population
A parameter – a measure of a
characteristics of an entire population
A statistic - a measure of a characteristics
of a sample
Statistics – a method
Basic Concepts
R&D in IT in Education
Descriptive statistics – classify, organize,
and summarize numerical data about a
particular group of observations (e.g., a
number of students in HK, the mean maths
grade, ethnic make-up of students)
Inferential statistics – involve selecting a
sample from a defined population and
studying it.
These two statistics are not mutually
exclusive
Probability and Level of Significance
R&D in IT in Education
Studies yield statistical results which are
used to decide whether to retain or reject
the null hypothesis
The decision is made in term of
probability, not certainty
Once we obtain sample statistic, we
compare the obtained value to the
appropriate critical value (from tables)
Mostly, the probability level of 5% (p of .05)
is considered statistically significant
Data Collection
Measurement
scales
R&D in IT in Education
Nominal – categories
Gender, ethnicity, etc.
Ordinal – ordered categories
Rank in class, order of finish, etc.
Interval – equal intervals
Test scores, attitude scores, etc.
Ratio – absolute zero
Time, height, weight, etc.
Obj. 2.1
R&D in IT in Education
Measurement Scales
R&D in IT in Education
Watch videos from Learner.org
http://learner.org/resources/series158.html
Watch
Video 5. Variation About the Mean
Statistical measures
R&D in IT in Education
Measures of central tendency or averages
Mean
Median -- a point in an array, above & below
which one-half of the scores fall
Mode -- the score that occurs most frequently
in a distribution
R&D in IT in Education
Organizing Data
Source: http://www.learnactivity.com/lo/
R&D in IT in Education
Example
Here is a set of maths test scores
(raw scores) for a class of 31
students
37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Organizing measurements
R&D in IT in Education
37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Steam Leaf
3 7 9
4 2 2 5 8 8
5 1 2 4 4 6 6 7 8 8 9
6 1 1 3 3 3 5 6 7 9
7 2 2 3 4 8
Organizing measurements –frequency tables
R&D in IT in Education
37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Cumulative
Percent
Frequency Frequency
Cumulative
Percentage
Test Score
Frequency
Midpoint
36-40
2
38
30
7
100
41-45
3
43
28
10
93
46-50
2
48
25
7
83
51-55
4
53
24
13
80
56-60
6
58
20
20
67
61-65
6
63
14
20
47
66-70
3
68
8
10
27
71-75
4
73
5
13
17
76-80
1
78
1
3
3
-------------N=30
Organizing measurements – Histogram
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
7
6
Frequency
R&D in IT in Education
37,
42,
52,
58,
5
4
3
2
1
0
36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80
Test Score
Organizing measurements – Mean
R&D in IT in Education
37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
X=
The mean, the median and the
The mean X=X
N
X = mean
Σ = sum of
X = scores in a distribution
N = number of scores
It is the base from which many
measures are computed.
37 + 58 + 74 + … + 72 + 63
31
= 58
Organizing measurements – Mode and Median
R&D in IT in Education
37,
42,
52,
58,
37,
42,
52,
58,
58,
61,
65,
73,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
74,
42,
39,
56,
66,
67,
61,
51,
56,
63
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
78,
45,
63,
72,
48,
63,
48,
54,
48,
63,
48,
54,
Mode -- the
score that
occurs most
frequently in a
distribution
Median -- a point
in an array,
above & below
which one-half
of the scores fall
63
59
Organizing measurements – Histogram
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Median
7
Mean
Mode
6
Frequency
R&D in IT in Education
37,
42,
52,
58,
5
4
3
2
1
0
36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80
Test Score
R&D in IT in Education
Statistical measures
Measures of spread or dispersion
Range -- the difference between the highest
and the lowest scores plus one
Standard deviation – average distance from
the mean (also see calculator)
Variance – squared standard deviation
Z-score -- a number of standard deviations
from the mean
Z=(score-mean)/SD
R&D in IT in Education
Basic Formulas for Sample
Variance = S2=
(X-X)2
n
Standard
2
=
S
=
S
Deviation
Normal Distribution
Variance = S2=
(X-X)2
n
R&D in IT in Education
Standard
2
Deviation = S = S
Source: http://en.wikipedia.org/wiki/Normal_distribution
R&D in IT in Education
Source: http://noppa5.pc.helsinki.fi/koe/flash/histo/histograme.html
Z-Score Example
R&D in IT in Education
X-X
z score = z =
S
Example, compare a student’s performance on Maths and
English tests if the student’s scores, class means and
standard deviations for the classes are known
Student's
Score
Class Mean
Class S
English
50
45
5
Maths
68
56
6
Subject
50-45
zEnglish=
=+1
5
68-56
zMaths =
= +2
6
Z-Score Example
R&D in IT in Education
zEnglish
zMaths
R&D in IT in Education
Z score vs. T score, and Percentile Rank
Correlational Studies
R&D in IT in Education
Attempts to describe the predictive
relationships between or among
variables
The predictor variable is the variable from
which the researcher is predicting
The criterion variable is the variable to
which the researcher is predicting
Objectives 10.1 & 10.2
Relationship Studies
General purpose
Gain insight into variables that are related to other
variables relevant to educators
R&D in IT in Education
Achievement
Self-esteem
Self-concept
Two specific purposes
Suggest subsequent interest in establishing cause
and effect between variables found to be related
Control for variables related to the dependent variable
in experimental studies
Objectives 5.1 & 5.2
Correlational Data
R&D in IT in Education
Income/month
($)
Expenditure/month
($)
4000
4000
4000
5000
5000
6000
2000
2000
9000
6000
4000
2000
7000
5000
8000
6000
9000
9000
5000
3000
Scatter Diagram
Expenditure/month
($)
4000
4000
4000
5000
5000
6000
2000
2000
9000
6000
4000
2000
7000
5000
8000
6000
9000
9000
5000
3000
10000
9000
8000
7000
Expenditure
R&D in IT in Education
Income/month
($)
6000
5000
4000
3000
2000
2000, 2000
1000
0
0
2000
4000
6000
Income
8000
10000
R&D in IT in Education
Source: http://noppa5.pc.helsinki.fi/koe/corr/index.html
Correlation Coefficients
R&D in IT in Education
The general rule
+.95 is a strong positive correlation
+.50 is a moderate positive correlation
+.20 is a low positive correlation (small correlation)
-.26 is a low negative correlation
-.49 is a moderate negative correlation
-.95 is a strong negative correlation
Predictions
Between .60 and .70 are adequate for group
predictions
Above .80 is adequate for individual predictions
Objective 3.3 & 3.5
Conducting a Prediction Study
Identify a set of variables
Limit to those variables logically related to the
R&D in IT in Education
criterion
Identify a population and select a sample
Identify appropriate instruments for measuring
each variable
Ensure appropriate levels of validity and reliability
Collect data for each instrument from each
subject
Typically data is collected at different points in time
Compute the results
Regression coefficient
Regression equation
Hypotheses for Correlation
R&D in IT in Education
H0: r = 0
HA: r 0
R&D in IT in Education
Collecting Measurement
Instrument – a tool used to collect data
Test – a formal, systematic procedure for
gathering information
Assessment – the general process of
collecting, synthesizing, and interpreting
information
Obj. 3.1 & 3.2
The Process
R&D in IT in Education
Participant and instrument selection
Minimum of 30 subjects
Instruments must be valid and reliable
Higher validity and reliability requires smaller samples
Lower validity and reliability requires larger samples
Design and procedures
Collect data on two or more variables for each subject
Data analysis
Compute the appropriate correlation coefficient
Objectives 2.2 & 2.3
Selection of a Test
R&D in IT in Education
Sources of test information, e.g.,:
Mental Measurement Yearbooks (MMY)
Buros Institute
ETS Test Collection
ETS Test Collection
Types of Correlation Coefficients
R&D in IT in Education
The type of correlation coefficient depends
on the measurement level of the variables
Pearson r - continuous predictor and criterion
variables
Math attitude and math achievement
Spearman rho – ranked or ordinal predictor
and criterion variables
Rank in class and rank on a final exam
Phi coefficient – dichotomous predictor and
criterion variables
Gender and pass/fail status on a high stakes test
Objectives 7.1, 7.2, & 7.3
Calculating Pearson Correlation Coefficient
R&D in IT in Education
Z-score formula
r=
zxzy
N
Raw score formula
r=
NXY-( X)( Y)
(NX2-(X)2) (NY2-(Y)2)
R&D in IT in Education
Just for information
Critical Values of the Pearson Product-Moment Correlation
Coefficient:
First you determine degrees of freedom (df). For a correlation
study, the degrees of freedom is 2 less than the number of
subjects. Use the critical value table to find the intersection of
alpha .05 (see columns) and 25 degrees of freedom (see
rows). The value found at the intersection (.381) is the
minimum correlation coefficient needed to confidently state 95
times out of a hundred that the relationship you found with your
subjects exists in the population from which they were drawn.
If the absolute value of your correlation coefficient is above
.381, you reject your null hypothesis (there is no relationship)
and accept the alternative hypothesis: e.g., there is a
statistically significant relationship between arm span and
height, r (25) = .87, p < .05.
If the absolute value of your correlation coefficient were less
than .381, you would fail to reject your null hypotheses: There
is not a statistically significant relationship between arm span
and height, r (25) = .12, p > .05.
Source: http://www.gifted.uconn.edu/siegle/research/Correlation/alphaleve.htm
R&D in IT in Education
Prediction and Regression
The position of the line
is determined by “b” or
the slope (the angle),
and “a” of the
interceptor (the point
where the line
intersects with Y-axis).
Y= bX + a
Source: http://noppa5.pc.helsinki.fi/koe/corr/index.html
Other Correlation Analyses
Multiple Regression
Two or more variables are used to predict
R&D in IT in Education
one criterion variable
Cannonical correlation
An extension of multiple regression in which
more than one predictor variable and more
than one criterion variable are used
Factor analysis
A correlational analysis used to take a large
number of variables and group them into a
smaller number of clusters of similar
variables called factors
References
R&D in IT in Education
Gay, L. R., Mills, G. E., & Airasian, P.
(2006). Educational Research:
Competencies for Analysis and
Applications. Upper Saddle River, N.J. :
Pearson/Merrill Prentice Hall.
Ravid, R. (2000). Practical statistics for
educators. (2nd ed). New York, NY.:
University Press of America, Inc.