Transcript Session4

Methods of Research and Enquiry
Basic Statistics and Correlational Research
by Dr. Daniel Churchill
What is statistics?
R&D in IT in Education

Statistics is a body of mathematical
techniques or processes for gathering,
organizing, analyzing, and interpreting
numerical data.
Basic Concepts

R&D in IT in Education




Measurement – assigning a number of
observation based on certain rules
Variable – a measured characteristic (e.g.,
age, grade level, test score, height, gender)
A constant – a measure that has only one
value
Continuous variable – can have a wide
range of values (e.g., height)
Discrete variables – have a finite number
of distinct values between any two given
points (age between 30-50)
Basic Concepts
R&D in IT in Education


Independent variables -- purported
causes
Dependent variables -- purported effects
 Two instructional strategies, co-operative
groups and traditional lectures, were used
during a three week social studies unit.
Students’ exam scores were analyzed for
differences between the groups.


The independent variable is the instructional
approach (of which there are two levels)
The dependent variable is the students’
achievement
Obj. 2.3
Basic Concepts
R&D in IT in Education





A population – entire group of elements
that have at least one characteristics in
common
A sample – a small group of observations
selected from the total population
A parameter – a measure of a
characteristics of an entire population
A statistic - a measure of a characteristics
of a sample
Statistics – a method
Basic Concepts
R&D in IT in Education



Descriptive statistics – classify, organize,
and summarize numerical data about a
particular group of observations (e.g., a
number of students in HK, the mean maths
grade, ethnic make-up of students)
Inferential statistics – involve selecting a
sample from a defined population and
studying it.
These two statistics are not mutually
exclusive
Probability and Level of Significance
R&D in IT in Education




Studies yield statistical results which are
used to decide whether to retain or reject
the null hypothesis
The decision is made in term of
probability, not certainty
Once we obtain sample statistic, we
compare the obtained value to the
appropriate critical value (from tables)
Mostly, the probability level of 5% (p of .05)
is considered statistically significant
Data Collection
 Measurement
scales
R&D in IT in Education
 Nominal – categories

Gender, ethnicity, etc.
 Ordinal – ordered categories

Rank in class, order of finish, etc.
 Interval – equal intervals

Test scores, attitude scores, etc.
 Ratio – absolute zero

Time, height, weight, etc.
Obj. 2.1
R&D in IT in Education
Measurement Scales
R&D in IT in Education

Watch videos from Learner.org
http://learner.org/resources/series158.html

Watch
Video 5. Variation About the Mean
Statistical measures
R&D in IT in Education

Measures of central tendency or averages
 Mean
 Median -- a point in an array, above & below

which one-half of the scores fall
Mode -- the score that occurs most frequently
in a distribution
R&D in IT in Education
Organizing Data
Source: http://www.learnactivity.com/lo/
R&D in IT in Education
Example
Here is a set of maths test scores
(raw scores) for a class of 31
students
37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Organizing measurements
R&D in IT in Education
37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Steam Leaf
3 7 9
4 2 2 5 8 8
5 1 2 4 4 6 6 7 8 8 9
6 1 1 3 3 3 5 6 7 9
7 2 2 3 4 8
Organizing measurements –frequency tables
R&D in IT in Education
37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Cumulative
Percent
Frequency Frequency
Cumulative
Percentage
Test Score
Frequency
Midpoint
36-40
2
38
30
7
100
41-45
3
43
28
10
93
46-50
2
48
25
7
83
51-55
4
53
24
13
80
56-60
6
58
20
20
67
61-65
6
63
14
20
47
66-70
3
68
8
10
27
71-75
4
73
5
13
17
76-80
1
78
1
3
3
-------------N=30
Organizing measurements – Histogram
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
7
6
Frequency
R&D in IT in Education
37,
42,
52,
58,
5
4
3
2
1
0
36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80
Test Score
Organizing measurements – Mean

R&D in IT in Education

37,
42,
52,
58,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,




48,
63,
48,
54,

X=
The mean, the median and the
The mean X=X
N
X = mean
Σ = sum of
X = scores in a distribution
N = number of scores
It is the base from which many
measures are computed.
37 + 58 + 74 + … + 72 + 63
31
= 58
Organizing measurements – Mode and Median
R&D in IT in Education
37,
42,
52,
58,
37,
42,
52,
58,
58,
61,
65,
73,
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
74,
42,
39,
56,
66,
67,
61,
51,
56,
63
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
78,
45,
63,
72,
48,
63,
48,
54,
48,
63,
48,
54,
Mode -- the
score that
occurs most
frequently in a
distribution
Median -- a point
in an array,
above & below
which one-half
of the scores fall
63
59
Organizing measurements – Histogram
58,
61,
65,
73,
74,
42,
39,
56,
66,
54,
57,
59,
69,
72,
67,
61,
51,
56,
63
78,
45,
63,
72,
48,
63,
48,
54,
Median
7
Mean
Mode
6
Frequency
R&D in IT in Education
37,
42,
52,
58,
5
4
3
2
1
0
36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80
Test Score
R&D in IT in Education
Statistical measures
 Measures of spread or dispersion
 Range -- the difference between the highest



and the lowest scores plus one
Standard deviation – average distance from
the mean (also see calculator)
Variance – squared standard deviation
Z-score -- a number of standard deviations
from the mean
Z=(score-mean)/SD
R&D in IT in Education
Basic Formulas for Sample

Variance = S2=
(X-X)2
n
Standard
2
=
S
=
S
Deviation
Normal Distribution

Variance = S2=
(X-X)2
n
R&D in IT in Education
Standard
2
Deviation = S = S
Source: http://en.wikipedia.org/wiki/Normal_distribution
R&D in IT in Education
Source: http://noppa5.pc.helsinki.fi/koe/flash/histo/histograme.html
Z-Score Example
R&D in IT in Education
X-X
z score = z =
S
Example, compare a student’s performance on Maths and
English tests if the student’s scores, class means and
standard deviations for the classes are known
Student's
Score
Class Mean
Class S
English
50
45
5
Maths
68
56
6
Subject
50-45
zEnglish=
=+1
5
68-56
zMaths =
= +2
6
Z-Score Example
R&D in IT in Education
zEnglish
zMaths
R&D in IT in Education
Z score vs. T score, and Percentile Rank
Correlational Studies
R&D in IT in Education

Attempts to describe the predictive
relationships between or among
variables
 The predictor variable is the variable from

which the researcher is predicting
The criterion variable is the variable to
which the researcher is predicting
Objectives 10.1 & 10.2
Relationship Studies

General purpose
 Gain insight into variables that are related to other
variables relevant to educators
R&D in IT in Education




Achievement
Self-esteem
Self-concept
Two specific purposes
 Suggest subsequent interest in establishing cause

and effect between variables found to be related
Control for variables related to the dependent variable
in experimental studies
Objectives 5.1 & 5.2
Correlational Data
R&D in IT in Education
Income/month
($)
Expenditure/month
($)
4000
4000
4000
5000
5000
6000
2000
2000
9000
6000
4000
2000
7000
5000
8000
6000
9000
9000
5000
3000
Scatter Diagram
Expenditure/month
($)
4000
4000
4000
5000
5000
6000
2000
2000
9000
6000
4000
2000
7000
5000
8000
6000
9000
9000
5000
3000
10000
9000
8000
7000
Expenditure
R&D in IT in Education
Income/month
($)
6000
5000
4000
3000
2000
2000, 2000
1000
0
0
2000
4000
6000
Income
8000
10000
R&D in IT in Education
Source: http://noppa5.pc.helsinki.fi/koe/corr/index.html
Correlation Coefficients
R&D in IT in Education


The general rule
 +.95 is a strong positive correlation
 +.50 is a moderate positive correlation
 +.20 is a low positive correlation (small correlation)
 -.26 is a low negative correlation
 -.49 is a moderate negative correlation
 -.95 is a strong negative correlation
Predictions
 Between .60 and .70 are adequate for group
predictions
 Above .80 is adequate for individual predictions
Objective 3.3 & 3.5
Conducting a Prediction Study

Identify a set of variables
 Limit to those variables logically related to the
R&D in IT in Education
criterion




Identify a population and select a sample
Identify appropriate instruments for measuring
each variable
 Ensure appropriate levels of validity and reliability
Collect data for each instrument from each
subject
 Typically data is collected at different points in time
Compute the results
 Regression coefficient
 Regression equation
Hypotheses for Correlation
R&D in IT in Education
H0: r = 0
HA: r  0
R&D in IT in Education
Collecting Measurement
 Instrument – a tool used to collect data
 Test – a formal, systematic procedure for
gathering information
 Assessment – the general process of
collecting, synthesizing, and interpreting
information
Obj. 3.1 & 3.2
The Process
R&D in IT in Education

Participant and instrument selection
 Minimum of 30 subjects
 Instruments must be valid and reliable




Higher validity and reliability requires smaller samples
Lower validity and reliability requires larger samples
Design and procedures
 Collect data on two or more variables for each subject
Data analysis
 Compute the appropriate correlation coefficient
Objectives 2.2 & 2.3
Selection of a Test
R&D in IT in Education

Sources of test information, e.g.,:
 Mental Measurement Yearbooks (MMY)
Buros Institute
ETS Test Collection
 ETS Test Collection


Types of Correlation Coefficients
R&D in IT in Education

The type of correlation coefficient depends
on the measurement level of the variables
 Pearson r - continuous predictor and criterion
variables

Math attitude and math achievement
 Spearman rho – ranked or ordinal predictor
and criterion variables

Rank in class and rank on a final exam
 Phi coefficient – dichotomous predictor and
criterion variables

Gender and pass/fail status on a high stakes test
Objectives 7.1, 7.2, & 7.3
Calculating Pearson Correlation Coefficient
R&D in IT in Education
Z-score formula
r=
zxzy
N
Raw score formula
r=
NXY-( X)( Y)
(NX2-(X)2) (NY2-(Y)2)
R&D in IT in Education
Just for information
Critical Values of the Pearson Product-Moment Correlation
Coefficient:
 First you determine degrees of freedom (df). For a correlation
study, the degrees of freedom is 2 less than the number of
subjects. Use the critical value table to find the intersection of
alpha .05 (see columns) and 25 degrees of freedom (see
rows). The value found at the intersection (.381) is the
minimum correlation coefficient needed to confidently state 95
times out of a hundred that the relationship you found with your
subjects exists in the population from which they were drawn.
 If the absolute value of your correlation coefficient is above
.381, you reject your null hypothesis (there is no relationship)
and accept the alternative hypothesis: e.g., there is a
statistically significant relationship between arm span and
height, r (25) = .87, p < .05.
 If the absolute value of your correlation coefficient were less
than .381, you would fail to reject your null hypotheses: There
is not a statistically significant relationship between arm span
and height, r (25) = .12, p > .05.
Source: http://www.gifted.uconn.edu/siegle/research/Correlation/alphaleve.htm
R&D in IT in Education
Prediction and Regression
The position of the line
is determined by “b” or
the slope (the angle),
and “a” of the
interceptor (the point
where the line
intersects with Y-axis).
Y= bX + a
Source: http://noppa5.pc.helsinki.fi/koe/corr/index.html
Other Correlation Analyses

Multiple Regression
 Two or more variables are used to predict
R&D in IT in Education
one criterion variable

Cannonical correlation
 An extension of multiple regression in which
more than one predictor variable and more
than one criterion variable are used

Factor analysis
 A correlational analysis used to take a large
number of variables and group them into a
smaller number of clusters of similar
variables called factors
References
R&D in IT in Education


Gay, L. R., Mills, G. E., & Airasian, P.
(2006). Educational Research:
Competencies for Analysis and
Applications. Upper Saddle River, N.J. :
Pearson/Merrill Prentice Hall.
Ravid, R. (2000). Practical statistics for
educators. (2nd ed). New York, NY.:
University Press of America, Inc.