Basic Measurement and Statistics in Testing

Download Report

Transcript Basic Measurement and Statistics in Testing

Basic Measurement
and Statistics in
Testing
Outline
Central Tendency and Dispersion
 Standardized Scores
 Error and Standard Error of
Measurement (Sm)
 Item Analysis

Central Tendency
and Dispersion
Central Tendency

Measures of central tendency are measures of
the location of the middle or the center of a
distribution. The definition of "middle" or "center"
is purposely left somewhat vague so that the
term "central tendency" can refer to a wide
variety of measures. The mean is the most
commonly used measure of central tendency.
Mean


The arithmetic mean is what is commonly called the
average. The mean is the sum of all the scores
divided by the number of scores.
The formula in summation notation is:
ΣX/N

The mean is a good measure of central tendency for
roughly symmetric distributions but can be
misleading in skewed distributions since it can be
greatly influenced by scores in the tail. Therefore,
other statistics such as the median may be more
informative for distributions such as reaction time or
family income that are frequently very skewed
Median



The median is the middle of a distribution: half the
scores are above the median and half are below the
median.
The median is less sensitive to extreme scores than the
mean and this makes it a better measure than the mean
for highly skewed distributions.
Computation of Median
When there is an odd number of numbers, the median is
simply the middle number. For example, the median of 2,
4, and 7 is 4.
When there is an even number of numbers, the median
is the mean of the two middle numbers. Thus, the
median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5.
Mode

The mode is the most frequently occurring score in a
distribution and is used as a measure of central
tendency. It is the only measure of central tendency that
can be used with nominal data.

The mode is greatly subject to sample fluctuations and is
therefore not recommended to be used as the only
measure of central tendency. A further disadvantage of
the mode is that many distributions have more than one
mode. These distributions are called "multi modal."

In a normal distribution, the mean, median, and mode
are identical.
Spread, Dispersion, Variability


A variable's spread is the degree to which scores on the
variable differ from each other. If every score on the
variable were about equal, the variable would have very
little spread. There are many measures of spread. The
distributions shown below have the same mean but differ
in spread: The distribution on the bottom is more spread
out.
Variability and dispersion are synonyms for spread.
Spread/Dispersion
Range




The range is the simplest measure of spread or
dispersion: It is equal to the difference between the
largest and the smallest values.
The range can be a useful measure of spread because it
is so easily understood. However, it is very sensitive to
extreme scores since it is based on only two values.
The range should almost never be used as the only
measure of spread, but can be informative if used as a
supplement to other measures of spread.
Example:
The range of the numbers 1, 2, 4, 6, 12, 15, 19, 26 = 26 1 = 25
Variance



The variance is a measure of how spread out a
distribution is. In other words, they are measures of
variability.
The variance is computed as the average squared
deviation of each number from its mean.
For example, for the numbers 1, 2, and 3, the mean is 2
and the variance will be:
(1-2)2 + (2-2)2 + (3-2)2
3
Example of Calculation
= 0.667
Standard Deviation

The standard deviation formula is very simple: it is the
square root of the variance. It is the most commonly
used measure of spread.

In a normal distribution, about 68% of the scores are
within one standard deviation of the mean and about
95% of the scores are within two standard deviations of
the mean.

The standard deviation has proven to be an extremely
useful measure of spread in part because it is
mathematically tractable. Many formulas in inferential
statistics use the standard deviation.
Different ways of calculating the standard
deviation – the raw score method and the
deviation method
 Standard deviation score and standard
deviation value

Standardized
Scores
Z scores and T scores and
their uses
Standardized Scores : Z scores



Z-score
Raw score – mean score / standard dev.
Example:
ID
X
Mean
D
StdDv
Z
1
95
90
5
5
1
2
90
90
0
5
0
3
85
90
-5
5
-1
Standardized Scores : Z scores



Using the Z-score
Comparing between scores in two tests
Example, compare previous score with this:
ID
X
Mean
D
S
Z
1
3
5.67
-2.67
2.45
-1.09
2
6
5.67
0.33
2.45
0.13
3
8
5.67
2.33
2.45
0.95
Standardized scores – T scores


Z scores are unfamiliar especially with ‘-’ scores
Formula for T-score: T = 10 (Z) + 50
ID
X
Mean
1
3
5.67
2
6
3
8
D
Sd
Z
T
-2.67 2.45
-1.09
39.1
5.67
0.33 2.45
0.13
51.3
5.67
2.33 2.45
0.95
59.5
Error and Standard
Error of Measurement
(Sm)
Error and Standard Error of
Measurement (Sm)
Every score has an error
 Error either adds or subtracts from your
true score
 True score = Obtained score +/- Error
 How to calculate error?
 Sm = SD1 - r

Example







Obtained score = 20; SD = 2; r = 0.64
Sm = SD1 - r
= 2 1- 0.64
= 2 0.36
= 2 x 0.6 = 1.2
True score = 20 – 1.2 = 18.8; and 20 + 1.2 =
21.2; or
Between 18.8 and 21.2 (at 1 SEM)
Item Analysis
Item difficulty
Item discrimination
Distractor analysis
Item difficulty (p)
How difficult is the item?
 Sometimes referred to as item facility.
 Used only with objective type tests
 Number of students who got the item
correct divided by the number of students
who attempted the item.
 Every item has an item difficulty value
 Possible values are from 0 to 1 with 0
indicating a difficult item

Example
30 students attempted the item
A 4
B 0
C 8
*D 18
 Find p
 p = No. of students who got it right
No of students who attempted
 = 18/30 = .60
 Note, this is also equal to 60 percent
correct

Item Discrimination (D)
To discriminate between good and weak
students
 Must determine the good and weak
students first
 Performance of good students compared
to performance of weak students divided
by the number of students in either group
 Every item has an item discrimination
value which range from -1 to 1

Example








Total number of students = 45
Number of students in Upper Group and Lower
Group = 15 each
Options
A
B
C
*D
Upper (Ug)
2
0
3
10
Lower (Lg)
2
1
6
6
Compute D
D = No. in Ug correct – No. in Lg correct
No of students in either group
D = 10 – 6 = 0.267
15
Deciding on Good and Bad Items
Item difficulty
 Item discrimination
 Check for miskeying, ambiguity and
guessing
 Evidence for miskeying: more chose
distractor than key
 Guessing: equal spread across options
 Ambiguity: equal number chose one
distractor and the key

END