item - Innoevalua | studyslide.com

item - Innoevalua

Transcript item - Innoevalua

Items analysis
1
1. Introduction
• Items can adopt different formats and assess cognitive
variables (skills, performance, etc.) where there are
right and wrong answers, or non-cognitive variables
(attitudes, interests, values, etc.) where there are no
right answers.
• The statistics that we present are used primarily with
skills or performance items.
2
1. Introduction
• To carry out the analysis of the items should be
available :
– A data matrix with the subjects' responses to each items.
• To analyze test scores and the responses to the correct alternative,
the matrix will take the form of ones (hits) and zeros (mistakes).
• To analyze incorrect alternatives, in the matrix should appear
specific options selected by each subject.
• The correct alternative analysis (which is offered more
information about the quality of the test) allow us to
obtain the index of difficulty, discrimination, and
reliability and validity of the item.
3
1. Introduction
• Empirical difficulty of an item: proportion of subjects who answer
it correctly.
• Discriminative power: the ability of the item to distinguish
subjects with different level in the trait measured.
• Both statistics are directly related to the mean and variance of
total test scores.
• The reliability and validity of the items are related to the standard
deviation of the test and indicate the possible contribution of each
item to the reliability and validity of total scores of the test.
4
2. Items difficulty
• Proportion of subjects who have responded correctly to the item:
– One of the most popular indices to quantify the difficulty of the items
dichotomous or dichotomized.
• The difficulty thus considered is relative because it depends on:
– Number of people who attempt to answer the item.
– Their characteristics.
ID 
A
N
A: number of subjects who hits the item.
N: number of subjects that attempt to respond the item.
• It ranges between 0 and 1.
– 0: No subject has hit the item. It is difficult.
– 1: All subjects have answered correctly to the item. It
is easy.
5
2. Items difficulty
• Example: A performance item in mathematics is applied
to 10 subjects, with the following result :
Subject
a
b
c
d
e
f
g
h
i
j
Answers
1
1
1
1
0
1
0
1
1
0
ID 
7
 0.70
10
• The obtained value does not indicate whether the item
is good or bad. It represents how hard it has been to the
sample of subjects who have attempted to answer it.
6
2. Items difficulty
• The ID is directly related to the mean and variance of the test. In
dichotomous items:
n
ID 
j 1
j 1
j
N
X j  1 or 0 according to success or failure in the item
n
X
X
j
 A (hits)
– The sum of all scores obtained by subjects in this item is equal to the
number of hits. Therefore, the item difficulty index is equal to its mean.
• If we generalize to the total test, the average of the test scores
is equal to the sum of the difficulty indices of items.
7
2. Items difficulty
• The relationship between difficulty and variance of the test is
also direct. In dichotomous items:
S 2j  p j q j
p j  proportion of subjects that answer correctly to the item; the ID.
q j  1-p j
• In item analysis, a relevant question is to find the value of pj
that maximizes the variance of items.
– Maximum variance is achieved by an item when its pj is 0.5
– An item is appropriate when it is answered by different subjects and
causes in them different answers.
8
2. Items difficulty
2.1. Correction of hits by chance
• The fact of hitting an item depends not only on the subjects know
the answer, but also of subjets’ luck who without know it they
choose the correct answer.
• The higher the number of distractors less likely that subjects hit
the item randomly.
• It is advisable to correct the ID:
E
A
q
IDc   K  1  p 
N
N
K 1
IDc  corrected ID
A  hits
(negative values can
E= mistakes
be found)
p= proportion of hits
q= proportion of mistakes
k= number of item alternatives
N= número de sujetos que intentan responder el ítem
9
2. Items difficulty
2.1. Correction of hits by chance
Subjects
Item 1
Item 2
Item 3
Item 4
Item 5
A
1
1
1
1
1
B
1
0
1
0
1
C
1
1
0
1
0
D
1
0
0
1
0
E
0
1
0
1
1
F
1
0
0
1
0
G
0
1
1
1
0
H
1
0
0
1
0
I
1
1
0
0
0
J
0
0
0
1
1
ID
IDc
Example. Test composed by items with 3 alternatives
Calculate ID and IDc to each item
10
2. Items difficulty
2.1. Correction of hits by chance
Subjects
Item 1
Item 2
Item 3
Item 4
Item 5
A
1
1
1
1
1
B
1
0
1
0
1
C
1
1
0
1
0
D
1
0
0
1
0
E
0
1
0
1
1
F
1
0
0
1
0
G
0
1
1
1
0
H
1
0
0
1
0
I
1
1
0
0
0
J
0
0
0
1
1
ID
0.7
0.5
0.3
0.8
0.4
IDc
0.55
0.25
-0.05
0.7
0.1
Items which have suffered a major correction are those that have
proved more difficult.
11
2. Items difficulty
2.1. Correction of hits by chance
Recommendations
• That items with extreme values to the population that
they are targeted in the index of difficulty are
eliminated from the final test.
• In aptitude test we will get better psychometric results
if the majority of items are of medium difficulty.
– Easy items should be included, preferably at the beginning (to
measure less competent subjects).
– And difficult items too (to measure less competent subjects).
12
3. Discrimination
• Logic: Given an item, subjects with good test
scores have been successful in a higher
proportion than those with low scores.
• If an item is not useful to differentiate between
subjects based on their skill level (does not
discriminate between subjects) it should be
deleted.
13
3. Discrimination
3.1. Index of discrimination based on extreme
groups (D)
• It is based on the proportions of hits among skill
extreme groups (25 or 27% above and below from the
total sample).
– The top 25 or 27% would be formed by subjects who scored
above the 75 or 73 percentile in the total test.
• Once formed the groups we have to calculate the
proportion of correct answers to a given item in both
groups and then apply the following equation:
D  ps  pi
ps  proportion of hits in the higher group
pi  proportion of hits in the lower group
14
3. Discrimination
3.1. Index of discrimination based on extreme
groups (D)
• D index ranges between -1 and 1.
– 1= when all people in the upper group hit the item
and those people from the lower group fail it.
– 0= item is equally hit in both groups.
– Negative values= less competent subjects hit the item
more than the most competent subjects (the item
confuses to more skilled subjects).
15
3. Discrimination
3.1. Index of discrimination based on extreme
groups (D)
Example. Answers given by 370 subjects at 3 alternatives (A, B, C) of an item
where B is the correct option. In rows are the frequency of subjects who
selected each alternative and have received scores above and below the 27%
of their sample in the total test, and the group formed by the central 46%.
A
B*
C
19
53
28
Intermediete 46% 52
70
48
Lower 27%
19
16
Upper 27%
65
Calculate the corrected index of difficulty and the discrimination index.
16
3. Discrimination
3.1. Index of discrimination based on extreme
groups (D)
• Proportion of correct answers:
(53+70+19)/370=0.38
• Proportion of mistakes:
228/370=0.62
q
0.62
IDc  p 
 0.38 
 0.07
k 1
3 1
53
19
53  19
D  ps  pi 


 0.34
19  53  28 65  19  16
100
17
3. Discrimination
3.1. Index of discrimination based on extreme
groups (D)
Interpretation of D values (Ebel, 1965)
Values
Interpretation
D ≥ 0.40
The item discriminates very well
0.30 ≤ D ≤ 0.39
The item discriminates well
0.20 ≤ D ≤ 0.29
The item discriminates slightly
0.10 ≤ D ≤ 0.19
The item needs revision
D < 0.10
The item is useless
18
3. Discrimination
3.2. Indices of discrimination based
on the correlation
• If an item discriminate adequately, the correlation
between the scores obtained by subjects in that item
and the ones obtained in the total test will be positive.
– Subjects who score high on the test are more likely to hit the
item.
• Def.: correlation between subjects scores in the item
and their scores in the test (Muñiz, 2003).
• The total score of subjects in the test will be calculated
discounting the item score.
19
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.1. Correlation coefficient Φ
• When test score and item scores are strictly
dichotomous.
• It allow us to estimate the discrimination of an item with
some criterion of interest (eg. fit and unfit, sex, etc.).
• First, we have to sort data in a 2x2 contingency table.
– 1= item is hit/criterion is exceeded.
– 0= item is fail/criterion is not exceeded.

pxy  px p y
p x qx p y q y
20
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.1. Correlation coefficient Φ
Example. The following table shows the sorted results from 50
subjects who took the last psychometrics exam.
Item 5 (X)
Criterion (Y)
1
0
Fit
pxy
30/50=0.6
5
py
35/50=0.7
Not fit
5
10
qy
15/50=0.3
px
35/50=0.7
qx
15/30=0.3
N=50
21
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.1. Correlation coefficient Φ

pxy  px p y
0.6  0.7 *0.7

 0.52
p x qx p y q y
0.7 *0.3*0.7 *0.3
• There is a high correlation between the item
and the criterion. That is, those subjects who
hit the item usually pass the psychometrics
exam.
22
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.2. Point-biserial correlation
• When the item is a dichotomous variable and the test
score is continuous.
X1  XT
rpb 
SX
p
q
X 1 = Mean in the test of participants that answered the item correctly.
X T = Mean of the test.
SX = Standard deviation of the test.
p = Proportion of participants that answered the item correctly.
q = proportion of participants that answered the item incorrectly.
• Remove the item score from the test score.
23
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.2. Point-biserial correlation
Example. The following table shows the responses of 5 subjects to 4
items. Calculate the point-biserial correlation of the second item.
Items
Participants
1
2
3
4
A
0
1
0
1
B
1
1
0
1
C
1
1
1
1
D
0
0
0
1
E
1
1
1
0
24
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.2. Point-biserial correlation
Items
Total
Participants
1
2
3
4
X
(X-i)
(X-i)2
A
0
1
0
1
2
1
1
B
1
1
0
1
3
2
4
C
1
1
1
1
4
3
9
D
0
0
0
1
1
1
1
E
1
1
1
0
3
2
4
9
19
∑
25
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.2. Point-biserial correlation
• Participants who answered correctly the item are A, B, C and E; so
their mean is:
1 2  3  2
X1 
2
4
• The total mean is:
9
X T   1 .8
5
• The standard deviation of the test is:
SX 
2
X

N
19
X 
 1.82  0.56  0.75
5
2
26
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.2. Point-biserial correlation
4
p   0.8
5
1
q   0 .2
5
X1  XT
rpb 
SX
p 2  1.8 0.8

 0.54
q
0.75 0.2
27
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.3. Biserial correlation
• When both item and test score are inherently continuous
variables, although one is dichotomized (the item).
X1  XT p
rb 
SX
y
y = height in the normal curve corresponding to the typical score that leaves
beneath a probability equal to p (see table).
• We can find values greater than 1, especially when one of the
variables is not normal.
• Example. Based on the table of the previous example, calculate
the biserial correlation of item 3.
28
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.3. Biserial correlation
Items
Total
Participants
1
2
3
4
X
(X-i)
(X-i)2
A
0
1
0
1
2
2
4
B
1
1
0
1
3
3
9
C
1
1
1
1
4
3
9
D
0
0
0
1
1
1
1
E
1
1
1
0
3
2
4
11
27
∑
29
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.3. Biserial correlation
• Participants who answered correctly the item are C and E; so
their mean is:
3 2
X1 
 2.5
2
• The total mean is:
11
X T   2.2
5
• The standard deviation of the test is:
SX 
2
X

N
27
X 
 2.22  5.4  4.84  0.56  0.75
5
2
30
3. Discrimination
3.2. Indices of discrimination based on the correlation
3.2.3. Biserial correlation
2
p   0.4
5
X1  XT
rpb 
SX
p 2.5  2.2
0.4

 0.4 *1.02  0.41
q
0.75
0.3863
Because the value p=0.4 does not appear in the first column of the table,
we look for its complement have to be looked up (0.6), which is associated
with an y=0.3863.
31
3. Discrimination
3.3. Discrimination in attitude items
• There are no right or wrong answers but the participant
must be placed in the continuum established based on
the degree of the measured attribute.
• Correlation between item scores and test scores.
– Because items are not dichotomous, Pearson
correlation coefficient is used.
• That coefficient can be interpreted as a Homogeneity Index
(HI). It indicates how much the item is measuring the same
dimension or attitude as the rest of the items of the scale.
32
3. Discrimination
3.3. Discrimination in attitude items
R jx 
N  JX   J  X
[ N  J 2  (  J ) 2 ][ N  X 2  (  X ) 2 ]

COV ( JX )
S j Sx
N  sample size
 J  sum of the subjects scores in J element
 X  sum of the subjects scores in the scale
R jx  correlation between scores obtained by subjects
in J element and in the scale
• Items in which HI is below 0.20 should be eliminated.
• Correction: deduct from the total score the item score or apply the
formula below:
R j ( x j ) 
R jx S x  S j
S x2  S 2j  2 R jx S x S j
33
3. Discrimination
3.3. Discrimination in attitude items
Example. The table below presents the answers of 5 people to 4
attitudes items. Calculate the discrimination of item 4 by Pearson
correlation.
Items
Total XT
X4XT
X 24
X 2T
Subjects X1
X2
X3
X4
A
2
4
4
3
13
39
9
169
B
3
4
3
5
15
75
25
225
C
5
2
4
3
14
42
9
196
D
3
5
2
4
14
56
16
196
E
4
5
2
5
16
80
25
256
20
72
292
84
1042
34
3. Discrimination
3.3. Discrimination in attitude items
• The correlation or IH between item 4 and total score of the test
will be:
R jx 
N  JX   J  X
[ N  J 2  ( J ) 2 ][ N  X 2  ( X ) 2 ]

5* 292  20 * 72
[5*84  20 2 ][5*1042  72 2 ]
 0.88
• Inflated result because item 4 score is included in total score.
Correction:
– Standard deviation both for item 4 and total score:
32  52  32  42  52
Sx4 
 (4)2  0.80  0.89
5
S xT 
R j ( x j ) 
132  152  142  142  162
 (14.4)2  1.04  1.02
5
R jx S x  S j
S x2  S 2j  2 R jx S x S j

0.88*1.02  0.89
 0.01
1.04  0.80  2 * 0.88*1.02 * 0.89
35
3. Discrimination
3.3. Discrimination in attitude items
• The big difference when applying the correction
is due to the small number of items that we have
used in the example.
– As the number of items increases, that effect
decreases because the influence of item scores on
the total score is getting smaller. With more than 25
items, the result is very close.
36
3. Discrimination
3.3. Discrimination in attitude items
•
Other procedure:
– Useful but less efficient than the previous because it does not use the entire sample.
– Determine whether the item mean for the subjects with higher scores on the total
test is statistically higher than the mean of those with lower scores. It is common to
use 25% or 27% of subjects with best and worst scores.
– Once the groups are identified, we calculate if the mean difference is statistically
significant by Student T test.
– Ho: means in both groups are equal.
T
X uj  X lj
(nu  1) Suj2  (nl  1) Slj2
nu  nl  2
37
3. Discrimination
3.3. Discrimination in attitude items
X uj = mean of the scores obtained in the item by the 25% of the participants that
obtained the highest scores in the test.
X lj = mean of the scores obtained in the item by the 25% of the participants that
obtained the lowest scores in the test.
2
Suj = variance of the scores obtained in the item by the 25% of the participants that
obtained the highest scores in the test.
2
Slj = variance of the scores obtained in the item by the 25% of the participants that
obtained the lowest scores in the test.
nu and nl = number of participants in the upper and the lower group respectively.
– Conclusions:
• T≤T(α,nu+nl-2) – Null hypothesis is accepted. There are not statistical differences
between means. The item does not discriminate adequately.
• T≤T(α,nu+nl-2) – Null hypothesis is rejected. There are statistical differences
between means. The item discriminates adequately.
– Student T test is used when the scores in the item and the scale are distributed
normally, and their variances are equal. If some of these assumptions are violated, a
38
non-parametric test should be used (e.g., Mann-Whitney U).
3. Discrimination
3.3. Discrimination in attitude items
Exercise: using the data presented in the last example, calculate Student T test for
item 2 (α=0.05).
• To calculate the discrimination of item 2 by Student T Test, we have to do
groups with extreme scores. Because of didactic reasons, we are going to use
just 2 participants to form those groups.
Upper group
Lower group
Participants
X2
E (16)
5
B (15)
4
Participants
X2
A (13)
4
C (14)
2
39
3. Discrimination
3.3. Discrimination in attitude items
Upper group
Lower group
X uj
X


X lj
X


uj
nu
nl
lj
X 22
Participants
X2
E (16)
5
25
B (15)
4
16
∑
9
41
Participants
X2
X 22
A (13)
4
16
C (14)
2
4
∑
6
20
9
  4.5
2
6
 3
2
40
3. Discrimination
3.3. Discrimination in attitude items
Suj2 
S
2
lj
2
X
 uj
nu
X


T
nl
2
lj
2
 X uj 
41
 4.52  20.5  20.25  0.25
2
20 2
X 
 3  10  9  1
2
2
lj
X uj  X lj
(nu  1) Suj2  (nl  1) Slj2
nu  nl  2

4.5  3
 1.9
(2  1)0.25  (2  1)1  1 1 
  
222
2 2
One tail: T(α,nu+nl-2) = T(0.05, 2+2-2) = T(0.05, 2) = 2.92
1.9 < 2.92 – Null hypothesis is accepted. There are not statistical
differences between means. The item does not discriminate adequately.
41
3. Discrimination
3.4. Factors that affect the discrimination
3.4.1. Variability
• Relation between test variability and item discrimination:
n
S x   S j rjx
j 1
S x  Standard deviation of the test
S j  Standard deviation of the item
rjx  Discrimination index of item j
• If the test is composed by dichotomous items:
n
S   p j q r ; Sx 
2
x
j 1
2
j jx
n
p q r
j 1
j
2
j jx
• To maximize the discriminative ability of one test, we have to
consider together both the difficulty (pj) and the discrimination
(rjx) of its items.
– It is achieved when discrimination is maximun (rjx=1) and the difficulty is
42
medium (pj=0.5).
Item discrimination
3. Discrimination
3.4. Factors that affect the discrimination
3.4.2. Item difficulty
Item difficulty
An item reaches its maximum discriminative power when
43
its difficulty is medium.
3. Discrimination
3.4. Factors that affect the discrimination
3.4.3. Dimensionality of the test
• When we are constructing a test, usually we try
to measure one single construct
(unidimensionality).
• In multidimensional tests, item discrimination
should be estimated considering only the items
that are associated with each dimension.
44
3. Discrimination
3.4. Factors that affect the discrimination
3.4.4. Test reliability
• If discrimination is defined as the correlation between scores
obtained by participants in the item and the test, then reliability
and discrimination are closely related.
• It is posible to express the Cronbach alpha coefficient from the
discrimination of items:


n
n

2 
2


Sj

 Sj 

n 
n 
j 1

1  j 1 2  
1

2 
n 1 
S x  n  1   n
 
S
r


j jx  
 


j

1
 
 
• Small values in item discrimination are typically associated with
unreliable tests.
45
Reliability coefficient KR21
3. Discrimination
3.4. Factors that affect the discrimination
3.4.4. Test reliability
Mean discrimination
As the mean discrimination of the test increases, so does the reliability
coefficient.
46
4. Indices of reliability and validity of the items
4.1. Reliability index
• To quantify the degree in which an item is measuring accurately
the attribute of interest.
RI  S j D j
Sj = Standard deviation of the scores in the item.
Dj = Discrimination index of the item.
• When any correlation coefficient is used to calculate the
discrimination of items,
RI  S j rjx
S 2j  pq
2
2
RI

S

X
47
4. Indices of reliability and validity of the items
4.1. Reliability index
• To the extent that we select items with higher RI, the
better the reliability of the test will be.
• Highest possible value of RI = 1.
• Example: Having the information presented in the table
below, calculate the RI of item 4.
Item 4
p
0.47
rbp
0.5
48
4. Indices of reliability and validity of the items
4.1. Reliability index
RI  S j rjx  0.5 * 0.5  0.25
S 2j  pq  0.47 * 0.53  0.25
q  1  p  1  0.47  0.53
S j  S  0.25  0.5
2
j
49
4. Indices of reliability and validity of the items
4.2. Validity index
• The validity of an item involves the correlation of the scores
obtained by a sample of participants in the item with the scores
obtained by the same subjects in any external criterion of our
interest.
– It serves to determine the degree in which each item of one test
contributes successfully to make predictions about that external criterion.
VI  S j rjy
• In the case that the criterion is a continuous variable and the
item is a dichotomous variable, we are going to use the pointbiserial correlation; but it is not necessary to substract from the
total score of the external criterion the item score because it is
not included.
VI  S j rpbjy
50
4. Indices of reliability and validity of the items
4.2. Validity index
• Test validity (rxy) can be expressed in connexion with the VI of the
items. The higher VI of the items are, the more optimized the
validity of the test will be.
rxy
S r


S r
j jy
j jx
VI


 RI
• This formula allows us to see how the validity of the test can be
estimated from the discrimination index of each item (rjx), their
validity indexes (rjy) and their difficulty indexes (S 2j  p j q j ).
51
4. Indices of reliability and validity of the items
4.2. Validity index
• Paradox in the selection of items: if we want to select
items to maximize the reliability of the test we have to
choose those items with a high discrimination index (rjx),
but this would lead us to reduce the validity of the test
(rxy) because it increases as validity indexes (VI) are high
and reliability indexes (RI) are low.
52
4. Indices of reliability and validity of the items
4.2. Validity index
Example. The table below presents the scores of 5
participants in a test with 3 items.
Participants
Item 1
Item 2
Item 3
A
0
0
1
B
1
1
1
C
1
0
0
D
1
1
1
E
1
1
1
rjy
0.2
0.4
0.6
Calculate the validity index of the test (rxy).
53
4. Indices of reliability and validity of the items
4.2. Validity index
rxy
S r


S r
j jy
j jx
0.4 * 0.2  0.49 * 0.4  0.4 * 0.6
0.516


 0.75
0.4 * 0.25  0.49 * 0.99  0.4 * 0.25 0.685
S 2j  p j q j  S j  S 2j
4 1
S  *  0.8 * 0.2  0.16  S1  0.16  0.4
5 5
3 2
2
S 2  *  0.6 * 0.4  0.24  S 2  0.24  0.49
5 5
4 1
2
S3  *  0.8 * 0.2  0.16  S3  0.16  0.4
5 5
2
1
54
4. Indices of reliability and validity of the items
4.2. Validity index
(X-it1)2
(X-it2)2
(X-it3)2
0
1
1
0
2
2
4
4
4
0
1
1
0
1
1
3
2
2
2
4
4
4
3
2
2
2
4
4
4
Σ=7
Σ=8
Σ=7
Σ=13
Σ=14
Σ=13
1
2
3
X (X-it1) (X-it2) (X-it3)
A
0
0
1
1
1
1
B
1
1
1
3
2
C
1
0
0
1
D
1
1
1
E
1
1
1
rjy 0.2 0.4 0.6
55
4. Indices of reliability and validity of the items
4.2. Validity index
X1  XT
rpb1 
SX
p 1.5  1.4 0.8 0.1


4  0.125 * 2  0.25
q
0. 8
0 .2 0 . 8
2022 6
  1.5
4
4
7
X T   1. 4
5
13
SX 
 1.4 2  2.6  1.96  0.64  0.8
5
4
p   0.8
5
q  1  p  1  0 . 8  0 .2
X1 
56
4. Indices of reliability and validity of the items
4.2. Validity index
rpb 2
X1  XT

SX
p 2  1.6 0.6 0.4


1.5  0.99
q
0.49 0.4 0.49
222
2
3
8
X T   1. 6
5
14
SX 
 1.6 2  2.8  2.56  0.24  0.49
5
3
p   0.6
5
q  1  p  1  0.6  0.4
X1 
57
4. Indices of reliability and validity of the items
4.2. Validity index
rpb3
X1  XT

SX
p 1.5  1.4 0.8 0.1


4  0.125 * 2  0.25
q
0 .8
0.2 0.8
0222 6
  1.5
4
4
7
X T   1. 4
5
13
SX 
 1.4 2  2.6  1.96  0.64  0.8
5
4
p   0.8
5
q  1  p  1  0 . 8  0 .2
X1 
58
5. Analysis of distractors
• It involves investigating in the distribution of subjects across the
wrong alternatives (distractors), in order to detect possible
reasons for the low discrimination of any item or see that some
alternatives are not selected by anyone, for example.
• In this analysis, the first step implies:
– To check that all the incorrect options are chosen by a minimum number of
subjects. If possible, they should be equiprobable.
• Criteria: each distractor have to be selected by at least the 10% of the sample
and there is not many difference between them.
– That performance on the test of subjects who have selected each incorrect
alternative is less than the performance of subjects that have selected the
correct one.
– It is expected that as the skill level of subjects increases, the percentage of
those who select incorrect alternatives decrease and vice versa.
59
5. Analysis of distractors
5.1. Equiprobability of distractors
• Distractors are equiprobable if they are selected by a
minimum of participants and if they are equally
attractive to those who do not know the correct answer.
• Χ2 Test:
k
Ei  Oi 2
j 1
Ei
2  
Ei = Expected (theoretical) frequency.
Oi = Observed frequency.
60
5. Analysis of distractors
5.1. Equiprobability of distractors
• Degrees of freedom: K -1 (K = number of incorrect alternatives).
• Ho: Ei = Oi (in the participants that do not know the correct
answer, the election of any distractor is equally attractive).
• Conclusion:
2
2



– O
( , k 1)
→ The null hypothesis is accepted. The distractors are
equally attractive.
2
2
– O   ( ,k 1) → The null hypothesis is rejected. The distractors are not
equally attractive.
61
5. Analysis of distractors
5.1. Equiprobability of distractors
Example. Determine if the incorrect alternatives
are equally attractive (α=0.05).
Number of answers
A
B*
C
136
142
92
62
5. Analysis of distractors
5.1. Equiprobability of distractors
k
Ei  Oi 2
j 1
Ei
2  
(114  136) 2  (114  92) 2


114
22 2  22 2 484  484 968



 8.49
114
114
114
136  92 228
Ei 

 114
2
2
To be equiprobable, each distractor should be selected by 114
participants.
63
5. Analysis of distractors
5.1. Equiprobability of distractors
(2 ,k 1)  (20.05,21)  (20.05,1)  3.84
8.49>3.84 → The null hypothesis is rejected. Incorrect alternatives
are not equally attractive to all subjects, although they met the
criterion of being selected by a minimum of 10% of the total
sample (N).
N  136  142  92  370
370 *10
10% 
 37
100
136  37
92  37
64
5. Analysis of distractors
5.2. Discriminative power of distractors
• It is expected from a good distractor that its correlation with test
scores is negative.
• To quantify the discriminative power of incorrect alternatives we
use the correlation. Depending on the kind of variable, we will
use biserial, biserial-point, phi or Pearson.
65
5. Analysis of distractors
5.2. Discriminative power of distractors
• Example. Answer of 5 subjects to 4 items. Brackets show
the alternatives selected by each subject and the correct
alternative with an asterisk. Calculate discrimination of
distractor b in item 3.
Items
Total
Subjects 1(a*)
2(b*)
3(a*)
4(c*)
X
(X-i)
A
0 (b)
1
0 (b)
1
2
2
B
1
1
0 (b)
1
3
3
C
1
1
1
1
4
3
D
0 (c)
0 (a)
0 (b)
1
1
1
e
1
1
1
0 (b)
3
2
66
5. Analysis of distractors
5.2. Discriminative power of distractors
• Subjets who have selected alternative b (the incorrect one) in
item 3 have been A, B and D. The mean of these subjects in the
test after eliminating the analyzed item score is:
XA 
2  3 1
2
3
• Total mean of the test resting from the scores obtained by
subjects, the score of item 3:
X T i 
2  3  3 1 2
 2.2
5
• Standard deviación of scores corresponding to (X-i):
ST i
22  32  32  12  22

 (2.2)2  0.56  0.75
5
• The proportion of subjects that have hit the item is 2/5=0.4; and
67
the proportion of subjects that have failed is 3/5=0.6.
5. Analysis of distractors
5.2. Discriminative power of distractors
• The point-biserial correlation between the incorrect
alternative “b” and test scores discounting the item
score is:
rbp 
X A  X T i
S X i
p 2  2.2 0.4

 0.22
q
0.75 0.6
– As the incorrect alternative in the score of these score subjects
in the item is 0, no need to remove anything from the total
test.
• The distractor discriminates in the opposite direction
than the correct alternative. It's a good distractor.
68
5. Analysis of distractors
5.2. Discriminative power of distractors
• Visual inspection of the distribution of subject answers
to the various alternatives.
Skill level
Statistics
A
B
C*
High
20
25
55
Low
40
35
25
p
0.28
0.5
0.22
Mean
5
10
9
rbp
-0.20
0.18
0.29
– Proportion of subjects that have selected each option: p
– Test mean of subjects that have selected each alterative:
mean
– Discrimination index of all options: rbp
69
5. Analysis of distractors
5.2. Discriminative power of distractors
• Positive discrimination index: the correct
alternative is mostly chosen by competent
subjects.
• Distractor A:
– Has been selected by an acceptable minimum of
subjects (28%) and is selected by subjects less
competent in a higher proportion.
– The test mean of subjects who have selected it is less
than the test mean of subjects that have selected the
correct alternative (consistent with its negative
70
discrimination index).
5. Analysis of distractors
5.2. Discriminative power of distractors
• Distractor B should be revised:
– It is chosen as correct by the subjects with better
scores in the test.
– It has been the most selected (50%), its
discrimination is positive, and the mean of the
subjects that have selected it is higher than one of
subjects who have chosen the correct alternative.
71
5. Analysis of distractors
5.2. Discriminative power of distractors
• In distractors analysis we still can go further and
use statistical inference. The test mean of
subjects that choose the correct alternative
should be higher than the test mean of subjects
that have chosen each distractor.
– ANOVA. IV or factor: each item with as many levels as
answer alternatives. DV: the raw score of subjects in
the test.
72

item - Innoevalua

Transcript item - Innoevalua

Directory