Item Response Theory

Download Report

Transcript Item Response Theory

Item Response Theory
Using
Bayesian Networks
by
Richard Neapolitan
I will follow the Bayesian network approach to IRT
forwarded by Almond and Mislevy:
http://ecd.ralmond.net/tutorial/
A good tutorial that introduces basic IRT is provided at
the following site:
http://www.creative-wisdom.com/multimedia/ICHA.htm
Let Θ represent arithmetic ability.
Θ is called a proficiency.
We have the following items to test Θ:
Item
Task
1 (easiest)
2+2
2
16 - 12
3
64 x 27
4
673 x 515
5 (hardest)
105,110 / 67
0 represents average ability.
-2 is the lowest ability.
2 is the highest ability.
We assume performance on items is independent given the ability.
Item_1
Right
77.2
Wrong 22.8
Item_2
Right
64.6
Wrong 35.4
pos2
pos1
Zero
neg1
neg2
Theta
10.0
20.0
40.0
20.0
10.0
Item_3
Right
49.3
Wrong 50.7
Item_4
Right
35.4
Wrong 64.6
Item_5
Right
22.9
Wrong 77.1
IRT Logistic Evidence Model
1
p( X i  Right ) 
( bi )
1 e
bi measures the difficulty of the item.
b = 0 (average difficulty)
P
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-5
-4
-3
-2
-1
0
1
2
3
4
5
theta
b = - 1.5 (easy item)
P
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-5
-4
-3
-2
-1
0
1
2
3
4
5
theta
b = 1.5 (hard item)
P
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-5
-4
-3
-2
-1
0
1
2
3
4
5
theta
Discrimination Parameter: a
p
X i Right
1
1

a



b

i
i
e
a = 5, b = 0
P
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-5
-4
-3
-2
-1
0
1
2
3
4
5
theta
a = .5, b = 0
P
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-5
-4
-3
-2
-1
0
1
2
3
4
5
theta
a = 5, b = 1.5
P
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-5
-4
-3
-2
-1
0
1
2
3
4
5
theta
Two Proficiency Models
Two Proficiency Models
Compensatory:
More of Proficiency 1 compensates for less of Proficiency 2.
Combination rule is sum.
Conjunctive:
Both proficiencies are needed to solve the problem.
Combination rule is minimum.
Disjunctive:
Two proficiencies represent alternative solution paths to the
problem.
Combination rule is maximum.
H
M
L
P2_Comp
H 33.3
M 33.3
L
33.3
P1_comp
33.3
33.3
33.3
Compensatory
Right
50.0
Wrong 50.0
H
M
L
P1_conj
33.3
33.3
33.3
H
M
L
P2_conj
33.3
33.3
33.3
Conjunctive
Right
50.0
Wrong 50.0
H
M
L
P1_disj
33.3
33.3
33.3
H
M
L
Disjunctive
Right
50.0
Wrong 50.0
P2_disj
33.3
33.3
33.3
Yes
No
Yes
No
Skill1
50.0
50.0
0.5 ± 0.5
Skill2
50.0
50.0
0.5 ± 0.5
Task1
Right
50.0
Wrong 50.0
0.5 ± 0.5
Task2
Right
50.0
Wrong 50.0
0.5 ± 0.5
Task3
Right
50.0
Wrong 50.0
0.5 ± 0.5
Mixed Number Subtraction
This example is drawn from the research of Tatsuoka (1983) and her
colleagues. Almond and MsLevy (2012) did the analysis.
Their work began with cognitive analyses of middle-school
students’ solutions of mixed-number subtraction problems.
Klein et al. (1981) identified two methods that students used to
solve problems in this domain:
• Method A: Convert mixed numbers to improper fractions,
subtract, then reduce if necessary
• Method B: Separate mixed numbers into whole number and
fractional parts; subtract as two subproblems, borrowing one from
minuend whole number if necessary; then simplify and reduce if
necessary.
Their analysis concerns the responses of 325 students
Tatsuoka identified as using Method B to fifteen items in
which it is not necessary to find a common denominator.
The items are grouped in terms of which of the following
procedures is required for a solution under Method B:
Skill 1: Basic fraction subtraction.
Skill 2: Simplify/reduce fraction or mixed number.
Skill 3: Separate whole number from fraction.
Skill 4: Borrow one from the whole number in a given mixed
number.
Skill 5: Convert a whole number to a fraction.
All models are conjunctive.
Learning Parameters From Data
Learning From Complete Data
We use Dirichlet distributions to represent our belief about the
parameters.
In our hypothetical prior sample,
–
–
–
–
a11 is the number of times Θ tooks its first value.
b11 is the number of times Θ took its second value.
a21 is the number of times I took its first value when Θ took its first value.
b21 is the number of times I took its second value when Θ took its first
value.
Θ
I
1
1
1
1
1
2
2
1
2
1
2
2
2
2
2
2
Suppose we have the data in the table above.
a11 = a11 + 3 = 2 + 3 = 5
b11 = b11 + 5 = 2 + 5 = 7
P(Θ1 ) = 5/12
a21 = a21 + 2 = 1 + 2 = 3
b21 = b21 + 1 = 1 + 1 = 2
P(I1 | Θ1) = 3/5
Θ
I
?
1
?
1
?
2
?
1
?
1
?
2
?
2
?
2
But we don’t have data on the proficiency.
We then use algorithms that learn when there is
missing data.
Markov Chain Monte Carlo (MCMC).
Expectation Maximization (EM).
Influence Diagrams
Standard IRT
In traditional applications of IRT there usually is one
proficency Θ and a set of items.
A normal prior is placed on Θ.
The parameters a and b in the logistic function are
learned from data.
The model is then used to do inference for the next
case.