Introduction to Test Development

Download Report

Transcript Introduction to Test Development

Introduction to Test
Development
Graham McMahon, MD, MMSc.
Sarah E. Peyre, EdD
Educational Research Methods Program
Learning Objectives



Understand the pros and cons to various testing questions
for written examinations
Learn how to determine
 Item difficulty and
 Item discrimination
Understand the psychometrics of a high stakes test
 Validity
 Reliability
 Standard Setting
Come to our Workshop!

Work in small groups to…
Review problematic multiple choice items
 Establish validity and reliability for a test
 Participate in standard setting exercise

Question Types – Pros and Cons






Essay Items
Short Answer and Completion Items
Matching Items
True-False and Multiple-Choice Tests
Interviews
Portfolios
….all can be scored and can be subject to test development
Multiple-Choice Items






Stem
An 85-year-old woman has difficulty raising
her arms above her head and

combing her hair. She has morning aches in her shoulders and neck. Her
reflexes are symmetrical and normal. There is no muscle tenderness or joint
swelling. Which one of following laboratory tests should be obtained to
confirm the most likely diagnosis?
 Lead in
A. Anti-nuclear antibody.
B. Erythrocyte sedimentation rate.
C. Serum concentration of creatine kinase.
D. Serum concentration of angiotensin-converting
enzyme.
 Responses
E. Urine microscopy.
 Correct
response
 Distractors
Tips for writing discriminant MCQs


Be sure that each item reflects a clearly defined learning outcome
Stem




The stem of the item should be self-contained and written in clear and
precise language.
Avoid ‘trigger’ words (e.g. pin-rolling tremor)
Negatives, excepts, absolutes and qualifiers in question stems are no-no’s.
Responses






All answers should be plausible and homogenous
Items need to be independent of one another
Answer choices should be similar in length and grammatical form
List answer choices in alphabetical or numerical order
Avoid ‘all of the above’ as a response
Avoid technical flaws (tense or plurality for example)
Pros and Cons of MCQ’s
Pros
 Useful for measuring
learning outcomes at
almost any level
 Easy to understand
 Easy to score
 Easily analyzed for
effectiveness
 Allow broad coverage
efficiently
Cons
 Good questions




Take a long time to write
Are difficult to write
Constrain creative
responses from learners
May have more than one
correct answer
Item Analysis


Qualitative: looks at whether the content
matches the information, attitude, characteristic
or behavior being assessed
Quantitative:
Item difficulty
 Item discrimination

Determining item difficulty

The percentage of participants who get that item
correct
Item difficulty scores can range from 0 to 100%
Number of Students achieving each Score
30

20


Low value = high difficulty
10
High value = low difficulty
0
0
10
20
30
40
Hard Exam
0
50
60
70
Normal Exam
80
90
100
Easy Exam
High
(Difficult)
Medium
(Moderate)
Low
(Easy)
<= 30%
>30% AND
< 80%
>=80
%
10
20
30
40
50
60
70
80
90
100
Discrimination Index



The Discrimination Index distinguishes for each item
Index
of discrimination:
between
the performance of students who did well on the
 The
difference
in who
the %did
of poorly.
people in one extreme group
exam
and
students
minus the % of people in the other extreme group
 Item discrimination scores can range from -1.00 to +1.00
Example
 100 test takers: 20 in top 25 were correct but only 5 in the
lowest 25 students were correct.
 DI = (20-5)/25 = 0.8
Item
Discrimination
(D)
Item Difficulty
High
Med
Low
D =< 0%
review
review
revie
w
0% < D < 30%
ok
review
ok
D >= 30%
ok
ok
ok
Item Analysis Report
Order ID and group number
percentages



counts
The left half shows percentages, the right half counts.
The correct option is indicated in parentheses.
Point Biserial is similar to the discrimination index, but is not based on fixed upper and
lower groups. For each item, it compares the mean score of students who chose the correct
answer to the mean score of students who chose the wrong answer.
Test Validity

Validity:


The extent to which inferences made from a test are
appropriate, meaningful, or useful.
Does my test measure what it is intended to measure?

Content validity


Criterion validity – Predictive/Concurrent


Expert review
Scores can be related to another known metric
Construct validity

Successfully differentiates between levels of learners
Kissing Cousins

A test can not be valid until it is reliable:
Test Reliability

Reliability: Measure the underlying construct
consistently = trustworthiness/stability
Test-Retest Reliability
 Alternate forms reliability
 Internal consistency reliability (cronbach’s alpha)
 Inter-rater reliability

How do I set a passing grade?

Standard Setting

Norm referenced: Z-scores


Number of standard deviations below the mean
Criterion Referenced: Angoff Method
Panel of experts are asked to evaluate each item and
estimate the number fraction of minimally competent
students who would answer each item correctly
 Ratings are averaged across the experts for each item,
discussed and then summed to get panel raw cutscore

Thank you!
Welcome to Our
Workshop on Test
Development!
Graham McMahon, MD, MMSc.
Sarah E. Peyre, EdD
Educational Research Methods
The Academy at Harvard Medical School
Outline
 Learning
Objectives
 Creating MCQ Items
 Item
Template
 Item Flaws
 Tips for Success
 Establishing
Validity and Reliability for a Test
 Mock Standard Setting
Item Creation
 Consider
beginning with the end in mind
 What
is it that you think the medical student should
demonstrate that he/she knows or knows how to do?
 This should be an objective from your lesson plan.
Learning Activities
Objectives
Evaluation
Item Stems: Clinical Vignettes

Things to consider:
Patient description (46-year-old-female)
 Functional disability (difficulty rising from a seated position,
but has no difficulty flexing her legs)


The question based on this item template:
A 46-year-old-female has difficulty rising from a seated
position, but has no difficulty flexing her legs. Which of the
following muscles has been injured?
[Objective: Identify and explain the function of the muscles in the…. ]

Item Creation

Lead-in: The most likely
diagnosis is



Options: disorders, diseases
Objective: Describe the signs
and symptoms of X. Compare
and contrast the signs and
symptoms of XY and Z.
Lead-in: Which of the
following additional
symptoms would you expect
to be present?


Options: symptoms
Objective: same as above

Lead-in: The most likely
cause is



Options: bacteria, toxins,
medications, metabolic defects
Objective: List and explain the
causes of X.
Lead-in: The most likely
mechanism is


Options: disease mechanisms,
pharmacologic mechanisms
Objective: Diagram and
explain the mechanism of drug
X.
Item Templates

Other considerations:
Age, gender, race, ethnicity
 Site of care (ER, office visit)
 Presenting complaint



presents for a routine physical exam
presents with a headache
Duration
 Patient history, family history




Physical findings


There is no history of…
He has a history of…
Lab values, imaging studies, pathology reports
Treatment, subsequent findings
Item Creation

Add the lead-in (question) and the options
 Which of the following pulmonary variables is most likely to be
lower than normal in this patient?
A. Alveolar-arterial PO2 difference
B. Compliance of the lung
C. Oncotic pressure of the alveolar fluid
D. Work of breathing
E. Residual volume
Item Creation: Taking Recall up
to Another Level
 Recall
question:
What area is supplied with blood by the posterior
inferior cerebral artery?
[Objective: Identify the areas of the brain supplied by the
major cerebral arteries.]
Item Creation: Taking Recall up
to Another Level
Application question:
A 62-year-old man develops left-sided limb ataxia,
Horner’s syndrome, nystagmus and loss of facial pain
and temperature. Which artery is most likely to be
occluded?

[Objective: Differentiate the signs and symptoms that would occur
upon occlusion of each of the major cerebral arteries.]
Your Turn!
Review the distributed questions
and identify strengths and
weaknesses in each.
Question






Acute intermittent porphyria is the result of a
defect in the biosynthetic pathway for
A. collagen
B. corticosteroid
C. fatty acid
D. glucose
E. heme
Rewritten….






An otherwise healthy 33-year-old male has mild weakness and
occasional episodes of steady, severe abdominal pain with some
cramping but no diarrhea. One aunt and a cousin have had
similar episodes. During an episode, his abdomen is distended,
and bowel sounds are decreased. Neurological examination
shows mild weakness in the upper arms. These findings suggest
a defect in the biosynthetic pathway for:
A. collagen
B. corticosteroid
C. fatty acid
D. glucose
E. heme
Question
A 52-year-old male presents to the office with a one-week history of
flank pain and hematuria. Past medical history is unremarkable.
Physical examination reveals a left-sided abdominal mass. The
greatest risk factor for renal cell carcinoma is
A. diabetes
B. female gender
C. hyperlipidemia
D. low body mass index
E. smoking
Question
Which of the following is a correct statement about cystic
fibrosis (CF)?
A. The incidence of CF is 1:2000.
B. Children with CF usually die in their teens.
C. Males with CF are sterile.
D. CF is an autosomal recessive disease.
E. Symptoms of CF only appear in infancy.
What other flaws can you detect in this question?
Item Flaws: Unfocused items
Which of the following is correct regarding [topic]?
There is not enough information in the stem to answer
the question without looking at the options.
The responses are disparate. The distractors have to be
100% false. Thus, the question basically becomes a
true/false question. Avoid these!
A 45-year-old man comes to the physician because of a 6 week history of
a non-productive cough. An X-ray film of the chest shows a 0.8 cm well
circumscribed peripheral nodule in the right lung. Biopsy shows a
necrotizing granuloma. Which of the following is the most likely
diagnosis?
(A)
(B)
(C)
(D)
(E)
(F)
Pulmonary embolus
Small cell carcinoma
Pseudomonas aeruginosa infection
Histoplasma capsulatum
Herpes pneumonitis
Metastatic renal cell carcinoma
A healthy 57-year-old woman comes to the physician
because of 2 cm mass in her right breast. Biopsy reveals
an invasive ductal carcinoma. Which of the following is
the most important prognostic factor?
(A)
(B)
(C)
(D)
(E)
(F)
High grade tumor cytology
Infiltrative nature of tumor into benign breast
Numerous mitotic figures
Amount of tumor fibrosis
Presence of Lymph node metastasis
Number of plasma cells in tumor
A 63-year-old man comes to the physician because of a 6-week history of
progressive dyspnea on exertion, orthopnea, and ankle edema. He has
received multiagent chemotherapy for Waldenström’s macroglobulinemia
for the past year. Urinalysis shows proteinuria. A bone marrow biopsy
shows a partial response to therapy with ongoing marrow involvement
still identified. Which of the following is the most likely diagnosis?
(A)
(B)
(C)
(D)
(E)
Cardiac amyloidosis
Viral myocarditis
Cardiac sarcoidosis
Myocardial infarct
Hypertrophic cardiomyopathy
A question submitted
In aortic stenosis what other abnormal heart
sounds might accompany the resulting
murmur?
A.
B.
C.
D.
Physiological splitting of S2
An accentuated S2
Paradoxical splitting of S2
A muffled S2
Revised question
A 60 year old patient with an active lifestyle is found to
have a systolic murmur on a routine physical exam. He
currently has no symptoms. If this were aortic stenosis,
what other abnormal heart sounds might accompany
the systolic murmur?
A.) Physiological splitting of S2
B.) An accentuated S2
C.) Paradoxical splitting of S2
D.) A muffled S2
Determining item difficulty

The percentage of participants who get that item
correct
Item difficulty scores can range from 0 to 100%
Number of Students achieving each Score
30

20


Low value = high difficulty
10
High value = low difficulty
0
0
10
20
30
40
Hard Exam
0
50
60
70
Normal Exam
80
90
100
Easy Exam
High
(Difficult)
Medium
(Moderate)
Low
(Easy)
<= 30%
>30% AND
< 80%
>=80
%
10
20
30
40
50
60
70
80
90
100
Discrimination Index



The Discrimination Index distinguishes for each item
Index
of discrimination:
between
the performance of students who did well on the
 The
difference
in who
the %did
of poorly.
people in one extreme group
exam
and
students
minus the % of people in the other extreme group
 Item discrimination scores can range from -1.00 to +1.00
Example
 100 test takers: 20 in top 25 were correct but only 5 in the
lowest 25 students were correct.
 DI = (20-5)/25 = 0.8
Item
Discrimination
(D)
Item Difficulty
High
Med
Low
D =< 0%
review
review
revie
w
0% < D < 30%
ok
review
ok
D >= 30%
ok
ok
ok
Item Analysis Report
Order ID and group number
percentages



counts
The left half shows percentages, the right half counts.
The correct option is indicated in parentheses.
Point Biserial is similar to the discrimination index, but is not based on fixed upper and
lower groups. For each item, it compares the mean score of students who chose the correct
answer to the mean score of students who chose the wrong answer.
Summary
Utilize action verbs to write objectives
 Write your exam items based on the objectives

Tie the clinical vignette to the lead-in
 Choose appropriate options with one best answer
 Avoid technical flaws

Utilize an item checklist to ensure that you have done all
you can to write the best items possible.
 Pretest your items

Establishing Validity and
Reliability
(Groups)
Standard Setting
(Groups)
Graham McMahon
[email protected]
43
Item Discrimination: Examples
Item
No.
Number of Correct Answers in
Group
Item Discrimination
Index
Upper 1/4
Lower 1/4
1
90
20
0.7
2
80
70
0.1
3
100
0
1
4
100
100
0
5
50
50
0
6
20
60
-0.4
Number of students per group = 100
Distracter Analysis: Examples
Item 1
A*
B
C
D
E
Omit
% of students in upper ¼
20
5
0
0
0
0
% of students in the middle
15
10
10
10
5
0
% of students in lower ¼
5
5
5
10
0
0
Item 2
A
B
C
D*
E
Omit
% of students in upper ¼
0
5
5
15
0
0
% of students in the middle
0
10
15
5
20
0
% of students in lower ¼
0
5
10
0
10
0
(*) marks the correct answer.