e-Assessment: From Implementation to Practice. Myles

Download Report

Transcript e-Assessment: From Implementation to Practice. Myles

e-Assessment: From
Implementation to Practice
Myles Danson
Farzana Khandia
Loughborough University
Learning Outcomes
• Identify the benefits of online assessment
• Understand the pros and cons of using
different question types
• Construct questions based on the advice
given
• Translate statistical output of questions to
improve the quality
Overview of CAA
“CAA is a common term for the use of
computers in the assessment of student
learning. The term encompasses the use of
computers to deliver, mark and analyse
assignments or examinations.” (Bull et al,
2001, p.8)
Benefits – For Staff
1.
Reduces lecturer administration (marking)
“ ... online assessment is capable of greater flexibility, cost
effectiveness and timesaving. It is these attributes that have made
it appealing in an age of competitive higher education funding and
resource constraints.” McLoughlin, 2002, p. 511.
2.
Supports distance learning assessment
3.
Potential to introduce a richer range of material (audio, visual)
4.
Ability to monitor the quality of the questions
5.
Question reusability
Benefits – For Students
1. They can revise and rehearse at their own
pace
2. Flexible access
3. Instant Feedback
“Action without feedback is completely
unproductive for a learner.” (Laurillard, 1993, p.
61)
4. Alternative form of assessment
Types of CAA - Web
• Authoring via PC
• Delivery on PC / Web
Types of CAA - OMR
• Authoring / delivery on paper
• Marked using technology
• Example
Scenarios of use - Diagnostic
One-off test during start of academic year,
or at intervals throughout the year to gauge:
1. Variety in student knowledge
2. Gaps in student knowledge
3. Common misconceptions
Or help
4. Plan lecture content
Scenarios of use - Formative
• Promote learning by providing feedback
• may wish to use objective tests at regular
intervals within a course to
• determine which topics have been
understood, or
• to motivate students to keep pace with the
teaching of the module
Scenarios of use - Summative
can be used to
• test the range of the student's
understanding of course material
• Norm or Criteria Referenced
• High Risk
Demo
• Lboro Tests
• Bruce Wright Excel test
• Maths practice
Drivers
•
•
•
•
•
•
•
Widening participation (Student diversity)
Increasing student retention
Enhanced quality of feedback
Flexibility for distance learning
Coping with large student numbers
Objectivity in marks / defensibility
QCA / DFES / HEFCE / JISC
QCA
by 2009
• e-Assessment will be rolled out in post-16 education by
2009
• e-Assessment will make a significant contribution to
reducing the assessment burden and improving the
quality of assessment
• e-Assessment field trials should be taking place in at
least two subjects per awarding body during 2005
• 75% of key and basic skills tests will be delivered on
screen by 2005
• All new qualifications will include an option for on-screen
assessment
• All Awarding Bodies should be set up to accept and
assess e-portfolios
Barriers
• Availability of resources
• Lack confidence in e-assessment
• Fear of technical failure high stakes
assessments
• Work pressures on academic staff (insufficient
time to give to developing potential of eassessment)
• Fit for purpose / assessing appropriate levels of
learning?
• Authentication issues e.g. learner identity,
plagiarism
Support and Training
• Student support
– Practice tests and fall back
– Demonstration of the software
– Special needs
• Staff support
– Introductory session
– Follow-up session
– Pre, during and post-examination procedures
Question Design
It’s very easy to write objective questions but
difficult to write good objective questions.
Parts of a MCQ
A multiple choice question consists of four discrete
elements:
As children’s language skills increase in complexity, from
the pre-linguistic phase to telegraphic speech, the
progression is most noticeable in the area of
a.
b.
c.
d.
e.
semantics
intonation
KEY
syntax
inference
clause combinations
STEM
DISTRACTERS
OPTIONS
Writing Stems
• When possible, state the stem as a
direct question rather than as an
incomplete statement.
e.g.
Alloys are ordinarily produced by ...
How are alloys ordinarily produced?
• Avoid irrelevant clues such as
grammatical structure, well known
verbal associations or connections
between stem and answer
e.g.
A chain of islands is called an:
*a. archipelago.
Grammatical
b. peninsula.
clue!
c. continent.
d. isthmus
The height to which a water dam is built
depends on
a. the length of the reservoir behind the
dam.
b. the volume of water behind the dam.
*c. the height of water behind the dam.
d. the strength of the reinforcing wall.
Connection between stem and
answer clue
• Use negatively stated stems sparingly.
When used, underline and/or capitalise
the negative word.
e.g.
Which of the following is not cited as an
accomplishment of the Kennedy
administration?
Which of the following is NOT cited as an
accomplishment of the Kennedy
administration?
• Eliminate excessive verbiage or
irrelevant information from the stem.
e.g.
While ironing her formal, Jane burned her
hand accidentally on the hot iron. This was
due to a transfer of heat be ...
Which of the following ways of heat
transfer explains why Jane's hand was
burned after she touched a hot iron?
• Include in the stem any word(s) that might otherwise
be repeated in each alternative.
e.g.
In national elections in the United States the President is
officially
a. chosen by the people.
b. chosen by members of Congress.
c. chosen by the House of Representatives.
*d. chosen by the Electoral College.
In national elections in the United States the President is
officially chosen by
a. the people.
b. members of Congress.
c. the House of Representatives.
*d. the Electoral college.
• Present a definite, explicit and singular
question or problem in the stem.
e.g.
Psychology ...
The science of mind and behaviour is
called ...
Writing Distracters
•
Use the alternatives "none of the above" and "all of the above"
sparingly. When used, such alternatives should occasionally be used
as the correct response.
•
Ensure there is only one unquestionably correct answer
e.g.
The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
*d. consistency.
The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
d. standardisation.
• Make the alternatives grammatically parallel with each other,
and consistent with the stem.
What would do most to advance the application of atomic
discoveries to medicine?
*a. Standardised techniques for treatment of patients.
b. Train the average doctor to apply radioactive treatments.
c. Remove the restriction on the use of radioactive substances.
d. Establishing hospitals staffed by highly trained radioactive therapy
specialists.
What would do most to advance the application of atomic
discoveries to medicine?
*a. Development of standardised techniques for treatment of
patients.
b. Training of the average doctor in application of radioactive
treatments.
c. Removal of restriction on the use of radioactive substances.
d. Addition of trained radioactive therapy specialists to hospital
staffs.
• Make alternatives approximately equal in length.
e.g.
The most general cause of low individual incomes in the
United States is
*a. lack of valuable productive services to sell.
b. unwillingness to work.
c. automation.
d. inflation.
What is the most general cause of low individual
incomes in the United States?
*a. A lack of valuable productive services to sell.
b. The population's overall unwillingness to work.
c. The nation's increased reliance on automation.
d. An increasing national level of inflation.
• Make all alternatives plausible and attractive to the less
knowledgeable or skilful student.
e.g.
What process is most nearly the opposite of photosynthesis?
a. Digestion
b. Relaxation
*c. Respiration
d. Exertion
a. Digestion
b. Assimilation
*c. Respiration
d. Catabolism
• Vary the position of the correct answer in a random way
Exercise
Item Intimations
Question / Test Performance
• Caution is needed when using statistics
based on small samples.
• External awarding bodies and test
producers aim to trial items with at least
200 candidates to get reliable item
statistics
• If there are fewer than 50 candidates the
statistics will only give an approximate
indication of how the items are working
Statistics for whole test
• Mean
• Standard deviation
• Reliability.
Mean
= Average mark
Expected value:
• 65-75% for most tests (50-75%
acceptable)
• Up to 90% in formative tests
• Mean should be similar to previous years.
Possible causes if Mean is
not as expected:
1. Candidates are more / less able than in past (or
than expected)
2. Candidates are better / less well taught
3. Test is too difficult / too easy
4. Time allowance was too short.
Standard deviation (s.d or σ)
= a measure of the spread of the marks
Standard deviation (s.d. or σ)
Expected value:
• 10-15%
• Will be much lower if mean is high
• S.D. should be similar to previous years.
Possible causes if Standard
Deviation is low:
1. The candidates are genuinely of similar
ability
2. The test is not distinguishing satisfactorily
between the better and weaker
candidates
3. The test covers two or more quite
different topics (or abilities) .
Reliability
= a measure of internal consistency (0 – 1)
Given as a decimal; the theoretical
maximum is 1.0
Expected value:
• above 0.75
Possible causes if Reliability
is low:
1. Too many items with low Discrimination
2. Test is too short .
Example : Beauty Therapy
exam
Number of candidates
151
Max possible score
120
Range of scores
Mean
Standard deviation
K-R 20 reliability
Pass Rate
63 / 112 (53-93%)
96.2 (80.2%)
9.96 (8.3%)
not calculated
96.7%
Statistics for each item:
• Facility
• Discrimination
• Number and % choosing each option
• Mean for outcome
Facility
Also called Difficulty or P-value
= Proportion or percentage of candidates
answering question correctly
= Proportion choosing the key (multiple-choice)
Expected value:
• 40-90% for multiple-choice
• May be lower in other item types.
Possible causes if Facility
below 40:
1. Key is not the best answer
2. Some candidates didn’t have time to answer
3. Topic is too difficult
4. The item is unclear or contains error
5. One or more distractors are unfair
6. Item is too complex.
Possible causes if Facility
above 90:
1. Topic is too easy
2. Wording contains clues
3. One or more distractors are weak.
Discrimination / Facility
Theoretically possible values from –1.0 to +1.0
Shows whether the candidates who answered
this item correctly were the candidates who
generally did better on the whole test
Expected value: +0.2 or above .
Possible causes if
Discrimination is too low:
1. Topic tested is unlike rest of the test
2.
Item is very easy (> 90) or very difficult
(<40)
3.
Item is misleading
4. One or more distractors are unfair.
A negative Discrimination is never acceptable.
Display for each option
A
B*
C
D
Number
10
137 1
3
%
6.6
90.7 0.7
1.9
Mean for outcome
87.7 97.4 74
76
Facility
90.7
Discrimination 0.39
Number and % choosing each
option
Also called Frequency and Percent or ‘Times
answered’
Expected values:
• at least 5% for each distractor
• % for distractor should not be more than %
for the key
Possible causes if distractor
chosen is under 5%:
1. There is a clue in the wording or the
distractor is silly
2. Topic is very easy.
Possible causes if distractor
attracts more candidates than
the key:
1. Item is misleading in some way.
Mean for outcome
Mean (average) mark on the whole test of the
candidates who chose this option.
Expected value:
• Mean for key (or correct outcome) should be
higher than Mean for each of the distractors
(or for incorrect outcome).
Possible causes if Mean for
outcome not as expected:
1. Item wording is misleading or the key
is arguable
2. Item is either very easy or very
difficult.
Recent Innovations
• Confidence Assessment
• Adaptive Testing
• Free Text
www.sagrader.com
www.intelligentassessment.com
• JISC
e-Assessment Glossary
e-Assessment Case Studies
e-Assessment Road Map
• www.caaconference.com/
Questions?
References
• Bull, J. and C. McKenna (2001). Blueprint for
Computer-assissted Assessment, CAA Centre.
• Laurillard, D. (1993). Rethinking university
teaching : a framework for the effective use of
educational technology. London ; New York,
Routledge.
• McLoughlin, C. (2002). "Editorial." British Journal
of Educational Technology 33(5): 511-513.
Further Reading
•
Introduction to e-assessment
–
•
CAA Benefits for Staff and Students
–
–
–
•
Sambell et al (1999) Students’ Perception of the learning benefits of computer-assisted
assessment: a case study in electronic engineering, in: S.Brown, J. Bull & P. Race (Eds)
Computer-assisted assessment in higher education (Birmingham, SEDA), 179-191.
Item Banks
–
•
Brown, Race & Bull (1999), Computer Assisted Assessment in Higher Education (Kogan
Page), 7-8.
Bull and McKenna (2000), Blueprint for CAA (CAA Centre), 8-10.
Ashton, H.S. et al (2003) Pilot summative web assessment in secondary education,
Proceedings of 7th CAA Conference (Loughborough University), 33-44.
Student Acceptance
–
•
Warburton (2005) Whither E-assessment, Proceedings of 9th CAA Conference
(Loughborough University), 471-482.
Sclater, et al (2004) Item Banks Infrastructure Study (JISC).
Future Developments
–
Boyle A (2005) Sophisticated Tasks in E-Assessment: What are they and what are their
benefits? Proceedings of 9th CAA Conference (Loughborough University), 51-66.