NCSA Presentation 20015 - Artificial Intelligence Scoring

Download Report

Transcript NCSA Presentation 20015 - Artificial Intelligence Scoring

Artificial Intelligence
Scoring
of Student Essays:
West Virginia’s
Experience
June 22, 2015
Vaughn G. Rhudy, Ed.D., NBCT
Office of Assessment
West Virginia Department of Education
West Virginia Writing Assessment
1984-2004
• Statewide writing assessment began in 1984.
• Traditional paper-pencil assessment administered from
1984-2004.
o
o
o
o
o
o
o
Grades 4, 7 and 10
Approximately 20,000 students per grade level
Hand scored
Grade-level rubrics for scoring
Modified holistic scoring on 4-point scale
Four genres – narrative, descriptive, expository, persuasive
Results not included as part of state accountability data
West Virginia Writing Assessment
1984-2004 Grade 4 Rubric
West Virginia Writing Assessment
1984-2004 Grades 7 and 10 Rubric
WV Online Writing Assessment
2005-2007
• Online Writing Assessment from 2005-2007, except grade 4.
• Paper-pencil test in grade 4
o Hand scored
• Computer-based assessment in grades 7 and 10
o Artificial intelligence engine scoring
• Approximately 20,000 students per grade level
• Grade-level rubrics for scoring
• Analytic trait scoring on a 6-point scale
o Five traits – Organization, Development, Sentence Structure,
Word Choice, Mechanics
• Four genres – narrative, descriptive, informative, persuasive
• Results not included as part of state accountability data
• Scores on each analytic trait added to obtain a Summative Score
and Performance Level
WV Online Writing Assessment
2007 Grade 4 Writing Prompt
• Imagine that you are on a magic carpet that takes you
anywhere you want to go. Tell about where you might
go and what you might do.
WV Online Writing Assessment
2005-2007 Grade 7 Rubric
WV Online Writing Assessment
2005-2007 Grade 10 Rubric
WV Online Writing Assessment
Initial Challenges
•
•
•
•
•
•
Bandwidth - Connectivity
Number of testing devices/computer labs
Computer classes in labs
Security updates
Length of testing window
Concerns about keyboarding skills, particularly younger
students
• Validity and reliability of artificial intelligence scoring
engine
WV Online Writing Assessment
Actions
• State and districts increased bandwidth.
• State and districts increased the number of testing
devices.
• Nine-week testing window was established to address
technology concerns and reduce daily testing load.
 Window spanned from February to April.
• From 2005-2007, fourth graders continued paperpencil testing because of concerns about keyboarding
skills.
• State engaged teachers in reviewing computer scoring
to help with teacher buy-in.
WV Online Writing Assessment
New Online Writing Assessment
Field Test - 2008
• Expanded to grades 3-11
o Hand scored
• Approximately 20,000 students per grade level
• Grade-level rubrics for scoring
• Analytic trait scoring on a 6-point scale
o Five traits – Organization, Development, Sentence
Structure, Word Choice/Grammar Usage, Mechanics
• Four genres – narrative, descriptive, informative and
persuasive
o Only narrative and descriptive at grade 3
• Passages added to prompts
• Results not included as part of state accountability data
WV Online Writing Assessment
2008 Field Test
• New prompts with passages
• 136 prompts field tested – 2 genres at grade 3, 4
genres at grades 4-9, 4 prompts per genre
• 2 operational prompts selected per genre
• New grade-specific, 6-point analytic writing rubrics
• All student essays were hand scored
• State staff and selected teachers participate in rangefinding
• Hand scored essays used to training new AI scoring
engine
WV Online Writing Assessment
2009-2014
Sample
Grade 3
Descriptive
Writing
Prompt
WV Online Writing Assessment
This is where you will begin typing your essay. At the end of the paragraph, hit the
enter key at least once to skip a line between paragraphs.
Do not hit the tab key to indent your paragraph. It will not work.
WV Online Writing Assessment
2009-2014 Grade 7 Rubric
WV Online Writing Assessment
Grade 3 Student Survey
• 85 percent of grade 3 students indicated they
preferred writing their essays on the computer than
using traditional paper-pencil.
WV Online Writing Assessment
WESTEST 2 Online Writing
Assessment – 2009-2014
• Grades 3-11
o Artificial intelligence engine scoring
• Approximately 20,000 students per grade level
• Grade-level rubrics for scoring
• Analytic trait scoring on a 6-point scale
o Five traits – Organization, Development, Sentence
Structure, Word Choice/Grammar Usage, Mechanics
WV Online Writing Assessment
WESTEST 2 Online Writing
Assessment – 2009-2014
• Four genres – narrative, descriptive, informative and
persuasive
o Only narrative and descriptive at grade 3
• Passages/prompts
• Results not included as part of state accountability data
• Online formative assessment practice program available
for schools to use
WV Online Writing Assessment
Later Challenges
• Bandwidth/Connectivity
o Continued in some districts and schools but improved
overall
• Number of testing devices/computer labs
o Continued in some districts and schools but improved
overall
• Computer classes in labs
o Continued to be an issue but improved overall
• Browser updates
o Test platform only allowed the use of Internet Explorer
o Microsoft auto updates sometimes created problems
• Accuracy and reliability of AI scoring in practice program
o Created lack of confidence in summative scoring engine
WV Online Writing Assessment
Formative Assessment Practice
Program – 2009-2014
• Writing Roadmap – shelf product
o Shelf prompts
o Shelf rubric
o AI scoring
• West Virginia Writes – customization of Writing
Roadmap for West Virginia
o WV passages and prompts (field tested)
o WV writing rubrics
o Student responses from field test used to train AI
engine
WV Online Writing Assessment
AI Scoring Challenges
• Teacher buy-in and understanding of AI scoring
• Field testing sufficient number of prompts
o WV lost some prompts during psychometric analysis
resulting in the need to repeat prompts in alternate
years
• Rubric development for use in AI scoring
• Initial hand scoring
• Range finding
• Training sets
• Sufficient number of student responses to train engine
o Particularly finding sufficient number of student
responses scored in the high range
Artificial Intelligence Scoring
Scoring Reliability
• Validation Papers/Iterations
• Second Reads
• Comparability Studies
Artificial Intelligence Scoring
Importance of Comparability
• Engine to Professional Hand Scorers
• Engine to West Virginia Teachers
WV Online Writing Assessment
Vendor Validation
WV Online Writing Assessment
WV Comparability Studies
WV Online Writing Assessment
Benefits of Teacher Participation
• Professional development in using rubrics for hand
scoring of student essays
• Improvement of instructional practices
• Teacher buy-in of artificial intelligence scoring
Artificial Intelligence Scoring
Considerations
• Involve teachers in prompt and rubric
development
• Pilot testing and field testing important
• Sufficient number of prompts should be included
in field test depending on sample size
• Include teachers in range finding
• Sufficient number of essays at each score point
necessary to train engine, particularly for highest
score point
Artificial Intelligence Scoring
Considerations
•
•
•
•
•
•
•
•
Quality of training sets important
Engine must be calibrated to the scoring rubric(s)
Engine training is key
Vendor validation and read-behind studies
State comparability studies with state teachers
Ongoing engine training to account for potential drift
Provide practice program for teachers and students
Professional development for state teachers
Artificial Intelligence Scoring
Scoring Strengths and Weaknesses
Human Scoring
Engine Scoring
Scoring accuracy dependent on Scoring accuracy dependent
training
on training
Get tired, hungry, bored
Doesn’t get tired, hungry,
bored
Individual Scorer Bias
No Bias
Easier to train – quicker
More difficult to train- timeconsuming
Can make inferences
Has difficulty with inferences
Slow, expensive scoring
Quick, less expensive scoring