LouTalk - Measured Progress

Download Report

Transcript LouTalk - Measured Progress

Exploring Skills Diagnostic
Opportunities at Measured Progress
Lou DiBello
William Stout
Meetings with Measured Progress
February 11-12, 2008
I. Overview, Goals Purpose




Establish a clear conceptual framework
and language for understanding and
discussing diagnostic assessment
Identify practical steps for developing
diagnostic assessments
Consider challenges
Explore possibilities for collaborative
work between IAI or AIARE and MP
Learning Science Research Institute--UIC--Informative Assessment Initiative
2
Presentation Agenda






I. Overview, goals, purpose
II. Background
III. Assessment as evidentiary system
IV. Practical Steps and Challenges
V. Possibilities for Collaborative Work
VI. Wrap-up
Learning Science Research Institute--UIC--Informative Assessment Initiative
3

II. Background
Learning Science Research Institute--UIC--Informative Assessment Initiative
4
Who we are


Our primary expertise is theoretical and
applied psychometrics
Our primary interest is broader: to develop
the engineering science of diagnostic
assessment

In addition to science and theory, we are
focused on practical issues: costs, production,
sustainability, scalability, implementation,
evaluation, dissemination
Learning Science Research Institute--UIC--Informative Assessment Initiative
5
Who we are-Bill




Bill is Professor Emeritus, Dept. of Statistics,
University of Illinois at Urbana-Champaign
Co-lead of Informative Assessment
Initiative in the Learning Sciences Research
Institute, University of Illinois at Chicago
Co-founder of (LLC) Applied Informative
Assessment Research Enterprises (AIARE).
Past director of ETS External Diagnostic
Research Team (the X Team)
Learning Science Research Institute--UIC--Informative Assessment Initiative
6
Who we are-Lou




Lou is Co-lead of Informative Assessment
Initiative
Research Professor and Associate Director of
Learning Sciences Research Institute,
University of Illinois at Chicago
Co-founder of (LLC) Applied Informative
Assessment Research Enterprises (AIARE)
Former Director ETS Profile Scoring Initiative
—Contract Manager for the X Team
Learning Science Research Institute--UIC--Informative Assessment Initiative
7
Who we are-Bill and Lou

Bill:



Distinguished psychometrician; past president of
the Psychometric Society
NCME scientific award winner for foundational
work in skills diagnostic modeling, dimensionality,
and item and test bias detection
Lou:


Recently served as a research director within the
testing industry
Directed effort to operationalize diagnostic
assessment for a large scale operational
assessment
Learning Science Research Institute--UIC--Informative Assessment Initiative
8
Our Affiliations

Informative Assessment Initiative (IAI)




One of three initiatives that make up the Learning
Sciences Research Institute (LSRI) at UIC
LSRI directed by Jim Pellegrino & Susan Goldman
The other two initiatives are Cognitive Science and
Math and Science Education
Applied Informative Assessment Research
Enterprises (AIARE); a new LLC that owns
and licenses Arpeggio software
Learning Science Research Institute--UIC--Informative Assessment Initiative
9
Joint Work: Bill, Lou (& Louis)





Pursuing research and development at the forefront
of a new skills diagnostic psychometric research area
Invited co-editors of an upcoming special issue of the
Jour. of Educational Measurement on skills diagnosis
Served as invited co-authors of a foundational paper
on psychometric approaches to cognitive diagnostic
assessment, just published in the Handbook of
Statistics (DiBello, Roussos & Stout, 2007)
Other publications in refereed academic journals
Directed numerous research and development
projects, both within academia and private sector
Learning Science Research Institute--UIC--Informative Assessment Initiative
10

III. A View of Assessments
as Evidentiary Systems
Learning Science Research Institute--UIC--Informative Assessment Initiative
11
A view of Assessment Design


Assessment as Evidentiary System
Assessment design is deciding: “…how
one wants to frame inferences about
students, what data one needs to see,
how one arranges situations to get the
pertinent data, and how one justifies
reasoning from the data to inferences
about the student.“ Junker
Learning Science Research Institute--UIC--Informative Assessment Initiative
12
Integrated Classroom or
Learning Environment
Instruction
Assessment
Curriculum
Learning Science Research Institute--UIC--Informative Assessment Initiative
13
Integrated Classroom or
Learning Environment
Instruction
Observation
Interpretation
Assessment
Curriculum
Cognition
Learning Science Research Institute--UIC--Informative Assessment Initiative
14
Assessment Triangle-Pellegrino et al
Observation
Interpretation
Assessment
Cognition
Learning Science Research Institute--UIC--Informative Assessment Initiative
15
Comprehensive View of Assessment

Assessment conceptually involves:



Cognition
Curriculum design
Instruction





Teaching practice
Teacher preparation
Psychometrics
Assessment design
Testing Industry Marketing and Implementation
Learning Science Research Institute--UIC--Informative Assessment Initiative
16
Validity—Thinking about
Assessment Quality and Value




Level 1: test design was soundly based on
cognitive principles—”inner” and “outer”
Level 2: test meets quantitatively defined
requirements for internal diagnostic quality
Level 3: independent confirmation, outside
the test, demonstrates that test-based
diagnostic skills inferences are accurate—
includes protocol studies and criterion validity
Level 4: consequential validity: proper use
of assessment and differential instruction leads
to improved teaching and learning
Learning Science Research Institute--UIC--Informative Assessment Initiative
17
Validity Studies




Level 1: design –expert analyses
Level 2: internal diagnostic quality—gather
data and compute reliability and fit
Level 3: independent confirmation—
includes protocol studies and criterion validity
Level 4: consequential validity—studies of
learning outcomes, teacher practices, teacher
preparation
Learning Science Research Institute--UIC--Informative Assessment Initiative
18
Practical Assessment Validity


Assessment validity provides a conceptual
framework for thinking about diagnostics
Validity studies are expensive, and it is not
practical to address very many of the aspects
of validity at once. A reasonable strategy is to
identify specific validity targets to address as
part of diagnostic development and stage
them over time
Learning Science Research Institute--UIC--Informative Assessment Initiative
19

IV. Practical Steps and
Challenges in Developing
Successful Skills
Diagnostic Assessments
Learning Science Research Institute--UIC--Informative Assessment Initiative
20
Implementation Paradigm






Describe assessment purpose
Describe a model for the skills space
Develop and analyze the assessment items
Specify an appropriate psychometric model
linking observable performance to latent skills
Select statistical methods for model
estimation and evaluating the results
Develop methods for reporting assessment
results to examinees, teachers, and others
Learning Science Research Institute--UIC--Informative Assessment Initiative
21
Walking through the Steps

The next few slides walk through the
steps of the Implementation Paradigm:






Purpose
Skills space
Tasks/items
Formative Reports
Psychometric Model: Fusion Model
Model calibration: Arpeggio
Learning Science Research Institute--UIC--Informative Assessment Initiative
22
Diagnostic Assessment Purposes



Provide timely information about students’
learning and understanding
Support teachers, learners, parents
Support teacher actions, decisions, planning




track students’ progress toward standards
diagnose deficiencies
group by skill profiles for instruction and practice
Curriculum evaluation and planning
Learning Science Research Institute--UIC--Informative Assessment Initiative
23
Skills Framework



A cognitive diagnostic model (e.g. the Fusion
Model) requires item-skills links as input
The skills framework=set of skills selected for
measurement and reporting
For K-12 classrooms, the skills must be:



aligned with standards and curriculum
aligned with teacher actions
supportable statistically
Learning Science Research Institute--UIC--Informative Assessment Initiative
24
Q matrix—encodes the
skills required for each item
Items=rows
Skills=columns
1

1
0

Q  0
0

0
0

0 0 1 0

0 1 1 0
1 1 1 0

1 1 1 0
1 1 1 0

0 0 0 1
0 0 0 1 
7x5 matrix
For example: Item 2 requires
skills 1, 3, and 4
Learning Science Research Institute--UIC--Informative Assessment Initiative
25
Skills Example—PTS3 Reading

A good starting point for PTS3 Reading
skills is:




Skill
Skill
Skill
Skill
1:
2:
3:
4:
Literary
Informational
Comprehension & Analysis
Reading Process & Language Skills
Learning Science Research Institute--UIC--Informative Assessment Initiative
26
PTS3-Math Initial Skills

A good starting point for PTS3 Math
skills is:




Skill
Skill
Skill
Skill
1:
2:
3:
4:
Numbers and Operations
Algebra
Geometry & Measurement
Data Analysis & Probability
Learning Science Research Institute--UIC--Informative Assessment Initiative
27
Skills—Practical Constraints



Alternative skills representations may be
supported within the substantive literature
Theory may suggest that 100 skills influence
performance within a particular mathematics
test domain. A 50 minute assessment cannot
accurately measure 100 skills, and teachers
could not manage diagnostic 100-skill profiles
for each student
Skills must be simultaneously comprehensive, of
“coarse” granularity, aligned with standards,
curriculum and instruction
Learning Science Research Institute--UIC--Informative Assessment Initiative
28
Skills Pragmatics—Focus

Developing skills frameworks is usually a
creative act. A small number of foundational
or core skills must be determined that are:




Important and useful to measure
Statistically supportable by the assessment
So that other skills can be ignored with impunity
Think of this as focusing the assessment
design in light of the diagnostic purpose—
assumptions about what to measure and
what to “ignore”
Learning Science Research Institute--UIC--Informative Assessment Initiative
29
Diagnostic “Score Reports”



A key component of diagnostic assessment is the
“score report, ” construed broadly as any and all
information presented to users as a result of
assessment performance
A diagnostic assessment reports a profile of scores
such as mastery/nonmastery on each skill
In addition, the score report can and should include
information that promotes better teaching and
learning:



possible action steps for teacher or learner
suggestions to student for improvement
interpretive information
Learning Science Research Institute--UIC--Informative Assessment Initiative
30
Score Reporting Statistics

An Arpeggio analysis produces
(as noted in Bill’s Monday presentation):







Item/skill level parameters
For each student a posterior probability of mastery for each
skill
For each student, a classification of master/non-master for
each skill based on the above posterior probability
Examinee probability distribution on the skill space
Estimates of skill classification accuracy
Fit statistics
The skills profiles are based on 2nd and 3rd above
Learning Science Research Institute--UIC--Informative Assessment Initiative
31
Skills Classification Accuracy

The Fusion Model and Arpeggio provide
several estimated indices of skills
classification accuracy or reliability:




CCR=individual skill correct classification rate
TCR=test-retest consistency rate (like classical
reliability)
Skill Pattern correctness or consistency rates
As is the case for standard unidimensional
IRT reliability, these measures are internal to
the model and the data—no external criteria
Learning Science Research Institute--UIC--Informative Assessment Initiative
32
Evaluating the assessment

Once the model is calibrated, we estimate the
skills classification accuracy and calculate
certain measures of fit that are directly
relevant to the diagnostic purpose of the
assessment. Both reflect on:




Which skills are selected and their definitions
Skill codings in Q matrix
Model suitability
Statistical analysis procedures employed
Learning Science Research Institute--UIC--Informative Assessment Initiative
33
Model-Data Fit

We evaluate model-data fit by computing fit
indices directly relevant to the diagnostic
purpose. Considering MCMC convergence, item
parameter values and fit, we examine:




Are items appropriate and of “good quality”
Are skills framework and Q matrix appropriate
Is the test “well designed”—enough good items for
each skill; no fatal information-blocking in the Q
matrix; good alignment between difficult items and
difficult skills; other aspects of good design
Are any aspects of the model suspect
Learning Science Research Institute--UIC--Informative Assessment Initiative
34

V. Possibilities for
Collaborative Work
Learning Science Research Institute--UIC--Informative Assessment Initiative
35
Status of Diagnostic Research



DiBello and Stout have collaborated with
other researchers, including Louis Roussos
Their studies provide a scientific and applied
foundation for cognitive diagnostic research
The IRT based skills-diagnostic Fusion Model
has been developed, along with software
called Arpeggio for calibrating the Fusion
Model that employs the Markov Chain Monte
Carlo (MCMC) statistical methodology
Learning Science Research Institute--UIC--Informative Assessment Initiative
36
“X Team” 3-year R&D output


Arpeggio R & D was directed within ETS by DiBello
and externally by Stout
46 Research Studies 18 studies on modeling issues







4 studies on skills-level linking methods
4 studies on skills-level reliability
2 studies on techniques for data-model fit
10 applied studies
8 theoretical studies backing the algorithms
5 Descriptions of Algorithms and sw code
12 Sets of user documentations
Learning Science Research Institute--UIC--Informative Assessment Initiative
37
Assets and Resources


Estimated $12M of investment underlies the
development of Arpeggio software system and
underlying theory, research studies, analyses
Resources:




Informative Assessment Initiative within LSRI-UIC (IAI),
Applied Informative Assessment Research Enterprises
(AIARE)-LLC
Ownership of Arpeggio software and broad rights to
license patent
Louis Roussos of MP is a major researcher,
inventor, collaborator, developer of Arpeggio
Learning Science Research Institute--UIC--Informative Assessment Initiative
38
Current Status of Arpeggio



AIARE owns copyright and trademark to
all Arpeggio software and has
unconstrained access to patent rights,
including right to license them to others
Practical reality: freedom to fashion any
agreement that is mutually beneficial to
MP and AIARE
ETS is guaranteed a share of royalties
Learning Science Research Institute--UIC--Informative Assessment Initiative
39
IAI Current Activities (as
background)






NSF project to do formative assessment using
established math curricula ($3M-funded)
IES proposal for classroom assessment($2M-applied for)
More grants likely to be applied for concerning skills
level formative and embedded assessments (testing as
integral part of curricular learning process)
Upgrade and expand capabilities of Arpeggio and the
Fusion Model (technical grant proposals planned)
Develop, upgrade, and disseminate the engineering
science of diagnostic assessment in educational settings
Work with testing companies, such as ETS and CTB
Learning Science Research Institute--UIC--Informative Assessment Initiative
40
IAI Project Ideas (some
maybe of interest to MP)





Developing Specific Diagnostic Assessments &
Pilot Trials
The Practice of Developing Lists of Skills for
Diagnostic Measurement & Reporting
Assessment-Curriculum-Instruction Linkages
Diagnostic Validity Studies
Foundational and Applied Psychometric
Diagnostic Research
Learning Science Research Institute--UIC--Informative Assessment Initiative
41
Diagnostic Assessment Design



Develop diagnostic scoring capability for PTS3
and other existing tests
Design new diagnostic tests
Needs and capacity analyses



What market needs exist
How might diagnostic assessment help teachers
and learners, directly in the classroom, indirectly
through summative or accountability tests
What capacity do teachers and curricula have to
incorporate and use diagnostic assessment
Learning Science Research Institute--UIC--Informative Assessment Initiative
42
Planned Foundational and Applied
Diagnostic Psychometric Research








Diagnostic Modeling
Skills-level assessment accuracy
Model-data Fit
Computational speed and performance
Efficacy Studies
Group-level diagnostic survey testing a la NAEP
Embedded Assessments
Growth Modeling
Learning Science Research Institute--UIC--Informative Assessment Initiative
43
Concrete Possibilities with MP


Proposal is that Measured Progress and we
explore possible cooperation that can aid MP
bring to fruition its strong interest in skills
diagnosis
Seems like a superb opportunity to pursue





Turn PTS3 in stages into a skills diagnostic test
Grants/contracts
Collaboration on research projects of joint interest
Explore diagnostic applications to state tests
AERA/NCME proposals
Learning Science Research Institute--UIC--Informative Assessment Initiative
44
VI. Wrap-up

We are mapping the dimensions of what we envision
as a new engineering science of diagnostic
assessment



Focused on supporting teachers and learners, school
districts, state departments of education
With due attention to sustainability and scalability to
support commercial and operational success
As a natural mode of dissemination, we are appealing
especially to testing companies interested in
assessment products and services that improve
teaching and learning
Learning Science Research Institute--UIC--Informative Assessment Initiative
45
Discussion


Discussion
Next Steps
Learning Science Research Institute--UIC--Informative Assessment Initiative
46