Chapter_Eleven

Download Report

Transcript Chapter_Eleven

Foundations of
Psychological Testing
Piloting and Revising Tests
Chapter 11
The Pilot Test
• Pilot Test- a scientific investigation of test’s reliability/validity for its
specified purpose
• Purpose of the pilot test is to study how well the test performs
• Set the pilot test in an environment that matches the actual
circumstances in which the test will be used & sample should
resemble the test’s target audience
• Pilot test setting should mirror the planned test setting
• Follow the APA code of ethics (Appendix B – text)
• Conduct the pilot test may require gathering extra data like
measures of performance & length of time needed to complete the
test – may use questionnaires/interviews to gather data from
respondents about the test
• Analyzing the Results by using statistical procedures. Internal
consistency can be estimated. Gather qualitative & quantitative
information to make revisions. Evaluate the performance of each
test item (item analysis)
Quantitative Item Analysis
Item difficulty
• % of test takers who respond correctly (p
value = divide # of correct responses by
the number of total responses)
• Seek an average of .5 (0 - .2 = too difficult
& .8 -1.0 = too easy)
• The p value (percentage value) of item
difficulty indicates the difficulty of the item
Quantitative Item Analysis
Item Discrimination
• Discrimination index allows a measure of
how test takers with a high degree of skill,
knowledge, attitude or personality
characteristics differ from those who
demonstrate little
• D=U-L (uses the upper and lower thirds)
Quantitative Item Analysis
Inter-Item Correlations
• Construct and inter-item correlation matrix
• Code items as dichotomous variables
correct (1) or incorrect (0)
• Phi coefficients result in having only two
values (variables)
• Provides info for increasing the test’s
internal consistency
Quantitative Item Analysis
Item-Criterion Correlations
• Correlation of item responses with a
criterion measure (job performance)
• Empirically based tests –the decision for a
category placement is based on the
quantitative relationship between the
predictor (test score) and the criterion
(possible categories)
Quantitative Item Analysis
Item Characteristic Curves
• Item Response Theory (IRT) – relates the performance
of each item to a statistical estimate of the test taker’s
ability on the construct being measured
• Item Characteristic Curves (ICC) – the line that results
when the probability of answering an item correctly is
graphed with the level of ability on the construct being
measured
• On the ICC difficulty is determined by the location of the
point at which the curve indicates a probability of .5 (5050 chance) for answering correctly
Quantitative Item Analysis
Item Bias
• Used to establish differences in responses
not related to differences in culture,
gender, or experiences of the test taker
• Item bias refers to an item being easier for
one group than for another
• Method involves computation of item
characteristics by group
Qualitative Item Analysis
• Questionnaires for Test Takers
Survey with and open ended format
• Expert Panels
Using information provided by experts
to understand & improve test results
Revising the Test
Choosing the Final Items
• Construct a matrix to select the best items
• See Table 11.3
• Review inter-item coefficients to determine
evidence of internal reliability
Revising the Test
Revising the Test Instructions
• Review qualitative information obtained
from test takers and test administrators
Validation and Cross-Validation
Validation is the process of obtaining evidence that the test
effectively measures what it is supposed to measure
1. Establish content validity as the test is developed
2. Establish construct validity (measures one or more
construct) and criterion-related validity (the ability to
predict an outside criterion)
3. Validation should take place in a one or more settings
that match the actual circumstances in which the test
will be used (more than one test site can provide
evidence of generalizability) - Sample should
resemble the test’s target audience and follow APA
ethics.
4. Purpose of the validation study is to affirm the test’s
ability to yield meaningful results
Validation and Cross-Validation
Differential Validity – when a test yields
significantly different validity coefficients for sub
groups
• Single-group validity- when a test is valid for one
group but not for another
• Practical significance of differential validity –
Single validity coefficient & regression line is
used to establish cutoff scores to select
applicants for admissions, then the equation will
over predict the number of men, women, and
other subgroups that might be successful
Validation and Cross-Validation
Defining Unfair Discrimination
• When group membership changes or
contaminates test scores, then test bias exists, &
members of some group might be treated
unfairly as a result
• Differential validity is not a widespread
phenomenon
• Test developers administer the test items a
number of times to ensure that the test is
effectively measuring the intended construct
Developing Norms and Cut Scores
• Cut Scores – decision points for dividing
test scores into pass or fail groupings
• Norms and cut scores provide information
that assists the test user in interpreting
test results
Developing Norms and Cut Scores
Developing Norms (distribution of test scores)
• Purpose – to provide a reference point for
understanding a score
• Administer tests in various locations to construct
a large database to generate statistics that can
be computed to be used as norms
• As the size of the database grows the statistics
used for norms becomes more stable
Developing Norms and Cut Scores
Identifying Cut Scores
•
Two approaches
1. Expert approach -employ a panel of expert judges who
provide opinions/ratings about the # of test items that a
barely qualified person is likely to answer correctly
(employment tests)
2. Empirical approach – Use the correlation between the
test and an outside criterion to predict the test score
that a person who performs at a minimum level of
acceptability is likely to make
Major problem with setting cut scores is allowing for test
error (SEM)
Developing the Test Manual
The test manual provides:
• The rationale for constructing the test
• A history of the development process
• The results of the validation studies
• Describes the appropriate target audience
• Instructions for administration and scoring
• Contains norms and information for interpretation
• Test reliability and validity
• Limitations for use
• Measurement accuracy