Transcript Slide 1

Appendix 3
Statistical Properties of Standardized Tests:
How to Interpret a Child’s Test Score
Beate Peter
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
1
Standardized Tests in Clinical Practice
• SLPs use standardized tests routinely as part
of a comprehensive assessment
• The test yields raw scores plus various
standardized scores
• These scores need to be interpreted and
incorporated into clinical decisions
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
2
Articulation and Phonology Tests
Tests vary along the following parameters:
• Purpose: articulation or phonological processes
• Target sounds: consonants only, consonants plus
rhotic vowels, consonants plus all vowels
• Weighting of sounds
– sample all sounds in all possible word positions, sum
the errors
– Sample all sounds in all possible word positions, then
weight the errors by how frequently the errored
sound occurs in spoken language
– Norming sample
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
3
Constructing a Hypothetical
Standardized Test of Articulation
• Raw scores
• Premise: children’s abilities change as a function
of age; the trajectories differ for boys and girls
– Give the test to, say, 400 children
• Within given age ranges
• Separately for boys and girls
• The hypothetical test scale ranges from 0 to 50
Figure A3.1 Dot plot of hypothetical test scores
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
4
Figure A3.2 Histogram of test scores
consolidated into 14 bins
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
5
Descriptive Statistics
• Mean: What was the average of the raw scores?
– Sum up the raw scores from all children, then divide by the
number of children
• Variance: How widely spread were the scores (in units
of squared test scores)?
– Take a child’s raw score, compute the difference to the
mean, then square that difference (so it’s always positive),
then do the same for all the other children’s scores and
add them up, then divide that sum by the number of
children (minus 1)
• Standard deviation: How widely spread were the scores
(in units of test scores)?
– Take the square root of the variance
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
6
n
m=
åx
i =1
Mean
NOVA
n
(x
å
=
n
s
2
i =1
i
- m)
2
n -1
1
n
2
2
s =
(x
m)
å i
n -1 i =1
s=
Variance (Average sum of squares)
1
n
2
(x
m)
å
n -1 i =1 i
Standard deviation
Comprehensive Perspectives on Child Speech Development and Disorders
7
More Information About Norming
Samples
• Norming distributions do not necessarily follow a
normal distribution
– Skewness is a measure of asymmetry
• Negative skew: left tail is longer than the right tail
• Positive skew: right tail is longer than the left tail
– Kurtosis is a measure of how flat or peaked the
distribution is
• Platykurtic distributions: flatter than a normal distribution
• Leptokurtic distributions: higher and narrower than a normal
distribution
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
8
Standardized Test Scores
•
•
•
•
Z score: A normed test score in units of standard deviations, e.g., -1,2 or 0.5
Standard score: A linear transformation of the z score to one of several available
normed scales (e.g., mean 100 [SD = 15]; mean = 10 [3]; mean = 50 [8])
T score: A linear transformation of the z score to convert it to a scale that is nearly
always positive (multiply by 10, add 50)
Percentile: Out of 100 hypothetical random observation, X were lower than the
tested individual.
– Example: “Ella obtained a percentile ranking of 29.” = If Ella were one of 100 children taking
the test, 29 would obtain a lower score than Ella.
•
Confidence interval: A measure of how reliable the obtained test score is.
– Example: “Ella’s standard score falls into a 90% confidence interval of 87 to 94.” = If the test
were to be repeated 100 times, 90 times the true score would be found in the interval
between 89 and 94. Higher confidence of 95% usually requires a broader interval.
•
Age equivalent: The median score that children of a certain age obtained.
– Example: “Kyle, age 5;6 (years/months) obtained a raw score of 37, with an age equivalent of
4;6.” = Kyle’s performance was as high as that of half the children in a sample at age 4;6,
which looks like he is delayed in his abilities.
– Bear in mind that the median score of a group of same-age children is not a meaningful way to
describe the performance of a tested child.
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
9
The Quality of a Standardized Test
• For some tests, the score distribution for an age/sex cohort does
not follow the normal (Gaussian) distribution. In that case, a
standard score of 100 does not necessarily correspond to the 50th
percentile.
• Norming samples vary, so be sure and look under the hood to learn
to which cohort you are comparing your proband child. Here are
just some of the variables:
–
–
–
–
–
–
–
NOVA
Size
Age
Sex
Ethnicity
SES
Geographical regions
Inclusion or exclusion of children with disabilities
Comprehensive Perspectives on Child Speech Development and Disorders
10
Comparing Some Norming Samples
Test
Sample Size
Ages
Norms
by Sex?
Children with Disabilities
Arizona-3
2,758 Males
2,758 Females
1;6 – 18;11
No
No information
CAAP
760 Males
859 Females
+ 88 Canadian
Males and
Females
2;6 – 8;11
No
Children with speech or language
deficits were excluded
GFTA-2
1,175 Males
1,175 Females
2;0 – 21;11
Yes
Included representative proportions
of speech, language, learning, and
cognitive disabilities
HAPP-3
452 Males
432 Females
3;0 – 7;11
No
Included representative proportions
of children with phonological and
other disabilities
SPAT D-3
1,151 Males
3;0 – 9;11
1, 119 Females
Yes
No information
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
11
Validity and Reliability
• Validity: What the test measures
– Concurrent validity: Correlations with other closely or distantly related
tests; tests administered to the same individuals at the same time
– Predictive validity: Correlations with other tests administered to the
same individuals later
– Construct validity: Correlations with measures that are known to
target the same ability or trait
• Reliability: How the test performs its purpose
– Inter-rater reliability: Correlation between tests scores obtained by
different clinicians
– Test-retest reliability: Correlation between test scores obtained in a
first and second administration of the test
– Internal consistency: Correlations between test scores from two halves
of the test items
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
12
From Standardized Test Scores to
Clinical Decisions
• How should standard scores be interpreted?
• How can they be used in clinical decisions?
• The answers depend, in part, on given guidelines and resources
– Most agree that scores > -1 standard deviation do not qualify for treatment
– Some settings (by federal, state, or local guidelines) require a score < -1.5
standard deviations (about 7th percentile) or even -2 standard deviations
(about 2rd percentile)
– In some settings, the nature of the speech errors determines whether a child
qualifies for services, e.g.,
• Must have speech sounds across at east two classes in error
• Must have difficulty saying his/her own name
• Must demonstrate reduced access to/benefit from instruction at school
• Rules for qualifying a child for treatment depend on
– Other clinical observations
• Nature of speech errors
• Overall profile of strengths and needs
• Impact of the disorder in daily life
NOVA
Comprehensive Perspectives on Child Speech Development and Disorders
13