4. pp slides: scales and norms

Download Report

Transcript 4. pp slides: scales and norms

Scales, Transformations,
& Norms
Norms
Norm-Referenced Test : one of the most useful ways of
describing a person’s performance on a test is to compare
his/her test score to the test scores of some other persons or
group of people.
• Norms are average scores computed for a large
representative sample of the population.
• The arithmetic average (mean) is used to judge whether a
score on the scale is above or below the average relative to
the population of interest.
Norms (cont.)
• No single population can be regarded as the normative
group.
• a representative sample is required to ensure meaningful
comparisons are made.
• When norms are collected from the test performance of
groups of people these reference groups are labeled
normative or standardized samples.
Norms (cont.)
• The normative sample selected as the normative group,
depends on the research question in particular.
• It is necessary that the normative sample selected be
representative of the examinee and of the research
question to be answered, in order for meaningful
comparisons to be made.
• For example: tests measuring attitudes towards
federalism having norm groups consisting of only
students in the province of Quebec might be very useful
for interpretation regionally in Quebec, however their
generalizability in other parts of the country (Yukon,
Toronto, Ontario) would be suspect.
Sample Groups
Although the three terms below are used interchangeably,
they are different.
Standardized Sample - is the group of individuals on
whom the test is standardized in terms of scoring
procedures, administration procedures, and developing the
tests norms. (e.g., sample used in technical manual)
Normative Sample - can refer to any group from which
norms are gathered. Norms collected after test is published
Reference Group - any group of people against which test
scores are compared. (e.g., a designated group such as
students in 3090.03 or World Champions)
Types of Norms
Norms can be Developed:
• Locally
• Regionally
• Nationally
Normative Data Can be Expressed By:
• Percentile Ranks
• Age Norms
• Grade Norms
Local Norms
• Test users may wish to evaluate scores on the
basis of reference groups drawn from specific
geographic or institutional setting.
For Example
Norms can be created for employees of a particular
company or the students of a certain university.
• Regional & National norms examine much
broader groups.
Subgroup Norms
• When large samples are gathered to represent broadly
defined populations, norms can be reported in aggregate or can
be separated into subgroup norms.
• Provided that subgroups are of sufficient size and fairly
representative of their categories, they can be formed in terms
of:
- Age
- Sex
- Occupation
- Education Level
Or any other variable that may have a significant impact on
test scores or yield comparisons of interest.
Percentile Ranks
• The most common form of norms and is the simplest method of
presenting test data for comparative purposes.
• The percentile rank represents the percentage of the norm group
that earned a raw score less than or equal to the score of that
particular individual.
For example, a score at the 50th percentile indicates that the
individual did as well or better on the test than 50% of the norm
group.
• When a test score is compared to several different norm groups,
percentile ranks may change.
For example, a percentile rank on a mathematical reasoning test
may be lower when comparing it to math grade students, than music
students.
Age Norms
• Method of describing scores in terms of the average or
typical age of the respondents achieving a specific test score.
• Age norms can be developed for any characteristic that
changes systematically with age.
• In establishing age norms, we need to obtain a representative
sample at each of several ages and measure the particular age
related characteristic in each of these samples.
• It is important to remember that there is considerable
variability within the same age, which means that some
children at one age will perform similar to children at other
ages.
Grade Norms
• Most commonly used in school settings.
• Similar to age norms except the baseline is grade level rather
than age.
• It is important to remember that there is considerable
variability within individuals of different grade, which means
that some children in one grade will perform similar to or
below children in other grades.
• One needs to be extremely careful when interpreting grade
norms not to fall into the trap of saying that, just because a
child obtains a certain grade-equivalent on a particular test,
he/she is the same grade in all areas.
Evaluating Suitability of a Normative
Sample
• How large is the normative sample?
• When was the sample gathered?
• Where was the sample gathered?
• How were individuals identified and selected?
• What was the composition of the normative sample?
- age, sex, ethnicity, education level, socioeconomic status
Caution When Interpreting Norms
• Norms are not based on samples that adequately
represent the type of population to which the examinee’s
scores are compared.
• Normative data can become outdated very quickly.
• The size of the sample taken.
Setting Standards/Cutoffs
• Rather than finding out how you stand compared to
others, it might be useful to compare your performance on
a test to some external standard.
For Example - if most people in class get an F on a test and
you get a D, your performance in comparison to the
normative group is good. However, overall your score is
not good.
Criterion-Referenced Tests - assesses your performance
against some set of standards. (e.g., school tests, Olympics)
Cutoff Scores - 1 SD?, 2 SD?
Raw Scores
• Raw scores are computed for instruments using Likert scales
(interval or ordinal) by assigning scores to responses and totaling
the scores of the items.
- For positively phrased items, e.g., “ I think things will turn out right”
5=Always, 4=Often, 3=Sometimes, 2=Seldom, 1=Never
- For negatively phrased items, e.g., “ I think things will turn out right”
1=Always, 2=Often, 3=Sometimes, 4=Seldom, 5=Never
• The raw score would be the sum of the scores for pertinent items.
• The problem with raw scores are that they are fairly meaningless
without some sort of benchmark with which to make a comparison
(e.g., What would a raw score of 30 on an Optimism scale mean?)
Transformations
• Raw scores (i.e., simplest counts of behaviour sampled by
a measuring procedure) do not always provide useful
information.
• It is often necessary to reexpress, or transform raw scores
into some more informative scale.
•The simplest form of transformation is changing raw
scores to percentages.
For Example
If a student answers 35 questions out of 50 correctly on a test, that
student’s score could be reexpressed as a score of 70%.
Linear Transformations
• Changes the units of measurement, while leaving the
interrelationship unaltered.
• An advantage of this procedure is that the normally
distributed scores of tests with different means and score
ranges can be meaningfully compared and averaged.
• Most familiar linear transformation is the z score.
Standard Scores
• Standard scores allow each obtained score to be
compared to the same reference value.
• In order to facilitate comparison between obtained scores
and the scores of other individuals (i.e., the normative
sample), as well as comparison among the various scales
and instruments.
• Standard scores are calculated from raw scores such that
each scale and subscale will have the same mean (or
average) score and standard deviation.
For example, IQ scores are transformed so that the
average score is 100, with a SD of 15.
Z Scores
•A z-score tells how many standard deviations someone is above or
below the mean. Simply put, the mean of the distribution is given the
z value of zero (0) and its standard deviation is counted by ones.
•A z-score of -1.4 indicates that someone is 1.4 standard deviations
below the mean. Someone who is in that position would have done as
well or better than 8% of the students who took the test.
• To calculate a z-score, subtract the mean from the raw score and
divide that answer by the standard deviation. (i.e., raw score =15,
mean = 10, standard deviation = 4. Therefore 15 minus 10 equals 5. 5
divided by 4 equals 1.25. Thus the z-score is 1.25.)
Test Your Knowledge:
Problem 1 –
Assume that a test has a raw-score mean of 62 and a
standard deviation of 9. If the test-taker obtains a raw
score of 71 on the test, what would their z-score be?
Problem 2:
Test has a raw-score mean of 62 and a
standard deviation of 9. Test-taker
obtains a raw-score of 53. What
would be their z-score?
T Scores
T-Scores (or standardized scores) are a conversion
(transformation) of raw individual scores into a standard
form, where the conversion is made without knowledge of
the population's mean and standard deviation.
•
• The scale has a mean set at 50 and a standard deviation
at 10.
T = 50 + l0 x z score
• An advantage of using a T-Scores is that none of the
scores are negative.
Test your knowledge:
Johnny obtains a z-score of -2.0. What
numerical value would his score be if
converted to T-score units?
Area Transformations
• Area transformations do more than simply put scores on a new and
more convenient scale -- it changes the point of reference.
• Area transformations adjust the mean and standard deviation of
the distribution into convenient units.
• Advantages of area transformations are obvious. Out of the infinite
number of possible empirical distributions of test scores, the normal
distribution is most frequently assumed and approximated. It is also
most frequently studied, in considerably greater detail than other
possible test score distributions.
• Normalization thus allows the application of knowledge concerning
properties of standard normal distribution toward the interpretation
of the obtained scores.
Skewness
• Skewness is the nature and extent to which symmetry is
absent.
Positive Skewness - when relatively few of the scores fall at
the high end of the distribution.
For Example - positively skewed examination results may indicate
that a test was too difficult.
Negative Skewness - when relatively few of the scores fall
at the low end of the distribution.
For Example - negatively skewed examination results may indicate
that a test was too easy.
Normal Distribution Curve
• Many human variables fall on a normal or close to normal curve
including IQ, height, weight, lifespan, and shoe size.
• Theoretically, the normal curve is bell shaped with the highest
point at its center. The curve is perfectly symmetrical, with no
skewness (i.e., where symmetry is absent). If you fold it in half at the
mean, both sides are exactly the same.
•From the center, the curve tapers on both sides approaching the X
axis. However, it never touches the X axis. In theory, the distribution
of the normal curve ranges from negative infinity to positive infinity.
•Because of this, we can estimate how many people will compare on
specific variables. This is done by knowing the mean and standard
deviation.
Normal Distribution
The bell-shaped curve has the following properties:
1. bilaterally symmetrical (right and left halves are mirror images)
3. the limits of the curve are plus and minus infinity, so the tails of
the curve will never quite touch the baseline
4. about 68% of the total area of the curve lies between one
standard deviation below the mean and one standard deviation
above the mean
5. about 95% of the total area of the curve lies between two
standard deviations below the mean and two standard deviations
above the mean
6. about 99.8% of the total area of the curve lies between three
standard deviations below the mean and three standard deviations
above the mean.
Standard Deviations
The standard deviation represents the average distance
each score is from the mean.
Use of Standard Deviations with Norms:
• Knowing the average of a population allows for a determination as
to whether a particular respondent scored above or below that
average, but does not indicate how much above or below average the
score falls. Standard Deviation plays a role in this.
• Scores within 1 SD of average are pretty much in the middle cluster
of the population. Scores between 1 & 2 SDs from the average are
moderately above or below the average , and scores 2 SDs from the
average are markedly for above or below the average.