Measurement – class 11

Download Report

Transcript Measurement – class 11

Measurement – class 11
Education (2/2)
A. Measuring students skills: PISA
Measuring students skills: PISA
• Cf. class 10: OECD became the dominant
source for measurement of education
understood as “outcome”: individual skills
• 1995: International Adult Literacy Survey
• 2000: PISA
• Cornerstone of these measurement tools:
measuring skills, not performance
Measuring students skills: PISA
• Psychometry:
– Measured score = real score +measurement
error
– Historically,
attempts
“intelligence” (IQ)
at
measuring
– The skill is the “hidden factor”, partially
revealed by tests
Measuring students skills: PISA
Measuring students skills: PISA
• Germany: “PISA shock” (huge debate)
• England: “Are we not such dunces after
all?” (Times, dec. 5, 2000)
• Again, much debate about rankings…
• So what does PISA tell us, and what does
it not?
Measuring students skills: PISA
• Goal = measuring ability to use knowledge
in a practical setting. « PISA’s aim of
tapping students’ preparedness for life. »
• Ex: reading = « The capacity of an
individual to understand, use and reflect
on written texts in order to achieve one’s
goals, to develop one’s knowledge and
potential, and to participate in society. »
Measuring students skills: PISA
• Math = « The capacity of an individual to
identify and understand the role that
mathematics plays in the world, to make
well-founded judgements and to use and
engage with mathematics in ways that
meet the needs of that individual’s life as a
constructive, concerned and reflective
citizen. »
Measuring students skills: PISA
• Items are designed by experts, designed
to measure certain skills
• Then, ranked on a difficulty scale
according to % of test students who
succeeded
• Actual survey students (not the same
students) are then given a score
• The student’s score in turn predicts the
probability of success on items of each
difficulty level (5 levels)
Measuring students skills: PISA
• Items are designed by experts, designed
to measure certain skills
• Then, ranked on a difficulty scale
according to % of test students who
succeeded (not the same students)
• Evaluated students are then given a score
• The score in turn predicts the probability of
success on items of each difficulty level
Measuring students skills: PISA
Measuring students skills: PISA
• Scores are standardized:
– Average score of OECD Countries = 500
– Standard deviation = 100 (20% of m)
Measuring students skills: PISA
• Items are tested in different countries,
eliminated if present a cultural, gender…
bias
• In the end, cultural bias?
• Convincingly argued that no:
– « Stimuli » are 15% longer in French, but
French Canadians do as well as the best
English speaking Canadian states
– Differences such as US / Eng Canada
– Type of question? Not really (the French
score well on item choice questions)
Measuring students skills: PISA
• Other sources of bias?
• Only the 15-year-olds currently attending
school
– UK: 95 %
– France: 90 %
– Brazil: 55 % (OECD partner state)
– Mexico: 54 % (OCDE member)
Measuring students skills: PISA
• Other sources of bias?
• Target population: remote schools,
disabled youth or non-native speakers
could be left out, but within less than 5% of
the population
• But some countries left out more than 5%
• Response rates? Very low in UK in 2000
and 2003 => results excluded from PISA
international comparisons
Measuring students skills: PISA
• Main source of uncertainty: sampling error
• Most countries: +/- 5 points on average
score
• Ex France (Grenet, working paper):
– Ranked somewhere between 18 and 28 (on
56)
– Mean score not statistically significant from 13
other countries’ (on 55)
Measuring students skills: PISA
• Differences
– between France and Australia = 32 points
– btw Australia and Argentina = 136 points
– Difference between 2 levels = 70 points
Mesurer les compétences: PISA
• Educational context
– « skills » and not contents of a curriculum but
the 2 are linked (skills are taught at school,
even if only a little!)
– Ex: 1st notions of probability come during 5th
year of high school in France
Mesurer les compétences: PISA
15-year olds participating in PISA 2003
6th year of high school ("1ère") = "early"
2%
5th year of high school = "on time"
57%
4th year of high school (repeated once)
34%
3rth year of high school (repeated twice)
5%
other
1%
• 15 year olds sampled by PISA did not all receive
the same education => what is being measured
isn’t the quality of education but many things
(grade repetition). Ex: probabilities
Mesurer les compétences: PISA
• Huge gap: ex. on Reading scale, PISA 200
– « on time » students in non-vocational classes
score 560 on average (= Finnish score)
– Students having repeated once: 430 on
average = bottom of international rankings
• PISA sheds light on these structural
differences and asks relevant policy
questions
Mesurer les compétences: PISA
• Motivation effect: PISA 2003 asked
students to measure their effort in
answering the questionnaire, on a 1 to 10
scale
– French students: average 7/10
– 40th / 41 participating countries…
Mesurer les compétences: PISA
• Binet’s joke (1 of designers of the IQ test):
« What is intelligence? Well, it’s what my
test measures! »
• DESECO program (1999): asked nonpsychologists (philosophers, sociologists,
economists, ethnologist) what skills were
needed to succeed in today’s world
= need for a definition of what is being
measured, outside of the definition of the
measurement tool
A word on non response
•
PISA: 3 causes at least
– Student can’t answer => measurement OK
– Student didn’t have time => intensity of the
test, not skill => must be taken into account
in scoring
– Student didn’t even try => motivation effect,
measurement pb
•
Relatively OK for assesment of skills
within school context
– Students are used to comply. Pb with adults!
A word on non response
•
In general
– Total non response: no survey
– Partial non response: parts of the survey,
items
•
•
Total non response high  probable bias
Partial non response:
– what does it say?
A word on non response
•
Refusal to answer some questions
because too private = not most frequent
case
– Ex: income. From brackets to actual amount
– Ex: questions on divorce, getting along with
partner…
•
Very often: meaningless question for the
respondent (Bourdieu)
– Ex: « what do you think of the US Foreign
Policy regarding Cuba? »
A word on non response
•
•
Answering an opinion question
meaningfully means having an opinion =
having already thought about the
question
Conclusion for survey design and
reading survey answers
– Always leave the possibility to answer
« don’t know » vs. « refusal to answer »
– When reading tables /regression results,
check the number of respondents
C. Measuring adults skills: IALS
Measuring adults’ skills: IALS
• IALS : 1995. A case in point of
« measurement failure »
• Idea: same as what was said in part A. on
PISA, but on adults.
• Items of various difficulties, level of the
individual = level of questions he answers
with .8 probability
• One single scale of literacy. Level 1 = has
difficulties ~ « illiterate »
Measuring adults’ skills: IALS
• France left the IALS project and inflicted
total censorship on the results
• Except that France was left in appendix
tables  articles like the IHT one
• How did that happen?
Measuring adults’ skills: IALS
Measuring adults’ skills: IALS
• Sampling method = random path  dwelling
– Adresses are sampled from a sampling frame
(census, phonebook…).
– To avoid bias due to dwellings not in the sampling
frame, another dwelling is actually sampled, with an
itinerary to go from the first one to the actual one
– Ex: « start from 48, bd Jourdan. Head North, take the
second street to the left, count 5 buildings on the right
side, enter the 6th one. Go to the 3rd floor and
choose the 2nd aparatment on the right side of the
corridor »
Measuring adults’ skills: IALS
• The Kalton, Lyberg, Rempp audit (1995) :
• Replacement rates (replace the protocol
dwelling by another) = very high
• Refusal = 45% (in addition to absent
households, etc)
Measuring adults’ skills: IALS


The Kalton, Lyberg, Rempp audit (1995)
Replacements:
Adress interviewed

Protocol adress 1

1st replacement

2nd replacement

total

non-response = 45,2 %
N
1363
792
841
2996
percentage
45,5
26,4
28,1
100,0
Measuring adults’ skills: IALS
• Probably upward bias when protocol not
strictly implemented
– Germany: 100% have no problem reading
German
– HH were not selected at ranmdom
– Within HH, the most able / motivated HH
member was selected in 5 to 10% of cases,
contrary to protocol
Measuring adults’ skills: IALS
• Motivation effect: very important
– The interviewer had nothing to do while
respondent filled booklet => pressure on
respondent (theoretically, no time limit)
Measuring adults’ skills: IALS
• Work by A. Blum et F. Guerin (1998)
• Interviewed the interviewers
– In 20% of HH, people seemed to answer
without thinking to be done with the survey
• Studied the items: ambiguities, « wrong »
when in fact perfectly understood. The
« onion example »
Measuring adults’ skills: IALS
• Conclusion on IALS
• Not one single defect, but a series of
dysfunctionings all along the measurement
production chain
– Sampling,
– Items design
– Survey fieldwork
– Imputation of non response
– Coding and calculation of scores (1 single scale)
Lessons to be learnt
• HMK 11:
– always consider the VARIANCE, not just the
mean / median
– Are levels relevant or only ranks (absolute //
relative measurement)
Lessons to be learnt
• IALS + HMK 11: non-response matters
– It is often non ignorable and induces biases,
try to assess them
– Partial non response means something about
the question
• PISA and IALS
– ALWAYS read the technical documents /
annexes before saying anything – you can!
– Rankings are relevant if variable values are
actually different…
– Same thing as « look for the standard error »