Statistical Inference in Education

Download Report

Transcript Statistical Inference in Education

Introduction to
Statistical Inference
EDUC 502
November 28, 2005
Statistical Inference in Education
Illuminating article:
Daniel, L.G. (1998). Statistical significance testing:
A historical overview of misuse and
misinterpretation with implications for editorial
policies of educational journals. Research in the
Schools, 5 (2), 23-32. Available online:
http://www.personal.psu.edu/users/d/m/dmr/
sigtest/3mspdf.pdf

Statistical Inference in Education


“Probably few methodological issues have
generated as much controversy among
sociobehavioral scientists as the use of
[statistical significance] tests” (Pedhazur &
Schmelkin, 1991, p. 198).
“The test of significance does not provide the
information concerning psychological
phenomena characteristically attributed to it…a
great deal of mischief has been associated with
its use” (Bakan, 1966, p. 423).
Statistical Inference in Education


Huberty (1987) asserted, “There is nothing
wrong with statistical tests themselves! When
used as guides and indicators, as opposed to
means of arriving at definitive answers, they are
Okay” (p. 7).
Main problem: “The ingenuous assumption that
a statistically significant result is necessarily a
noteworthy result” (Daniel, 1997, p. 106).
Statistical Inference in Education


Another problem: It is “common practice to
drop the word ‘statistical’ and instead speak of
‘significant differences,’ ‘significant correlations,’
and the like” (Pedhazur & Schmelkin, 1991, p.
202).
Schafer (1993) noted, “I hope that most
researchers understand that significant
(statistically) and important are two different
things. Surely the term significant was ill-chosen”
(p. 387).
Statistical Inference in Education
In order to better understand this controversy,
we will explore some of the mathematics behind
statistical inference.
 We will follow the outline provided by:
Moore, D.S. (1997). Statistics: Concepts and
controversies (4th ed.). New York: W.H. Freeman.

Statistical Inference in Education



Inference simply means drawing conclusions
from data, as we have discussed up to this point.
The phrase “statistical inference” is reserved for
occasions when probability concepts are used to
help in drawing conclusions.
Probability can account for chance variation,
which allows us to correct our judgment of
what is happening in certain situations.
Statistical Inference in Education


Scenario: Suppose a multiple choice test is used
to compare the performance of students
receiving teaching method A to teaching method
B. 20 students were assigned at random to
teaching method A and another 20 to teaching
method B. At the end of the experiment, 12 of
the students in group A received Fs on the test
while only 8 in group B received Fs.
Question: Can we conclude that teaching
method B better prevents students from
receiving Fs?
Statistical Inference in Education


Answer: Not necessarily. A difference this size
could likely be due to chance variation alone. We
could do a probability calculation to compute
the probability of avoiding an F just by guessing
and then compare.
While there is a numerical difference between
the number of Fs in the two groups, that
difference might vanish if the experiment were
repeated a number of times.
Statistical Inference in Education


Drawing conclusions in mathematics: Start with
a hypothesis and then use a logical argument to
prove that the conclusion follows.
Example: If a quadrilateral is a rectangle, then its
diagonals are congruent. (This can be proven
through an a priori logical argument – not by
just examining a bunch of rectangles and
measuring their diagonals to see if they are
congruent).
Statistical Inference in Education


Drawing conclusions in social science is almost
the opposite of mathematics: You need to start
with a number of observations and draw
conclusions from them. (Inductive reasoning).
Important implication: Social science research
studies do NOT produce proofs. They only
produce evidence that something may or may not
be the case. (i.e., you can never prove that
teaching method A is better than method B, but
you can systematically gather evidence to help
you make decisions about how to teach).
Statistical Inference in Education


“Statistical inference uses probability to say how
strong an inductive argument is” (Moore, 1997,
p. 459).
In the teaching method A vs. teaching method B
scenario, a probability calculation could help us
see that the argument in favor of teaching
method B is not very strong. We could likely get
different results if the experiment were
replicated a number of times.
Statistical Inference in Education


Note: The probability calculations required for
statistical inference depend upon probability
samples or randomized comparative
experiments.
Very few educational research studies have this
sort of luxury, with a few notable exceptions.
For example:
National Assessment of Educational Progress
(NAEP)
 Trends in Mathematics and Science Study (TIMSS)

Some Essential Terminology


“A parameter is a number that describes the
population. For example, the proportion of the
population having some characteristic of
interest is a parameter we call p. In a statistical
inference problem, population parameters are
fixed numbers, but we do not know their values”
(Moore, 1997, p. 460).
Example: The actual proportion of 3rd graders
who can read in the U.S. is a population
parameter. We can only estimate it by drawing
random samples from the population. We will
probably never know it exactly.
Some Essential Terminology


“ A statistic is a number the describes the sample
data. For example, the proportion of the sample
having some characteristic of interest is a statistic
the we call p-hat. Statistics change from sample to
sample. We use the observed statistics to get
information about the unknown parameters”
(Moore, 1997, p. 460).
Example: We could draw a random sample out of
all the 3rd graders in the U.S. and administer a
literacy test. The proportion that could read would
be a statistic to estimate the population parameter.
Confidence Intervals

Scenario: “The NAEP survey includes a short
test of quantitative skills, covering mainly basic
arithmetic and the ability to apply it to realistic
problems. Scores on the test range from 0 to
500. For example, a person who scores 233 can
add the amounts of two checks appearing on a
bank deposit slip; someone scoring 325 can
determine the price of a meal from a menu, a
person scoring 375 can transform a price in
cents per ounce into dollars per pound” (Moore,
1997b, p. 207).
Confidence Intervals

Scenario (contd).: “In a recent year, 840 men 21
to 25 years of age were in the NAEP sample.
Their mean quantitative score was 272 (statistic).
These 840 men are a simple random sample
from the population of all young men. On the
basis of this sample, what can we say about the
mean score in the population of all 9.5 million
young men of these ages (parameter)?” (Moore,
1997b, p. 207).
Confidence Intervals



Because the statistic was 272, you might guess
the actual population parameter is around 272.
Statistical Inference question related to
confidence intervals: “How would the sample
mean (statistic) vary if we took many samples of
840 young men from this same population?”
(Moore, 1997b, p. 207).
This seems like an impossible question to
answer on the face of it, but some statistical
facts help us out.
Confidence Intervals




Useful fact #1: The sampling distribution for
sample means is normally distributed!
Useful fact #2: The mean of the sampling
distribution is equal to the mean of the
population.
Useful fact #3: The 68-95-99.7 rule for normal
distributions.
Useful fact #4: From long experience, we
calculate the standard deviation of the sampling
distribution to be 2.1.
Confidence Intervals



Putting the facts together: The 68-95-99.7 rule
says that about 95% of the means will be within
two standard deviations of the population mean.
In our case, 95% of the sample means will be
within 4.2 points of the population mean.
In 95% of all samples taken, the actual
population mean is within 4.2 points of the
sample mean.
This means that in 95% of all samples the actual
population mean lies between (sample mean) –
4.2 and (sample mean) + 4.2
Confidence Intervals


Bottom line: If we choose very many samples,
95% of the intervals defined by (sample mean)
plus or minus (4.2) will capture the actual
population mean.
Back to the NAEP scenario: Recall that our
sample mean was 272. This means we can say
that we are 95% confident that the actual
population mean for the NAEP lies between:
272-4.2 = 267.8 and 272+4.2 = 276.2.
Confidence Intervals

“Be sure you understand the grounds for our
confidence. There are only two possibilities:
1. The interval between 267.8 and 276.2 contains the
true population mean.
 2. Our simple random sample was one of the few
samples for with the sample mean is not within 4.2
points of the true population mean. Only 5% of all
samples give such inaccurate results” (Moore, 1997,
p. 210).

Confidence Intervals


“We cannot know whether our sample is one of
the 95% for which the interval catches the actual
population mean, or one of the unlucky 5%.
The statement that we are 95% confident that
the actual population mean lies between 267.8
and 276.2 is shorthand for saying, ‘We got these
numbers by a method that gives correct results
95% of the time” (Moore, 1997b, p. 210).
Homework Exercise 1

“The report of a sample survey of 1500 adults
says, ‘With 95% confidence, between 27% and
33% of American adults believe that drugs are
the most serious problem facing our nation’s
public schools.’ Explain to someone who knows
no statistics what the phrase ‘ninety-five percent
confidence’ means in this report” (Moore, 1997,
p. 468).
Homework Exercise 2

“A student reads that a 95% confidence interval
for the mean NAEP quantitative score for men
of ages 21 to 25 is 267.8 to 276.2. Asked to
explain the meaning of this interval, the student
says, ‘ninety-five percent of all young men have
scores between 267.8 and 276.2.’ Is this student
right? Justify your answer” (Moore, 1997b, p.
217).
Hypothesis Tests

“The other major type of formal inference is the
test of significance. The purpose of a statistical test
is to assess the evidence provided by the data
against some claim about a parameter. A test
says, ‘If we took many samples and the claim
were true, we would rarely get a result like this.’
Observing a result that would rarely occur if a
claim were true is evidence that the claim is not
true. Replace the word ‘rarely’ by a probability
and you have a numerical measure of our
confidence in the evidence that the data give us”
(Moore, 1997, p. 483).
Hypothesis Tests


Generic Example: Suppose we want to compare
a new teaching method (A) against another one
(B). We might start by guessing that teaching
method A will work better.
We would then state a null and alternative
hypothesis: Null – Mean posttest scores for the
two groups will be identical. Alternative: Mean
posttest scores for group A will be greater than
group B.
Hypothesis Tests


If we believe in teaching method A, we hope to
gather evidence against the null hypothesis and
in support of the alternative.
If we gather enough evidence (enough and
significance being defined in probabilistic terms),
we can reject the null hypothesis. Note, however,
that this does not prove the alternative
hypothesis. All that any sociological study can do
is to gather evidence.
Homework Exercise 3

Suppose you read in an educational research
report that students’ posttest scores after
receiving teaching method A were significantly
higher than those of students who received
teaching method B. Does this prove that
teaching method A is more effective than
teaching method B? Why or why not?