PPAL -6200 Intro to Inference

Download Report

Transcript PPAL -6200 Intro to Inference

PPAL -6200 Intro to Inference
Chapter 14 and 15 (16 is review)
March 8-9, 2011
(revised March 8 22:30)
Why we do research?
• Once we pull a sample and run our
experiment or test, or whatever we learn
something about the sample.
• However, that is not really our goal
• Our goal is to Infer from the sample some
conclusion about the wider population
What if there are differences among
samples?
• If we pull two samples from the same
population there is a good chance that our
results will differ
• Therefore, we can never be 100% certain that
any statistics or results we draw from a sample
will be reflected in the population parameters
or how something might work in the wider
world
Therefore,…
• When we do a study based on a sample we
generally have to calculate two things
• Confidence Intervals
• Significance
• Both of these things are based on the sampling
distribution of statistics (the probability of what
our outcomes would be if we applied the same
methods for choosing a sample and calculating a
statistic repeatedly many times).
Imagine a very simple situation
• SRS from the population
• Variable is precisely normal
• We don’t know the pop mean but we do know
the pop std. dev
• Note as book states, this is a bit implausible
If you only knew the Std. Dev of the
Pop. Life would be easy
• Take two examples from the book. They
simply assume that the mean of the sample is
equal to the mean of the pop, give or take.
• What do we use to calculate that “give or
take”
• The sampling distribution of the mean for the
sample and our knowledge of the normal
curve
So let’s look at an example
• X BMI (body mass index) of 654 women = 26.8
• We know from some other source that the 
of BMI for all women is 7.5
• We know the std dev. for the sampling
distribution (standard deviation for all
samples of the mean) is
7.5
 0.3
/ n
654

Estimating using the 68,95,99.7 Rule
• The 68, 95, 99.7 rule tells us that 95 % of all
means will be within two units of standard
deviation from the mean for all samples.
Therefore if the sampling distribution is 0.3
then two units of standard deviation will be
0.6
• Therefore the 95% confidence interval will be
between 
X
X

 0.6  26.8  0.6  26.2
 0.6  26.8  0.6  27.4
Another Example
•
•
•
•
NAEP Test of basic math skills
840 young men in the sample
Mean score for young men = 272
We want to estimate Mean score in pop of
young men. From another source We know
pop is normal and has a std. deviation of 60
• So we can figure out the std dev. for the
sampling dist. And from there the 95%
confidence interval
/ n
60
 2.07
840
• So if one unit of std. dev
for the sampling dist is
2.07
• Then two is plus or
minus 4.14
Therefore we are 95% confident that
the mean score in the population is
between

X
X

 4.14  272  4.14  267.86
 4.14  272  4.14  276.14
Now let’s take what we learned and think
wider…
• In the first example (slides 7 and 8) the margin of
error was ± 0.6 with 95% confidence
• If we choose the 95% we saw it was just plus or
minus 2 x the std dev. for the sampling
distribution (which we worked out as 0.3). Now
please note, if we chose a different confidence
interval we would have a different margin of
error. 99.7 would be 3 x the sampling
distribution.
Example #1 margin of error now with
99.7% confidence rather than 95%

X  0.9  26.8  0.9  25.9
X  0.9  26.8  0.9  27.7

Think of it like playing darts…
• As you can see there is a
sort of trade off here. All
other things being equal:
Higher confidence means
greater margin of error.
• Think of it like playing darts.
– The smaller the target area
(margin of error) the less
likely you are to hit it
(confidence)
– The larger the target area
(margin of error) the more
likely you are to hit it
(confidence)
Rogues
• In the two examples we did: We estimated the
range within which the population mean resides,
given our knowledge of the sample means and
the sampling distribution 95% of the time.
• That means if we use our method 100 times
there will be five occasions when the population
mean falls either above or below our estimate for
the margins of error.
• Those five occasions are sometimes called
“rogues” We can discuss how to detect them
later
The Standardized normal curve
• In practice we can simplify what we did before
by using the properties of the normal curve
and the known critical values of Z
Confidence
level C
90%
95%
99%
Critical value
Z*
1.645
1.960
2.576
So let’s see this with equation and
numbers

X
Z
*
  

to
 n

X

* 
Z 

 n
• Let’s go back and check the Math tests using
this method (slides 9-11).
• As you recall:
–
–
–
–
Mean is 272
Sample “n” is 840
Std dev of pop is 60
Critical z for 95% is
1.960
Z

X
X

• The difference
between the two slides
is due to rounding and
nothing more
*
  
 60 

  1.96
  4.05
 n
 840 
 4.05  272  4.05  267.95
 4.05  272  4.05  276.05
Significance and Testing the null
hypothesis
• Why do we always test the null hypothesis that
there is “no” relationship among the variables?
Because we can never really say anything is true.
• The probability that the null hypothesis is true is
the p value.
• Thinking about the 95% confidence interval we
can just reverse this and say we want to see
results that have a p value < .05 meaning there is
less than a 5% probability that the null hypothesis
is true.
In terms of statistical tests
• We are asking, what is the probability that the
results we have observed could be caused by
random chance? A p value < .05 says there is
less than a 5% chance of this.
• Looking at the example in the book (does cola
lose sweetness in storage?)
• Our null hypothesis is that there is no loss of
sweetness and that average loss of sweetness
is in fact = 0
Here comes the normal curve again
• We know there are 10 tasters
• We know their mean loss of sweetness score
was 1.02
• We also know that for any cola the std.
deviation of sweetness loss is 1.0
If the mean really were zero then what we are
talking about is 0±   1  0.316
n
10
Now let’s think about moving beyond
eyeballs on graphs to numbers
• Like Z, All of the statistics we use have known
critical values so that we can go to a table and
look up how strong the stat has to be with a
given size sample so as to be significant at the
p<0.5 level
• In fact, your software will calculate the precise
or (at least to three decimal place) probability
What we are looking for
• Data that would rarely occur if the null hypothesis Ho were
true provide evidence that Ho is not true. P values give us
a measure of “would rarely occur”.
• State: what is the practical question we are testing
• Plan: Identify the parameter, state null and alternative
hypotheses, and choose the type of test that fits the data
and problem
• Solve:
– Check the conditions for the test you plan to use
– Calculate the statistic
– Find the P. Value
• Conclude: Return to the practical question to describe your
results in this setting
Tests for a population mean The Z test
statistic
• The z test statistic measures how far the
observed mean varies from the population
mean (and allows us to proceed to estimate if
the difference is significant).
• This is worth doing for the sake of knowing
how. However, it has a big flaw as a real
world test, it assumes you know the Mean in
the pop and the standard deviation.
Let’s look at the blood pressure
example
• An executive wants to know if his/her people
have abnormal blood pressure.
So if we do it
• Now go to the back of the
book and look at table A

and find the area that is
X


o
126.07  128
1.09 units away from 0.
Z

 1.09

15
Find the row -1.0 and then
n
72
go across to column .09 and
you find 0.1379 double it
and you have found the
percentage of times 27%
that a sample of 72 men
would have blood pressure
this far from the population
mean OR P=0.27.
Thinking about Inference
• No matter which statistical test you employ, the
reasoning of confidence intervals and significance
is the same
• Statistics is applied mathematics. You must know
the mathematical theorems (such as the fact that
the Z stat has a normal distribution when Ho is
true).
• As well, you must also use judgment so as to
determine when to apply these theorems and
when not to (what I call the “deer hunter
phenomenon”).
When you teach stats you sometimes come away
feeling as though you are teaching people to use a
high-power rifle without teaching them what deer look
like
Some rules and ideas to keep in mind so that
you ensure you are shooting at deer, rather than
things you are not supposed to hunt
H0 True
Ha True
Reject H0
Type 1 Error
Correct Conclusion
Fail to Reject H0
Correct Conclusion
Type II Error
• Conditions for inference
– Understand the conditions that apply for a given statistic
so that its confidence intervals and associated significance
tests can be trusted
– E.g. the Z statistic procedure only worked because we
knew the parameter mean (a situation that will rarely
occur). Therefore this test is of little use in the real world.
However, there is another way to do this that will be
discussed in chapter 17. The other requirements, that
data be drawn from a population with a SRS and that
population is “Normal” is harder to get around. Points to
two big questions you must always ask:
• Where did the data come from?
• What is the shape of the population distribution?
• Where did the data come from?
– Most statistical tests assume data comes from some
sort of Random Sample, does it really?
• What is the shape of the population distribution?
– In an ideal world you will have a sample that is normal
from Normally distributed populations.
– In truth this will often not apply and it often does not
matter that much. However, there are some statistical
tests where it does matter and you will have to pay
attention to those warnings.
• How do the confidence intervals behave
– What causes it to get smaller, what causes it to get
larger?
• What does the margin of error include?
– The margin of error calculated for a confidence
interval only includes the error caused by the
sampling errors (the sampling distribution)
– Any other source of error, e.g. response bias, nonresponses, etc. is not covered and will certainly
influence this
• How do significance
tests behave
– How small a P is
convincing?
– 0.05 good for social
research not so hot for
designing nuclear
reactors
• Significance always depends on the alternative hypothesis
as to whether it should be a two sided or one sided test
• Significance does not mean important.
– Many associations are relatively minor but significant. For
example, a study might show that a certain type of behavior
increases the risk of getting a certain type of cancer by 0.05%
and that the relationship is significant. This might be cause for
concern until you read that only 1% of people get that sort of
cancer.
– Then you have to ask, what am being asked to give up in order
to get that benefit? Eating a specific type of fatty food? Okay,
I’ll give that up. Living in a large urban area? Probably not
because living in large urban areas has other benefits that
probably counter-balance that risk.
• If you do it often enough you will get a
significant result.
– Beware of multiple analysis. If you have several
studies using the same method and only one
produces significant results be careful.
• Use the correct sample size for the confidence
interval and significance level (power of a test)
you want.