Inference about a Population Mean, Two Sample T Tests, Inference

Download Report

Transcript Inference about a Population Mean, Two Sample T Tests, Inference

From Theory to Practice:
Inference about a Population
Mean, Two Sample T Tests,
Inference about a Population
Proportion
Chapters 17-18-etc
Recently we have
• Looked at the basic theoretical ideas
underpinning inference
–
–
–
–
–
–
Randomness
Samples vs. Populations
The law of large numbers
The central limit theorem
Confidence Intervals
Probability (in regard to an observed statistic or
equation and a population parameter or relationship)
– Hypothesis testing
Today we are going to move on to
making inferences about variables
• Inference about a population mean with the (t)
statistic.
• Then we are going to move on to two sample
problems using a different variation of (t).
• Then we are going to delve back into theoryland to look at population proportions. (**note for
this topic I am going to switch to the publisher’s
slides that accompanied the book as they are
better than the slides I have produced in the
past.)
Inference about a Population Mean
• Last class we had a bit of fun calculating
confidence intervals and probabilities but it was
a bit un- realistic as we assumed that even
though we did not know the population mean we
did know its standard deviation
• This class we will look at the more realistic
situation where both of these parameters of the
population (mean and std. deviation) are
unknown
• We are going to do this by using:
– the sample mean as an estimate for the population
mean
– And confidence levels and P-values derived from the
sampling distribution of the mean
• In order to do this certain conditions need to
apply
– The data is drawn from a SRS
– The variable we are interested in has a distribution in
the population that is distributed relatively normally,
even symmetrically is probably good enough
– The population must be much larger than the sample
size (general requirement for almost all work with
statistics)
But we don’t know the
population standard deviation…
• Because we don’t know
the population standard
deviation we estimate it
using something called
the standard error of the
sample mean
• This is just the sample
standard deviation
divided by the square root
of the sample size
s
n
• As with many other things we have
looked at this term, standard errors
for a sample mean of a given size
sample, have a known distribution.
This is known as the ‘t’ statistic.
• Sample size n-1 is referred to as
the degrees of freedom
• The distribution of t is almost but
not quite normal because the
substitution of the sample standard
deviation for the population std. d.
adds more variability
• The equation opposite is not
something I think you would ever
use in the real world but instead
explains how the values for table C
at the back of the book were
calculated
t
x
s
n
• The net result is that
we can use the
resulting “t” statistic to
estimate the
confidence interval for
our sample mean
without knowing our
population mean or
population standard
deviation
s
X  (t * ( ))
n
• The good news is that if you have a
computer you don’t need to do this by
hand any stats package will calculate this
“one sample t procedure”.
• But since we are practicing the math you
can find the table for values of (t) with
various degrees of freedom for various
confidence intervals at the very back of
your book in table “C”
Break
• Another slightly different t test is the matched pair test.
Here we are trying to figure out if the means are
significantly different among the sets of pairs.
• This is also sometimes used as a before and after test
(for example, patients might be asked to rate their pain
on a scale of 1 to 10 before and after receiving a drug).
• Here the question is this: What is the probably that the “t”
statistic which was calculated from the scores for
matched pairs in our sample would occur if the
difference in means among matched pairs in the overall
population were actually zero? If it is low, then we can
infer the drug had an impact on the pain reported by our
sample of patients and we would say our results are
statistically significant.
• Things to remember about t
– As the distribution of the variable you are
interested in varies from normal so to will the
usefulness of t
– In general it is fine as long as the sample is
large and the population is even larger
(regardless of normality). The book says
sample size 40 or greater.
Comparing Means: Two Sample T
• A third further t test is
called the two sample t
test
• Here the assumption is
that our two samples are
independent of each
other and from
populations independent
from each other. For
example, voters in two
different provincial
elections.
• The math looks like this
t
x
x
s s
n n
1
2
2
2
s1
s2
1
2
• The real trick with t is knowing which of
the three procedures to use
– 1 sample
• We want to find out how confident we can be that
the mean we observe in our sample data will
reflect the mean in the population and to what
probability the population mean falls within our
confidence limits.
– Paired or matched sample
• We want to find out the significance of the
difference in means we observe between pairs.
E.g. Do husbands spend less time on average
per week cleaning than wives? (or)
• We want to find out the significance of the
difference in means we observe before and after
an event or intervention. E.g. did the pain killer
produce a meaningful change in the pain
experienced by a sample of patients.
– Independent sample t test
• Similar to the paired sample but our two samples
are independent of each other and therefore, we
must assume they have different population means
and different population standard deviations. Eg.
A national public opinion survey asks voters in
every province to place themselves on a left-right
scale with 0 being extreme left and 100 being
extreme right. Do voters in two different provinces
have similar mean scores on the left-right
continuum?
Break 2
Inference about a Population
Proportion
• Up until now we have been looking at
making inferences about population
means by looking at sample means
• Now we are going to shift focus to looking
at whether we can predict what portion of
a population is going to exhibit some trait
or outcome based on the portion of the
sample that does so
• Please note, for the rest of this lesson I am
going to switch to the slides that were
provided by the publisher as they do a
much better job of explaining this than I
do.
• As they are copyright, I cannot generally
distribute them. They are on the secure
server for your class.