(s/sqrt(n)) - People Server at UNCW

Download Report

Transcript (s/sqrt(n)) - People Server at UNCW

• Inference about the mean of a population of
measurements (m) is based on the standardized
value of the sample mean (Xbar).
• The standardization involves subtracting the
mean of Xbar and dividing by the standard
deviation of Xbar – recall that
– Mean of Xbar is m ; and
– Standard deviation of Xbar is s/sqrt(n)
• Thus we have (Xbar - m )/(s/sqrt(n)) which has a
Z distribution if:
– Population is normal and s is known ; or if
– n is large so CLT takes over…
• But what if s is unknown?? Then this
standardized Xbar doesn’t have a Z distribution
anymore, but a so-called t-distribution with n-1
degrees of freedom…
• Since s is unknown, the standard deviation of
Xbar, s/sqrt(n), is unknown. We estimate it by
the so-called standard error of Xbar, s/sqrt(n),
where s=the sample standard deviation.
• There is a t-distribution for every value of the
sample size; we’ll use t(k) to stand for the
particular t-distribution with k degrees of
freedom. There are some properties of these tdistributions that we should note…
• Every t-distribution looks like a N(0,1) distribution; i.e., it
is centered and symmetric around 0 and has the same
characteristic “bell” shape… however, the standard
deviation of t(k) {sqrt(k/(k-2))} is greater than 1, the s.d.
of Z so the t-distribution density curve is more spread out
than Z. Probabilities involving r.v.s that have the t(k)
distributions are given by areas under the t(k) density
curve … Table D in the back of our book gives us the
probabilities we need…
• The good news is that everything we’ve already
learned about constructing confidence intervals
and testing hypotheses about m carries through
under the assumption of unknown s …
• So e.g., a 95% confidence interval for m based
on a SRS from a population with unknown s is
Xbar +/- t*(s.e.(Xbar))
Recall that s.e.(Xbar) = s/sqrt(n). Here t* is the
appropriate tabulated value from Table D so that
the area between –t* and +t* is .95
• As we did before, if we change the level of
confidence then the value of t* must change
appropriately…
• Similarly, we may test hypotheses using this tdistributed standardized Xbar… e.g., to test the
H0: m =m0 against Ha: m >m0 we use
(Xbar - m0)/(s/sqrt(n)) which has a tdistribution with n-1 df, assuming the null
hypothesis is true. See page 422 (7.1, 3/7) for a
complete summary of hypothesis testing in the
case of “the one-sample t-test” …
• HW: Read section 7.1 thru p. 433; go over all the
examples carefully and answer the HW questions
following them: #7.1-7.9 Work on the following
problems (p.441 ff) (use software as needed):
#7.15-7.22, 7.25, 7.32, 7.35-7.37, 7.41.
Is there a difference in aggressive behavior of patients on
"moon days" compared with "non-moon days"?
•To summarize the analysis:
– when the data comes in matched pairs, the analysis is
performed on the differences between the paired
measurements
– then use the t-statistic with n-1 d.f. (n = # of pairs) to
construct confidence intervals and test hypotheses on
the true mean difference.
• In a matched pairs design, subjects are matched
in pairs and the outcomes are compared within
each matched pair. A coin toss could determine
which of the two subjects gets the treatment and
which gets the control… One special kind of
matched pairs design is when a subject acts as
his/her own control, as in a before/after study…
See example 7.7 on page 428ff (7.1, 4/7). Note
that the paired observations (# of agressive
behaviors) are subtracted and the difference in
scores becomes the single number analyzed
with a one-sample t-statistic with n-1 df, where
n=the number of pairs… see the top of page 431
and the next page for a summary of the process.
• HW Read through p.433. Go over Example 7.7
then do #7.32, 7.35, 7.41.
• Read the section on Robustness of the t
procedures (starting p.432 (7.1, 5/7))… note
the definition of the statistical term robust –
essentially, a statistic is robust if it is insensitive
to violations of the assumptions made when the
statistic is used. For example, the t-statistic
requires normality of the population… how
sensitive is the t-statistic to violations of
normality?? Look at the practical guidelines for
inference on a single mean at bottom of p.432…
– If the sample size is < 15, use the t procedures if the
data are close to normal.
– If the sample size is >= 15 then unless there is strong
non-normality or outliers, t procedures are OK
– If the sample size is large (say n >= 40) then even if
the distribution is skewed, t procedures are OK