confidence level C - People Server at UNCW
Download
Report
Transcript confidence level C - People Server at UNCW
Objectives
6.1
Estimating with confidence
Statistical confidence
Confidence intervals
Confidence interval for a population mean
How confidence intervals behave
Choosing the sample size
Methods for drawing conclusions about a population
from sample data are called statistical inference
So we’ll use data to make inferences; i.e., draw
conclusions about populations from data in our samples
or from our experiments
We'll consider two types:
Confidence interval estimation
Tests of significance
In both of these cases, we'll consider our data as either
being a random sample from a population or as data
from a randomized experiment
Start with estimation… there are two situations we'll
consider
estimating the mean m of a population of
measurements
estimating the proportion p of Ss in a population of Ss
and Fs
In either case, we'll construct a confidence interval of the
form estimate +/- M.O.E., where M.O.E. = margin of
error of the estimator.
The MOE gives information on how good the estimate is
through the variation in the estimator (its standard error)
and through the level of confidence in the confidence
interval (through a tabulated value).
The standard error of an estimator is its estimated
standard deviation (treating the estimator as a statistic
with a sampling distribution…)
Best estimator of m is X and we know from the previous
chapter that X is approximately N(m, / n)
Best estimator of p is phat and we know from the last
chapter that phat is approx. N( p, p(1 p) )
n
Statistical confidence
Although the sample mean, x, is a unique number for any particular
sample, if you pick a different sample you will probably get a different
sample mean.
In fact, you could get many different values for the sample mean, and
virtually none of them would actually equal the true population mean, m.
But the sample distribution is narrower than the population distribution,
by a factor of 1/√n.
n
Sample means,
n subjects
Thus, the estimates
x
x
gained from our samples
are always relatively
n
Population, x
individual subjects
close to
the population
parameter µ.
m
If the population is normally distributed N(µ,σ),
so will the sampling distribution N(µ,σ/√n),
95% of all sample means will
be within the MOE (2*/√n)
n
of the population parameter
m.MOE=Margin of Error)
Distances are symmetrical
which implies that the
population parameter m
must be within roughly 2
standard deviations from
the sample average
x , in
95% of all samples.
Red dot: mean value
of individual sample
This reasoning is the essence of statistical inference - know and understand this figure!
Confidence intervals
The confidence interval is a range of values with an associated
probability or confidence level C. The probability quantifies the chance
that the interval contains the true population parameter.
x ± 4.2 is a 95% confidence interval for the population parameter m.
This equation says that in ~95% of the cases, the actual value of m will be
within 4.2 units of the value of x.
Implications
We don’t need to take a lot of
random samples to “rebuild” the
sampling distribution and find m
at its center.
n
All we need is one SRS of
Sample
size n and rely on the
n
Population
properties of the sample
mean's distribution to
infer the population mean
m
m.
Reworded
With 95% confidence, we can say
that µ should be within roughly 2
standard deviations (2*/√n) from
our sample mean .
x
In 95% of all possible samples of
this size n, µ will indeed fall in our
confidence interval.
In only 5% of samples would x be
farther from µ.
n
A confidence interval can be expressed as:
Sample Mean ± MOE
MOE is called the margin of error
m within x ± m
Example: 120 ± 6
A confidence level C (in %)
Two endpoints of an interval
m within ( x − MOE) to ( x +
MOE)
ex. 114 to 126
indicates the sense of
confidence that the µ falls within
the interval.
It represents the area under the
normal curve within ± MOE of
the center of the curve.
MOE
MOE
Review: standardizing the normal curve using z
x m
z
n
N(64.5, 2.5)
N(µ, σ/√n)
N(0,1)
x
z
Standardized height (no units)
Here, we work
with the sampling distribution of the sample mean,
and /√n is its standard deviation (spread).
Remember that is the standard deviation of the original population.
Varying confidence levels
Confidence intervals contain the population mean m in C% of samples, in
the long run. Different areas under the curve give different confidence
levels C.
Practical use of z: z*
z* is related to the chosen
confidence level C.
C
C is the area under the standard
normal curve between −z* and z*.
The confidence interval is thus:
x z *
−z*
n
z*
Example: For an 80% confidence
level C, 80% of the normal curve’s
area is contained in the interval.
How do we find specific z* values?
We can use a table of z (Table A) or t values (Table D). In Table D, for a
particular confidence level, C, the appropriate z* value is just above it.
Example: For a 98% confidence level, z*=2.326
We can use software. In JMP:
Create a new column, Edit Formula, and choose Normal Quantile( p ) under
Probability where p = (1-C)/2 is the area to the left of z*
Since we want the middle C probability, the probability we require is (1 - C)/2
Example: A 98% confidence level, Normal Quantile (.01) = −2.326349 (= neg. z*)
Link between confidence level and margin of error
The confidence level C determines the value of z* (in table C).
The margin of error m also depends on z*.
m z *
n
Higher confidence C implies a larger
margin of error m (thus less precision
in our estimates).
C
A lower confidence level C produces a
smaller margin of error m (thus better
precision in our estimates).
m
−z*
m
z*
Different confidence intervals for the same
set of measurements
Density of bacteria in solution:
Measurement equipment has standard deviation
= 1 * 106 bacteria/ml fluid.
Three measurements: 24, 29, and 31 * 106 bacteria/ml fluid
Mean: x = 28 * 106 bacteria/ml. Find the 96% and 70% CI.
96% confidence interval for the
true density, z* = 2.054, and write
x z*
= 28 ± 2.054(1/√3)
n
= 28 ± 1.19 x 106
bacteria/ml
70% confidence interval for the
true density, z* = 1.036, and write
x z*
= 28 ± 1.036(1/√3)
n
= 28 ± 0.60 x 106
bacteria/ml
Properties of Confidence Intervals
User chooses the confidence level
Margin of error follows from this choice
We want
high confidence
small margins of error
The margin of error, z* / n , gets smaller when
z* (and thus the confidence level C) gets smaller
σ is smaller
n is larger
Impact of sample size
The spread in the sampling distribution of the mean is a function of the
number of individuals per sample.
The larger the sample size, the smaller
the standard deviation (spread) of the
sample mean distribution.
Standard error ⁄ √n
But the spread only decreases at a rate
equal to 1/√n.
Sample size n
Sample size and experimental design
You may need a certain margin of error (e.g., drug trial, manufacturing
specs). In many cases, the population variability () is fixed, but we can
choose the number of measurements (n).
So plan ahead what sample size to use to achieve that margin of error.
m z*
n
z *
n
m
2
Remember, though, that sample size is not always stretchable at will. There are
typically
costs and constraints associated with large samples. The best
approach is to use the smallest sample size that can give you useful results.
What sample size for a given margin of error?
Density of bacteria in solution:
Measurement equipment has standard deviation
σ = 1 * 106 bacteria/ml fluid.
How many measurements should you make to obtain a margin of error
of at most 0.5 * 106 bacteria/ml with a confidence level of 95%?
For a 95% confidence interval, z* = 1.96.
z * 2
1.96 *12
2
n
n
3.92 15.3664
m
0.5
Using only 15 measurements will not be enough to ensure that m is no
6. Therefore, we need at least 16 measurements.
more
than
0.5
*
10
Cautions about using
x z* * / n
Data must be a SRS from the population.
Formula is not correct for other sampling designs.
Inference cannot rescue badly produced data.
Confidence intervals are not resistant to outliers.
If n is small (<15) and the population is not normal, the true
confidence level will be different from C.
The standard deviation of the population must be known.
The margin of error in a confidence interval covers only
random sampling errors!
Interpretation of Confidence Intervals
Conditions under which an inference method is valid are never fully met in
practice. Exploratory data analysis and judgment should be used when
deciding whether or not to use a statistical procedure.
Any individual confidence interval either will or will not contain the true
population mean. It is wrong to say that the probability is 95% that the true
mean falls in the confidence interval.
The correct interpretation of a 95% confidence interval is that we are 95%
confident that the true mean falls within the interval. The confidence interval
was calculated by a method that gives correct results in 95% of all possible
samples.
In other words, if many such confidence intervals were constructed, ~95%
of these intervals would contain the true mean.
HW: Read Section 6.1; do # 6.1-6.8, 6.10-6.12, 6.15-6.18, 6.27, 6.28