Normal Distributions

Download Report

Transcript Normal Distributions

Chapter 7
Frequency Distributions
© Ray Panko
Probability Distributions
2
3
Same mean, different standard deviations
Different means
© Ray Panko
Event is the estimation of
the mean (X bar) from a
sample of size n.
µ
X
Sampling
µ
µ
X
μx
x
X
Frequency Distribution
for a variable
© Ray Panko
Sampling Distribution to find
the mean of the variable
4
5
µ
δ
μx
X
Population
Distribution
Sampling
Distribution for µ
μX  μ
© Ray Panko
σX 
x
σ
n
6

Forty percent of voters call themselves
independents.
◦ 40% is a proportion (∏)
◦ Take a sample to estimate ∏
◦ The sample mean, p, is an unbiased estimator of ∏
◦ The sampling standard deviation, δp, is given by:
© Ray Panko
“Based on a sample of 1,500 households,
the percentage of voters in favor of
Proposition X is 40%, with a sampling error
of plus or minus 3%.”
8




The sample mean (X) or proportion (p) is not
likely to be exactly the population mean (µ)
or proportion (∏)
However, they should be close.
Confidence intervals allow us to estimate
how close.
Example: “It is estimated that the
proportion of independent voters is 49%,
with a sampling error of plus or minus 3%.”
© Ray Panko
9

Probability that the true population mean µ will lie
within a certain interval around the sampling
distribution mean Xbar, with a certain degree of
confidence.
95% Confidence Interval
Xbar
© Ray Panko
10


If the confidence level is 95%, then the area outside
the confidence interval, which we call α, is 0.05.
The upper and lower tails are α/2 or 0.025
95%  1   , so   0.05
α
α
 0.025
2
2
Xbar
© Ray Panko
 0.025
11

Find the Z values for α/2.

For P(1-0.025) = P(0.975), Z is 1.96

So the Z values are -1.96 and 1.96
α
α
 0.025
2
2
Z units:
Zα/2 = -1.96
X units:
Lower
Confidence
Limit
© Ray Panko
0
Point Estimate
Zα/2 = 1.96
Upper
Confidence
Limit
 0.025
12
Confidence
Level
80%
90%
95%
98%
99%
99.8%
99.9%
© Ray Panko
Confidence
Coefficient,
1 
Zα/2 value
0.80
0.90
0.95
0.98
0.99
0.998
0.999
1.28
1.645
1.96
2.33
2.58
3.08
3.27
13


A sample of 11 circuits from a large normal
population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is 0.35 ohms.
95% confidence
for the true mean:
X  Z α /2
σ
n
 2.20  1.96 (0.35/
 2.20  0.2068
1.9932
© Ray Panko
 μ  2.4068
11 )
14


A sample of 11 circuits from a large normal
population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is 0.35 ohms.
90% confidence
interval for the
true mean:
X  Z  /2
σ
n
 2.20  1.645 (0.35/
 2.20  0.173595
2 . 0264    2.3736
© Ray Panko
11 )
15
Confidence
Intervals
Use
Normal
Distribution
With δ
Population
Mean
σ Known
© Ray Panko
σ Unknown
Population
Proportion
Use
t Distribution
based on the sample
standard deviation S
computed from
sample instead of δ
16

Assumptions
◦ Population standard deviation is unknown
◦ Population is normally distributed
◦ If population is not normal, use large sample


Use Student’s t Distribution instead of the
normal distribution
Confidence Interval Estimate: X  t
α /2
S
(where tα/2 is the critical value of the t distribution with
n -1 degrees of freedom and an area of α/2 in each tail)
© Ray Panko
n
17
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
X1 = 7
X2 = 8
X3 = ?
If the mean of these three
values is 8.0,
then X3 must be 9
(i.e., X3 is not free to vary)
Here, the sample size (n) = 3
So degrees of freedom = n – 1 = 3 – 1 = 2
© Ray Panko
18
For confidence intervals based on
sample standard deviations,
d.f. = n-1
Where n is the sample size
© Ray Panko
19
Note: t
so (n-1
Z as n increases
n)
Standard
Normal
(t with df = ∞)
t (df = 13)
t-distributions are bellshaped and symmetric,
but have ‘fatter’ tails
than the normal
t (df = 5)
0
© Ray Panko
t
90% confidence level, 20
 = 0.10
/2 = 0.05
Upper Tail Area
df .25
.10
.05
1 1.000 3.078 6.314
2
Sample Size = 3
df = n-1
df = 2
0.817 1.886 2.920
/2 = 0.05
3 0.765 1.638 2.353
The body of the table
contains t values, not
probabilities
© Ray Panko
0
2.920 t
21
Confidence
Level
.90
t
(10 d.f.)
1.812
t
(20 d.f.)
1.725
t
(30 d.f.)
1.697
1.645
.95
2.228
2.086
2.042
1.96
.99
3.169
2.845
2.750
2.58
As sample size n increases, df (n-1) increases.
As df increases, t approaches z
So at large sample sizes, t and z are the same
© Ray Panko
z
22
A random sample of n = 25 has X = 50 and
S = 8. Form a 95% confidence interval for μ
◦ d.f. = n – 1 = 49, and α/2 = .025
◦ From Table E.1, tα/2 = 2.0639
◦ So The confidence interval is
X  t /2
S
 50  (2.0649)
n
46.698 ≤ μ ≤ 53.302
© Ray Panko
8
25
23


TINV(Probability, df)
For a 95% confidence level, sample size of
25, and a standard deviation S of 8
◦ df is 24 (n-1)
◦ Probability is α (.05), not α/2 = .05
◦ Equation is = TINV(.05,24)
◦ Its value is 2.063899
◦ This is the same value found with the table
lookup
© Ray Panko
24
Confidence
Intervals
Population
Mean
σ Known
© Ray Panko
σ Unknown
Population
Proportion
Based on a sample
of 70, 95% of our
faculty members
have PhDs.
25

Recall that the distribution of the sample
proportion is approximately normal if the
sample size is large, with standard deviation
σp 

 (1   )
n
We will estimate this with sample data:
p(1  p)
n
© Ray Panko
26

Upper and lower confidence limits for the
population proportion are calculated with the
formula
p  Z α /2


p(1  p)
n
where
◦
Zα/2 is the standard normal value for the level of confidence desired
◦
p
is the sample proportion
◦
n
is the sample size
Note: must have np > 5 and n(1-p) > 5
© Ray Panko
27

A random sample of 100 people shows that
25 are left-handed. Form a 95% confidence
interval for the true proportion of lefthanders.
p  Z α /2
p(1  p)/n
 25/100  1.96
0.25(0.75) /100
 0.25  1.96 * (0.0433)
0.1651
© Ray Panko
   0.3349
for a desired error size and confidence level
29
(continued)

To determine the required sample size for
the mean, you must know:
◦ The desired level of confidence (1 - ), which
determines the critical value, Zα/2
◦ The acceptable sampling error, e (the plus or
minus in the estimate).
◦ The population standard deviation, σ
© Ray Panko
30
If  = 45, what sample size is needed
to estimate the mean within ± 5 with
90% confidence?
2
n
Z σ
e
2
2

(1.645)
5
2
2
(45)
2
 219.19
So the required sample size is n = 220
(Always round up)
© Ray Panko
31

If unknown, σ can be estimated when
using the required sample size formula
◦ Use a value for σ that is expected to be at
least as large as the true σ
◦ Select a pilot sample and estimate σ with
the sample standard deviation, S
© Ray Panko
32




A confidence interval estimate (reflecting
sampling error) should always be included
when reporting a point estimate
The level of confidence should always be
reported
The sample size should be reported
An interpretation of the confidence interval
estimate should also be provided
© Ray Panko