Inference for the Mean of a Population

download report

Transcript Inference for the Mean of a Population

Chapter 7 and Chapter 8
1
Inference for the Mean of a
Population – Part 1
Chapter 7.1
(omit sign test pp 469 – 470)
2
The situation where 
is not known
•
If  is known then the std
deviation of the sample mean is
given by /sqrt(n)
•
We now consider the more
realistic situation where  is not
known. In effect, we estimate 
using, s, the sample standard
deviation.
3
4
t-table (Table D)
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
60
1000
z*
0.25
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.679
0.675
0.674
50.0%
Upper tail probability p
0.2
0.15
1.376
1.963
1.061
1.386
0.978
1.250
0.941
1.190
0.920
1.156
0.906
1.134
0.896
1.119
0.889
1.108
0.883
1.100
0.879
1.093
0.876
1.088
0.873
1.083
0.870
1.079
0.868
1.076
0.866
1.074
0.865
1.071
0.863
1.069
0.862
1.067
0.861
1.066
0.860
1.064
0.859
1.063
0.858
1.061
0.858
1.060
0.857
1.059
0.856
1.058
0.856
1.058
0.855
1.057
0.855
1.056
0.854
1.055
0.854
1.055
0.848
1.045
0.842
1.037
0.842
1.036
60.0%
70.0%
0.1
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.296
1.282
1.282
80.0%
0.05
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.671
1.646
1.645
90.0%
0.025
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.000
1.962
1.960
95.0%
0.01
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.390
2.330
2.326
98.0%
0.005
63.656
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.660
2.581
2.576
99.0%
0.0025
127.321
14.089
7.453
5.598
4.773
4.317
4.029
3.833
3.690
3.581
3.497
3.428
3.372
3.326
3.286
3.252
3.222
3.197
3.174
3.153
3.135
3.119
3.104
3.091
3.078
3.067
3.057
3.047
3.038
3.030
2.915
2.813
2.807
99.5%
0.001
318.289
22.328
10.214
7.173
5.894
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.232
3.098
3.090
99.8%
0.0005
636.578
31.600
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.689
3.674
3.660
3.646
3.460
3.300
3.291
99.9%
Confidence Level C
5
Using the t-table
6
•
•
Example: The following data are the
amounts of vitamin C, measured in
mg. per 100 grams of blend (dry basis)
for a random sample of size 8 from a
production run:
26,31,23,22,11,22,14,31
We want a 95% c.i. for µ, the mean
vitamin C content produced during this
run.
7
•
Example: A random sample of 10 onebedroom apartment rental ads from your
local newspaper has these monthly rents
(dollars):
500,650,600,505,450,550,515,495,650,395.
Do these data give good reason to believe
that the mean rent of all advertised one
apartments is greater than $500 per month?
8
Matched Pairs
• Here are some sales
before and after a
motivational course.
Employee Before
After
1
212
237
2
282
291
3
203
191
4
327
341
5
165
192
6
198
180
Does the course
appear to be effective
in increasing sales?
9
Robustness of the t
procedures
• A statistical procedure is said
to be robust if the probability
calculations required are
insensitive to violations of the
assumptions made:
• For t:
– n < 15: use t if data is clearly
close to normal. If clearly nonnormal or outliers are present
do not use t.
– 40>= n ≥ 15: can use t except
in presence of outliers or
strong skewness.
– Large samples: can use t
procedures even for clearly
skewed data when sample
size is large, roughly n ≥ 40.
10
Inference for the Mean of a
Population – Part 2: Comparing
Two Means
Chapter 7.2
(omit pp 498- 503)
11
Overview
• Want to compare means
of two populations
• Can use c.i. or hypothesis
tests.
• Many specialized
procedures -- depending
on data and underlying
distributions.
• We’ll look at some of the
most important ones.
12
The idealized situation
•
•
•
We assume variances are known and
normal population.
Doesn’t happen often in practice
Can do hypothesis tests and compute pvalues as in Ch 6.
•
Example: sigma1 =20, sigma 2 =30, n1 =
120, n2 = 150, x1bar = 67.3, x2 bar =
72.0
•
H0: mu1- mu2 = 0. Ha: mu1-mu2 ≠0
–
–
(a) Compute the z statistic and p-value.
(b) Get a 95% c.i for mu1- mu2
13
Two sample tprocedures
• The most common situation.
We use sample standard
deviations to estimate sigma1
and sigma2.
14
Example
The purchasing department has suggested that
all new computer monitors for your company
should have flat screens. You want to be sure
employees like them. The next 20 employees
needing screens are randomly divided into two
groups, with 10 in each group. 10 get flat
screens, the other 10 get conventional
monitors.
One month after receiving the monitors, the
employees rate their satisfaction with their
monitors on a scale from 1 to 5 by responding
to the question “I like my new monitor ( 1=
strongly disagree, 5 = strongly agree). Flat
screen employees have an average satisfaction
of 4.6 with std dev of 0.7. The employees with
the standard monitors have an average 3.2 with
a standard deviation of 1.6.
(a) Give a 95% c.i for the difference in mean
satisfaction scores for all employees.
(b) What about a hypothesis test for comparing the
two means?
15
Robustness of the two
sample procedures
•
Generally procedures are quite robust
•
If sample sizes are equal and distributions
of the two populations have similar shapes,
p-values from t table are quite accurate even
when n1 and n2 are as small as 5.
•
If sample sizes are unequal can use the
following (same as for one sample t-tests
and conf.ints., but replace n by n1+n2):
–
–
–
n1+n2 < 15: use t if data is clearly
close to normal. If clearly non-normal
or outliers are present do not use t.
n1+n2 ≥ 15: can use t except in
presence of outliers or strong
skewness.
Large samples: can use t procedures
even for clearly skewed data when
sample size is large, roughly n1+n2 ≥
40.
16
Small samples
• Have to be very careful.
– Substantial uncertainty in
estimates, but if differences
in means is large, can often
detect this
• Specialized procedures
– If we can assume that two
populations have equal
variances then can use
pooled estimator.
– Can test for equal
variances (F test)
– Numerical procedures
(optional) appear in text.
17
Excel
•
Data analysis tool pack can do
two-sample t-tests that we have
discussed + optional material:
•
•
Most important for us are the two
sample t test that does not
assume equal variances
Excel also does the calculation for
a specialized test that assumes
the two populations have equal
variance
•
All are very easy to use.
•
We Should alway plot data, do
normal quantile plots, etc.
18
Excel example
• Example– Do piano lessons
improve spatial-temporal
reasoning?
• Excel output appears below.
t-Test: Two-Sample Assuming Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
3.618
9.334
34
0
62
5.059
0.000
1.670
0.000
1.999
Variable 2
0.386
5.871
44
19
Chapter 8 Inferences for
Proportions
(Section 8.1)
20
How do sample proportions
behave?
Chapter 5 tells us
…
21
22
Example
• A SRS of 1600 BC
residents found that
954 favored
construction of a new
highway to Whistler.
• Give a 95% c.i for the
true proportion of BC
residents who favor a
new highway to
Whistler.
23
A variation that works better for small samples
24
Using the plus 4 estimator
for small samples
• 9 of 15 people in a
SRS of 15 Buec 232
students felt that the
course workload was
too heavy.
• Compute an
approximate 90% c.i.
for the proportion of
students who felt the
course workload was
too heavy.
25
Hypothesis tests for proportions
– we use sample proportion
rather than plus 4 estimate.
26
Example
• We found that 11 customers
in a sample of 40 would be
willing to buy a software
upgrade that costs $100. If
the upgrade is to be
profitable, you will need to
sell it to more than 20% of
your customers. Do the
sample data give good
evidence that more than
20% are willing to buy?
27
•
A poll (March 2, 2004) estimated
that support for the BC Liberal
party was 39%. Using this
estimate as a “guessed value” for
a follow up study, how large a
sample would I need to estimate
Liberal support to within +/- 3%?
I want a 95% level of confidence
in my estimate.
28