Slides 1-31 Hypothesis Testing

Download Report

Transcript Slides 1-31 Hypothesis Testing

BA 275
Quantitative Business Methods
Agenda
 Hypothesis Testing



Elements of a Test
Concept behind a Test
Examples
1
Midterm Examination #1
2
Question 1 – A
Histogram B
Histogram A
12
8
10
6
8
6
4
4
2
2
0
0
0
0
20
40
60
80
20
40
60
80
100
100
Histogram C
10
8
6
4
2
0
0
20
40
60
80
100
3
Question 1 – B
Box-and-Whisker Plot
A
B
C
0
100
4
Question 1 – C
C – 1. (1 point) The mean score will (circle one)
A). Be unchanged
B). Increase by 5
points
C). Increase by
5 points
C – 2. (1 point) The median score will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
C – 3. (1 point) The standard deviation of scores will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
C – 4. (1 point) The interquartile range (IQR) of scores will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
C – 5. (1 point) The range of scores will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
D). None of the above
D). None of the above
D). None of the above
D). None of the above
D). None of the above
5
Question 2
Center City
Outlying Area
$940
$955
$965
$975
$980
$985
$999
$1,000
$1,119
$1,247
$575
$690
$694
$705
$725
$725
$745
$750
$775
$800
Summary Statistics Center City Outlying Area
Count
Average
Median
Mode
Variance
Standard deviation
Minimum
Maximum
Range
Lower quartile
A
1,016.5
982.5=(980+985)/2
D
F
2
8949.16=(94.6)
H
94.6
J
L
1247
N
307=1247-940
965.0
B
C
E
G
725
3,755.6
61.28295
I
K
575
M
O
694.0
Upper quartile
Interquartile range
1000=965+35
P
750.0
56=750-694
Q
35.0
6
Question 3 – A, B, and C
99.7%
95%
68%
0.15%
2.35%
13.5%
34%
34%
3
x  2s
  2
2
xs
 
1
x

0
5k
6k
7k
8k
x  3s
  3
13.5%
xs
 
1
9k
2.35%
0.15%
x  2s
  2
2
  3
10k
11k
x  3s
3
A.Estimate the percentage of games that have between 7,000 to 10,000 people in attendance.
81.5% = 68% + 13.5%
B. Estimate the percentage of games that have less than 6,000 people in attendance.
2.5% = 2.35% + 0.15%
C. Estimate the percentage of games that have more than 9,000 people in attendance.
16% = 13.5% + 2.35% + 0.15%
7
Question 3 – D and E
D. If we assume that the attendance follows a normal distribution with the same mean and
standard deviation, estimate the percentage of games that have between 7,500 to 10,250 people
in attendance.
P(7500 < X < 10250) = P( -0.5 < Z < 2.25)
= 0.9878 – 0.3085
= 0.6793
E. Again, with the same normality assumption, estimate the percentage of games that have
more than 5,500 people in attendance.
P( X < 5500 ) = P( Z < -2.5)
= 0.0062
P( X > 5500 ) = 1 – 0.0062 = 0.9938
8
Question 4


A bottling company uses a filling machine to fill
plastic bottles with cola. The bottles are supposed to
contain 300 milliliters (ml). In fact, the contents vary
according to a normal distribution with mean  = 298
ml and standard deviation  = 3 ml.
A. What is the probability that an individual bottle
contains less than 296 ml?
P( X < 296 ) = P( Z < -0.67 ) = 0.2514

B. What is the probability that the mean contents of
the bottles in a six-pack is less than 296 ml?
296  298
P( X  296)  P( Z 
)  P( Z  1.64)  0.0505
3/ 6
9
Question 5
 A questionnaire about study habits was given to a random sample
of students taking a large introductory statistics class. The sample
of 25 students reported that they spent an average of 110 minutes
per week studying statistics. Assume that the standard deviation is
40 minutes.
 Give a 90% confidence interval for the mean time spent studying
statistics by students in this class.
40
110  1.645
 110  13.16
25
 If we wish to reduce the margin of error to only 8 minutes while
keeping the confidence level at 90%, how large a sample do we
need?
 1.645  40 
n
  67.65  68
8


2
10
Central Limit Theorem (CLT)
 The CLT applied to
Means
If X ~ N (  ,  2 ) , then X ~ N (  ,
2
).
n
If X ~ any distribution with a mean , and variance 2,
then X ~ N (  ,
2
n
) given that n is large.
11
Example 1
 The number of cars sold annually by used car
salespeople is normally distributed with a standard
deviation of 15. A random sample of 400 salespeople
was taken and the mean number of cars sold
annually was found to be 75. Find the 95%
confidence interval estimate of the population mean.
Interpret the interval estimate.

15
X  1.96
 75  1.95
n
400
12
Statistical Inference: Estimation
Population
Example:
 = 10,000
n = 100
What is the value of ?
Research Question:
What is the parameter value?
Example:  ?
Sample of size n
Tools (i.e., formulas):
Point Estimator
Interval Estimator
13
Example 2: Concept behind a H.T.
 A bank has set up a customer service goal
that the mean waiting time for its customers
will be less than 2 minutes. The bank
randomly samples 30 customers and finds
that the sample mean is 100 seconds.
Assuming that the sample is from a normal
distribution and the standard deviation is 28
seconds, can the bank safely conclude that
the population mean waiting time is less than
2 minutes?
14
Statistical Inference: Hypothesis
Testing
Population
Research Question:
Is a claim about the
parameter value supported?
Example:
 = 10,000
n = 100
Is “ > 22,000”?
Example: “ > 22,000”?
Sample of size n
Tool (i.e., formula):
Z or T score
15
Elements of a Test
 Hypotheses
 Null Hypothesis H0
 Alternative Hypothesis Ha
 Test Statistic
 Decision Rule (Rejection Region)
Before collecting data
After collecting data
 Evidence (actual observed test statistic)
 Conclusion
 Reject H0 if the evidence falls in the R.R.
 Do not reject H0 if the evidence falls outside the R.R.
16
Example 2 (cont’d)
 A bank has set up a customer service goal
that the mean waiting time for its customers
will be less than 2 minutes. The bank
randomly samples 30 customers and finds
that the sample mean is 112 seconds.
Assuming that the sample is from a normal
distribution and the standard deviation is 28
seconds, can the bank safely conclude that
the population mean waiting time is less than
2 minutes?
17
Type I and II Errors
Conclusion
State of Nature
H0 is true ( ≥ 120)
Ha is true ( < 120)
Ha is not supported
(cannot say  < 120)
Ha is supported (have
evidence to say  < 120)
Correct
Type II error
Type I error
Correct
Chance of making Type I error = P( Type I error ) = a
Chance of making Type II error = P( Type II error ) = b
18
Example 3
 The manager of a department store is thinking about
establishing a new billing system for the store’s credit
customers. After a thorough financial analysis, she
determines that the new system will be cost-effective
only if the mean monthly account is greater than $70.
A random sample of 200 monthly accounts Is drawn,
for which the sample mean account is $76 with a
standard deviation of $30. Is there enough evidence
at the 5% significance level to conclude that the new
system will be cost-effective?
 What if the sample mean is $68? $74?
19
Answer Key to the Examples
 Example 1: X  1.96   75  1.95 15
n
400
 Example 2: the z score of the sample mean 100 is -3.9123
meaning the sample mean is almost 4 standard deviations
below the null hypothesis  = 120. Strong evidence to reject
H0 and to conclude that the true mean waiting time is less than
120 seconds.
 Example 2 (cont’d): the z score of the sample mean 112 is only
-1.56, less than 2 standard deviations below  = 120. No
evidence to reject H0.
 Example 3: at a = 5%, the rejection region is: Reject H0 if z >
1.645.
 The z score of 76 is 2.828. Reject H0.
 68 is below the null  = 70. No evidence at all to reject H0.
 The z score of 74 is 1.8856. Reject H0.
20