Slides 1-31 Hypothesis Testing
Download
Report
Transcript Slides 1-31 Hypothesis Testing
BA 275
Quantitative Business Methods
Agenda
Hypothesis Testing
Elements of a Test
Concept behind a Test
Examples
1
Midterm Examination #1
2
Question 1 – A
Histogram B
Histogram A
12
8
10
6
8
6
4
4
2
2
0
0
0
0
20
40
60
80
20
40
60
80
100
100
Histogram C
10
8
6
4
2
0
0
20
40
60
80
100
3
Question 1 – B
Box-and-Whisker Plot
A
B
C
0
100
4
Question 1 – C
C – 1. (1 point) The mean score will (circle one)
A). Be unchanged
B). Increase by 5
points
C). Increase by
5 points
C – 2. (1 point) The median score will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
C – 3. (1 point) The standard deviation of scores will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
C – 4. (1 point) The interquartile range (IQR) of scores will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
C – 5. (1 point) The range of scores will (circle one)
C). Increase by
B). Increase by 5
A). Be unchanged
points
5 points
D). None of the above
D). None of the above
D). None of the above
D). None of the above
D). None of the above
5
Question 2
Center City
Outlying Area
$940
$955
$965
$975
$980
$985
$999
$1,000
$1,119
$1,247
$575
$690
$694
$705
$725
$725
$745
$750
$775
$800
Summary Statistics Center City Outlying Area
Count
Average
Median
Mode
Variance
Standard deviation
Minimum
Maximum
Range
Lower quartile
A
1,016.5
982.5=(980+985)/2
D
F
2
8949.16=(94.6)
H
94.6
J
L
1247
N
307=1247-940
965.0
B
C
E
G
725
3,755.6
61.28295
I
K
575
M
O
694.0
Upper quartile
Interquartile range
1000=965+35
P
750.0
56=750-694
Q
35.0
6
Question 3 – A, B, and C
99.7%
95%
68%
0.15%
2.35%
13.5%
34%
34%
3
x 2s
2
2
xs
1
x
0
5k
6k
7k
8k
x 3s
3
13.5%
xs
1
9k
2.35%
0.15%
x 2s
2
2
3
10k
11k
x 3s
3
A.Estimate the percentage of games that have between 7,000 to 10,000 people in attendance.
81.5% = 68% + 13.5%
B. Estimate the percentage of games that have less than 6,000 people in attendance.
2.5% = 2.35% + 0.15%
C. Estimate the percentage of games that have more than 9,000 people in attendance.
16% = 13.5% + 2.35% + 0.15%
7
Question 3 – D and E
D. If we assume that the attendance follows a normal distribution with the same mean and
standard deviation, estimate the percentage of games that have between 7,500 to 10,250 people
in attendance.
P(7500 < X < 10250) = P( -0.5 < Z < 2.25)
= 0.9878 – 0.3085
= 0.6793
E. Again, with the same normality assumption, estimate the percentage of games that have
more than 5,500 people in attendance.
P( X < 5500 ) = P( Z < -2.5)
= 0.0062
P( X > 5500 ) = 1 – 0.0062 = 0.9938
8
Question 4
A bottling company uses a filling machine to fill
plastic bottles with cola. The bottles are supposed to
contain 300 milliliters (ml). In fact, the contents vary
according to a normal distribution with mean = 298
ml and standard deviation = 3 ml.
A. What is the probability that an individual bottle
contains less than 296 ml?
P( X < 296 ) = P( Z < -0.67 ) = 0.2514
B. What is the probability that the mean contents of
the bottles in a six-pack is less than 296 ml?
296 298
P( X 296) P( Z
) P( Z 1.64) 0.0505
3/ 6
9
Question 5
A questionnaire about study habits was given to a random sample
of students taking a large introductory statistics class. The sample
of 25 students reported that they spent an average of 110 minutes
per week studying statistics. Assume that the standard deviation is
40 minutes.
Give a 90% confidence interval for the mean time spent studying
statistics by students in this class.
40
110 1.645
110 13.16
25
If we wish to reduce the margin of error to only 8 minutes while
keeping the confidence level at 90%, how large a sample do we
need?
1.645 40
n
67.65 68
8
2
10
Central Limit Theorem (CLT)
The CLT applied to
Means
If X ~ N ( , 2 ) , then X ~ N ( ,
2
).
n
If X ~ any distribution with a mean , and variance 2,
then X ~ N ( ,
2
n
) given that n is large.
11
Example 1
The number of cars sold annually by used car
salespeople is normally distributed with a standard
deviation of 15. A random sample of 400 salespeople
was taken and the mean number of cars sold
annually was found to be 75. Find the 95%
confidence interval estimate of the population mean.
Interpret the interval estimate.
15
X 1.96
75 1.95
n
400
12
Statistical Inference: Estimation
Population
Example:
= 10,000
n = 100
What is the value of ?
Research Question:
What is the parameter value?
Example: ?
Sample of size n
Tools (i.e., formulas):
Point Estimator
Interval Estimator
13
Example 2: Concept behind a H.T.
A bank has set up a customer service goal
that the mean waiting time for its customers
will be less than 2 minutes. The bank
randomly samples 30 customers and finds
that the sample mean is 100 seconds.
Assuming that the sample is from a normal
distribution and the standard deviation is 28
seconds, can the bank safely conclude that
the population mean waiting time is less than
2 minutes?
14
Statistical Inference: Hypothesis
Testing
Population
Research Question:
Is a claim about the
parameter value supported?
Example:
= 10,000
n = 100
Is “ > 22,000”?
Example: “ > 22,000”?
Sample of size n
Tool (i.e., formula):
Z or T score
15
Elements of a Test
Hypotheses
Null Hypothesis H0
Alternative Hypothesis Ha
Test Statistic
Decision Rule (Rejection Region)
Before collecting data
After collecting data
Evidence (actual observed test statistic)
Conclusion
Reject H0 if the evidence falls in the R.R.
Do not reject H0 if the evidence falls outside the R.R.
16
Example 2 (cont’d)
A bank has set up a customer service goal
that the mean waiting time for its customers
will be less than 2 minutes. The bank
randomly samples 30 customers and finds
that the sample mean is 112 seconds.
Assuming that the sample is from a normal
distribution and the standard deviation is 28
seconds, can the bank safely conclude that
the population mean waiting time is less than
2 minutes?
17
Type I and II Errors
Conclusion
State of Nature
H0 is true ( ≥ 120)
Ha is true ( < 120)
Ha is not supported
(cannot say < 120)
Ha is supported (have
evidence to say < 120)
Correct
Type II error
Type I error
Correct
Chance of making Type I error = P( Type I error ) = a
Chance of making Type II error = P( Type II error ) = b
18
Example 3
The manager of a department store is thinking about
establishing a new billing system for the store’s credit
customers. After a thorough financial analysis, she
determines that the new system will be cost-effective
only if the mean monthly account is greater than $70.
A random sample of 200 monthly accounts Is drawn,
for which the sample mean account is $76 with a
standard deviation of $30. Is there enough evidence
at the 5% significance level to conclude that the new
system will be cost-effective?
What if the sample mean is $68? $74?
19
Answer Key to the Examples
Example 1: X 1.96 75 1.95 15
n
400
Example 2: the z score of the sample mean 100 is -3.9123
meaning the sample mean is almost 4 standard deviations
below the null hypothesis = 120. Strong evidence to reject
H0 and to conclude that the true mean waiting time is less than
120 seconds.
Example 2 (cont’d): the z score of the sample mean 112 is only
-1.56, less than 2 standard deviations below = 120. No
evidence to reject H0.
Example 3: at a = 5%, the rejection region is: Reject H0 if z >
1.645.
The z score of 76 is 2.828. Reject H0.
68 is below the null = 70. No evidence at all to reject H0.
The z score of 74 is 1.8856. Reject H0.
20