Test of Significance 2
Download
Report
Transcript Test of Significance 2
Introduction to Inference
Tests of Significance
Proof
x
z
s
n
x 1000
125
x
25
25
979 1000
25
.84
x 979
P( x 979) .2005
925
950
975
1000
Proof
x
z
s
n
x 1000
125
x
25
25
920 1000
25
3.2
x 920
P( x 920) .0007
925
950
975
1000
Definitions
• A test of significance is a method for using
sample data to make a decision about a
population characteristic.
• The null hypothesis, written H0, is the starting
value for the decision (i.e. H0 : 1000).
• The alternative hypothesis, written Ha, states
what belief/claim we are trying to determine if
statistically significant (Ha : 1000).
Note: population characteristic could
be , or hypothesized value
Examples
• Chrysler Concord
– H0: 8
– Ha: 8
• K-mart
– H0: 1000
– Ha: 1000
Chrysler
x
z
s
n
8.7 8
1
10
2.21
x 8.7
P( x 8.7) .0134
8
K-mart
x
z
s
n
982 1000
1.24
65
20
x 982
P( x 982) .1078
1000
Phrasing our decision
• In justice system, what is our null and
alternative hypothesis?
• H0: defendant is innocent
• Ha: defendant is guilty
• What does the jury state if the defendant
wins?
• Not guilty
• Why?
Phrasing our decision
• H0: defendant is innocent
• Ha: defendant is guilty
• If we have the evidence:
– We reject the belief the defendant is innocent
because we have the evidence to believe the
defendant is guilty.
• If we don’t have the evidence:
– We fail to reject the belief the defendant is
innocent because we do not have the
evidence to believe the defendant is guilty.
Chrysler Concord
•
•
•
•
H0: 8
Ha: 8
p-value = .0134
We reject H0 since the probability is so
small there is enough evidence to believe
the mean Concord time is greater than 8
seconds.
K-mart light bulb
•
•
•
•
H0: 1000
Ha: 1000
p-value = .1078
We fail to reject H0 since the probability is
not very small there is not enough
evidence to believe the mean lifetime is
less than 1000 hours.
Remember:
Inference procedure overview
•
•
•
•
•
State the procedure
Define any variables
Establish the conditions (assumptions)
Use the appropriate formula
Draw conclusions
Test of Significance Example
• A package delivery service claims it takes an
average of 24 hours to send a package from
New York to San Francisco. An independent
consumer agency is doing a study to test the
truth of the claim. Several complaints have led
the agency to suspect that the delivery time is
longer than 24 hours. Assume that the delivery
times are normally distributed with standard
deviation (assume for now) of 2 hours. A
random sample of 25 packages has been taken.
Example 1
test of significance
= true mean delivery time
Ho: = 24
Ha: > 24
Given a random sample
Given a normal distribution
Safe to infer a population of at least 250 packages
Example 1 (look, don’t copy)
x
24.85 24
x 24
z
s
2
.4
x
0.4
n
2.125
25
x 24.85
24.85
22.8
23.2
23.6
24
24.4
24.8
25.2
Example 1
test of significance = true mean delivery time
Ho: = 24
Ha: > 24
Given a random sample
Given a normal distribution
Safe to infer a population of at least 250 packages.
24.85 24
z
2.125
2
p-value 1 .9834 .0166
25
let a = .05
Example 1
test of significance = true mean delivery time
Ho: = 24
Ha: > 24
Given a random sample
Given a normal distribution
Safe to infer a population of at least 250 packages.
24.85 24
z
2.125
2
25
p-value .0166
let a = .05
We reject Ho. Since p-value<a there is enough
evidence to believe the delivery time is longer than
24 hours.
Wording of conclusion revisit
• If I believe the statistic is just too extreme and
unusual (P-value < a), I will reject the null
hypothesis.
• If I believe the statistic is just normal chance
variation (P-value > a), I will fail to reject the null
hypothesis.
reject
p-value<a, there is
We
Ho, since the
fail to reject
p-value>a, there is not
enough evidence to believe…(Ha in context…)
Example 3
test of significance
= true mean distance
Ho: = 340
Ha: > 340
Given random sample
Given normally distributed.
Safe to infer a population of at least 100 missiles.
348 340
z
1.26
p-value=.1038
20
10
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean distance
traveled is more than 340 miles.
Familiar transition
• What happened on day 2 of confidence
intervals involving mean and standard
deviation?
• Switch from using z-scores to using the tdistribution.
• What changes occur in the write up?
Example 3
test of significance
= true mean distance
Ho: = 340
Ha: > 340
Given random sample.
Given normally distributed.
Safe to infer a population of at least 100 missiles.
348 340
z
1.26
p-value=.1038
20
10
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean distance
traveled is more than 340 miles.
Example 3
t-test
= true mean distance
Ho: = 340
Ha: > 340
Given random sample.
Given normally distributed.
Safe to infer a population of at least 100 missiles.
348 340
z
1.26
p-value=.1038
20
10
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean distance
traveled is more than 340 miles.
Example 3
t-test
= true mean distance
Ho: = 340
Ha: > 340
Given random sample
Given normally distributed.
Safe to infer a population of at least 100 missiles.
df 9
348 340
t
1.26
p-value=.1038
20
10
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean distance
traveled is more than 340 miles.
Example 3
t-test
= true mean distance
Ho: = 340
Ha: > 340
Given random sample.
Given normally distributed.
Safe to infer a population of at least 100 missiles.
df 9
348 340
t
1.26
p-value=
20
10
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean distance
traveled is more than 340 miles.
t-chart
t 1.26
Example 3
t-test
= true mean distance
Ho: = 340
Ha: > 340
Given random sample.
Given normally distributed.
Safe to infer a population of at least 100 missiles.
df 9
348 340
t
1.26
.10<p-value<.15
20
10
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean distance
traveled is more than 340 miles.
Example 3
t-test
= true mean distance
Ho: = 340
Ha: > 340
Given random sample.
Given normally distributed.
Safe to infer a population of at least 100 missiles.
df 9
348 340
t
1.26
p-value=.1188
20
10
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean distance
traveled is more than 340 miles.
1 proportion z-test
p = true proportion pure short
Ho: p = .25
Ha: p = .25
Given a random sample.
np = 1064(.25) > 10
n(1–p) = 1064(1–.25) > 10
Sample size is large enough to use normality
Safe to infer a population of at least 10,640 plants.
.2603 .25
z
.78
.25(1 .25)
1064
.p-value=.4361
let a = .05
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the proportion of pure
short is different than 25%.
Choosing a level of significance
• How plausible is H0? If H0 represents a
long held belief, strong evidence (small a)
might be needed to dissolve the belief.
• What are the consequences of rejecting
H0? The choice of a will be heavily
influenced by the consequences of
rejecting or failing to reject.
Errors in the justice system
Actual truth
Guilty
Guilty
Not guilty
Correct decision Type I error
Jury decision
Not guilty
Type II error Correct decision
“No innocent man is jailed” justice system
Actual truth
Guilty
Type I error
Guilty
smaller
Jury decision
Not guilty
Not guilty
Type II error
larger
“No guilty man goes free” justice system
Actual truth
Guilty
Type I error
Guilty
larger
Jury decision
Not guilty
Not guilty
Type II error
smaller
Errors in the justice system
Actual truth
Guilty
(Ha true)
Not guilty
(H0 true)
Guilty Correct decision Type I error
(reject H0)
Jury decision
Not guilty Type II error Correct decision
(fail to reject H0)
Type I and Type II errors
• If we believe Ha when in fact H0 is true,
this is a type I error.
• If we believe H0 when in fact Ha is true,
this is a type II error.
• Type I error: if we reject H0 and it’s a
mistake.
• Type II error: if we fail to reject H0 and
it’s a mistake.
APPLET
Type I and Type II example
A distributor of handheld calculators receives very large
shipments of calculators from a manufacturer. It is too
costly and time consuming to inspect all incoming
calculators, so when each shipment arrives, a sample is
selected for inspection. Information from the sample is
then used to test Ho: p = .02 versus Ha: p < .02, where p
is the true proportion of defective calculators in the
shipment. If the null hypothesis is rejected, the distributor
accepts the shipment of calculators. If the null hypothesis
cannot be rejected, the entire shipment of calculators is
returned to the manufacturer due to inferior quality. (A
shipment is defined to be of inferior quality if it contains
2% or more defectives.)
Type I and Type II example
• Type I error: We think the proportion of
defective calculators is less than 2%, but
it’s actually 2% (or more).
• Consequence: Accept shipment that has
too many defective calculators so potential
loss in revenue.
Type I and Type II example
• Type II error: We think the proportion of
defective calculators is 2%, but it’s actually
less than 2%.
• Consequence: Return shipment thinking
there are too many defective calculators,
but the shipment is ok.
Type I and Type II example
• Distributor wants to avoid Type I error.
Choose a = .01
• Calculator manufacturer wants to avoid
Type II error. Choose a = .10
Concept of Power
• Definition?
• Power is the capability of accomplishing
something…
• The power of a test of significance is…
Power Example
In a power generating plant, pressure in a certain line is
supposed to maintain an average of 100 psi over any 4
- hour period. If the average pressure exceeds 103 psi
for a 4 - hour period, serious complications can evolve.
During a given 4 - hour period, thirty random
measurements are to be taken. The standard
deviation for these measurements is 4 psi (graph of
data is reasonably normal), test Ho: = 100 psi versus
the alternative “new” hypothesis = 103 psi. Test at
the alpha level of .01. Calculate a type II error and the
power of this test. In context of the problem, explain
what the power means.
Type I error and a
4
s
.73
n
30
for a =.01 t*=2.462
a is the probability that we think
the mean pressure is above 100 psi,
but actually the mean pressure is
100 psi (or less)
100
101.46
100.73
102.19
Type I error and a
4
s
.73
n
30
for a =.01 t*=2.462
x 100
2.462
.73
101.80
100
101.46
100.73
102.19
Type II error and b
zb
b .0505
100
103
1.64
.73
101.8
103
101.46
100.73
102.19
Type II error and b
b is the probability that we think the mean pressure is
100 psi, but actually the pressure is greater than 100 psi.
b .0505
100
101.8
103
101.46
100.73
102.19
Power?
Power = 1 .0505 .9495
b .0505
100
101.8
103
101.46
100.73
102.19
Power
There is a .9495 probability that this test of
significance will correctly detect if the pressure
is above 100 psi.
b .0505
100
101.8
103
101.46
100.73
102.19
Concept of Power
• The power of a test of significance is
the probability that the null hypothesis
will be correctly rejected.
• Because the true value of is unknown,
we cannot know what the power is for ,
but we are able to examine “what if”
scenarios to provide important
information.
• Power = 1 – b
Effects on the Power of a Test
• The larger the difference between the
hypothesized value and the true value of
the population characteristic, the higher
the power.
• The larger the significance level, a, the
higher the power of the test.
• The larger the sample size, the higher the
power of the test.
APPLET