Transcript 11.3, 11.4

1)
2)
The diastolic blood pressure for American women
aged 18-44 has approx. a normal distribution with pop.
Mean 75 mmHG and standard deviation 10 mmHg. We
suspect that regular exercise will lower blood pressure.
A random sample of 25 women who jog at least five
miles a week gives sample mean blood pressure 71
mmHG. Is this good evidence that the mean diastolic
blood pressure for the population of female regular
exercisers is lower than 75 mmHG?
A standard solution is supposed to have conductivity 5
microsiemens per centimeter. We know that
measurements of conductivity aren’t perfectly precise:
they vary according to a Normal distribution with mean
equal to the true conductivity and standard deviation
0.2 microsiemens per centimeter. Six measurements
of the solution’s conductivity are:
5.32
4.88 5.10 4.73 5.15 4.75
Is this evidence that the true conductivity (the mean of
the population of all measurements) is not 5
microsiemens per centimeter?
11.3:Using significance tests
 Statistical
significance is valued because it
points to an effect that is unlikely to occur
simply by chance
 Widely used in reporting the results of
research in applied science, industry, and
legal proceedings
 Some products require significant
evidence of effectiveness and safety
Choosing a Level of Significance

The purpose of a test of sig. is to give a clear
statement of the degree of evidence provided by
the sample against the null hypothesis = pvalue!!
 Sometimes we will make a decision if our
evidence reaches a certain standard, but we
need a standard to set this against = level of
significance
 Ex1: Drug companies use .01 level
 Ex2: Lawsuits alleging racial discrimination if the
% hired of ethnic minorities hired is less than the
.05 level
Choosing a Significance Level

How plausible is the Null Hypothesis?
If the Null represents an assumption that people
have believed for years, your significance level
should be small (need strong evidence)
 What are the consequences of rejecting the Null
Hypothesis?
If rejecting the Null means an expensive change,
you need strong evidence that the change will
bring about a profit or benefit to those having to
bear the expense.
Significant/Insignificant
There is no sharp border between “significant”
and “Insignificant” – only increasingly strong
evidence as the p-value decreases.
 Always better to report the p-value, which allows
us to decide (individually) if the evidence is
sufficiently strong.
 Statistical significance is not the same as
practical importance! Pay attention to the actual
data as well as the p-value. Plot your data!

Outliers and other considerations





A few outlying observations can produce highly
significant results if you blindly apply common
tests of significance.
Outliers can destroy the significance of
otherwise convincing data.
Faulty data collection and testing a hypothesis
on the same data that suggested the hypothesis
can invalidate a test.
A confidence interval estimates the size of an
effect rather than simply asking it its too large to
reasonably occur by chance alone.
P. 721, 11.43, 11.44 Statistical applet
11.4: Inference as Decision
Link Calculators: Download “Power” program.
 Using significance tests with fixed alpha level
points to the outcome of a test as a decision.
 If our result is significant at this level, we reject
the null hypothesis in favor of the alternative
hypothesis. Otherwise we fail to reject the null
hypothesis.
 Tests of significance concentrate on the null
hypothesis.

Scenario

Present: Suspect and President of Student Court
 This student suspect has been arrested for stealing
paper clips from the main office.
 The suspect claims, “I was only fiddling with the paper
clips while waiting for an appointment with my
counselor.”
 You, the class, are the student court. If the student is
found guilty of the “crime,” the suspect will not be
allowed to attend the next school dance.
 The president of the student court announces, “This
student should be considered innocent unless there is
sufficient evidence to find them guilty.”
 That is, the court’s hypothesis is that they are innocent,
and it is looking to see if the evidence against them is
sufficient to warrant rejecting that hypothesis and thus,
finding them guilty.
 What type of errors can be made in this situation?
Type I/Type II Errors
Example 1
 A producer
of bearings and the consumer
of the bearings agree that each carload
must meet certain quality standards. When
a carload arrives, the consumer inspects a
sample of the bearings. On the basis of
the sample outcome, the consumer makes
some decision about whether or not to
reject the carload.
Error Probabilities
We assess any rule for making decisions by looking at the probabilities
of the 2 types of error




The mean diameter of a type of bearing is supposed to
be 2.000 cm. The bearing diameters vary normally with
standard deviation .010 cm. When a lot of the bearings
arrives, the consumer takes an SRS of 5 bearings from
the lot and measures their diameters. The consumer
rejects the bearings if the sample mean diameter is
significantly different from 2 at the 5% significance
level. Find:
P(Type I Error)
P(Type II Error) when the mean is 2.015.
Power

It is usual to report the probability that a test does reject
the Null hypothesis when an alternative is true (= power)
 The higher this probability is, the more sensitive the test
is
 High power is desirable!
 80% power is becoming a standard
 In order to calculate power, fix an alpha so we have a
fixed rule to reject Ho; usually .05.
 4 ways to increase power:
1) Increase alpha
2) Consider an alternative far from your hypothesized
value
3) Increase n
4) Decrease

P-value vs. Power

P-value: Describes what happens if the null
hypothesis is true
= Assumes ho is true!

Power: Describes what happens if the
alternative hypothesis is true
= Assumes ha is true!
Decide what alternatives the test should detect
and check that the power is adequate; power
depends on what parameter for Ha we are
interested in.
Many homeowners buy detectors to check for the invisible gas radon in their
homes. We want to determine the accuracy of these detectors. To answer
this question, university researchers placed 12 radon detectors in a
chamber that exposed them to 105 picocuries per liter of radon. The
detector readings were as follows:
91.9
97.8
111.4
122.3
105.4
95.0
103.8
99.6
96.6
119.3
104.8 101.7
Assume that  = 9 picocuries per liter of radon for the population of all radon
detectors. We want to determine if there is convincing evidence at the
10% significance level that the mean reading of all detectors of this type
differs from the true value 105, so our hypotheses are H0: µ = 105 and
Ha: µ 105. A significance test to answer this question was carried out.
The test statistic is z = –0.3336, and the P-value is 0.74.
1) Describe what a Type I error would be in this situation.
2) Calculate the probability of a Type I error for this problem.
3) The researchers who carried out the study suspect that the large P-value is
due to low power. First describe what a Type II error would be in this
situation, then determine the probability of a Type II error when in fact µ
= 100. Finally, compute the power of the test against the alternative.
4) If the sample size is increased to n = 30, what will be the power against
the alternative, µ = 100? What happened to the power as the sample size
increased?