Transcript 11.1

11.1: Significance Tests





2nd type of inference
Assesses the evidence provided by the data
in favor of some claim about the population
Asks how likely an observed outcome would
be
Formal procedure for comparing observed
data with a hypothesis whose truth we want
to assess – in other words, we test a claim!
Begin with the unrealistic assumption that we
know the population standard deviation.
Ha/Ho!
STEP 1: P = Identify your Parameter
STEP 2: H = State your Hypotheses
 Make a claim and ask if the data gives evidence
against it.
 What we want to PROVE becomes your Ha.
 Your statement of equivalency always becomes your
Ho; the effect is not present in the population. Trying
to find evidence against it.
 Ho/Ha’s always refer to some population and thus
must be written in terms of a population parameter.
 One-sided/two-sided hypothesis
Some hypothesis notes…
Start with stating the alternative
hypothesis, since this is the effect that
we hope to find evidence for, then set
up the null hypothesis as the statement
that the hoped-for effect is not present
 If you do not have a specific direction
firmly in mind in advance, use a 2-sided
alternative.

Vehicle accidents can result in serious injuries
to drivers and passengers. When they do,
someone usually calls 911. Police,
firefighters, and paramedics respond to these
calls as fast as possible. A city decides to
record response times to all accidents
involving life-threatening emergencies, and
finds the mean response time to be 6.7 with a
std. dev. of 2 minutes. The city manager tells
them to “do better” next year. At the end of
the next year, the manager selects an SRS of
400 calls and examines the response times.
For this sample, the mean response time was
6.48 min. Does this data provide good
evidence that response times have
decreased since last year?
Exploring “good evidence”
X = 6.61
This result could
occur just by change
when the population
mean is 6.7. Not good
evidence of a decrease
in response time.
X = 6.48
 This result is much
father from the pop.
mean; an observed
value this small would
rarely occur by chance
if the true pop. mean is
6.7. Good evidence of a
decrease in response
time.
Assumptions/Test Statistics

STEP 3: Assumptions: SRS, Normality,
Independence

STEP 4: Test Statistic
The test is based on a stat that compares the value of
the parameter (as stated in Ho) with an estimate of
the parameter from the sample data.
Values of the estimate far from the parameter value in
the direction of Ha give evidence against Ho.
=
estimate-hypothesized value
Standard deviation of the estimate





We measure the strength of the evidence
against Ho by the probability given to us by
our z-score = p-value.
P-value = the probability of a result at least as
far out as what we actually got.
A quantitative measure of just how unlikely a
given finding is, assuming Ho is true.
The lower the p-value, the stronger the
evidence against Ho; the observed value is
unlikely to occur by chance.
Large p-values fail to give evidence against
Ho.





Significance level is the value (alpha) to which we
compare our p-value in order to determine
significance.
“Statistically significant” = not likely to happen by
chance
If p-value < alpha = statistically significant.
P-value allows us to assess significance at any level
we choose.
If you are going to draw a conclusion based on
statistical significance, then the level should be
chosen BEFORE the data is produced.

STEP 5:Conclusion
 There is about a 1.4% chance that the
manager would obtain a sample of 400 calls
with a mean response time of 6.48 minutes or
less.
 The small p-value provides STRONG
evidence AGAINST Ho and in favor of the
alternative Ha, so we conclude that the mean
response time appears to be less than 6.7
minutes.
Does the job satisfaction of assembly workers differ
when their work is machine-paced rather than selfpaced? One study chose 28 subjects at random from
a group of women who worked at assembling
electronic devices. Half of them were assigned at
random to each of two groups. Both groups did
similar assembly work, but one work setup allowed
workers to pace themselves and the other featured
an assembly line that moved at fixed time intervals so
that the workers were paced by machine. After 2
weeks, all subjects took a test of job satisfaction.
Then they switched work setups, and took the test
after two more weeks. The response variable is the
difference in scores, self paced – machine paced.
The authors of the study want to know if the two
work conditions have different levels of job
satisfaction. Data from 18 workers gave: SRS,
Normality, Independent, x-bar = 17, pop. std. dev =
60.
Do all Steps: PHATC
Values as far from 0 as x-bar=17 would
happen 23% of the time when the true
population mean is 0 (Ho). An outcome
that would occur so often when Ho is
true is not good evidence against Ho.
 Simple terms: Reject the Ho! P value
too big.

Using Significance Tests
Widely used in reporting the results of
research in applied science, industry,
and legal proceedings
 Some products require significant
evidence of effectiveness and safety
 Statistical significance is valued
because it points to an effect that is
unlikely to occur simply by chance

Same problem…different question!
Sulfur compounds cause “off-odors” in wine, so winemakers want
to know the odor threshold, the lowest concentration of a
compound that the human nose can detect. The odor threshold
for dimethyl sulfide (DMS) in trained wine tasters is about 25
micrograms per liter of wine( ). The untrained noses of
consumers may be less sensitive, however. Here are the DMS
odor thresholds for 10 untrained students:
31 31 43 36 23 34 32 30 20 24
Assume that the standard deviation of the odor threshold for
untrained noses is known to be 7. Are you convinced that the
mean odor threshold for beginning students is higher than the
published threshold, 25 micrograms per liter of wine ( )?
Carry out an appropriate significance test (and then state your
conclusions clearly in complete English sentence(s).

At the bakery where you work, loaves of bread are
supposed to weigh 1 pound. From experience, the weights
of loaves produced at the bakery follow a Normal
distribution with standard deviation s = 0.13 pounds. You
believe that new personnel are producing loaves that are
heavier than 1 pound. As supervisor of Quality Control,
you want to test your claim at the 95% confidence level.
You weigh 20 loaves and obtain a mean weight of 1.05
pounds.
1. Identify the population and parameter of interest. State
your null and alternative hypotheses.
2. Identify the statistical procedure you should use. Then
state and verify the conditions required for using this
procedure.
3. Calculate the test statistic and the P-value. Illustrate
using a graph.
4. State your conclusions clearly in complete sentences.