Business Statistics - QBM117

Download Report

Transcript Business Statistics - QBM117

Business Statistics - QBM117
Introduction to hypothesis testing
Objectives

To introduce the second type of statistical inference hypothesis testing

To introduce the concept of hypothesis testing.

To gain a basic understanding of the methodology of
hypothesis testing
Introduction to hypothesis testing

Hypothesis testing is another type of statistical inference
where, once again, decisions are based on sample data.

The purpose of hypothesis testing is to determine whether
the sample results provide sufficient statistical evidence to
support (or fail to support) a particular belief about a
population parameter.

Over the next few lectures we will develop a step-by-step
methodology that will enable us to test these beliefs.
The objective of hypothesis testing is captured
by this question:
Is the sample evidence consistent with a particular
hypothesized population parameter, or does the
sample evidence contradict the hypothesized
value?
By rejecting the plausibility of the initially
hypothesized value, we indirectly establish
the plausibility of an alternative hypothesized
value or range of values.
The null and alternative hypotheses
The null hypothesis, denoted H0 ,


is the challenged hypothesis. It is the assertion we hold as
true, until we have sufficient statistical evidence to
conclude otherwise.
it always expresses a value of the population parameter
which we intend to subject to scrutiny, based on sample
data.
The purpose of scrutinizing the null hypothesis is
to determine whether there is support for the
alternative hypothesis, denoted HA ,
Examples of null and alternative hypotheses
The operations manager is concerned with determining
whether the filling process for filling 100g boxes of
smarties is working properly.
If the manager wants to know whether the average fill
of the boxes is less than 100g, he would specify the
null and alternative hypotheses to be
H 0 :   100
H A :   100
If the manager wants to know whether the average fill
of the boxes is more than 100g, he would specify the
null and alternative hypotheses to be
H 0 :   100
H A :   100
If the manager wants to know whether the average fill
of the boxes differs from 100g, he would specify the
null and alternative hypotheses to be
H 0 :   100
H A :   100
The manager hopes to find the filling process is
working properly, however, he may find the sampled
boxes weigh too little
or too much
H A :   100
H A :   100
As a result he may decide to halt the production
process until the reason for the failure to fill to the
required weight of 100g is determined.
By analysing the difference between the weights obtained
from the sample and the 100g expected weight, he can
reach a decision based on this sample information, and
one of the two conclusions can be drawn.
The test statistic
The test statistic is a sample statistic calculated from
the data. Its value is used in determining whether to
reject or not reject the null hypothesis.
When testing hypotheses about the population mean
, when the population variance is known, the test
statistic will be x or its standardised value
x
z
/ n
as long as the population is normal or n  30
When testing hypotheses about the population mean
, when the population variance is unknown, the test
statistic will be x or its standardised value
x
t
s/ n
as long as the population is normal.
When testing hypotheses about the population proportion
p, the test statistic will be pˆ or its standardised value
pˆ  p
z
pq
n
as long as both np and nq  5
The rejection region
The sampling distribution of the test statistic is divided
into regions, a region of rejection (critical region) and a
region of non-rejection.
The rejection region consists of all values of the test
statistic for which H0 is rejected.
The non-rejection region consists of all values of the test
statistic for which H0 is not rejected.
The value that separates the rejection region from the nonrejection region is the critical value.
Two-tailed hypothesis test HA:   100
Region of non-rejection
3.00
2.50
2.00
1.50
1.00
0.50
0.00
Region of rejection
-0.50
-1.00
-1.50
-2.00
-2.50
-3
Critical value

Critical value
Region of rejection
Upper tailed hypothesis test HA: > 100
Region of non-rejection
3.00
2.50
2.00
1.50
1.00
0.50
0.00
-0.50
-1.00
-1.50
-2.00
-2.50
-3

Critical value
Region of rejection
Lower tailed hypothesis test HA: < 100
Region of non-rejection
3.00
2.50
2.00
1.50
1.00
0.50
0.00
Region of rejection
-0.50
-1.00
-1.50
-2.00
-2.50
-3
Critical value

The decision rule
This is a rule that specifies the conditions under which the
null hypothesis will be rejected.
It is a mathematical representation of the region of
rejection as seen in the previous three slides.
For example the rejection region in slide 2 might be
described by
Reject H0 if
x is greater than 110.
or the rejection region in slide 3 might be described
by
Reject H0 if
x is less than 90.
The critical value
Therefore, in order to illustrate these rejection regions and
describe them mathematically, we need to know the
critical value(s), ie that value or values which separate the
rejection region(s) from the non-rejection region.
How do we determine the critical value?
How do we determine this critical value?

The determination of the critical value depends on the size
of the rejection region.

The size of the rejection region depends on the probability
of making an error, when testing our hypotheses.

So, what are these errors?
Errors in hypothesis testing
A hypothesis test concludes with a decision to either reject
or not reject the null hypothesis. This decision, together
with whether the hypothesis is true or not, results in two
possible errors:

rejecting the null hypothesis when it is true, a type I error;

not rejecting the null hypothesis when it is false, a type II
error.
Because the decision we make and the conclusion we
reach is based on sample data, there is always a
possibility of making an error.
Type I and type II errors

The probability of making a type I error is defined as . This
probability is also referred to as the level of significance.

The probability of making a type II error is defined as .
Ideally we would like to keep both errors  and  as
small as possible. Unfortunately however, as 
decreases  increases and vice versa therefore
generally the size of  is decided by the cost of
making a type I error.

The size of  is usually kept as small as possible,
generally a value between 1% and 10%.

Once the value of  is specified by the decision maker, the
size of the rejection region is known because  is the
probability of rejecting the null hypothesis when it is true.

The size of this rejection region is .
Two-tailed hypothesis test HA:   100 when
 = 0.05
Region of non-rejection
0.95
3.00
2.50
2.00
1.50
1.00
0.50
0.00
/2 = 0.025
-0.50
-1.00
-1.50
-2.00
-2.50
-3
Critical value
Region of rejection

Critical value
Region of rejection
/2 = 0.025
Upper tailed hypothesis test HA: > 100 when
 = 0.05
Region of non-rejection
0.95
3.00
2.50
2.00
1.50
1.00
0.50
0.00
-0.50
-1.00
-1.50
-2.00
-2.50
-3

Critical value
Region of rejection
 = 0.05
Lower tailed hypothesis test HA: < 100 when
 = 0.05
Region of non-rejection
0.95
3.00
2.50
2.00
1.50
1.00
0.50
0.00
 = 0.05
-0.50
-1.00
-1.50
-2.00
-2.50
-3
Critical value
Region of rejection

Reading for next lecture


Chapter 10, sections 10.3 and 10.5
(Chapter 9, sections 9.3 and 9.5 abridged)