Transcript tests

STATISTICAL HYPOTHESIS
Statistical hypothesis is a certain assumption about
POPULATION.
NULL hypothesis X ALTERNATIVE hypothesis
Two-sided hypothesis:
H0:  = 50
H1:   50
One-sided hypothesis:
H0:   50
1
(H0:   50
all other possibilities are H1
50
only here H0 is
valid
H1:   50
H1:   50)
H0 is valid
H1 is valid
50
TEST CRITERION, CRITICAL POINT
Test criterion (TC) is a random variable, distribution of
which is known both for H0 and H1
Critical point (CP) - boundary between intervals in
which we accept or reject H0.
we accept H0
distribution of
test criterion for
H0
2
we reject H0
CP
distribution of test criterion for H0
(it is the same distribution as in the
case of H0 but with different
parameters)
General concept of statistical test
We test H0:   50 against H1:   50.
Population has „infinitely“ data points with  ≤ 50 – one-sided test.
H0
H1
50
Our question:
Can we (on the basis of sample) reject the hypothesis that population has  
50 with sufficient probability?
3
General concept of statistical test
4
General concept of statistical test
5
General concept of statistical test
6
General concept of statistical test
Now there is a question:
Is difference between theoretical (blue) and experimental (red) distributions so
big that we can reject null hypothesis with sufficient probalility (in this case we
reject null hypothesis)
or
Is difference between theoretical (blue) and experimental (red) distributions so
small that we cannnot reject the possibility that population has mean  50 with
sufficient probalility (in this case we cannot reject null hypothesis)
7
General concept of statistical test
ACCEPTED
8
CRITICAL
POINT
REJECTED
General concept of statistical test
TEST CRITERION
9
General concept of statistical test
10
TWO SIDED TEST
interval where H0
is rejected
interval where H0
is accepted
a/2
aa/2
lower critical point
11
interval where H0
is rejected
a/2
ba/2
upper critical point
p-VALUE
p-value - the probability of observing a value of TC as extreme or more
extreme as the one we observed. It's also referred to as the observed
significance level.
p < a  H0 is rejected
p > a  H0 is not rejected
here we reject H0
12
here we accept H0
critical point
1-p – probability of rejecting H0
EVALUATION THE RESULTS
OF THE TEST (decision rules)
TC < CP  H0 IS ACCEPTED
TC > CP  H0 IS REJECTED
p < a  H0 IS REJECTED
p
13
> a  H0 IS ACCEPTED
EXAMPLE
Accuracy of hypsometer was exemined. Known height (0 = 20
m). was 15 x measured . We know sample mean = 19,2 m and
sample SD = 1,1 m. We want to know whether the hypsometer
is OK.
x
 formulation of
H0 and H1:
H0: Measurements of the hypsometer are correct.
14
H0:  = 0 (sample with statistics x and S2 is taken from
population with normal distribution N( ,2) where parametr
 is equal to known value 0).
EXAMPLE
H1: Measurements of the hypsometer are not correct.
H0:   0 (sample with statistics x and S2 does not take
from population with normal distribution N( ,2) where
parametr  is equal to known value 0).
 a = 0,05:
for rejecting of H0 we need probability at least 1 - a = 0.95.
15
EXAMPLE
 selection of test
test of mean for one sample
n -1
t = (x - μ 0 ) 
S
16
15 - 1
t = (19, 2 - 20) 
= -2, 72
1,1
EXAMPLE
 critical point
=TINV (0,05;14) = 2,145
(t quantile for a/2 = 0,05/2 = 0,025 and df = 15-1= 14)
because of symmetry of t-distribution we have 2 CP –2,145 a +2,145.
17
POSTUP TESTOVÁNÍ NA PŘÍKLADU
test criterion
- 2,72
interval of NOT rejecting of H0 for
interval of rejecting H0
interval of rejecting H0
interval of NOT rejecting of H0 for
18
interval of rejecting H0
interval of rejecting H0
TYPE I ERROR AND TYPE II ERROR,
THE POWER OF A STATISTICAL TEST
Type I error (a)
error of rejecting a null hypothesis when it is actually true
19
TYPE I ERROR AND TYPE II ERROR,
THE POWER OF A STATISTICAL TEST
Type II error ()
the error of failing to reject („accept“) a null hypothesis when
in fact we should have rejected it
20
TYPE I ERROR AND TYPE II ERROR,
THE POWER OF A STATISTICAL TEST
The power of a statistical test (1-)
is the probability that the test will reject a false null hypothesis.
21
TYPE I ERROR AND TYPE II ERROR,
THE POWER OF A STATISTICAL TEST
22
TYPE I ERROR AND TYPE II ERROR,
THE POWER OF A STATISTICAL TEST
23
FACTORS INFLUENCING POWER
o
o
o
o
Type I error (position of critical value)
Effect size (difference between H0 and H1)
Variability
Sample size
http://www.intuitor.com/statistics/CurveApplet.html
24
FACTORS INFLUENCING POWER
Type I error
25
FACTORS INFLUENCING POWER
Effect size
26
FACTORS INFLUENCING POWER
variability
27
PRACTICAL IMPORTANCE OF
POWER OT THE TEST
28
PRACTICAL IMPORTANCE OF
POWER OT THE TEST
29
Power analysis - example
After introduction of a new method of water treatment in water company, the
content of chlorine in drinking water was monitored.
According to the standard, 0,3 mg.l-1 is the allowance of chlorine in drinking
water.
Assess whether a real content of chlorine meets the requirements of the standard.
Moreover, we need to know how many samples is necessary to take in order that test
error with possible serious consequences was not higher then 5 %.
Preliminary 23 samples were taken (content of Cl in water v mg.l-1 ):
0.10
0.15
0.25
0.15
0.30
0.25
0.25
0.30
0.55
0.70
0.70
0.25
0.20
0.15
0.65
0.55
0.30
0.35
0.30
0.25
0.80
30
0.35
0.50
Power analysis - example
This is an example of one-sided test (we are interested only in exceeding of standard,
all mean values below standard are OK)
H0: content of Cl  0,3 mg.l-1
31
H1: content of Cl  0,3 mg.l-1
Power analysis - example
32
Power analysis - example
One sample t-test – one sided alternative – in R:
test criterion t
degrees of freedom p-value > a (0,05)
We can also compare test criterion (1,494) with critical value of t-distribution (1,717):
33
Power analysis - example
When null hypothesis is not rejected, we need to compute real power od the test
Power is only 42%. It
means that cca 58% of
tests incorrectly „accept“
the null hypothesis – this
test is very unreliable
34
Power analysis - example
Now we need to know how many samples we should take if we want to keep
required Type II error – 0,05 . We can adjust „delta“ – effect size – according to
our best knowlege.
Required sample
size is 92
35
Power analysis - example
If this sample size is unacceptable we must to raise „errors“ of test. We start with
TypeI error – we evaluated its practical consequences as not so problematic
Now we need 61 samples
36
Power analysis - example
If this sample size (62 sapmles) is still too big sample, we can slightly raise also
TypeII error – eg. from 0,05 to 0,10.
After increasing of
TypeII error we need
„only“ 46 samples
37