Transcript H 0 - LICH
SUMMARY
Hypothesis testing
Self-engagement assesment
๐ = 7.8
๐ = 0.76
Null hypothesis
song
Null hypothesis: I assume
that populations without
and with song are same.
At the beginning of our
calculations, we
assume the null
hypothesis is true.
no song
Hypothesis testing song
โข population ๐ = 7.8, ๐ = 0.76
โข sample ๐ = 30, ๐ฅ = 8.2
๐=
Because of such a low probability,
we interpret 8.2 as a significant
increase over 7.8 caused by
undeniable pedagogical qualities
of the 'Hypothesis testing song'.
8.2 โ 7.8
= 2.85
0.76
30
corresponding probability is 0.0022
7.8 8.2
Four steps of hypothesis testing
1. Formulate the null and the alternative (this includes
one- or two-directional test) hypothesis.
2. Select the significance level ฮฑ โ a criterion upon which
we decide that the claim being tested is true or not.
--- COLLECT DATA --3. Compute the p-value. The p-value is the probability that
the data would be at least as extreme as those
observed, if the null hypothesis were true.
4. Compare the p-value to the ฮฑ-level. If p โค ฮฑ, the
observed effect is statistically significant, the null is
rejected, and the alternative hypothesis is valid.
One-tailed and two-tailed
one-tailed (directional) test
two-tailed (non-directional) test
Z-critical value,
what is it?
NEW STUFF
Decision errors
โข Hypothesis testing is prone to misinterpretations.
โข It's possible that students selected for the musical lesson
were already more engaged.
โข And we wrongly attributed high engagement score to the
song.
โข Of course, it's unlikely to just simply select a sample with
the mean engagement of 8.2. The probability of doing so
is 0.0022, pretty low. Thus we concluded it is unlikely.
โข But it's still possible to have randomly obtained a sample
with such a mean mean.
Four possible things can happen
Decision
State of
the world
Reject H0
Retain H0
H0 true
1
3
H0 false
2
4
In which cases we made a wrong decision?
Four possible things can happen
Decision
Reject H0
State of
the world
H0 true
H0 false
Retain H0
1
4
In which cases we made a wrong decision?
Four possible things can happen
Decision
Reject H0
State of
the world
H0 true
H0 false
Retain H0
Type I error
Type II error
Type I error
โข When there really is no difference between the
populations, random sampling can lead to a difference
large enough to be statistically significant.
โข You reject the null, but you shouldn't.
โข False positive โ the person doesn't have the disease, but
the test says it does
Type II error
โข When there really is a difference between the populations,
random sampling can lead to a difference small enough to
be not statistically significant.
โข You do not reject the null, but you should.
โข False negative - the person has the disease but the test
doesn't pick it up
โข Type I and II errors are theoretical concepts. When you
analyze your data, you don't know if the populations are
identical. You only know data in your particular samples.
You will never know whether you made one of these
errors.
The trade-off
โข If you set ฮฑ level to a very low value, you will make few
Type I/Type II errors.
โข But by reducing ฮฑ level you also increase the chance of
Type II error.
Clinical trial for a novel drug
โข Drug that should treat a disease for which there exists no
โข
โข
โข
โข
โข
โข
therapy
If the result is statistically significant, drug will me
marketed.
If the result is not statistically significant, work on the drug
will cease.
Type I error: treat future patients with ineffective drug
Type II error: cancel the development of a functional drug
for a condition that is currently not treatable.
Which error is worse?
I would say Type II error. To reduce its risk, it makes
sense to set ฮฑ = 0.10 or even higher.
Harvey Motulsky, Intuitive Biostatistics
Clinical trial for a me-too drug
โข Drug that should treat a disease for which there already
โข
โข
โข
โข
โข
exists another therapy
Again, if the result is statistically significant, drug will me
marketed.
Again, if the result is not statistically significant, work on
the drug will cease.
Type I error: treat future patients with ineffective drug
Type II error: cancel the development of a functional drug
for a condition that can be treated adequately with
existing drugs.
Thinking scientifically (not commercially) I would minimize
the risk of Type I error (set ฮฑ to a very low value).
Harvey Motulsky, Intuitive Biostatistics
Engagement example, n = 30
H0 : ๐ = ๐๐๐๐๐
HA : ๐ โ ๐๐๐๐๐
๐ = 7.8
๐ = 0.76
๐ = 30
๐ฅ = 8.06
๐๐๐๐๐ = 7.91
Z = 1.87
Z = 0.79
๐ผ = 0.05
two-tailed test
๐=0
www.udacity.com โ Statistics
Engagement example, n = 30
Which of these four quadrants represent the result
of our hypothesis test?
Decision
Reject H0
State of
the world
Retain H0
H0 true
H0 false
www.udacity.com โ Statistics
Engagement example, n = 30
Which of these four quadrants represent the result
of our hypothesis test?
Decision
Reject H0
State of
the world
H0 true
H0 false
Retain H0
X
Engagement example, n = 50
H0 : ๐ = ๐๐๐๐๐
HA :๐ โ ๐๐๐๐๐
๐ = 7.8
๐ = 0.76
๐ = ๐๐
๐ฅ = 8.06
๐๐๐๐๐ = 7.91
Z = 2.42
Z = 1.02
๐ผ = 0.05
two-tailed test
๐=0
www.udacity.com โ Statistics
Engagement example, n = 50
Which of these four quadrants represent the result
of our hypothesis test?
Decision
Reject H0
State of
the world
Retain H0
H0 true
H0 false
www.udacity.com โ Statistics
Engagement example, n = 50
Which of these four quadrants represent the result
of our hypothesis test?
Decision
Reject H0
State of
the world
H0 true
Retain H0
X
H0 false
www.udacity.com โ Statistics
population of students that did
not attend the musical lesson
parameters are known
๐0
๐0
population of students that did
attend the musical lesson
unknown
sample
๐
๐
statistic
is known
๐ฅ
Test statistic
test statistic
๐ฅ โ ๐0
๐=๐
0
๐
Z-test
We use Z-test if we know the population
mean ๐0 and the population s.d. ๐0 .
New situation
โข An average engagement score in the population of 100
students is 7.5.
โข A sample of 50 students was exposed to the musical
lesson. Their engagement score became 7.72 with the
s.d. of 0.6.
โข DECISION: Does a musical performance lead to the
change in the students' engagement? Answer YES/NO.
โข Setup a hypothesis test, please.
Hypothesis test
โข H0: ๐0 = ๐
โข H1: ๐0 โ ๐
โข In this case doing two-sided test is the only way to test the null.
You compare the sample mean of 7.72 with the population mean of
7.5. It seems that sample mean is larger than the population mean
(7.72 > 7.5), but the sample s.d. is 0.6. You can't setup the onetailed test as you can't guess the correct direction of the
relationship. Actually, you could very easily miss the correct
direction.
โข ๐ผ = 0.05
Formulate the test statistic
๐ฅ โ ๐0
๐=๐
0
๐
population of students that did
not attend the musical lesson
๐0 known
๐0 unknown
but this is unknown!
โข Instead of ๐0 we only know the sample s.d.
โข We can use it as the point estimate of population
s.d.
โข However, this will estimate s.d. for the population
exposed to the musical lesson, ๐0 in the above
formula is for "unperturbed" population.
โข In this case, it is common to make an assumption
that both populations have the same standard
deviation.
population of students that did
attend the musical lesson
unknown
sample
๐
๐
๐ฅ
๐
t-statistic
๐ฅ โ ๐0
๐ก= ๐
๐
one sample t-test
jednovýbฤrový t-test
Choose a correct alternative in the following statements:
1. The larger/smaller the value of ๐ฅ, the strongest the
evidence that ๐ > ๐0 .
2. The larger/smaller the value of ๐ฅ, the strongest the
evidence that ๐ < ๐0 .
3. The further the value ๐ฅ from ๐0 in either direction, the
stronger/weaker evidence that ๐ โ ๐0 .
t-distribution
One-sample t-test
๐ฅ โ ๐0
๐ก= ๐
๐
๐ป0 : ๐ = ๐0
๐ป๐ด : ๐ < ๐0
๐ > ๐0
๐ โ ๐0
๐ผ level
Quiz
๐ฅ โ ๐0
๐ก= ๐
๐
โข What will increase the t-statistic? Check all that apply.
1. A larger difference between ๐ฅ and ๐0 .
2. Larger ๐ .
3. Larger ๐.
4. Larger standard error.
Z-test vs. t-test
โข Use Z-test if
โข you know the standard deviation of the population.
โข If you know the sample ๐ AND you have large sample size
(traditionally over 30). In addition, you assume that the population
standard deviation is the same as the sample standard deviation.
โข Use t-test if
โข you don't know the population standard deviation (you know only
sample standard deviation ๐ ) and have a relatively small sample
size.
โข Tip: If you know only the sample standard deviation,
always use t-test.
โข For two sided test and ๐ผ = 0.05, what are the critical
values at Z- and t-distributions?
Typical example of one-sample t-test
โข You have to prepare 20 tubes with 30% solution od NaCl.
When you're finished, you measure the strength of 20
solutions. The mean strength is 31.5%, with the s.d. of
1.15%.
โข Decide if you have 30% solution or not?
โข ๐0 = 30%
โข ๐ป0 : ๐ = 30%, ๐ป1 : ๐ โ 30%
โข You use t-test in such a situation.
โข You could use Z-test if you have a large sample (e.g., you
prepared 100 tubes), but generally it is always correct to
use t-test.
Dependent t-test for paired samples
โข Two samples are dependent when the same subject
takes the test twice.
โข paired t-test (párový t-test)
โข This is a two-sample test, as we work with two samples.
โข Examples of such situations:
โข Each subject is assigned to two different conditions (e.g., use
QWERTZ keyboard and AZERTY keyboard and compare the error
rate).
โข Pre-test โฆ post-test.
โข Growth over time.
Example
โข 25 students attended a normal lesson. Their mean
engagement is ๐ฅ๐ = 5.08.
โข The same 25 students then heard the โHypotheses testing
songโ. Their mean engagement score is ๐ฅ๐ = 7.80.
student 1
student 2
โฎ
student n
๐๐ ๐๐ ๐ซ๐
๐ฅ1 ๐ฆ1 ๐ท1
๐ฅ2 ๐ฆ2 ๐ท2
โฎ โฎ โฎ
๐ฅ๐ ๐ฆ๐ ๐ท๐
song
no song
๐ฅ๐ โ ๐ฆ๐
Do the hypothesis test
โข Now we follow the same procedure as for the one-sample
t-test, except that we use values of differences ๐ท.
โข What will be the null? ๐ = 25, ๐ฅ๐ = 5.08, ๐ฅ๐ = 7.8
โข ๐ป0 โถ ๐๐ = ๐๐
โข But this is equivalent to stating ๐ป0 โถ ๐๐ โ ๐๐ = 0
โข And the alternative?
โข ๐ป0 โถ ๐๐ โ ๐๐
โข What is our point estimate for ๐ฅ๐ โ ๐ฅ๐ ?
โข ๐ฅ๐ โ ๐ฅ๐ = 5.08 โ 7.8 = โ2.72
Do the hypothesis test
โข What else do we need to calculate a t-statistic?
โข Wee need the standard deviation ๐ of mean differences.
โข We have a paired samples table, so we know each value,
and we can easily calculate ๐ (do not forget, you're dividing
by ๐ โ 1!).
โข Let's say it is ๐ = 3.69.
โข The t-statistic ๐ก =
๐ฅ๐ โ๐ฅ๐
๐
๐
โ2.72
= 3.69
= โ3.68
25
โข Do we reject the null or do we fail to reject the null at the ๐ผ =
0.05?
โข Critical values for ๐. ๐. = ๐ โ 1 = 24 for two-tailed ๐ผ = 0.05 are ±2.064.
โข We reject the null.
Dependent samples
โข e.g., give one person two different conditions to see how
he/she reacts. Maybe one control and one treatment or
two types of treatments.
โข Advantages
โข we can use fewer subjects
โข cost-effective
โข less time-consuming
โข Disadvantages
โข carry-over effects
โข order may influence results
Independent samples
โข Disadvantages of dependent samples become
advantages of dependent samples and vice versa.
โข We need more subjects, it's generally more time consuming and
more expensive.
โข No carry-over effects (each subject only gets one treatment).
โข Everything else is same
โข ๐ป0 โถ ๐ฅ1 โ ๐ฅ2 = 0, ๐ป1 โถ ๐ฅ1 โ ๐ฅ2
โข ๐ก=
๐ฅ1 โ๐ฅ2
SE
โข Reject ๐ป0 if ๐ < ๐ผ, fail to reject ๐ป0 if ๐ > ๐ผ.
Independent samples
โข However, the standard error changes because it is based on
two sample sizes and two standard deviations.
โข If we subtract normally distributed data from another normally
distributed data, we get a new data set
๐ ๐1 , ๐1 โ ๐ ๐2 , ๐2 = ๐ ๐1 โ ๐2 , ๐12 + ๐22
โข Similarly, for the sample:
๐ . ๐. =
๐ 12 + ๐ 22
This is true only if two
samples are independent!
โข standard error
๐ . ๐.
=
๐
๐ 12 + ๐ 22
=
๐
๐ 12 + ๐ 22
=
๐
๐ 12 ๐ 22
+
๐
๐
Independent samples
โข However, the standard error changes because it is based on
two sample sizes and two standard deviations.
โข If we subtract normally distributed data from another normally
distributed data, we get a new data set
๐ ๐1 , ๐1 โ ๐ ๐2 , ๐2 = ๐ ๐1 โ ๐2 , ๐12 + ๐22
โข Similarly, for the sample:
๐ . ๐. =
๐ 12 + ๐ 22
โข standard error
๐ . ๐.
=
๐
๐ 12 + ๐ 22
=
๐
๐ 12 + ๐ 22
=
๐
๐ 12
๐ 22
+
๐1
๐2
An example
โข Again, the musical lesson.
โข Let's teach nN = 10 students without the musical
performance, and expose different n๐ = 20 students to the
song.
โข What will be the null and the alternative?
โข ๐ป0 : ๐๐ = ๐๐ , ๐ป๐ด : ๐๐ โ ๐๐
โข Which direction will we use?
โข two-tailed
An example
โข ๐๐ = 10, ๐๐ = 20
โข ๐ฅ๐ = 5.08, ๐ ๐ = 2.65
โข ๐ฅ๐ = 7.80, ๐ ๐ = 2.18
โข Standard error
๐๐ธ =
๐ ๐2 ๐ ๐2
+
=
๐๐ ๐๐
2.652 2.182
+
= 0.97
10
20
โข Calculate t-statistic
๐ฅ๐ โ ๐ฅ๐ 5.08 โ 7.80
๐ก=
=
= โ2.80
๐๐ธ
0.97
โข How will you proceed further?
โข calculate d.f., define ๐ผ, find the critical t-value, compare the t-
statistic with the t-critical, decide about the null
An example
โข ๐. ๐. = 10 + 20 โ 2 = 28
โข t-critical value for ๐ผ = 0.05 is ±2.048
โข Reject or fail to reject the null?
โข Reject the null.
Summary of t-tests
โข one-sample test (jednovýbฤrový test)
โข you test H0 : ๐ = ๐0
โข two-sample test (dvouvýbฤrový test)
โข you test H0 : ๐1 โ ๐2 = 0
โข dependent samples
โข paired t-test (párový test)
โข independent samples
โข equal variances ๐1 ~๐2
โข unequal variances ๐1 โ ๐2
two-sample tests
F-test of equality of variances
โข How to know if our variances are equal or not?
โข var.test() in R, ๐ป0 : ๐1 = ๐2
โข Test statistic is a ratio of two variances. It has an F-
distribution. Each numerator and denominator has certain
number of d.f.
source: Wikipedia
t-test in R
โข t.test()
โข Let's have a look into R manual:
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html
โข See my website for link to pdf explaining various t-test in
R (with examples).
Assumptions
1. Unpaired t-tests are highly sensitive to the violation of
the independence assumption.
2. Populations samples come from should be
approximately normal.
โข This is less important for large sample sizes.
โข What to do if these assumptions are not fullfilled
1. Use paired t-test
2. Let's see further
Check for normality โ histogram
Check for normality โ QQ-plot
qqnorm(rivers)
qqline(rivers)
Check for normality โ tests
โข The graphical methods for checking data normality still
leave much to your own interpretation. If you show any of
these plots to ten different statisticians, you can get ten
different answers.
โข H0: Data follow a normal distribution.
โข Shapiro-Wilk test
โข shapiro.test(rivers):
Shapiro-Wilk normality test
data: rivers
W = 0.6666, p-value < 2.2e-16
Nonparametric statistics
โข Small samples from considerably non-normal
distributions.
โข non-parametric tests
โข No assumption about the shape of the distribution.
โข No assumption about the parameters of the distribution (thus they
are called non-parametric).
โข Simple to do, however their theory is extremely
complicated. Of course, we won't cover it at all.
โข However, they are less accurate than their parametric
counterparts.
โข So if your data fullfill the assumptions about normality, use
paramatric tests (t-test, F-test).
Nonparametric tests
โข If the normality assumption of the t-test is violated, and
the sample sizes are too small, then its nonparametric
alternative should be used.
โข The nonparametric alternative of t-test is Wilcoxon test.
โข wilcox.test()
โข http://stat.ethz.ch/R-manual/R-patched/library/stats/html/wilcox.test.html