Inferential Statistics 2

Download Report

Transcript Inferential Statistics 2

Inferential Statistics 2
Maarten Buis
January 11, 2006
outline
•
•
•
•
Student Recap
Sampling distribution
Hypotheses
Type I and II errors and power
• testing means
• testing correlations
Sampling distribution
• PrdV example from last lecture.
• If H0 is true, than the population consists of
16 million persons of which 41% (=6.56
million persons) supports de PrdV.
• I have drawn 100,000 random samples of
2,598 persons each and compute the
average support in each sample.
Sampling distribution
• 5% or 50,000 samples have a mean of 39%
or less.
• So if we reject H0 when we find a support
of 39% or less than we will have a 5%
chance of making an error.
• Notice: We assume that the only reason we
would make an error is random sampling
error.
sampling distribution of support for PrdV
1.0e+04
8000
6000
4000
2000
0
.36
.38
.4
.42
% support for PrdV
.44
.46
More precise approach
• We want to know the score below which only 5%
of the samples lie.
• Drawing lots of random samples is a rather rough
approach, an alternative approach is to use the
theoretical sampling distribution.
• The proportion is a mean and the sampling
distribution of a mean is the normal distribution
with a mean equal to the H0 and a standard
deviation (called standard error) of  N
More precise approach
• For a standard normal distribution we know
the z-score below which 5% of the samples
lie (Appendix 2, table A): -1.68
• So if we compute a z-score for the observed
value (.31) and it is below -1.68 we can
reject the H0, and we will do so wrongly in
only 5% of the cases
• z  x
se
More precise approach
•  is the mean of the sampling distribution,
so .41 (H0)
• se is  N ,  of a proportion is p1  p 
(1  .41)
• so the se is .412598
 .0096
 .41
 10.4
• so the z-score is z  x se   .31.0096
• -10.4 is less than -1.68, so we reject the H0
Null Hypothesis
• A sampling distribution requires you to
imagine what the population would look
like if H0 is true.
• This is possible if H0 is one value (41%)
• This is impossible if H0 is a range (<41%)
• So H0 should always contain a equal sign
(either = or ≤ or ≥)
Null hypothesis
• In practice the H0 is almost always 0, e.g.:
– difference between two means is 0
– correlation between two variables is 0
– regression coefficient is 0
• This is so common that SPSS always
assumes that this is the H0.
Undirected Alternative Hypotheses
• Often we have an undirected alternative
hypothesis, e.g.:
– the difference between two means is not zero
(could be either positive or negative)
– the correlation between two variables is not
zero (could be either positive or negative)
– the regression coefficient is not zero (could be
either
Directed alternative hypothesis
• In the PrdV example we had a directed
alternative hypothesis: Support for PrdV is
less than 41%, since PrdV would have still
participated if his support were more than
41%.
Type I and Type II errors
actual situation
decision
H0 is True
H0 is False
reject H0
Type I error
probability = a
do not
reject H0
correct decision
probability = 1-a
correct decision
probability = 1-b
(power)
Type II error
probability = b
Type I error rate
• You choose the type I error rate (a)
• It is independent of sample size, type of
alternative hypothesis, or model
assumptions.
Type I versus type II error rate
• a low probability of rejecting H0 when H0 is
true (type I error), is obtained by:
• rejecting the H0 less often,
• Which also means a higher probability of
not rejecting H0 when H0 is false (type II
error),
• In other words: a lower probability of
finding a significant result when you should
(power).
How to increase your power:
• Lower type I error rate
• Larger sample size
• Use directed instead of undirected
alternative hypothesis
• Use more assumptions in your model (nonparametric tests make less assumptions, but
are also have less power)
Testing means
• What kind of hypotheses might we want to
test:
– Average rent of a room in Amsterdam is 300
euros
– Average income of males is equal to the
average income of females
Z versus t
• In the PrdV example we knew everything about
the sampling distribution with only an hypothesis
about the mean.
• In the rent example we don’t: we have to estimate
the standard deviation.
• This adds uncertainty, which is why we use the
t distribution instead of the normal
• Uncertainty declines when sample size becomes
larger.
• In large samples (N>30) we can use the normal.
t-distribution
• It has a mean and standard error like the
normal distribution.
• It also has a degrees of freedom, which
depends on the sample size
• The larger the degrees of freedom the closer
the t-distribution is to the normal
distribution.
Data: rents of rooms
rent
rent
room 1
175
room 11
240
room 2
room 3
180
185
room 12
room 13
250
250
room 4
room 5
room 6
190
200
210
room 14
room 15
room 16
280
300
300
room 7
room 8
room 9
room 10
210
210
230
240
room 17
room 18
room 19
310
325
620
Rent example
H0: =300, HA:  ≠ 300
We choose a to be 5%
N = 19, so df= 18
We reject H0 if we find a t less than -2.101
or more than 2.101 (appendix B, table 2)
• We do not reject H0 if we find a t between
-2.101 and 2.101 .
•
•
•
•
Rent example
x
t
,
se
se   n
•
• We use s2 as an estimate of 2
x  258,
  300,
258  300
 1.85
99 19
s  99,
N  19
• So
• -1.85 is between -2.101 and 2.101, so we do
not reject H0
t
Compare means in SPSS
Group Statistics
incmid hous ehold
income in guilders
s ex s ex respondent
1 male
2 female
N
Mean
1131 2833,2228
1121 2199,2640
Std. Deviation
1530,70376
1366,42170
Std. Error
Mean
45,51556
40,81144
Independent Samples Test
Levene's Test for
Equality of Variances
F
incmid hous ehold
income in guilders
Equal variances
ass umed
Equal variances
not as sumed
19,012
Sig.
,000
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
10,365
2250
,000
633,95876
61,16368 514,01564 753,90189
10,370
2225,825
,000
633,95876
61,13297 514,07516 753,84236
t
x1  x2   1   2   x1  x2   0  x1  x2 
se x1  x2
se x1  x2 
se x1  x2
se x1  x2
s pool
N1  N 2
s pool  s 2pool
s
2
pool

N1  1s12  N 2  1s22

N1  N 2  2
N1  1s12  N 2  1s22
se x1  x2 
N1  N 2  2
N1  N 2
N1  1s12  N 2  1s22
se x1  x2 
t
N1  N 2  2
N1  N 2
  N1  1s12   N 2  1s22   1
1 
 


 

N

N

2
N
N
1
2
2

  1
x1  x2 
  N1  1s12   N 2  1s22   1



N1  N 2  2
1 
 N  N 
2
  1
Do before Monday
• Read Chapter 9 and 10
• Do the “For solving Problems”