a An example

Transcript a An example

Today
•
•
•
•
•
•
•
•
•
Null and alternative hypotheses
1- and 2-tailed tests
Regions of rejection
Sampling distributions
The Central Limit Theorem
Standard errors
z-tests for sample means
The 5 steps of hypothesis-testing
Type I and Type II error
(not necessarily in this order)
1
Hypothesis testing
• Approach hypothesis testing from the standpoint
of theory.
• If our theory about some phenomenon is correct,
then things should be a certain way.
• If the commercial really works, then we should see
an increase in sales (that cannot easily be
attributed to chance).
• Hypotheses are stated in terms of parameters (e.g.,
“the average difference between Groups A and B is
zero in the population”).
2
Hypothesis testing
• We will always observe some kind of effect, even if
nothing interesting is going on.
• It could be due to chance fluctuations, or sampling
error... or there really could be an effect in the
population.
• Inferential statistics help us decide.
• If we conclude, on the basis of statistics, that an
effect should not be attributed to chance, the effect
is termed statistically significant.
3
Gender: F
10
•
Say we know m and s,
and that they are m =
64.28” and s = 3.1”, like
in the female sample.
•
We want to know if the
74-inch-tall person is
female.
•
Use logic to make a
good guess.
Frequency
8
6
4
2
Mean = 64.28
Std. Dev. = 3.077
N = 47
0
56
58
60
62
64
66
68
70
How tall are you in inches?
4
Gender: F
10
•
If the person is female, then her distribution
has m = 64.28” and s = 3.1” (assuming
normality).
•
That implies that “her” z-score is:
Frequency
8
6
4
z
2
Mean = 64.28
Std. Dev. = 3.077
N = 47
0
56
58
60
62
64
66
68
70
xm
s
74  64.28

 3.14
3.1
p  .001
How tall are you in inches?
•
Very unlikely that this person is female!
•
We could do this because we made the
assumption of normality, and assumed m =
64.28” and s = 3.1”.
5
Hypothesis testing
• A hypothesis is a theory-based prediction about
population parameters.
• Researchers begin with a theory.
• Then they define the implications of the theory.
• Then they test the implications using if-then logic
(e.g., if the theory is true, then the population mean
should be greater than 3.8).
6
Hypothesis testing
• Null hypothesis – Represents the “status quo”
situation. Usually, the hypothesis of no difference
or no relationship. E.g. ...
H0 : m  0
• Alternative hypothesis – what we are predicting
will occur. Usually, the most scientifically
interesting hypothesis. E.g. ...
H1 : m  0
7
Conventions
• By convention, the null and alternative hypotheses
are mutually exclusive and exhaustive. E.g. ...
H 0 : m  40%
H1 : m  40%
• Not everyone follows this convention.
8
Hypothesis testing
• This is an example of a 2-tailed hypothesis test:
H 0 : m  40%
H1 : m  40%
• Null distribution:
Null Distribution for Recidivism
0.05
0.04
0.03
0.02
0.01
0.00
0
20
40
60
Recidivism Percentage
80
100
9
1-tailed tests
• Say we had the following hypotheses:
H0 : m  0
H1 : m  0
• We would reject the null hypothesis only if the
observed mean is sufficiently positive.
• “Sufficiently” because sample means will always
differ. We care about the population, not samples.
• If we conclude that chance variability isn’t driving
the effect, then we say the effect is statistically
significant.
10
An example...
• Say we want to know if UNC students’ IQ differs
from the national average. We know:
m  500
s  100
• We pick a student at random (our “sample”), and
give her an IQ test. She scores 700.
• Was her score drawn from the U.S. population at
large, or from another (more intelligent)
distribution?
11
An example...
• The null hypothesis is that she is part of the U.S.
population distribution of IQ test-takers. Nothing
special.
• The alternative is that she is from some other
(more intelligent) population distribution.
H 0 : m  500
H1 : m  500
• 1-tailed, because we are interested only if UNC
students are more intelligent than average.
12
An example...
0.005
Null Distribution of IQ scores (U.S. population)
0.004
• First, draw the
null distribution:
0.003
0.002
0.001
0.000
100
200
300
400
500
600
700
800
900
IQ
0.005
Null Distribution of IQ scores (U.S. population)
0.004
• Then define the
region(s) of rejection:
0.003
a
0.002
0.001
0.000
100
200
300
400
500
600
700
800
900
IQ
13
An example...
0.005
Null Distribution of IQ scores (U.S. population)
0.004
0.003
0.002
a
0.001
0.000
100
200
300
400
500
600
700
800
900
IQ
14
An example...
• How did I find the “critical value” of IQ? By
knowing alpha, knowing how to use Table E10, and
a little algebra...
• First, find z given p, then...
z
x  m0
sx
zs x  x  m0
x  zs x  m0
x  1.645 100   500
x  664.5
15
An example...
• Our student’s IQ score is 700. Does it fall in the
region of rejection?
0.005
Null Distribution of IQ scores (U.S. population)
0.004
0.003
0.002
0.001
...Yes!
0.000
100
200
300
400
500
600
700
800
900
IQ
16
An example...
• We could have done this by comparing z-scores
instead of raw scores.
z
x  m0
sx
700  500

 2.0
100
• 2.0 > 1.645, so we reject H0.
17
An example...
• We also could have done this by comparing a pvalue to a instead of comparing raw scores or zscores.
• The p-value corresponding to a z-score or 2.0 is
.0228.
• .0228 < .05, so we reject H0.
• A UNC student with an IQ of 700 would be very rare
if drawn from the null population with m = 500. In
fact, even more rare than we are willing to tolerate
(remember, a = .05).
18
3 Decision rules in this example
• We need to know if we should reject H0. These
three rules all yield the same conclusion. Reject H0
if...
xobs  xcritical
zobs  zcritical
p a
19
But...
• Wait a minute – we did all that with only one
student??
• The sample was very small (N = 1) to making such
bold claims about UNC.
• We need a representative sample, N >> 1.
• The logic of hypothesis testing is exactly the same
with samples as it is with individuals.
• But, we need to know about sampling
distributions...
20
Sampling distributions
• Sampling distribution: A distribution of some
statistic.
• “Sampling distribution of _____” (mean / variance /
z, t, etc.)
21
The Central Limit Theorem
• Given a population with mean m and variance s2,
the sampling distribution of the mean (the
distribution of sample means) will have a mean
equal to m and a variance equal to:
s s
2
x
2
x
N
...and thus a standard deviation of:
sx sx
N
The distribution will approach normality as N
increases. [from Howell, p. 267]
22
The Central Limit Theorem
sx sx
N
• ...is called the standard error of the mean, or
simply standard error.
23
0.005
Null Distribution of IQ scores (U.S. population)
0.004
The Central Limit
Theorem
N=1
0.003
0.002
0.001
0.000
100
200
300
400
500
600
700
800
900
IQ
0.010
Null Distribution of IQ scores (U.S. population)
• As sample size
increases, the
standard error
decreases.
0.008
N=5
0.006
0.004
0.002
0.000
100
200
300
400
500
600
700
800
900
IQ
0.020
0.018
Null Distribution of IQ scores (U.S. population)
0.016
N = 20
0.014
0.012
0.010
0.008
0.006
0.004
0.002
0.000
100
200
300
400
500
IQ
600
700
800
900
24
The Central Limit Theorem
• Another example...
25
Back to the UNC IQ example...
• Let’s say we that we collect a sample of N = 4 UNC
students.
• Their IQs are 700, 710, 680, and 670.
• Now the mean is
x  690
• Is there enough evidence to claim that UNC
students are brighter than average?
• Now the question is, “if the population mean is
500, how extreme would a sample mean of 690 be
(given that N = 4)?
26
In terms of z-scores...
z
x  m0
sx

x  m0
s x


690  500

 3.8
 100 

 
4
N 
• The critical value for z is still +1.645 (because it’s a
1-tailed test and a = .05).
• 3.8 > 1.645, so reject H0.
• Conclusion: UNC students are likely brighter than
average (we’ll never really know for sure).
27
Another example
• Your theory says that Benadryl should alter
reaction time on some task, but you are not sure
how. The null and alternative hypotheses might be:
H 0 : m  0.09sec
H1 : m  0.09sec
• We’re given that s = .032 seconds
• We’re given that N = 400
• We’re given that a = .01
28
Finding critical z’s for a 2-tailed test
0.5
Standard Normal Distribution
0.4
Density
0.3
a
2
0.2
a
2
0.1
0.0
-5
-4
-3
-2
-1
0
1
2
3
4
5
z
z = -2.575
z = +2.575
29
Finding critical reaction times
300
s  .032sec
Reaction Time Sampling Distribution of the Mean
250
s x  .032
Density
200
150
100
a
2
a
2
400
 .032 / 20  .0016
50
0
0.082
0.084
0.086
0.088
0.090
0.092
0.094
0.096
0.098
seconds
30
Another example
• We collect data from our 400 subjects and find the
mean RT to be .097 seconds.
• .097 is different from .09, but different enough?
z
x  m0
sx
.097  .09
.007


 4.375
 .032
 .0016


400 

• 4.375 > 2.575, so reject H0. Benadryl probably does
have an effect on reaction time. Specifically, it
slows people down.
31
N = 1: a special case?
z
x  m0
sx
• When N = 1,

x  m0
s x



N

xx
s x
  s x   s

  1 
x
N 


...and:
z
x  m0
sx
32
The 5 steps of hypothesis testing
1. Specify null and alternative hypotheses.
2. Identify a test statistic.
3. Specify the sampling distribution and sample size.
4. Specify alpha and the region(s) of rejection.
5. Collect data, compute the test statistic, and make a
decision regarding H0.
33
1. Null and alternative hypotheses
• Specify H0 and H1 in terms of population
parameters.
• H0 is presumed to be true in the absence of
evidence against it.
• H1 is adopted if H0 is rejected.
H 0 : m  0.09sec
H1 : m  0.09sec
34
2. Identify a test statistic
• Identify a test statistic that is useful for
discriminating between different hypotheses about
the population parameter of interest, taking into
account the hypothesis being tested and the
information known.
• E.g., z, t, F, and c2.
35
3. Sampling distribution and N
• Specify the sampling distribution and sample size.
• The sampling distribution here refers to the
distribution of all possible values of the test
statistic obtained under the assumption that H0 is
true.
• E.g., “N = 48. The sampling distribution is the
standard normal distribution (distribution of z
statistics), because we are testing a hypothesis
about the population mean when s is known.”
36
4. Specify a and the rejection regions
• Alpha (a) is the probability of incorrectly rejecting
H0 (rejecting the null hypothesis when it is really
true).
• Regions of rejection are those ranges of the test
statistic’s sampling distribution which, if
encountered, would lead to rejecting H0.
• The regions of rejection are determined by a and
by whether the test is 1-tailed or 2-tailed.
37
5. Collect data, compute the test
statistic, make a decision
• For example...
z
x  m0
s x



N


20
 5



48 


2
 2.77
.722
• E.g., “2.77 > 1.96, so reject H0 and conclude that...”
• Always couch the conclusion in terms of the
original problem.
38
The 5 steps: Example
•
Let’s say you think a certain standardized achievement
test is biased against Asian-Americans. You know that
for the non-Asian-American population...
m  100
s  10
•
In the sample...
N  28
39
The 5 steps: Example
1. Specify null and alternative hypotheses.
H 0 : m  100
H1 : m  100
2. Identify a test statistic.
We want to compare a sample mean to a hypothesized
value, and we know s, so we use a z-test.
40
The 5 steps: Example
3. Specify the sampling distribution and sample size.
The sampling distribution of z is the standard normal
distribution.
N  28
4. Specify alpha and the region(s) of rejection.
a  .05
The regions of rejection are harder...
41
The 5 steps: Example
5. Collect data, compute the test statistic, make a decision.
We collect data. Say the mean is 97.1. Does 97.1 fall in
the region of rejection?
z
x m
sx

x m
s x


97.1  100 2.9


 1.53
 10
 1.89
 

N 
28 
42
Type I and Type II errors
•
There are two ways to make an incorrect decision in
hypothesis testing: Type I and Type II errors.
•
Type I error: Concluding that the null hypothesis is false
when it is really true.
•
We control the probability of making a Type I error
(alpha).
•
Alpha (a): The risk of incorrectly rejecting a true null
hypothesis.
•
Why not make a really, really small? The smaller we
make a, the more likely it becomes we will encounter a
Type II error.
43
Type I and Type II errors
•
Type II error: Concluding the null hypothesis is true
when it is really false.
•
Beta (b): The probability of incorrectly retaining a false
null hypothesis.
44
Next time...
•
Power
•
Effect size
•
Statistical significance vs. practical significance
•
Confidence intervals
45

a An example

Transcript a An example

Directory