10-09 lecture +Q

Transcript 10-09 lecture +Q

Review
You run a t-test and get a result of t = 0.5. What is your conclusion?
A.
B.
C.
D.
Reject the null hypothesis because t is bigger than expected by chance
Reject the null hypothesis because t is smaller than expected by chance
Keep the null hypothesis because t is bigger than expected by chance
Keep the null hypothesis because t is smaller than expected by chance
Review
Your null hypothesis states µ = 50, and your sample has a mean of M = 58.
If your t statistic equals 4, what is the standard error of the mean (sM)?
A.
B.
C.
D.
Depends on the sample size
0.5
4
2
Review
Your null hypothesis predicts the population mean should be µ0 = 100. You
measure a sample of 25 people and calculate statistics of M = 94 and s = 10.
What is the value of your t statistic?
A.
B.
C.
D.
5
-0.6
-3
2.4
Hypothesis Testing
10/9
Where Am I?
• Wake up after a rough night in unfamiliar surroundings
• Still in Boulder?
Expected if in Boulder
(large likelihood)
Surprising but not impossible Couldn’t happen IF in Boulder
(moderate likelihood)
(likelihood near zero)
 Can’t be in Boulder
Steps of Hypothesis Testing
1. State clearly the two hypotheses
2. Determine which is the null hypothesis (H0) and which is the
alternative hypothesis (H1)
3. Compute a relevant test statistic from the sample
4. Find the likelihood function of the test statistic according to the
null hypothesis
5. Choose alpha level (a): how willing you are to abandon null (usually .05)
6. Find the critical value: cutoff with probability a of being exceeded
under H0
7. Compare the actual result to the critical value
• Less than critical value  retain null hypothesis
• Greater than critical value  reject null hypothesis;
accept alternative hypothesis
Specifying Hypotheses
• Both hypotheses are statements about
population parameters
• Null Hypothesis (H0)
– Always more specific, e.g. 50% chance, mean of 100
– Usually the less interesting, "default" explanation
• Alternative Hypothesis (H1)
– More interesting – researcher’s goal is usually to
support the alternative hypothesis
– Less precise, e.g. > 50% chance,  > 100
Test Statistic
• Statistic computed from sample to decide between
hypotheses
• Relevant to hypotheses being tested
– Based on mean if hypotheses are about means
– Based on number correct (frequency) if hypotheses are
about probability correct
• Sampling distribution according to null hypothesis
must be fully determined
– Can only depend on data and on values assumed by H0
• Often a complex formula with little intuitive meaning
– Inferential statistic: Only used in testing reliability
Likelihood Function
• Probability distribution of a statistic according to a
hypothesis
– Gives probability of obtaining any possible result
• Usually interested in distribution of test statistic
according to null hypothesis
• Same as sampling distribution, assuming the
population is accurately described by the hypothesis
• Test statistic chosen because we know its likelihood
function
– Binomial test: Binomial distribution
– t-test: t distribution
Critical Value
•
Cutoff for test statistic between retaining and rejecting null hypothesis
– If test statistic is beyond critical value, null will be rejected
– Otherwise, null will be retained
•
Before collecting data: What strength of evidence will you require to reject null?
– How many correct outcomes?
– How big a difference between M and 0, relative to sM?
•
Critical region
0.002
0.000
0.001
Probability
p
0.10
0.05
0.00
Probability
0.15
0.003
– Range of values that will lead to rejecting null hypothesis
– All values beyond critical value
0
1
2
3
4
5
6
7
8
9
Frequency
10
11
12
13
14
15
-4
-2
0
t
z[1:8000]
2
4
Types of Errors
• Goal: Reject null hypothesis when it’s false; retain it when it’s
true
• Two ways to be wrong
– Type I Error: Null is correct but you reject it
– Type II Error: Null is false but you retain it
• Type I Error rate
– IF H0 is true, probability of mistakenly rejecting H0
– Proportion of false theories we conclude are true
• E.g., proportion of useless treatments that are deemed effective
• Logic of hypothesis testing is founded on controlling Type I
Error rate
– Set critical value to give desired Type I Error rate
Alpha Level
• Choice of acceptable Type I Error rate
– Usually .05 in psychology
– Higher  more willing to abandon null hypothesis
– Lower  require stronger evidence before abandoning null hypothesis
• Determines critical value
– Under the sampling distribution of the test statistic according to the null
hypothesis, the probability of a result beyond the critical value is a
0.002
0.001
a
0.000
p
0.003
Sampling Distribution from H0
-4
-2
0
Testz[1:8000]
Statistic
2
4
Critical Value
Doping Analogy
• Measure athletes' blood for signs of doping
– Cheaters have high RBCs, but even honest people vary
• What rule to use?
– Must set some cutoff, and punish anyone above it
– Will inevitably punish some innocent people
• H0 likelihood function is like distribution of innocent athletes’ RBCs
• Cutoff determines fraction of innocent people that get unfairly punished
Don’t Punish
Punish
0.001
0.002
0.003
0.004
Distribution of Innocent Athletes
0.000
pnorm(z[2:801]) - pnorm(z[1:800])
– This fraction is alpha
-4
-2
RBC
0
z[1:800]
2
4
Power
•
Type II Error rate
– IF H0 is false, probability of failing to reject it
– E.g., fraction of cheaters that don’t get caught
•
Power
– IF H0 is false, probability of correctly rejecting it
– Equal to one minus Type II Error rate
– E.g., fraction of cheaters that get caught
•
Power depends on sample size
0.004
0
2
4
6
Power
c(-4, 6)
H1
-4
-2
0
2
4
6
-4
-4
0.002
0.001
0.000
c(0, 0.004)
0.003
-2
H0
0.000 0.001 0.002 0.003 0.004
-4
Type II error rate
0.000 0.001 c(0,0.002
0.004) 0.003 0.004
H0
0.000 0.001 0.002 0.003 0.004
c(0, 0.004)
Type I error rate (a)
c(0, 0.004)
0.000 0.001 0.002 0.003 0.004
0.000 0.001 0.002 0.003 0.004
c(0, 0.004)
c(0, 0.004)
– Choose sample size to give adequate power
– Researchers must make a guess at effect size to compute power
-4
-2
0
2
4
6
c(-4, 6)
H1
-4 -2
-2
-2 0
0
0
2
2
2
4
4
4
6
6
6
Two-Tailed Tests
• Sometimes want to detect effects in either direction
– Drugs that help or drugs that hurt
• Formalized in alternative hypothesis
–  < 0 or  > 0
• Two critical values, one in each tail
• Type I error rate is sum from both critical regions
Reject H0
0.000 0.001 0.002 0.003 0.004
c(0, 0.004)
– Need to divide errors between both tails
– Each gets a/2 (2.5%)
a/2
Reject H0
0
M
-4
-2
a/2
-tcrit
0
0
c(-4, 6)
t
2
tcrit
4
6
One-tailed
0.000 0.001 0.002 0.003 0.004
c(0, 0.004)
One-Tailed vs. Two-Tailed Tests
a
-4
-2
0
0
2
tcrit
4
6
c(-4, 6)
Two-tailed
0.000 0.001 0.002 0.003 0.004
c(0, 0.004)
t
a/2
-4
-2
a/2
-tcrit
0
0
c(-4, 6)
t
2
tcrit
4
6
0.000 0.001 0.002 0.003 0.004
An Alternative View: p-values
• Reversed approach to hypothesis testing
– After you collect sample and compute test statistic
– How big must a be to reject H0
• p-value
–
–
–
–
Measure of how consistent data are with H0
Probability of a value equal to or more extreme than what you actually got
Large p-value  H0 is a good explanation of the data
Small p-value  H0 is a poor explanation of the data
• p > a: Retain null hypothesis
• p < a: Reject null hypothesis; accept alternative hypothesis
• Researchers generally report p-values, because then reader can choose
own alpha level
– E.g. “p = .03”
– If willing to allow 5% error rate, then accept result as reliable
– If more stringent, say 1% (a = .01), then remain skeptical
tcrittcrit
forfor
a
.05
= .03
tcrit=afor
a = .01
-4
-2
0
2
c(-4, 6)
t
t =4 2.15  p =6 .03
Review
Later this semester, we’ll learn about hypothesis tests for distributions of
nominal variables. For example, we’ll poll everyone on their favorite colors and
count the frequency for each color. What would be a reasonable null
hypothesis?
A. Each color is chosen by the same number of people in the class
B. Each color would be chosen by the same number of people in the
population
C. Some colors are more popular than others among people in this class
D. Some colors are more popular than others among the population
Review
If you run a 1-tailed t-test with a sample size of n = 10 and a = .05,
the critical value is tcrit = 1.81.
Now imagine you ran the same test, but 2-tailed. Which of the following are
the new critical values? (You should be able to rule out all wrong answers.)
A.
B.
C.
D.
1.21, 2.41
-2.01, 2.45
-2.23, 2.23
-1.67, 1.67
Review
You run a t-test and get a result of t = 0.56 and p = .32. If your chosen alpha
level was 5%, what do you conclude?
A.
B.
C.
D.
Retain the null hypothesis, because p > a
Reject the null hypothesis, because p > a
Retain the null hypothesis, because p < t
Reject the null hypothesis, because p < t

10-09 lecture +Q

Transcript 10-09 lecture +Q

Directory