235_lecture11_080401
Download
Report
Transcript 235_lecture11_080401
Psyc 235:
Introduction to Statistics
http://www.psych.uiuc.edu/~jrfinley/p235/
DON’T FORGET TO SIGN IN FOR CREDIT!
Stuff
• Thursday: office hours
hands-on help with specific problems
• Next week labs:
demonstrations of solving various types of
hypothesis testing problems
Population
Sampling
Distribution
(of the mean)
X
Sample
size = n
X
Descriptive vs Inferential
• Descriptive
describe the data you’ve got
if those data are all you’re interested in, you’re
done.
• Inferential
make inferences about population(s) of
values
(when you don’t/can’t have complete data)
Inferential
• Point Estimate
• Confidence Interval
• Hypothesis Testing
1 population parameter
z, t tests
2 pop. parameters
z, t tests on differences
3 or more?...
ANOVA!
Hypothesis Testing
1. Choose pop. parameter of interest
•
(ex: )
2. Formulate null & alternative hypotheses
•
assume the null hyp. is true
3. Select test statistic (e.g., z, t) & form of
sampling distribution
•
based on what’s known about the pop., &
sample size
Defining our hypothesis
• H0= the Null hypothesis
Usually designed to be the situation of no
difference
The hypothesis we test.
• H1= the alternative hypothesis
Usually the research related hypothesis
Null Hypothesis
(~Status Quo)
Examples:
• Average entering age is 28
(until shown different)
• New product no different from old one (until shown better)
• Experimental group is no different from control
group (until shown different)
• The accused is innocent
(until shown guilty)
Null Hypothesis
H0
Alternative Hypothesis
H1,H A ,H a
H 0 : 0
H 0 : 0 ( 0 )
H 0 : 0 ( 0 )
H a : 0
H a : 0
H a : 0
- Ha is the hypothesis you are gathering evidence in support of.
- H0 is the fallback option = the hypothesis you would like to reject.
- Reject H0 only when there is lots of evidence against it.
- A technicality: always include “=” in H0
- H0 (with = sign) is assumed in all mathematical calculations!!!
Decision Tree
for Hypothesis Testing
Population
Standard Deviation
known?
Yes
Pop. Distribution
normal?
n large?
(CLT)
Yes
No
z-score
Standard normal
distribution
Yes
No
No
Test stat.
Yes
z-score
Can’t do it
t-score
t distribution
No
Yes
No
t-score
Can’t do it
Selecting a distribution
Hypothesis Testing
1. Choose pop. parameter of interest
•
(ex: )
2. Formulate null & alternative hypotheses
•
assume the null hyp. is true
3. Select test statistic (e.g., z, t) & form of
sampling distribution
•
based on what’s known about the pop., &
sample size
Hypothesis Testing
4. Calculate test stat.:
sample stat. pop. param. under H
std. dev, of sampling distribution
0
5. Note: The null hypothesis implies a certain
sampling distribution
6. if test stat. is really unlikely under Ho, then reject
Ho
•
HOW unlikely does it need to be? determined by
Three equivalent methods of hypothesis
testing
(=significance level)
Compute standardiz ed statistic.
If standardiz ed statistic more extreme than critical value, then reject H 0
Compute 1 - confidence interval. If 0 not in confidence interval, then reject H 0
Compute p - value of observed X . If p - value , then reject H 0
p-value: prob of getting test stat at least as
extreme if Ho really true.
Hypothesis Testing as a
Decision Problem
H 0 true
H 0 false
H 0 retained
Fail to
Reject H 0
Great!
Type II
Error
given H 0 false
P(Type II Error)
Power:
Reject H 0
False Rejection of H 0
P(Type I Error)
Significan ce Level
Type I
Error
Great!
Depends on sample size
and how much the null
and alternative hypotheses
differ
1 – P(Type II error)
Our ability to reject
the null hypothesis
when it is indeed
false
ERRORS
• Type I errors (): rejecting the null
hypothesis given that it is actually true;
e.g., A court finding a person guilty of a
crime that they did not actually commit.
• Type II errors (): failing to reject the null
hypothesis given that the alternative
hypothesis is actually true; e.g., A court
finding a person not guilty of a crime that
they did actually commit.
Type I and Type II errors
decision
criterion
Power (1-
-6
-4
-2
0
2
4
6
8
10
ANOVA: Analysis of Variance
• a method of comparing 3 or more group
means simultaneously to test whether the
means of the corresponding populations
are equal
(why not just do a bunch of 2-sample ttests?...)
inflation of Type I error rate
ANOVA: 1-Way
• You have sample data from several
different groups
• “One-way” refers to one factor.
• Factor = a categorical variable that
distinguishes the groups.
• Level (group) of the factor refers to the
different values that the categorical
variable can take.
ANOVA: 1-Way
• Examples of Factors & groups:
Factor: Political Affiliation
groups: Democrat, Republican, Independent
X=annual income
Factor: Studying Method
groups: Re-read notes, practice test, do nothing
(control)
X=score on exam
ANOVA: 1-Way
• So you’ve got 3(+) sets of sample data,
from 3 different populations.
• You want to test whether those 3
populations all have the same mean ()
• Null Hypothesis:
H0: 1=2=3 (all pop. means are same)
H1: all pop. means are NOT the same!
• [draw examples on chalkboard]
ANOVA: Assumptions
• Normality
populations are normally distributed
• Homogeneity of variance
populations have same variance (2)
• 1-Way “Independent Samples”:
groups are independent of each other
ANOVA: the idea
• Two ways to estimate 2
MSB: Mean Square Between Group (aka MSE: MS Error)
based on how spread out the sample means are from each other.
Variation Between Samples
MSW: Mean Square Within Group
based on the spread of data within each group
Variation within Samples
• If the 3(+) populations really do have same mean,
then these 2 #s should be ~ the same
• If NOT, then MSB should be bigger.
ANOVA: calculating
• MSB: Variation between samples
(sample size) * (variance of sample means)
if sample sizes are the same in all groups
note: use the “sample variance” formula
• MSW: Variation within samples
(mean of sample variances)
ANOVA: the F statistic
• So how to compare MSB and MSW?
Fdfn,dfd
MSB
MSW
• Under H0: F≈1
• So calculate your F test statistic and compare to
F distribution, see if it falls in region of rejection.
[chalkboard]
• note: F one-tailed!
ANOVA: F & df
• F distribution requires specification of 2
degrees of freedom values
• DFn: degrees of freedom numerator:
(# of groups) - 1
• DFd: degrees of freedom denominator:
(total sample size (N)) - (# of groups)
ANOVA: example
• Groups: adults w/ 3 different activity levels
• X=% REM sleep
Group
Very Active
Moderately Active
Inactive
•
•
•
•
•
Sample
size
10
10
10
Sample
mean
26.6
25.1
26.7
Sample
variance
3
14.4
4.7
MSB=(sample size)(variance of sample means)=...
MSW=(mean of sample variance)=...
F=MSB/MSW=...
dfn=# groups - 1=... dfd=Ntotal-#groups=...
Fcritical=...p-value=...